WO2022127333A1 - 图像分割模型的训练方法、图像分割方法、装置、设备 - Google Patents

图像分割模型的训练方法、图像分割方法、装置、设备 Download PDF

Info

Publication number
WO2022127333A1
WO2022127333A1 PCT/CN2021/124337 CN2021124337W WO2022127333A1 WO 2022127333 A1 WO2022127333 A1 WO 2022127333A1 CN 2021124337 W CN2021124337 W CN 2021124337W WO 2022127333 A1 WO2022127333 A1 WO 2022127333A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
scale
sample
activation map
classification
Prior art date
Application number
PCT/CN2021/124337
Other languages
English (en)
French (fr)
Inventor
卢东焕
马锴
郑冶枫
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP21905262.8A priority Critical patent/EP4220555A4/en
Publication of WO2022127333A1 publication Critical patent/WO2022127333A1/zh
Priority to US17/955,726 priority patent/US20230021551A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/149Segmentation; Edge detection involving deformable models, e.g. active contour models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Definitions

  • the present application relates to the field of image segmentation, and in particular, to an image segmentation model training method, image segmentation method, apparatus, and equipment.
  • Image segmentation refers to dividing an image into several disjoint regions according to features such as grayscale, color, spatial texture, geometric shape, etc., so that these features show consistency or similarity in the same region, but different regions. markedly different. Simply put, it is to separate the foreground target from the background in an image.
  • a threshold method is used for image segmentation.
  • the basic idea of the thresholding method is to calculate one or more grayscale thresholds based on the grayscale features of the image, compare the grayscale value of each pixel in the image with the threshold, and finally classify the pixels into appropriate categories according to the comparison results. . Therefore, the most critical step of this method is to solve the optimal gray threshold according to a certain criterion function.
  • the method in the related art requires the target to have obvious edge or grayscale differences, and can only determine the target based on shallow features such as image pixel values, with low precision.
  • a training method for an image segmentation model includes an encoder and a decoder, and the method includes:
  • the encoder to perform feature extraction on the sample image and the scale image, and obtain the sample image feature of the sample image and the scale image feature of the scale image, and the scale image includes: an image obtained by enlarging the sample image and reducing at least one of the images obtained from the sample image;
  • the class activation map is used to represent The contribution value of each pixel in the image to the classification result of the image;
  • the decoder Invoke the decoder to decode the sample image feature to obtain the sample segmentation result of the sample image, and invoke the decoder to decode the scale image feature to obtain the scale segmentation result of the scale image;
  • the sample segmentation The result includes the classification probability value of each pixel in the sample image;
  • the decoder is trained based on the class activation map loss and the scale loss; the class activation map loss is used to train the decoder to make the sample segmentation result close to the sample class activation map, so that the scale segmentation The result is close to the scale class activation map; the scale loss is used to train the decoder to make the sample segmentation result close to the scale segmentation result.
  • a training device for an image segmentation model includes an encoder and a decoder, and the device includes:
  • an encoding module configured to call the encoder to perform feature extraction on the sample image and the scale image, and obtain the sample image feature of the sample image and the scale image feature of the scale image, where the scale image includes: enlarging the sample image at least one of the obtained image and the image obtained by reducing the sample image;
  • a class activation map module configured to calculate a class activation map based on the sample image features to obtain a sample class activation map of the sample image, and calculate a class activation map based on the scale image features to obtain a scale class activation map of the scale image;
  • the class activation map is used to represent the contribution value of each pixel in the image to the classification result of the image;
  • a decoding module configured to call the decoder to decode the sample image features to obtain the sample segmentation result of the sample image, and call the decoder to decode the scale image features to obtain the scale segmentation result of the scale image ;
  • the sample segmentation result includes the classification probability value of each pixel in the sample image;
  • a loss module configured to calculate a class activation map loss based on the sample class activation map, the sample segmentation result, the scale class activation map and the scale segmentation result, and calculate the class activation map loss based on the sample segmentation result and the scale segmentation result scale loss;
  • a training module for training the decoder based on the class activation map loss and the scale loss;
  • the class activation map loss is used for training the decoder so that the sample segmentation result is close to the sample class activation map,
  • the scale segmentation result is brought close to the scale class activation map;
  • the scale loss is used to train the decoder to bring the sample segmentation result close to the scale segmentation result.
  • An image segmentation method comprising:
  • the decoder Invoke the decoder to decode the input image features, and obtain the image segmentation result of the input image.
  • the decoder is trained according to the class activation map loss and scale loss, and the class activation map loss is used to train the
  • the decoder outputs the segmentation result close to the class activation map, the class activation map is used to represent the contribution value of each pixel in the image to the classification result of the image, and the scale loss is used to train the decoder to the same image content , and multiple images with different scales output similar segmentation results.
  • An image segmentation device includes:
  • the acquisition module is used to acquire the input image
  • a feature extraction module used for invoking an encoder to perform feature extraction on an input image to obtain input image features of the input image
  • the image segmentation module is used to call the decoder to decode the input image features and obtain the image segmentation result of the input image.
  • the decoder is trained according to the class activation map loss and the scale loss.
  • the class activation map The loss is used to train the decoder to output a segmentation result close to the class activation map, the class activation map is used to represent the contribution of each pixel in the image to the classification result of the image, and the scale loss is used to train the The decoder outputs similar segmentation results for multiple images with the same image content and different scales.
  • a computer device comprising a memory and one or more processors, the memory storing a computer program that, when executed by the one or more processors, causes the one or more processors A training method or an image segmentation method for implementing the image segmentation model described above.
  • a non-volatile computer-readable storage medium storing one or more computer-readable instructions storing a program of computer-readable instructions that, when executed by one or more processors, cause all The one or more processors implement the image segmentation model training method or the image segmentation method as described above.
  • a computer program product or computer program comprising computer readable instructions stored in a computer readable storage medium.
  • the processor of the computer device reads the computer-readable instructions from the computer-readable storage medium, and the processor executes the computer-readable instructions, so that the computer device executes the image segmentation model training method or image segmentation provided in the above-mentioned optional implementation manner method.
  • FIG. 1 is a block diagram of a computer device provided by an exemplary embodiment of the present application.
  • FIG. 2 is a method flowchart of a training method for an image segmentation model provided by another exemplary embodiment of the present application
  • FIG. 3 is a method flowchart of a training method for an image segmentation model provided by another exemplary embodiment of the present application.
  • Fig. 4 is a method flowchart of a training method of an image classification model provided by another exemplary embodiment of the present application.
  • FIG. 5 is a schematic diagram of a training method for an image classification model provided by another exemplary embodiment of the present application.
  • FIG. 6 is a schematic diagram of a training method for an image segmentation model provided by another exemplary embodiment of the present application.
  • FIG. 7 is a method flowchart of an image segmentation method provided by another exemplary embodiment of the present application.
  • FIG. 8 is a block diagram of an apparatus for training an image segmentation model provided by another exemplary embodiment of the present application.
  • FIG. 9 is a block diagram of an image segmentation apparatus provided by another exemplary embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of a server provided by another exemplary embodiment of the present application.
  • FIG. 11 is a block diagram of a terminal provided by another exemplary embodiment of the present application.
  • FIG. 1 shows a schematic diagram of a computer device 101 provided by an exemplary embodiment of the present application, where the computer device 101 may be a terminal or a server.
  • the terminal may include at least one of a smart phone, a notebook computer, a desktop computer, a tablet computer, a smart speaker, a vehicle-mounted terminal, and an intelligent robot.
  • a client supporting the image segmentation function is installed on the terminal.
  • the client supporting the image segmentation function may be an image processing application or a client of a video processing application.
  • image segmentation functions are provided in image processing applications to intelligently identify face regions in images, so that image optimization can be performed automatically on face regions.
  • the target batch modification function is provided in the video processing application, which is used to intelligently identify the target (people, plants, animals, objects, etc.) in each frame of the video, and perform uniform image processing on the target in each frame, such as , adjust color, brightness, saturation, and more.
  • an image segmentation model is stored on the terminal, and when the client needs to use the image segmentation function, the client can call the image segmentation model to complete the image segmentation. For example, when the user needs to perform image optimization on the target image, the client invokes the image segmentation model to perform image segmentation on the target image, obtains the target area in the target image, and automatically optimizes the target area.
  • the terminal and the server are connected to each other through a wired or wireless network.
  • the method provided in this application may be executed by a client on a terminal, or may be executed by a server. That is, the training of the image segmentation model can be done by the client or by the server.
  • the client in the application stage after the training of the image segmentation model is completed, the client can call the locally stored image segmentation model to perform image segmentation; the client can also send an image segmentation request to the server, and the server can invoke the image segmentation model to perform image segmentation. ; It can also be that when the server needs to perform image segmentation on the received image, the image segmentation model can be called to perform image segmentation.
  • the terminal includes a first memory and a first processor.
  • An image segmentation model is stored in the first memory; the above-mentioned image segmentation model is called and executed by the first processor to implement the training method of the image segmentation model provided by the present application.
  • the first memory may include but is not limited to the following: random access memory (Random Access Memory, RAM), read only memory (Read Only Memory, ROM), programmable read only memory (Programmable Read-Only Memory, PROM), Erasable Programmable Read-Only Memory (EPROM), and Electrically Erasable Programmable Read-Only Memory (EEPROM).
  • the first processor may be composed of one or more integrated circuit chips.
  • the first processor may be a general-purpose processor, such as a central processing unit (Central Processing Unit, CPU) or a network processor (Network Processor, NP).
  • the first processor may implement the training method of the image segmentation model provided by the present application by running a program or code.
  • the server includes a second memory and a second processor.
  • An image segmentation model is stored in the second memory, and the image segmentation model is called by the second processor to implement the training method of the image segmentation model provided by the present application.
  • the server receives and stores user data sent by the terminal, and marks the information objects based on the user data.
  • the second memory may include but not limited to the following: RAM, ROM, PROM, EPROM, and EEPROM.
  • the second processor may be a general-purpose processor, such as a CPU or NP.
  • the image segmentation model stored in the terminal or server includes an encoder 102 and a decoder 103 .
  • the computer device invokes the encoder 102 to perform feature extraction on the sample image X and the scale image R(X) to obtain the sample image feature of the sample image X and the scale image feature of the scale image R(X), and the scale image R(X) is an image obtained by up-sampling the sample image X; that is, the scale image R(X) is twice the size of the sample image X.
  • the class activation map 104 is used to represent each pixel in the image to the image The contribution value of the classification results;
  • the decoder 103 to decode the sample image feature to obtain the sample segmentation result of the sample image, and call the decoder to decode the scale image feature to obtain the scale segmentation result of the scale image;
  • the sample segmentation result includes the classification probability value of each pixel in the sample image;
  • the loss L seg is calculated based on the sample class activation map, the sample segmentation result, the scale class activation map and the scale segmentation result.
  • the loss L seg includes the class activation map loss and the scale loss; the class activation map loss is used to train the decoder to make the sample segmentation result close to the sample The class activation map, which makes the scale segmentation result close to the scale class activation map; the scale loss is used to train the decoder to make the sample segmentation result close to the scale segmentation result;
  • the decoder 103 is trained based on the class activation map loss and scale loss.
  • FIG. 2 shows a flowchart of a training method for an image segmentation model provided by an exemplary embodiment of the present application.
  • the method may be performed by a computer device, eg, as shown in FIG. 1 .
  • the method includes the following steps.
  • Step 201 call the encoder to perform feature extraction on the sample image and the scale image, and obtain the sample image feature of the sample image and the scale image feature of the scale image, and the scale image includes: an image obtained by enlarging the sample image, or, an image obtained by reducing the sample image at least one of them.
  • the image segmentation model includes an encoder and a decoder.
  • the image segmentation model is used to segment the input image to obtain the region where the classification target is located on the image.
  • the image segmentation model can be used to identify at least one classification target.
  • the image segmentation model can be used to identify the pixels where cats, dogs, and people are located in an image.
  • the image segmentation model can output N probability values that each pixel on the image belongs to the N classification targets according to the input image, then it can be obtained that the image belongs to the N classification targets.
  • N classification probability maps of N classification targets the pixel value of each pixel on the i-th classification probability map is the probability value that the pixel belongs to the i-th classification target, where N is a positive integer, and i is not greater than N positive integer of .
  • the classification target to which each pixel on the image belongs may be determined according to the N classification probability maps of the image.
  • the classification target for example, the probability value of a pixel belonging to a cat is 0.1, the probability value of belonging to a dog is 0.2, and the probability value of belonging to a person is 0.7, then the classification target to which this pixel belongs is a person.
  • the output of the image segmentation model may be a classification target map of the image, and the value of each pixel on the classification target map represents the classification target to which the pixel belongs. Therefore, the location (pixel point) of each classification target can be seen on the classification target map.
  • the segmentation result output by the image segmentation model may be the above-mentioned N classification probability maps or the above-mentioned classification target map.
  • the method provided in this embodiment is based on this idea to train the image segmentation network, so that the segmentation results output by the image segmentation network for the original image and the zoomed image are similar.
  • the manner of enlarging the sample image may be up-sampling, and the manner of reducing the sample image may be down-sampling.
  • a set of training samples includes at least one of the following three combinations:
  • the method steps of this embodiment only take a group of training samples as an example to describe the training method of the image segmentation model provided by this application. Iterative training.
  • the sample image and the scale image are respectively input into the encoder, and feature extraction is performed to obtain the sample image feature of the sample image and the scale image feature of the scale image.
  • the encoder used in this embodiment is an encoder that has been trained and has a high feature extraction capability.
  • only the decoder of the image segmentation model is trained, not Train the encoder.
  • the encoder can use any encoder from an image classification model that has been trained.
  • the image classification model is used to output the classification target to which the image belongs according to the input image.
  • the classification target identified by the image classification model is the same as the classification target identified by the image segmentation model, that is, the image classification model is also used to identify N classification targets.
  • the encoder is a convolutional neural network (Convolutional Neural Networks, CNN), and its specific network structure can be adjusted according to the size of the image.
  • CNN Convolutional Neural Networks
  • Step 202 Calculate the class activation map based on the sample image features to obtain the sample class activation map of the sample image, and calculate the class activation map based on the scale image features to obtain the scale class activation map of the scale image; the class activation map is used to represent each pixel in the image to the image. contribution to the classification results.
  • the encoder adopts the encoder in the image classification model that has been trained, the image features extracted by the encoder already have feature information for classifying the image. Therefore, based on the image output by the encoder
  • the feature calculation class activation map can obtain the pixel points on which the image classification model classifies the image.
  • the class activation map is used to train the decoder so that the image segmentation results output by the decoder are close to the class activation map.
  • Step 203 calling the decoder to decode the sample image features to obtain the sample segmentation result of the sample image, and calling the decoder to decode the scale image features to obtain the scale segmentation result of the scale image;
  • the sample segmentation result includes the classification probability of each pixel in the sample image. value.
  • the decoder is configured to decode the image features output by the encoder to obtain the segmentation result.
  • the decoder consists of a multi-layer convolutional neural network.
  • the decoder adopts a network structure that is reciprocal to the encoder.
  • the encoder includes four convolution blocks, and each convolution block consists of two convolution layers, where the convolution kernel size is 3*3.
  • the encoder accesses a kernel size It is a 2*2 pooling layer, and the image features output by the convolution block are changed to 1/2 of the original.
  • the decoder can also include four convolution blocks, and each convolution block consists of two convolution layers. , where the size of the convolution kernel is 3*3.
  • the decoder accesses an upsampling layer to double the size of the image features. In this way, the image size of the input encoder is the same as The images output by the decoder are the same size.
  • Step 204 calculate the class activation map loss based on the sample class activation map, the sample segmentation result, the scale class activation map and the scale segmentation result, and calculate the scale loss based on the sample segmentation result and the scale segmentation result;
  • the class activation map loss is used to train the decoder to make the sample The segmentation result is close to the sample class activation map, so that the scale segmentation result is close to the scale class activation map;
  • the scale loss is used to train the decoder to make the sample segmentation result close to the scale segmentation result.
  • the class activation map and the class activation map loss of the segmentation result are calculated. Based on the idea that the segmentation result of the image does not change after the scale is changed, the scale loss of the sample segmentation result and the scale segmentation result is calculated.
  • Step 205 train the decoder based on the class activation map loss and the scale loss.
  • the decoder is trained based on the above-mentioned class activation map loss and scale loss, so that the decoder outputs a segmentation result based on the class activation map, and the output result of the decoder ensures that the segmentation result of the image remains unchanged after the scale is changed.
  • the method provided in this embodiment uses the sample image and the scale image obtained after the sample image is scaled to be input into the encoder and decoded respectively.
  • the encoder performs image segmentation to obtain the image segmentation results of the two images.
  • the scale loss can be calculated, and the scale loss is used to train the encoder, so that the two image segmentation results are similar to ensure that the image size is changed after the image size is changed.
  • the segmentation result remains unchanged.
  • the class activation images of the two images are calculated separately.
  • the image segmentation result should be close to the class activation map, using the image segmentation results.
  • Calculate the class activation map loss with the class activation map of the image use the class activation map loss to train the encoder, and finally make the image segmentation result output by the encoder close to the class activation map, while ensuring that the image segmentation result obtained after image scaling is not Change.
  • FIG. 3 shows a flowchart of a training method for an image segmentation model provided by an exemplary embodiment of the present application.
  • the method can be performed by a computer device, for example, a terminal or a server as shown in FIG. 1 .
  • step 202 further includes step 301
  • step 202 further includes step 2021
  • step 204 further includes step 2041 to step 2043
  • step 205 further includes step 2051 to step 2052.
  • Step 301 call the fully connected layer to classify and predict the sample image features to obtain the sample classification result of the sample image; call the fully connected layer to classify and predict the scale image features to obtain the scale classification result of the scale image.
  • the image segmentation model further includes a pretrained fully connected layer.
  • the image classification model mentioned in step 201 in the embodiment shown in FIG. 2 also includes a fully connected layer, that is, the image classification model includes an encoder and a fully connected layer. After the image is input to the encoder for feature extraction, , and the classification result is obtained through the fully connected layer.
  • the image segmentation model also adopts the fully connected layer of the image classification model, and outputs the classification results of the sample image and the scale image according to the image features output by the encoder.
  • the classification result is a vector composed of N probability values that the image belongs to the N classification targets.
  • Step 2021 Calculate the sample class activation map of the sample image based on the sample image feature and the sample classification result; calculate and obtain the scale class activation map of the scale image based on the scale image feature and the scale classification result.
  • c is the c-th classification target in the N classification targets
  • S c is the probability value of the c-th class in the classification result
  • N*M is the size of the image feature
  • N*M is the size of the image feature
  • It represents the contribution value of the pixel in the k-th row and the j-th column of the i-th image feature to the image being classified into the i-th classification target.
  • ReLU is the activation function, indicating that the class activation map only focuses on pixels with a probability value greater than 0, and A i is the i-th image feature.
  • N class activation maps of the N classification targets of the image can be calculated according to the above calculation formula, and the i-th class activation map is used to represent each pixel in the image.
  • the point pair image is divided into the contribution value of the ith classification target.
  • the sample class activation map includes N class activation maps of the sample image at N classification targets;
  • the scale class activation map includes N class activation maps of the scale image at N classification targets.
  • Step 2041 Calculate the sample class activation map loss based on the sample class activation map and the sample segmentation result.
  • the sample class activation map loss is calculated according to the error between the sample segmentation result and the sample class activation map.
  • the cross-entropy of the sample class activation map and the sample segmentation result is determined as the sample class activation map loss.
  • L seg, 1 is the sample class activation map loss
  • y 1 is the sample class activation map
  • s 1 is the sample segmentation result.
  • the size of the class activation map is different from that of the segmentation result. It is also necessary to upsample the class activation map to the same size as the segmentation result, and then calculate the loss of the class activation map according to the above formula. Since the size of the segmentation result is the same as the size of the original image of the input image segmentation model, that is, the size of the sample segmentation result is the same as the size of the sample image.
  • the class activation map is calculated based on the image features output by the encoder.
  • the size of the class activation map is the same as the size of the image features output by the encoder, and the size of the image features is smaller than the original image. Therefore, the size of the class activation map is smaller than the segmentation. result. In this regard, it is necessary to upsample the class activation map, enlarge it to the size of the segmentation result, and then use the enlarged class activation map to substitute the above formula to calculate the class activation map loss.
  • the above formula is the calculation formula of the sample class activation map loss when the classification target of the image segmentation network is 1.
  • the calculation formula of the sample class activation map loss is:
  • L seg, 1 is the sample class activation map loss
  • y 1i is the sample class activation map of the ith classification target
  • s 1i is the sample segmentation result of the ith classification target.
  • Step 2042 Calculate the scale class activation map loss based on the scale class activation map and the scale segmentation result.
  • the scale class activation map loss is calculated according to the error between the scale segmentation result and the scale class activation map.
  • the cross-entropy of the scale class activation map and the scale segmentation result is determined as the scale class activation map loss.
  • L seg, 2 is the scale class activation map loss
  • y 2 is the scale class activation map
  • s 2 is the scale segmentation result.
  • the above formula is the calculation formula of the scale class activation map loss when the classification target of the image segmentation network is 1.
  • the calculation formula of the scale class activation map loss is:
  • L seg, 2 is the scale class activation map loss
  • y 2i is the scale class activation map of the ith classification target
  • s 2i is the scale segmentation result of the ith classification target.
  • the above-mentioned formula can be used to calculate the scaled class activation map loss.
  • the scale image includes two scale images, the reduced scale image and the enlarged scale image
  • the scale class activation map loss includes the reduced scale class activation map loss and the enlarged scale class activation map loss. Then, the reduced-scale class activation map loss L seg of the reduced-scale image, 2.1 , and the enlarged-scale class activation map loss L seg of the enlarged-scale image, L seg , 2.2 , are respectively calculated using the above formula.
  • Step 2043 Calculate the scale loss based on the sample segmentation result and the scale segmentation result.
  • the output sample segmentation result and the scale segmentation result have different sizes. Therefore, in order to compare the difference between the two, it is necessary to scale the sample segmentation result and the scale segmentation result to the same size. size.
  • the output segmentation result is the same size as the input image. Therefore, according to the scale relationship between the sample image and the scale image, the sample segmentation result and the scale segmentation result are scaled to the same scale.
  • the computer equipment scales the sample segmentation result to the same size as the scale segmentation result according to the scale relationship between the sample image and the scale image, and obtains the scaled sample segmentation result; according to the error calculation between the scale segmentation result and the scaled sample segmentation result scale loss.
  • a scaled image is an image obtained by upsampling a sample image and zooming in by a factor of two. Then, the sample segmentation result is up-sampled, and the scaled sample segmentation result is obtained after being enlarged to twice.
  • the first matrix difference between the scale segmentation result and the scaled sample segmentation result is calculated, and the 2-norm of the first matrix difference is determined as the scale loss.
  • the formula for calculating scale loss can be:
  • L seg, 3 is the scale loss
  • s 2 is the scale segmentation result
  • s 1 is the sample segmentation result
  • R(s 1 ) is the scaled sample segmentation result.
  • the above formula is the calculation formula of the scale loss when the classification target of the image segmentation network is 1.
  • the calculation formula of the scale loss is:
  • L seg, 3 is the scale loss
  • s 2i is the scale segmentation result of the ith classification target
  • s 1i is the sample segmentation result of the ith classification target
  • R(s 1i ) is the scaled result of the ith classification target. sample segmentation results.
  • the above formula can be used to calculate the scale loss.
  • the calculation formula of the scale loss is:
  • L seg, 3 is the scale loss
  • s 4 is the reduced-scale segmentation result of the reduced-scale image
  • s 5 is the enlarged-scale segmentation result of the enlarged-scale image
  • s 1 is the sample segmentation result
  • R 4 (s 1 ) The sample segmentation result after the scale relationship between the scale image and the sample image is reduced
  • R 5 (s 1 ) is the sample segmentation result after being enlarged according to the scale relationship between the enlarged scale image and the sample image.
  • the above formula is the calculation formula of the scale loss when the classification target of the image segmentation network is 1.
  • the calculation formula of the scale loss is:
  • L seg, 3 is the scale loss
  • s 4i is the reduced scale segmentation result of the reduced scale image at the ith classification target
  • s 5i is the enlarged scale segmentation result of the enlarged scale image at the ith classification target
  • s 1i is the ith classification target.
  • the sample segmentation result of the i classification target; R 4 (s 1i ) is the sample segmentation result of the i-th classification target, and the sample segmentation result after being reduced according to the scale relationship between the reduced-scale image and the sample image;
  • R 5 (s 1i ) is The sample segmentation result of the i-th classification target is the enlarged sample segmentation result according to the scale relationship between the enlarged scale image and the sample image.
  • Step 2051 Calculate the weighted sum of sample class activation map loss, scale class activation map loss and scale loss.
  • L seg,1 and L seg,2 represent that the segmentation result output by the constraint decoder is closer to the corresponding class activation map.
  • L seg, 3 represents the segmentation result obtained after scale transformation of the image, which should be consistent with the segmentation result of the original image after the same scale transformation.
  • L seg is the total loss
  • L seg, 1 is the sample class activation map loss
  • L seg, 2 is the scale class activation map loss
  • L seg, 3 is the scale loss
  • is the weight of the scale loss
  • the above formula can be used to calculate the total loss.
  • the scale image includes two scale images, the reduced scale image and the enlarged scale image, the scale class activation map loss includes the reduced scale class activation map loss and the enlarged scale class activation map loss. Then the total loss is:
  • L seg L seg, 1 +L seg, 2.1 +L seg, 2.2 + ⁇ L seg, 3
  • L seg is the total loss
  • L seg, 1 is the sample class activation map loss
  • L seg, 2.1 is the reduced scale class activation map loss
  • L seg, 2.2 is the enlarged scale class activation map loss
  • L seg, 3 is the scale loss
  • is the weight of the scale loss.
  • Step 2052 train the decoder according to the weighted sum.
  • the decoder is trained according to the calculated weighted sum (total loss), the segmentation results are constrained to be close to the class activation map, and the segmentation results of different sizes of the same image are constrained to be consistent.
  • the image segmentation model can be used to perform image segmentation.
  • the computer device invokes the image segmentation model to perform image segmentation on the input image, and obtains the image segmentation result, and the image segmentation model includes an encoder and a decoder.
  • calling the encoder to perform feature extraction on the input image to obtain the input image features of the input image calling the decoder to decode the input image features to obtain the image segmentation result of the input image.
  • the method provided in this embodiment uses the encoder of the pre-trained image classification model to perform feature extraction, so that the image features output by the encoder have the feature information required for classification, and the full connection of the image classification model is used.
  • the layer outputs the classification result of the image, and then uses the classification result and image features to obtain the class activation map of the image.
  • the class activation map can also be output as an image segmentation result, because the change of image scale will greatly affect the class activation map, and normally the change of scale will not affect the image segmentation result, therefore, based on the idea of scale invariance , two scales of the same image are introduced, and the error of the two segmentation results of the two scales is used to train the decoder, so that the decoder can accurately segment the image while ensuring that the segmentation result remains unchanged after scale transformation.
  • the encoder and fully connected layers in the image segmentation model use the encoder and fully connected layers of the trained image classification model.
  • the encoder in the image classification model is a classification encoder
  • the fully connected layer is a classification fully connected layer.
  • set the parameters of the encoder the image classification model has the same classification target as the image segmentation model; set the parameters of the fully connected layer according to the parameters of the classification fully connected layer in the trained image classification model, and the image classification model is used according to the input Image output classification result.
  • An exemplary embodiment of training an image classification model is given.
  • FIG. 4 shows a flowchart of a training method for an image segmentation model provided by an exemplary embodiment of the present application.
  • the method can be performed by a computer device, for example, a terminal or a server as shown in FIG. 1 .
  • the method includes the following steps.
  • Step 401 acquiring data samples.
  • Step 402 Invoke the classification encoder and the classification fully connected layer to perform feature extraction and classification on the data samples, obtain classification vectors of the data samples, and determine the correlation between the data samples and the classification vectors; the classification vectors include class vectors and intra-class style vectors.
  • a computer device acquires a data sample set.
  • the data sample set includes data samples.
  • Data samples are images.
  • An image classification model is pre-established in the computer equipment, and the image classification model includes an encoder and a fully connected layer connected after the encoder.
  • Image classification models can employ a variety of neural networks.
  • an image classification model can employ a convolutional neural network whose convolutional blocks can be adjusted according to the size of the image. The larger the image, the correspondingly larger the convolutional block. For example, for a 32*32 image, 2 convolution blocks can be used, and for a 96*96 image, 4 convolution blocks can be used.
  • the computer equipment inputs the data samples into the image classification model, the encoder of the image classification model performs feature extraction on the data samples to obtain sample features, and the fully connected layer outputs the classification vectors of the data samples according to the sample features.
  • Categorical vectors contain category vectors as well as intra-class style vectors. Among them, the element in the category vector is the probability that the data sample belongs to each classification target.
  • the intra-class style vector describes the intra-class style information of the data sample.
  • the computer device can use other network models to train the image classification model.
  • the computer equipment can use the discriminator to determine the correlation between the data sample and the classification vector (the data sample corresponds to the classification vector obtained according to the data sample), and use the evaluator to determine the score value of the classification vector obeying the prior distribution.
  • the overall network structure diagram of the computer equipment for training the image classification model can be shown in Figure 5.
  • the discriminator 501 is a deep neural network composed of multiple fully connected layers.
  • it can be a deep neural network consisting of three or more fully connected layers.
  • the evaluator 502 is a deep neural network composed of multiple fully connected layers. It can be a deep neural network consisting of three or more fully connected layers.
  • the discriminator 501 can determine whether there is a correlation between the data sample and the classification vector, so as to maximize the mutual information between the data sample and the classification vector.
  • the computer device may simultaneously input the data samples and the extracted classification vectors to the discriminator 501 .
  • the data samples include a first sample and a second sample.
  • the discriminator 501 judges that the two are not related.
  • the discriminator 501 judges that the two are related.
  • the shoe image 503 can be used as the first sample
  • the clothes image 504 can be used as the second sample.
  • the encoder and the fully connected layer obtain the first classification vector according to the input first sample, and obtain the second input sample according to the first classification vector.
  • the second classification vector The first sample is related to the first classification vector, and the first sample is not related to the second classification vector.
  • the discriminator 501 can correctly determine whether the data sample is related to the classification vector, it means that the classification vector contains information related to the data sample, so that the purpose of maximizing mutual information can be achieved.
  • step 403 a priori distribution of category is introduced into the category vector, and a priori distribution of style within the category is introduced into the style vector within the category, so as to determine the score value of the category vector subject to the prior distribution.
  • the evaluator introduces a prior distribution for the categorical vector.
  • the evaluator is also a deep neural network consisting of multiple fully connected layers. It can be a deep neural network consisting of three or more fully connected layers.
  • the prior distribution includes the category prior distribution and the intra-class style prior distribution.
  • the class prior distribution can be simply referred to as the class distribution, and the intra-class style prior distribution can be a Gaussian distribution.
  • the evaluator introduces a class distribution for the class vector z c Introduce a Gaussian distribution for the intra-class style vector z s In this way, the category vector can be effectively decoupled from the intra-class style vector.
  • the output category feature part is a one-hot vector, and the element with the largest value in the one-hot vector can be directly used to represent the category of the data sample, avoiding the need for the next classification operation.
  • the output category feature part can also prevent data samples from being clustered into only one or several categories, and can ensure that they are clustered into the required number of categories, such as clustering into 10 categories.
  • Step 404 train the image classification model at least according to the correlation and the score value.
  • the computer device can perform reverse optimization on the network parameters of the image classification model by using the correlation between the data sample and the classification vector, and the score value of the classification vector obeying the prior distribution.
  • the back-propagation method can be used to optimize each network parameter in the image classification model.
  • the back-propagation method can use Adam-based gradient descent.
  • the weights of the network parameters of the image classification model, the discriminator and the evaluator can be updated.
  • the learning rate is 0.0001
  • the parameters ⁇ 1 that control the convergence of the loss function are set to 0.5
  • ⁇ 2 is set to 0.9.
  • the batch size is set to 64.
  • the evaluator, image classification model and discriminator can be optimized alternately using the same batch of data samples each time.
  • the loss function of the evaluator begins to converge, it means that the classification vector learned by the image classification model is close to the prior distribution, and the training can be stopped.
  • the training method for the image classification model further includes: performing enhancement processing on the data samples, and mapping through the image classification model to obtain an enhanced classification vector; the enhanced classification vector includes an enhanced class vector and an enhanced class vector.
  • Intra-class style vector; determining the class feature difference between the class vector and the enhanced class vector; training the image classification model according to at least the correlation and the score value includes: training the image classification model according to the correlation, the class feature difference and the score value.
  • Categorical vectors include category vectors and intra-class style vectors.
  • the category vector is the vector activated by the Softmax function
  • the elements in the vector represent the probability that the data sample belongs to each classification target
  • the vector dimension is set as the number of classification targets.
  • the intra-class style vector is the vector after linear activation.
  • the vector describes the intra-class style information of the data sample, and the dimension of the vector can be a preset number, such as 50.
  • the intra-class style information refers to, for a plurality of images belonging to the same classification target, information on the style differences existing between images.
  • the category vector and the intra-class style vector get different values after different excitations, but some information may be mixed.
  • the category vector and the intra-class style vector can be effectively decoupled.
  • the enhanced classification vector does not change through training.
  • the computer equipment performs enhancement processing on the data samples, and the enhancement processing includes random cropping, random horizontal flipping, color dithering, and random combination of color channels on the image.
  • the enhanced data samples are input into the image classification model, and the enhanced classification vector is obtained through the encoder and the fully connected layer.
  • the computer equipment extracts the class vector from the classification vector, extracts the enhanced class vector from the enhanced class vector, inputs the class vector and the enhanced class vector to the evaluator, and identifies the class vector and the enhanced class vector through the evaluator differences in class characteristics.
  • the element in the vector of the category vector is the probability value of the data sample belonging to each classification target.
  • the class feature difference between the class vector and the enhanced class vector can be measured by divergence.
  • the computer device can reversely optimize the network parameters of the image classification model by using the correlation between the data sample and the classification vector, the score value of the classification vector obeying the prior distribution, and the class feature difference between the class vector and the enhanced class vector.
  • gradient descent is used to update the weight values corresponding to the network parameters of the image classification model, discriminator and evaluator.
  • This enables the image classification model to learn that the classification vector is related to the data sample, the learned category vector can represent the classification target of the data sample, and the learned intra-class style vector can represent the difference between the same type of data samples.
  • the category vector of the data sample remains unchanged, that is, the style of the data sample may change to a certain extent, but it still belongs to the same category.
  • the category vector can be made as close to the one-hot vector as possible, that is, the value of most elements is close to 0, and only one element is close to 1, so that the corresponding classification of the data sample can be directly determined according to the category vector. Target.
  • the data samples include a first sample and a second sample, wherein the first sample and the second sample may be completely different or the same.
  • the first sample is input to the image classification model, and the first classification vector corresponding to the first sample is obtained by mapping.
  • the second sample is input to the image classification model, and a second classification vector corresponding to the second sample is obtained by mapping. Both the first classification vector and the second classification vector may be multi-dimensional vectors, eg, 50-dimensional.
  • the computer device converts the first sample into a first sample vector.
  • the computer device splices the first classification vector and the first sample vector to generate the spliced first sample vector.
  • the way of splicing may be adding the first sample vector after the first classification vector, or adding the first classification vector after the first sample vector.
  • the computer device may use the above-mentioned splicing method to splicing the second classification vector and the first identical vector to generate a spliced second sample vector.
  • the discriminator can correctly determine whether the data sample and the classification vector are related, it means that the classification vector contains information related to the data sample, so as to maximize the mutual information, which can make the classification vector learned by the image classification model. related to the data sample.
  • introducing a category prior distribution to the category vector, and introducing an intra-class style prior distribution to the intra-class style vector, so as to determine the rating value of the categorical vector subject to the prior distribution includes: introducing a category to the category vector by an evaluator The prior distribution is used to obtain the category distribution result of the category vector; the intra-class style prior distribution is introduced into the intra-class style vector by the evaluator, and the intra-class style prior distribution result of the intra-class style vector is obtained; the category distribution result and The results of the prior distribution of the style within the class are scored, and the classification vector is subject to the scoring value of the prior distribution.
  • the evaluator introduces a prior distribution for the categorical vector.
  • the prior distribution includes the category prior distribution and the intra-class style prior distribution.
  • the class prior distribution can be simply referred to as the class distribution, and the intra-class style prior distribution can be a Gaussian distribution.
  • the class distribution can be in, is the distribution of category vectors, Cat is the category distribution, is a one-hot vector, K is the number of classification targets, and P is the reciprocal of K.
  • In-class style vectors can be is the distribution of the style vector within the class, N is the Gaussian distribution, ⁇ is the standard deviation, and ⁇ can be a preset value, such as 0.1.
  • the computer equipment simultaneously inputs the category vector and the intra-class style vector to the evaluator, and the evaluator outputs the category distribution result corresponding to the category vector and the Gaussian distribution result corresponding to the intra-class style vector respectively.
  • the category distribution result can be a category vector
  • the category vector can be a one-hot vector.
  • Gaussian distribution results can be style vectors.
  • scoring the category distribution result and the intra-class style prior distribution result by the evaluator includes: splicing the category distribution vector of the category vector and the Gaussian distribution vector of the intra-class style vector to generate the prior distribution vector; The prior distribution vector is scored by the evaluator, and the rating value that the classification vector obeys the prior distribution is obtained.
  • the computer equipment splices the category result with the Gaussian distribution result, that is, splices the corresponding category vector and the style vector.
  • the concatenation can be done by adding elements of the style vector after the last element of the category vector. It can also be to add elements of the style vector after the last element of the category vector.
  • the evaluator scores the spliced vector and obtains the corresponding score, which is the probability that the classification vector obeys the prior distribution. The higher the probability, the more the classification vector obeys the prior distribution.
  • the output category vector can be made as close to the one-hot vector as possible, so that the element with the largest value in the one-hot vector can be directly used to represent the category of the data sample, avoiding the need for the next classification operation.
  • the data samples can be prevented from being classified into only one or several classes, so that the data samples can be guaranteed to be classified into the desired classification targets.
  • the training method of the image classification model further includes: determining the correlation between the data sample and the classification vector by the discriminator; determining, by the evaluator, the scoring value of the classification vector obeying the prior distribution; Training the image classification model includes alternately optimizing the image classification model, the discriminator, and the evaluator based on at least correlation and scoring values.
  • the discriminator identifies correlations between data samples and classification vectors.
  • the loss function of the discriminator to identify the correlation between the data samples and the classification vector can be called the mutual information loss function.
  • the discriminator can be trained with a mutual information loss function.
  • the mutual information loss function can be expressed as follows:
  • X is the data sample
  • Z is the classification vector
  • S is the sigmoid function
  • E is the expectation
  • D is the discriminator, which is used to judge whether X and Z are related
  • X) is the image classification model.
  • Posterior distribution; P X is the prior distribution of the input image, is the aggregated posterior distribution of Z, Indicates that X and Z obey the mathematical expectation of Q(Z
  • the discriminator can correctly judge whether the data sample and the feature are related, it means that the feature contains information related to the data sample, so as to maximize the mutual information.
  • the class feature difference between the class vector and the enhanced class vector can be measured by divergence.
  • the divergence can be a KL divergence.
  • the corresponding loss function can be called the class difference loss function, which adopts the following formula:
  • KL is the KL divergence
  • Q is the image classification model
  • Z c is the category vector
  • X is the data sample
  • T is the data enhancement
  • X) is the aggregated posterior distribution of Z c
  • T(X)) is the posterior distribution of the enhanced classification vector.
  • the classification vector is scored by the evaluator for conforming to the prior distribution.
  • the loss function that introduces the prior distribution for the classification vector can be called the prior distribution loss function.
  • different prior distribution loss functions can be defined for the image classification model and the evaluator respectively.
  • the prior distribution loss function can make the classification vector mapped by the image classification model as close to the prior distribution as possible.
  • the prior distribution loss function of the image classification model can be as follows:
  • Q is the image classification model
  • Z is the classification vector of the data sample
  • C(Z) is the probability value of whether the classification vector obeys the prior distribution
  • Q Z is the aggregated posterior distribution of Z
  • the prior distribution loss function of the evaluator can be as follows:
  • C is the evaluator
  • P Z is the prior distribution
  • Q Z is the aggregated posterior distribution
  • is the gradient penalty term coefficient, set to 10.
  • the mutual information loss function, the class difference loss function, and the prior distribution loss function of the image classification model may be used as sub-loss functions to define the total loss function of the image classification model.
  • Each sub-loss function can have corresponding weights respectively.
  • the total loss function of the discriminator can be defined using the mutual information loss function and its corresponding weights.
  • the overall loss function of the evaluator can be defined using the prior distribution loss function of the evaluator and its weights.
  • the total loss function of the image classification model is as follows:
  • the total loss function of the discriminator is as follows:
  • the total loss function of the discriminator is as follows:
  • the total loss function of the evaluator is as follows:
  • L Q is the total loss function of the image classification model.
  • L MI is the mutual information loss function
  • L Aug is the class difference loss function
  • ⁇ MI is the weight of L MI
  • ⁇ Aug is the weight of L Aug
  • ⁇ Adv is the weight of.
  • ⁇ MI and ⁇ Adv can be set to corresponding fixed values, for example, ⁇ MI is set to 0.5 and ⁇ Adv is set to 1.
  • ⁇ Aug is related to the dataset of data samples and can be set in the following ways.
  • the computer device can generate a corresponding visual dimension reduction graph by performing nonlinear dimension reduction processing on the classification vector, and select the weight of the class difference loss function according to the visual dimension reduction graph.
  • the visual dimension reduction graph is the result of reducing the dimension of high-dimensional data to low-dimensional data, so that the result is visualized.
  • Low dimensions such as two or three dimensions.
  • t-SNE can be used to perform nonlinear dimensionality reduction processing on the classification vector, and a visual dimensionality reduction graph, that is, a t-SNE graph, can be generated according to the processing result.
  • the data samples will be classified to form classification clusters.
  • the classification clusters of each data sample are relatively scattered.
  • the obtained features tend to converge. Taxonomic clusters may even overlap. Different data types have different classification results.
  • the classification clusters in the t-SNE graph do not overlap.
  • the clusters of classifications in the t-SNE plots overlap. Therefore, the maximum value of the overlap of classification clusters can be selected between 2 and 3 as the value of ⁇ Aug , which can make the total loss function of the image classification model more accurate, so that the classification results of the trained image classification model can be more accurate. .
  • the training of the image classification model can be carried out in a reverse optimization manner.
  • the evaluator, image classification model, and discriminator can be optimized alternately. Among them, the evaluator is first optimized, and then the image classification model and discriminator are optimized. Specifically, the overall loss function of the evaluator is used to reversely optimize the evaluator, so that the probability of the classification vector that obeys the prior distribution is close to 1, and the probability of the classification vector that does not obey the prior distribution is close to 0.
  • alternately optimizing the image classification model, the discriminator and the evaluator according to at least the correlation and the score value includes: firstly optimizing the network parameters of the evaluator at least once according to the score value; The score value optimizes the network parameters of the image classification model, and optimizes the network parameters of the discriminator according to the correlation.
  • the data samples can be randomly divided into multiple batches, and each batch uses a fixed number of data samples, which can also be called batch samples.
  • batch samples can be set to 64 data samples, that is, the batch size is set to 64.
  • the computer device determines that the classification vector is subject to the prior distribution of the score value, and determines the correlation between the data sample and the classification vector.
  • the image classification model, discriminator and evaluator are alternately optimized, the weights corresponding to each network parameter are updated.
  • the score value, class feature difference and the total loss function of the image classification model optimize the network parameters of the image classification model, and optimize the network parameters of the discriminator according to the correlation between the data samples and the classification vector and the total loss function of the discriminator. For example, after 4 optimizations for the evaluator, the image classification model and the discriminator are optimized 1 time. When reverse optimization is performed on the image classification model and the discriminator, reverse optimization can be performed successively, or reverse optimization can be performed simultaneously.
  • the evaluator When the evaluator is inversely optimized, for the input of the prior distribution, the closer the output is to 1, the smaller the loss function value is, and the smaller the change to the parameters during backpropagation, for the input of the data sample, the output is The closer it is to 0, the smaller the loss function, and the smaller the change to the parameters during backpropagation.
  • the image classification model When the image classification model is reversely optimized, the input of the data sample, the closer the output is to 1, the smaller the loss function value, and the smaller the change to the parameters during backpropagation.
  • the prior distribution is not considered when inversely optimizing an image classification model.
  • the total loss function of the evaluator can indicate the difference between the feature distribution learned by the current image classification model and the prior distribution. When the total loss function of the evaluator begins to converge, it indicates that the image classification The feature distribution learned by the model is close to the prior distribution, and training can be stopped.
  • the method provided in this embodiment does not need to perform an additional classification algorithm for the data samples of the classification objects in the classification business, nor does it need to generate a real image for comparison with the original image, by determining the data sample and the classification vector.
  • the correlation between , and the category prior distribution is introduced to the category vector
  • the intra-class style prior distribution is introduced to the sample-in-class style vector to determine the score value of the classification vector subject to the prior distribution, thereby using the correlation and score.
  • Training the image classification model can effectively improve the learning of the classification vector by the image classification model. Since the feature distribution learned by the image classification model is close to the prior distribution, and the category vector and the intra-class style vector are effectively decoupled, the classification target corresponding to the data sample can be obtained according to the category vector. Thus, the accuracy of data classification can be effectively improved without manual annotation. Then, the effective training of image segmentation network without manual annotation is realized.
  • FIG. 6 a schematic diagram of training an image segmentation model by applying the training method for an image segmentation model provided by the present application is given.
  • the encoder 102 in FIG. 6 adopts the encoder of the image classification model in the embodiment shown in FIG. 4 .
  • the encoder 102 is further connected with a fully connected layer (not shown in the figure).
  • the sample image x 1 and the scale image x 2 are respectively input into the encoder, and the encoder outputs the sample image feature of the sample image x 1 and the scale image feature of the scale image x 2 .
  • the sample class activation map C 1 of the sample image is calculated.
  • the scale class activation map C 2 of the scale image is calculated.
  • the sample image features are input into the decoder 103 to obtain the sample segmentation result s 1 .
  • the scale image features are input into the decoder 103 to obtain the scale segmentation result s 2 .
  • the sample class activation map loss is calculated based on the sample class activation map and the sample segmentation results
  • the scale class activation map loss is calculated based on the scale class activation map and the scale segmentation results
  • the scale loss is calculated based on the sample segmentation results and the scale segmentation results. Calculate the weighted sum of sample class activation map loss, scale class activation map loss, and scale loss to obtain the total loss L seg .
  • the network parameters in the decoder are optimized by back propagation.
  • the optimization method uses the Adam-based gradient descent method, the learning rate is 0.0001, the exponential decay rate ⁇ 1 of Adam's parameter first -order moment estimation is set to 0.5, and the exponential decay rate ⁇ 2 of the second -order moment estimation is set to 0.9.
  • the image classification model is optimized first, then the encoder part is fixed, migrated to the image segmentation model, and then the decoder part of the image segmentation model is optimized.
  • FIG. 7 shows a flowchart of an image segmentation method provided by an exemplary embodiment of the present application.
  • the method can be performed by a computer device, for example, a terminal or a server as shown in FIG. 1 .
  • the computer device that executes the training method for the image segmentation model and the computer device that executes the image segmentation method may be the same computer device or different computer devices.
  • the method includes the following steps.
  • Step 701 acquiring an input image.
  • the input image may be any image that needs to be segmented.
  • the input image when an image segmentation model is trained to segment faces in an image, the input image may be an image containing a human face; when an image segmentation model is trained to segment lesions in an image, the input image may be an image containing a human face. image of the lesion.
  • the input image may not contain the classification target of the image segmentation model, that is, the input image may not contain human faces, or may not contain lesions.
  • Step 702 invoking the encoder to perform feature extraction on the input image to obtain input image features of the input image.
  • the encoder is the encoder in the image segmentation model mentioned in any of the above embodiments.
  • the source of the encoder is different according to the classification target of the image segmentation model.
  • the classification target of the image segmentation model is a face
  • the parameters of the encoder are set according to the parameters of the classification encoder of the face classification model, and the face classification model is used to identify whether the input image contains a human face.
  • the parameters of the encoder are set according to the parameters of the classification encoder of the lesion classification model, and the lesion classification model is used to identify whether the input image contains lesions.
  • the encoder of the image segmentation model can accurately extract the features of the image on the classification task. Since the classification target of image segmentation is the same as the classification target of image classification, then The image segmentation model can perform accurate image segmentation according to the extracted features, which not only simplifies the training process of the image segmentation model, but also improves the segmentation accuracy of the image segmentation model.
  • Step 703 call the decoder to decode the input image features, and obtain the image segmentation result of the input image.
  • the decoder is obtained by training according to the class activation map loss and the scale loss, and the class activation map loss is used to train the decoder to output close to the class activation map.
  • the class activation map is used to represent the contribution of each pixel in the image to the classification result of the image, and the scale loss is used to train the decoder to output similar segmentation results for multiple images with the same image content and different scales.
  • the decoder is the decoder in the image segmentation model mentioned in any of the above embodiments.
  • the training method of the decoder reference may be made to the above-mentioned embodiments.
  • the segmentation result output by the encoder includes the probability value of each pixel in the input image belonging to each classification target, or the segmentation result output by the encoder includes the classification target to which each pixel in the input image belongs.
  • the image segmentation result includes the probability value that each pixel in the input image is a human face, or the image segmentation result includes whether each pixel in the input image is a human face.
  • the image segmentation result includes the probability value that each pixel in the input image is a lesion, or the image segmentation result includes whether each pixel in the input image is a lesion.
  • the image can be segmented based on the deep features of the image. , which improves the accuracy of image segmentation.
  • the image segmentation model outputs similar image segmentation results for multiple images with the same image content but different scales, which is closer to the actual situation of image segmentation and further improves the performance of the image segmentation model. The accuracy of image segmentation.
  • the image segmentation model provided in this application can be used to perform image segmentation on different classification targets in an image in various application scenarios. Taking the above-mentioned application scenarios of face recognition and lesion recognition as examples, the present application also provides an exemplary embodiment of training corresponding image segmentation models for these two application scenarios.
  • an image classification model for face recognition is trained.
  • the image classification model includes a classification encoder and a classification fully connected layer.
  • Image classification models are used for clustering, that is, grouping multiple input images into a class that contains faces and a class that doesn't.
  • the first step is to obtain a data sample.
  • the second step is to call the classification encoder and the classification fully connected layer to extract and classify the data samples, obtain the classification vectors of the data samples, and determine the correlation between the data samples and the classification vectors.
  • the classification vector includes a category vector and an intra-class style vector, and the category vector is used to describe whether the input image contains a human face.
  • the category prior distribution is introduced into the category vector
  • the intra-class style prior distribution is introduced into the intra-class style vector, so as to determine the rating value of the category vector subject to the prior distribution.
  • an image classification model is trained at least on the correlation and score values.
  • the image segmentation model includes an encoder, a fully connected layer and a decoder in the training phase, and the image segmentation model includes an encoder and a decoder in the application phase.
  • the first step is to initialize the encoder according to the parameters of the classification encoder, and initialize the fully connected layer according to the parameters of the classification fully connected layer, that is, the encoder and the fully connected layer can accurately classify the input image and identify whether the image contains a face. .
  • the second step is to obtain a training data set.
  • the training data set includes at least one set of sample images and scale images.
  • the sample images are images including or not including faces
  • the scale images are images obtained by up-sampling the sample images.
  • the size is twice that of the sample image.
  • the third step is to call the encoder to perform feature extraction on the sample image to obtain the sample image features, and call the fully connected layer to classify the sample image features to obtain the sample classification result of the sample image.
  • the sample classification result includes the probability value of the sample image including the face.
  • the encoder is called to extract the scale image features to obtain the scale image features, and the fully connected layer is called to classify the scale image features to obtain the scale classification result of the scale image.
  • the scale classification result includes the probability value of the scale image including the face.
  • the decoder is called to decode the features of the sample image to obtain a sample segmentation result of the sample image, and the sample segmentation result includes a probability value that each pixel in the sample image is a face.
  • the decoder is called to decode the scale image features to obtain the scale segmentation result of the scale image, and the scale segmentation result includes the probability value that each pixel in the scale image is a face.
  • the sample class activation map of the sample image is calculated according to the sample image feature and the sample classification result.
  • the scale class activation map of the scale image is calculated according to the scale image features and the scale classification results.
  • the sixth step is to calculate the sample class activation map loss of the sample class activation map and the sample segmentation result, calculate the scale class activation map and the scale class activation map loss of the scale segmentation result, and calculate the sample class segmentation result and the scale loss of the scale segmentation result. Calculate the weighted sum of sample class activation map loss, scale class activation map loss, and scale loss to get the total loss.
  • the decoder is trained according to the total loss.
  • Step 8 Repeat steps 3 to 7 to iteratively train the decoder to obtain the final image segmentation model.
  • the ninth step use the trained image segmentation model to segment the face region in the image.
  • an exemplary embodiment of an image segmentation model for segmenting a lesion area in an image is given by using the training method of the image segmentation model provided by the present application.
  • an image classification model for identifying lesions is trained.
  • the image classification model includes a classification encoder and a classification fully connected layer.
  • the image classification model is used for clustering, ie, clustering the input multiple images into a class that contains lesions and a class that does not.
  • the image classification model can also be used to identify specific types of lesions, for example, to cluster multiple input images into lesion 1, lesion 2, lesion 3, and normal.
  • the first step is to obtain a data sample.
  • the second step is to call the classification encoder and the classification fully connected layer to extract and classify the data samples, obtain the classification vectors of the data samples, and determine the correlation between the data samples and the classification vectors.
  • the classification vector includes a class vector and an intra-class style vector, and the class vector is used to describe whether the input image contains lesions, or, it is used to describe the probability value of the input image belonging to various types of lesions.
  • the category prior distribution is introduced into the category vector
  • the intra-class style prior distribution is introduced into the intra-class style vector, so as to determine the rating value of the category vector subject to the prior distribution.
  • an image classification model is trained at least on the correlation and score values.
  • the image segmentation model includes an encoder, a fully connected layer and a decoder in the training phase, and the image segmentation model includes an encoder and a decoder in the application phase.
  • the encoder is initialized according to the parameters of the classification encoder
  • the fully connected layer is initialized according to the parameters of the classification fully connected layer, that is, the encoder and the fully connected layer can accurately classify the input image and identify whether the image contains lesions, Or, identify which type of lesions the image belongs to.
  • the second step is to acquire a training data set.
  • the training data set includes at least one set of sample images and scale images.
  • the sample images are images including or not including lesions.
  • the scale images are images obtained by up-sampling the sample images. The size of the scale images is twice as large as the sample image.
  • the third step is to call the encoder to perform feature extraction on the sample image to obtain the sample image features, and call the fully connected layer to classify the sample image features to obtain the sample classification result of the sample image.
  • the sample classification result includes the probability value of the sample image including the lesion area, or , the sample classification result includes the probability value that the sample image belongs to each type of lesions.
  • the scale classification result includes the probability value of the scale image including the lesion, or, the scale classification result includes The scale image belongs to the probability value of each type of lesion.
  • the decoder is called to decode the features of the sample image to obtain the sample segmentation result of the sample image.
  • the sample segmentation result includes the probability value that each pixel in the sample image is a lesion, or the sample segmentation result includes each pixel in the sample image. The probability value that the point belongs to each lesion.
  • the scale segmentation result includes the probability value that each pixel in the scale image is a lesion, or, the scale segmentation result includes that each pixel in the scale image belongs to each The probability value of the lesion.
  • the sample class activation map of the sample image is calculated according to the sample image feature and the sample classification result.
  • the scale class activation map of the scale image is calculated according to the scale image features and the scale classification results.
  • the sixth step is to calculate the sample class activation map loss of the sample class activation map and the sample segmentation result, calculate the scale class activation map and the scale class activation map loss of the scale segmentation result, and calculate the sample class segmentation result and the scale loss of the scale segmentation result. Calculate the weighted sum of sample class activation map loss, scale class activation map loss, and scale loss to get the total loss.
  • the decoder is trained according to the total loss.
  • Step 8 Repeat steps 3 to 7 to iteratively train the decoder to obtain the final image segmentation model.
  • the ninth step use the trained image segmentation model to segment the lesion area in the image.
  • FIG. 8 shows a schematic structural diagram of an apparatus for training an image segmentation model provided by an exemplary embodiment of the present application.
  • the apparatus can be implemented as all or a part of computer equipment through software, hardware or a combination of the two, the image segmentation model includes an encoder and a decoder, and the apparatus includes:
  • the encoding module 601 is used to call the encoder to perform feature extraction on the sample image and the scale image, and obtain the sample image feature of the sample image and the scale image feature of the scale image, and the scale image includes: an image obtained by enlarging the sample image, or reducing the sample image. at least one of the obtained images;
  • the class activation map module 602 is used to calculate the class activation map based on the sample image features to obtain the sample class activation map of the sample image, and calculate the class activation map based on the scale image features to obtain the scale class activation map of the scale image; The contribution of each pixel to the classification result of the image;
  • the decoding module 603 is configured to call the decoder to decode the sample image feature to obtain the sample segmentation result of the sample image, and call the decoder to decode the scale image feature to obtain the scale segmentation result of the scale image; the sample segmentation result includes each pixel in the sample image.
  • the loss module 604 is used to calculate the class activation map loss based on the sample class activation map, the sample segmentation result, the scale class activation map and the scale segmentation result, and calculate the scale loss based on the sample segmentation result and the scale segmentation result;
  • the class activation map loss is used for training decoding
  • the decoder makes the sample segmentation result close to the sample class activation map, and the scale segmentation result close to the scale class activation map;
  • the scale loss is used to train the decoder to make the sample segmentation result close to the scale segmentation result;
  • the class activation map loss includes sample class activation map loss and scale class activation map loss
  • a loss module 604 configured to calculate the sample class activation map loss based on the sample class activation map and the sample segmentation result
  • the loss module 604 is used to calculate the scale class activation map loss based on the scale class activation map and the scale segmentation result;
  • the loss module 604 is configured to calculate the scale loss based on the sample segmentation result and the scale segmentation result.
  • the loss module 604 is configured to scale the sample segmentation result to the same size as the scale segmentation result according to the scale relationship between the sample image and the scale image, to obtain the scaled sample segmentation result;
  • the loss module 604 is configured to calculate the scale loss based on the error between the scale segmentation result and the scaled sample segmentation result.
  • the loss module 604 is configured to calculate the first matrix difference between the scale segmentation result and the scaled sample segmentation result, and determine the 2-norm of the first matrix difference as the scale loss.
  • the loss module 604 is configured to determine the cross-entropy of the sample class activation map and the sample segmentation result as the sample class activation map loss;
  • the loss module 604 is configured to determine the cross-entropy of the scale class activation map and the scale segmentation result as the scale class activation map loss.
  • the loss module 604 is configured to calculate the weighted sum of the sample class activation map loss, the scale class activation map loss and the scale loss;
  • the image segmentation model further includes a pre-trained fully connected layer; the apparatus further includes:
  • the classification module 606 is configured to call the fully connected layer to classify and predict the sample image features to obtain the sample classification result of the sample image; call the fully connected layer to classify and predict the scale image features to obtain the scale classification result of the scale image;
  • the class activation map module 602 is used to calculate the sample class activation map of the sample image based on the sample image feature and the sample classification result;
  • the class activation map module 602 is configured to calculate a scale class activation map of the scale image based on the scale image feature and the scale classification result.
  • the encoder is a pre-trained encoder; the apparatus further includes:
  • the initialization module 607 is configured to set the parameters of the encoder according to the parameters of the classification encoder in the trained image classification model, and the classification target of the image classification model and the image segmentation model is the same.
  • the device further includes:
  • the initialization module 607 is configured to set the parameters of the fully connected layer according to the parameters of the classification fully connected layer in the trained image classification model, and the classification target of the image classification model and the image segmentation model is the same.
  • the image classification model includes a classification encoder and a classification fully connected layer; the apparatus further includes:
  • the classification training module 608 is used to obtain data samples; call the classification encoder and the classification fully connected layer to perform feature extraction and classification on the data samples, obtain the classification vectors of the data samples, and determine the correlation between the data samples and the classification vectors; the classification vectors include categories Vectors and intra-class style vectors; introduce a category prior distribution to the category vector, and the intra-class style prior distribution to the intra-class style vector to determine the rating value of the categorical vector subject to the prior distribution; at least train on correlation and rating values Image classification model.
  • FIG. 9 shows a schematic structural diagram of an image segmentation apparatus provided by an exemplary embodiment of the present application.
  • the apparatus can be implemented as all or a part of computer equipment through software, hardware or a combination of the two.
  • the image segmentation model includes an encoder and a decoder, and the apparatus includes:
  • an acquisition module 1001 for acquiring an input image
  • the feature extraction module 1002 is used for invoking the encoder to perform feature extraction on the input image to obtain the input image feature of the input image;
  • the image segmentation module 1003 is used to call the decoder to decode the input image features and obtain the image segmentation result of the input image.
  • the decoder is trained according to the class activation map loss and the scale loss, and the class activation map loss is used to train the output of the decoder Close to the segmentation result of the class activation map, the class activation map is used to represent the contribution of each pixel in the image to the classification result of the image, and the scale loss is used to train the decoder to output similar segmentations for multiple images with the same image content and different scales result.
  • the parameters of the encoder are set according to the parameters of the classification encoder of the face classification model, and the face classification model is used to identify whether the input image contains a human face;
  • the image segmentation result includes the probability value that each pixel in the input image is a human face.
  • the parameters of the encoder are set according to the parameters of the classification encoder of the lesion classification model, and the lesion classification model is used to identify whether the input image contains lesions;
  • the image segmentation result includes the probability value that each pixel in the input image is a lesion.
  • FIG. 10 is a schematic structural diagram of a server provided by an embodiment of the present application.
  • the server 800 includes a central processing unit (English: Central Processing Unit, referred to as: CPU) 801, includes a random access memory (English: Random Access Memory, referred to as: RAM) 802 and a read-only memory (English: Read-Only Memory, abbreviated as: ROM) 803 of the system memory 804, and the system bus 805 connecting the system memory 804 and the central processing unit 801.
  • Server 800 also includes a basic input/output system (I/O system) 806 that facilitates the transfer of information between various components within the computer, and a mass storage device 807 for storing operating system 813, application programs 814, and other program modules 815 .
  • I/O system basic input/output system
  • the basic input/output system 806 includes a display 808 for displaying information and input devices 809 such as a mouse, keyboard, etc., for user account entry information. Both the display 808 and the input device 809 are connected to the central processing unit 801 through an input/output controller 810 connected to the system bus 805.
  • the basic input/output system 806 may also include an input/output controller 810 for receiving and processing input from various other devices such as a keyboard, mouse, or electronic stylus. Similarly, input/output controller 810 also provides output to a display screen, printer, or other type of output device.
  • Mass storage device 807 is connected to central processing unit 801 through a mass storage controller (not shown) connected to system bus 805 . Mass storage device 807 and its associated computer
  • the readable storage medium provides non-volatile storage for the server 800 . That is, the mass storage device 807 may include a computer-readable storage medium (not shown) such as a hard disk or a compact disc read-only memory (English: Compact Disc Read-Only Memory, CD-ROM for short) drive.
  • a computer-readable storage medium such as a hard disk or a compact disc read-only memory (English: Compact Disc Read-Only Memory, CD-ROM for short) drive.
  • Computer-readable storage media can include computer storage media and communication media.
  • Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
  • Computer storage media include RAM, ROM, Erasable Programmable Read-Only Memory (English: Erasable Programmable Read-Only Memory, referred to as: EPROM), Electrically Erasable Programmable Read-Only Memory (English: Electrically Erasable Programmable Read-Only Memory) , referred to as: EEPROM), flash memory or other solid-state storage technology, CD-ROM, Digital Versatile Disc (English: Digital Versatile Disc, referred to as: DVD) or other optical storage, tape cassettes, magnetic tape, disk storage or other magnetic storage devices.
  • the system memory 804 and the mass storage device 807 described above may be collectively referred to as memory.
  • the server 800 may also run on a remote computer connected to a network through a network such as the Internet. That is, the server 800 can be connected to the network 812 through the network interface unit 811 connected to the system bus 805, or can also use the network interface unit 811 to connect to other types of networks or remote computer systems (not shown).
  • the present application further provides a terminal, the terminal includes a processor and a memory, the memory stores at least one instruction, and the at least one instruction is loaded and executed by the processor to implement the training method of the image segmentation model provided by the above method embodiments or Image segmentation method.
  • the terminal may be the terminal provided in FIG. 11 below.
  • FIG. 11 shows a structural block diagram of a terminal 900 provided by an exemplary embodiment of the present application.
  • the terminal 900 can be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, the standard audio level 3 of moving picture expert compression), MP4 (Moving Picture Experts Group Audio Layer IV, moving picture expert compression standard audio Level 4) Player, laptop or desktop computer.
  • Terminal 900 may also be called a user account device, portable terminal, laptop terminal, desktop terminal, and the like by other names.
  • the terminal 900 includes: a processor 901 and a memory 902 .
  • the processor 901 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like.
  • Memory 902 may include one or more computer-readable storage media, which may be non-transitory.
  • a non-transitory computer-readable storage medium in the memory 902 is used to store at least one computer-readable instruction for execution by the processor 901 to implement the method in this application.
  • Embodiments provide an image segmentation model training method or an image segmentation method.
  • the terminal 900 may optionally further include: a peripheral device interface 903 and at least one peripheral device.
  • the processor 901, the memory 902 and the peripheral device interface 903 may be connected through a bus or a signal line.
  • Each peripheral device can be connected to the peripheral device interface 903 through a bus, a signal line or a circuit board.
  • the peripheral device includes: at least one of a radio frequency circuit 904 , a display screen 905 , a camera assembly 906 , an audio circuit 907 , a positioning assembly 908 and a power supply 909 .
  • terminal 900 also includes one or more sensors 910 .
  • the one or more sensors 910 include, but are not limited to, an acceleration sensor 911 , a gyro sensor 912 , a pressure sensor 913 , a fingerprint sensor 914 , an optical sensor 915 , and a proximity sensor 916 .
  • FIG. 11 does not constitute a limitation on the terminal 900, and may include more or less components than the one shown, or combine some components, or adopt different component arrangements.
  • the memory also includes one or more programs, the one or more programs are stored in the memory, and the one or more programs include a training method or image segmentation for performing the image segmentation model provided by the embodiments of the present application method.
  • the present application also provides a computer device, the computer device includes: a processor and a memory, the storage medium stores at least one instruction, at least a piece of program, code set or instruction set, the at least one instruction, at least a piece of program, code set Or the instruction set is loaded and executed by the processor to implement the image segmentation model training method or the image segmentation method provided by the above method embodiments.
  • the present application also provides a computer-readable storage medium, which stores at least one instruction, at least one piece of program, code set or instruction set, and the at least one instruction, at least one piece of program, code set or instruction set is loaded by a processor And execute it to implement the training method or image segmentation method of the image segmentation model provided by the above method embodiments.
  • the present application also provides a computer program product or computer readable instructions, the computer program product comprising computer readable instructions stored in a computer readable storage medium.
  • the processor of the computer device reads the computer-readable instructions from the computer-readable storage medium, and the processor executes the computer-readable instructions, so that the computer device executes the image segmentation model training method or image segmentation provided in the above-mentioned optional implementation manner method.
  • references herein to "a plurality” means two or more.
  • "And/or" which describes the association relationship of the associated objects, means that there can be three kinds of relationships, for example, A and/or B, which can mean that A exists alone, A and B exist at the same time, and B exists alone.
  • the character “/” generally indicates that the related objects are an "or" relationship.
  • the storage medium can be a read-only memory, a magnetic disk or an optical disk, and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

一种图像分割模型的训练方法,包括:对样本图像和尺度图像进行特征提取,得到样本图像的样本图像特征和尺度图像的尺度图像特征;计算类激活图得到样本图像的样本类激活图和尺度图像的尺度类激活图;调用解码器对样本图像特征进行解码得到样本图像的样本分割结果,调用解码器对尺度图像特征进行解码得到尺度图像的尺度分割结果;基于样本类激活图、样本分割结果、尺度类激活图和尺度分割结果计算类激活图损失和尺度损失;基于类激活图损失和尺度损失训练解码器。

Description

图像分割模型的训练方法、图像分割方法、装置、设备
本申请要求于2020年12月16日提交中国专利局,申请号为202011487554.0,申请名称为“图像分割模型的训练方法、图像分割方法、装置、设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像分割领域,特别涉及一种图像分割模型的训练方法、图像分割方法、装置、设备。
背景技术
图像分割是指根据灰度、彩色、空间纹理、几何形状等特征把图像划分成若干个互不相交的区域,使得这些特征在同一区域内表现出一致性或相似性,而在不同区域间表现出明显的不同。简单的说就是在一副图像中,把前景目标从背景中分离出来。
相关技术中,采用阈值法进行图像分割。阈值法的基本思想是基于图像的灰度特征来计算一个或多个灰度阈值,并将图像中每个像素的灰度值与阈值作比较,最后将像素根据比较结果分到合适的类别中。因此,该方法最为关键的一步就是按照某个准则函数来求解最佳灰度阈值。
相关技术中的方法,需要目标具有明显的边缘或灰度差异,且只能基于图像像素值这种浅层特征进行目标判定,精度较低。
发明内容
一种图像分割模型的训练方法,所述图像分割模型包括编码器和解码器,所述方法包括:
调用所述编码器对样本图像和尺度图像进行特征提取,得到所述样本图像的样本图像特征和所述尺度图像的尺度图像特征,所述尺度图像包括:放大所述样本图像得到的图像和缩小所述样本图像得到的图像中的至少一种;
基于所述样本图像特征计算类激活图得到所述样本图像的样本类激活图,基于所述尺度图像特征计算类激活图得到所述尺度图像的尺度类激活图;所述类激活图用于表示图像中各个像素点对所述图像的分类结果的贡献值;
调用所述解码器对所述样本图像特征进行解码得到所述样本图像的样本分割结果,调用所述解码器对所述尺度图像特征进行解码得到所述尺度图像的尺度分割结果;所述样本分割结果包括所述样本图像中各个像素点的分类概率值;
基于所述样本类激活图、所述样本分割结果、所述尺度类激活图和所述尺度分割结果计算类激活图损失,基于所述样本分割结果和所述尺度分割结果计算尺度损失;
基于所述类激活图损失和所述尺度损失训练所述解码器;所述类激活图损失用于训练所述解码器使所述样本分割结果靠近所述样本类激活图,使所述尺度分割结果靠近所述尺度类激活图;所述尺度损失用于训练所述解码器使所述样本分割结果靠近所述尺度分割结果。
一种图像分割模型的训练装置,所述图像分割模型包括编码器和解码器,所述装置包括:
编码模块,用于调用所述编码器对样本图像和尺度图像进行特征提取,得到所述样本图像的样本图像特征和所述尺度图像的尺度图像特征,所述尺度图像包括:放大所述样本图像得到的图像和缩小所述样本图像得到的图像中的至少一种;
类激活图模块,用于基于所述样本图像特征计算类激活图得到所述样本图像的样本类激活图,基于所述尺度图像特征计算类激活图得到所述尺度图像的尺度类激活图;所述类激活图用于表示图像中各个像素点对所述图像的分类结果的贡献值;
解码模块,用于调用所述解码器对所述样本图像特征进行解码得到所述样本图像的样本分割结果,调用所述解码器对所述尺度图像特征进行解码得到所述尺度图像的尺度分割结果;所述样本分割结果包括所述样本图像中各个像素点的分类概率值;
损失模块,用于基于所述样本类激活图、所述样本分割结果、所述尺度类激活图和所述尺度分割结果计算类激活图损失,基于所述样本分割结果和所述尺度分割结果计算尺度损失;
训练模块,用于基于所述类激活图损失和所述尺度损失训练所述解码器;所述类激活图损失用于训练所述解码器使所述样本分割结果靠近所述样本类激活图,使所述尺度分割结果靠近所述尺度类激活图;所述尺度损失用于训练所述解码器使所述样本分割结果靠近所述尺度分割结果。
一种图像分割方法,所述方法包括:
获取输入图像;
调用编码器对输入图像进行特征提取,得到所述输入图像的输入图像特征;
调用解码器对所述输入图像特征进行解码,得到所述输入图像的图像分割结果,所述解码器是根据类激活图损失和尺度损失训练得到的,所述类激活图损失用于训练所述解码器输出靠近类激活图的分割结果,所述类激活图用于表示图像中各个像素点对所述图像的分类结果的贡献值,所述尺度损失用于训练所述解码器对图像内容相同、尺度不同的多个图像输出相近的分割结果。
一种图像分割装置,所述装置包括:
获取模块,用于获取输入图像;
特征提取模块,用于调用编码器对输入图像进行特征提取,得到所述输入图像的输入图像特征;
图像分割模块,用于调用解码器对所述输入图像特征进行解码,得到所述输入图像的图像分割结果,所述解码器是根据类激活图损失和尺度损失训练得到的,所述类激活图损失用于训练所述解码器输出靠近类激活图的分割结果,所述类激活图用于表示图像中各个像素点对所述图像的分类结果的贡献值,所述尺度损失用于训练所述解码器对图像内容相同、尺度不同的多个图像输出相近的分割结果。
一种计算机设备,包括存储器和一个或多个处理器,所述存储器存储有计算机程序,所述计算机可读指令被所述一个或多个处理器执行时,使得所述一个或多个处理器实现如上方面所述的图像分割模型的训练方法或图像分割方法。
一个或多个存储有计算机可读指令的非易失性一种计算机可读存储介质,存储有计算机可读指令程序,所述计算机可读指令程序被一个或多个处理器执行时,使得所述一个或多个处理器实现如上方面所述的图像分割模型的训练方法或图像分割方法。
一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机可读指令,该计算机可读指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机可读指令,处理器执行该计算机可读指令,使得该计算机设备执行上述可选实现方式中提供的图像分割模型的训练方法或图像分割方法。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请一个示例性实施例提供的计算机设备的框图;
图2是本申请另一个示例性实施例提供的图像分割模型的训练方法的方法流程图;
图3是本申请另一个示例性实施例提供的图像分割模型的训练方法的方法流程图;
图4是本申请另一个示例性实施例提供的图像分类模型的训练方法的方法流程图;
图5是本申请另一个示例性实施例提供的图像分类模型的训练方法的示意图;
图6是本申请另一个示例性实施例提供的图像分割模型的训练方法的示意图;
图7是本申请另一个示例性实施例提供的图像分割方法的方法流程图;
图8是本申请另一个示例性实施例提供的图像分割模型的训练装置的框图;
图9是本申请另一个示例性实施例提供的图像分割装置的框图;
图10是本申请另一个示例性实施例提供的服务器的结构示意图;
图11是本申请另一个示例性实施例提供的终端的框图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
图1示出了本申请一个示例性实施例提供的计算机设备101的示意图,该计算机设备101可以是终端或服务器。
终端可以包括智能手机、笔记本电脑、台式电脑、平板电脑、智能音箱、车载终端、智能机器人中的至少一种。在一种可选的实现方式中,终端上安装有支持图像分割功能的客户端,例如,支持图像分割功能的客户端可以是图像处理应用程序,或,视频处理应用程序的客户端。例如,图像处理应用程序中提供了图像分割功能,用于智能识别图像中的人脸区域,以便自动对人脸区域进行图像优化。或者,视频处理应用程序中提供了目标批量修改功能,用于智能识别视频每帧图像中的目标(人、植物、动物、物品等),对每帧图像中的目标进行统一地图像处理,例如,调整色彩、亮度、饱和度等。
示例性的,终端上存储有图像分割模型,当客户端需要使用图像分割功能时,客户端可以调用图像分割模型完成图像分割。例如,当用户需要对目标图像进行图像优化时,客户端调用图像分割模型对目标图像进行图像分割,得到目标图像中的目标区域,对目标区域进行自动优化。
终端与服务器之间通过有线或者无线网络相互连接。
示例性的,本申请提供的方法可以由终端上的客户端来执行,也可以由服务器来执行。即,图像分割模型的训练可以是由客户端完成,也可以是由服务器完成。示例性的,图像 分割模型训练完成后的应用阶段,可以由客户端调用本地存储的图像分割模型进行图像分割;也可以由客户端向服务器发送图像分割请求,由服务器调用图像分割模型进行图像分割;还可以是在服务器需要对接收到的图像进行图像分割时,调用图像分割模型进行图像分割。
终端包括第一存储器和第一处理器。第一存储器中存储有图像分割模型;上述图像分割模型被第一处理器调用执行以实现本申请提供的图像分割模型的训练方法。第一存储器可以包括但不限于以下几种:随机存取存储器(Random Access Memory,RAM)、只读存储器(Read Only Memory,ROM)、可编程只读存储器(Programmable Read-Only Memory,PROM)、可擦除只读存储器(Erasable Programmable Read-Only Memory,EPROM)、以及电可擦除只读存储器(Electric Erasable Programmable Read-Only Memory,EEPROM)。
第一处理器可以是一个或者多个集成电路芯片组成。可选地,第一处理器可以是通用处理器,比如,中央处理器(Central Processing Unit,CPU)或者网络处理器(Network Processor,NP)。可选地,第一处理器可以通过运行程序或代码来实现本申请提供的图像分割模型的训练方法。
服务器包括第二存储器和第二处理器。第二存储器中存储有图像分割模型,上述图像分割模型被第二处理器调用来实现本申请提供的图像分割模型的训练方法。示例性的,服务器接收并存储终端发送的用户数据,基于用户数据来对信息对象进行标注。可选地,第二存储器可以包括但不限于以下几种:RAM、ROM、PROM、EPROM、EEPROM。可选地,第二处理器可以是通用处理器,比如,CPU或者NP。
示例性的,终端或服务器中存储的图像分割模型包括编码器102和解码器103。
示例性的,计算机设备调用编码器102对样本图像X和尺度图像R(X)进行特征提取得到样本图像X的样本图像特征和尺度图像R(X)的尺度图像特征,尺度图像R(X)为对样本图像X进行上采样得到的图像;即,尺度图像R(X)的尺寸为样本图像X二倍。
基于样本图像特征计算类激活图104得到样本图像的样本类激活图,基于尺度图像特征计算类激活图104得到尺度图像的尺度类激活图;类激活图104用于表示图像中各个像素点对图像的分类结果的贡献值;
调用解码器103对样本图像特征进行解码得到样本图像的样本分割结果,调用解码器对尺度图像特征进行解码得到尺度图像的尺度分割结果;样本分割结果包括样本图像中各个像素点的分类概率值;
基于样本类激活图、样本分割结果、尺度类激活图和尺度分割结果计算损失L seg,损失L seg包括类激活图损失和尺度损失;类激活图损失用于训练解码器使样本分割结果靠近样本类激活图,使尺度分割结果靠近尺度类激活图;尺度损失用于训练解码器使样本分割结果靠近尺度分割结果;
基于类激活图损失和尺度损失训练解码器103。
图2示出了本申请一个示例性实施例提供的图像分割模型的训练方法的流程图。该方法可以由计算机设备来执行,例如,如图1所示的计算机设备来执行。方法包括如下步骤。
步骤201,调用编码器对样本图像和尺度图像进行特征提取,得到样本图像的样本图像特征和尺度图像的尺度图像特征,尺度图像包括:放大样本图像得到的图像,或,缩小样本图像得到的图像中的至少一种。
示例性的,图像分割模型包括编码器和解码器。图像分割模型用于对输入的图像进行分割得到图像上分类目标所在的区域。图像分割模型可以用于识别至少一种分类目标,例如,图像分割模型可以用于识别图像中猫、狗、人所在的像素点。
示例性的,当图像风格模型用于分割图像上N个分类目标时,图像分割模型根据输入的图像可以输出图像上每个像素点属于N个分类目标的N个概率值,则可以得到图像属于N个分类目标的N个分类概率图,第i个分类概率图上每个像素点的像素值为该像素点属于第i个分类目标的概率值,其中N为正整数,i为不大于N的正整数。
示例性的,根据图像的N个分类概率图可以确定图像上每个像素点所属的分类目标,例如,将一个像素点的N个概率值中概率值最大的分类目标确定为该像素点所属的分类目标,例如,一个像素点属于猫的概率值是0.1,属于狗的概率值是0.2,属于人的概率值是0.7,则这个像素点所属的分类目标是人。示例性的,图像分割模型的输出可以是图像的分类目标图,分类目标图上每个像素点的取值代表该像素点所属的分类目标。因此,分类目标图上可以看出各个分类目标所在的位置(像素点)。
综上,图像分割模型输出的分割结果可以是上述的N个分类概率图也可以是上述的分类目标图。
由于对于一张图像,放大或缩小该图像,并不会改变图像中分类目标的位置分布,因此,放大或缩小的图像其图像分割结果应该与原图像是相同的(分割结果缩放到同一尺度)。因此,本实施例提供的方法基于这一思想,来训练图像分割网络,使图像分割网络对原图和缩放图输出的分割结果相逼近。
因此,在获取图像分割模型的训练样本时,需要获取样本图像、以及与样本图像对应的尺度图像,尺度图像包括放大的样本图像和缩小的样本图像中的至少一种。示例性的,对样本图像进行放大的方式可以是上采样,对样本图像进行缩小的方式可以是下采样。
即,一组训练样本包括以下三种组合方式中的至少一种:
1)样本图像、对样本图像进行上采样得到的放大尺度图像;
2)样本图像、对样本图像进行下采样得到的缩小尺度图像;
3)样本图像、对样本图像进行上采样得到的放大尺度图像和对样本图像进行下采样得到的缩小尺度图像。
示例性的,本实施例的方法步骤仅以一组训练样本为例对本申请提供的图像分割模型的训练方法进行说明,基于本实施例提供的方法,可以采用多组训练样本对图像分割模型进行迭代训练。
示例性的,在获取样本图像和尺度图像后,分别将样本图像和尺度图像输入编码器,进行特征提取,得到样本图像的样本图像特征和尺度图像的尺度图像特征。
示例性的,本实施例所采用的编码器是已经经过训练,具有较高特征提取能力的编码器,在本实施例的训练方法中只对图像分割模型的解码器进行了训练,并不会训练编码器。该编码器可以使用任意已经训练好的图像分类模型中的编码器。图像分类模型用于根据输入的图像输出该图像所属的分类目标。示例性的,图像分类模型所识别的分类目标与图像分割模型所识别的分类目标相同,即,图像分类模型也是用于识别N个分类目标的。
示例性的,编码器为卷积神经网络(Convolutional Neural Networks,CNN),其具体的网络结构可以根据图像大小进行调整。
步骤202,基于样本图像特征计算类激活图得到样本图像的样本类激活图,基于尺度图像特征计算类激活图得到尺度图像的尺度类激活图;类激活图用于表示图像中各个像素点对图像的分类结果的贡献值。
示例性的,由于编码器是采用了已经训练好的图像分类模型中的编码器,编码器所提取出的图像特征中已经具有对该图像进行分类的特征信息,因此,基于编码器输出的图像特征计算类激活图,就可以得到图像分类模型对图像进行分类时所依据的像素点。使用类激活图来训练解码器,使解码器输出的图像分割结果贴近类激活图。
步骤203,调用解码器对样本图像特征进行解码得到样本图像的样本分割结果,调用解码器对尺度图像特征进行解码得到尺度图像的尺度分割结果;样本分割结果包括样本图像中各个像素点的分类概率值。
示例性的,解码器用于对编码器输出的图像特征进行解码得到分割结果。
示例性的,解码器由多层卷积神经网络组成。通常,解码器采用与编码器互逆的网络结构。例如,编码器包括四个卷积块,每个卷积块由两个卷积层组成,其中卷积核大小均为3*3,在每个卷积块之后,编码器接入一个核大小为2*2的池化层,将卷积块输出的图像特征变为原来的1/2,对应地解码器也可以包括四个卷积块,每个卷积块由两个卷积层组成,其中卷积核大小均为3*3,在每个卷积块之前,解码器接入一个上采样层,将图像特征的尺寸变为原来的两倍,如此,输入编码器的图像大小与解码器输出的图像大小相同。
步骤204,基于样本类激活图、样本分割结果、尺度类激活图和尺度分割结果计算类激活图损失,基于样本分割结果和尺度分割结果计算尺度损失;类激活图损失用于训练解码器使样本分割结果靠近样本类激活图,使尺度分割结果靠近尺度类激活图;尺度损失用于训练解码器使样本分割结果靠近尺度分割结果。
示例性的,基于上述使分割结果逼近类激活图的思路,计算类激活图和分割结果的类激活图损失。基于上述尺度改变后图像的分割结果不变的思路,计算样本分割结果和尺度分割结果的尺度损失。
步骤205,基于类激活图损失和尺度损失训练解码器。
示例性的,基于上述的类激活图损失以及尺度损失训练解码器,使解码器基于类激活图输出分割结果,并使解码器的输出结果保证尺度改变后图像的分割结果不变。
综上所述,本实施例提供的方法,基于同一张图片,进行缩放后,其图像分割结果不应该改变的思想,使用样本图像和样本图像缩放后得到的尺度图像,分别输入编码器和解码器进行图像分割,得到两个图像的图像分割结果,基于两个图像分割结果可以计算得到在尺度损失,利用尺度损失来训练编码器,使两个图像分割结果相逼近,保证图像尺寸改变后图像分割结果不变。基于编码器输出的图像特征,分别计算两个图像的类激活图像,由于类激活图体现了进行图像分类所依据的主要像素点,因此,图像分割结果应该贴近类激活图,使用图像的分割结果和图像的类激活图计算类激活图损失,利用类激活图损失来训练编码器,最终使编码器输出的图像分割结果能够在靠近类激活图的同时,保证图像缩放后得到的图像分割结果不变。
示例性的,给出一种计算类激活图和损失的示例性实施例。
图3示出了本申请一个示例性实施例提供的图像分割模型的训练方法的流程图。该方法可以由计算机设备来执行,例如,如图1所示的终端或服务器来执行。在图2所示的示 例性实施例的基础上,步骤202之前还包括步骤301,步骤202还包括步骤2021,步骤204还包括步骤2041至步骤2043,步骤205还包括步骤2051至步骤2052。
步骤301,调用全连接层对样本图像特征进行分类预测得到样本图像的样本分类结果;调用全连接层对尺度图像特征进行分类预测得到尺度图像的尺度分类结果。
示例性的,图像分割模型还包括经过预训练的全连接层。
示例性的,图2所示的实施例中步骤201所提到的图像分类模型中还包括了全连接层,即,图像分类模型包括编码器和全连接层,图像输入编码器进行特征提取后,经过全连接层得到分类结果。图像分割模型还采用了图像分类模型的全连接层,根据编码器输出的图像特征分别输出样本图像和尺度图像的分类结果。
示例性的,以图像分割模型和图像分类模型是用于识别N个分类目标的模型,则分类结果为图像属于N个分类目标的N个概率值组成的向量。
步骤2021,基于样本图像特征和样本分类结果计算得到样本图像的样本类激活图;基于尺度图像特征和尺度分类结果计算得到尺度图像的尺度类激活图。
示例性的,类激活图的计算公式为:
Figure PCTCN2021124337-appb-000001
其中,c为N个分类目标中的第c个分类目标,S c为分类结果中第c类的概率值,N*M为图像特征的大小,
Figure PCTCN2021124337-appb-000002
为第i个图像特征第k行第j列的像素值,
Figure PCTCN2021124337-appb-000003
表示第i个图像特征第k行第j列的像素对图像被分为第i个分类目标的贡献值。
Figure PCTCN2021124337-appb-000004
其中,
Figure PCTCN2021124337-appb-000005
为图像在第c个分类目标的类激活图;ReLU为激活函数,表示类激活图只关注概率值大于0的像素点,A i为第i个图像特征。
示例性的,当图像分割模型用于识别N个分类目标时,根据上述计算公式可以计算得到图像在N个分类目标的N个类激活图,第i个类激活图用于表示图像中各个像素点对图像被分为第i个分类目标的贡献值。
即,样本类激活图包括了样本图像在N个分类目标的N个类激活图;尺度类激活图包括了尺度图像在N个分类目标的N个类激活图。
步骤2041,基于样本类激活图和样本分割结果计算样本类激活图损失。
示例性的,根据样本分割结果与样本类激活图的误差计算得到样本类激活图损失。示例性的,将样本类激活图和样本分割结果的交叉熵确定为样本类激活图损失。
样本类激活图损失的计算公式如下:
L seg,1=-y 1log(s 1)
其中,L seg,1为样本类激活图损失,y 1为样本类激活图,s 1为样本分割结果。
示例性的,类激活图与分割结果的尺寸不同,还需要将类激活图上采样至与分割结果相同的尺寸,然后根据上述公式进行类激活图损失的计算。由于分割结果的尺寸与输入图像分割模型的原图像的尺寸相同,即,样本分割结果的尺寸与样本图像的尺寸相同。而类激活图是基于编码器输出的图像特征计算得到的,类激活图的尺寸与编码器输出的图像特征的尺寸相同,而图像特征的尺寸小于原图像,因此,类激活图的尺寸小于分割结果。对此,需要对类激活图进行上采样,放大至分割结果的尺寸,然后使用放大后的类激活图代入上述公式进行类激活图损失的计算。
示例性的,上述式子是图像分割网络的分类目标为1个时的样本类激活图损失的计算公式,当图像分割网络的分类目标为N个时,样本类激活图损失的计算公式为:
Figure PCTCN2021124337-appb-000006
其中,L seg,1为样本类激活图损失,y 1i为第i个分类目标的样本类激活图,s 1i为第i个分类目标的样本分割结果。
步骤2042,基于尺度类激活图和尺度分割结果计算尺度类激活图损失。
示例性的,根据尺度分割结果与尺度类激活图的误差计算得到尺度类激活图损失。示例性的,将尺度类激活图和尺度分割结果的交叉熵确定为尺度类激活图损失。
尺度类激活图损失的计算公式如下:
L seg,2=-y 2log(s 2)
其中,L seg,2为尺度类激活图损失,y 2为尺度类激活图,s 2为尺度分割结果。
示例性的,上述式子是图像分割网络的分类目标为1个时的尺度类激活图损失的计算公式,当图像分割网络的分类目标为N个时,尺度类激活图损失的计算公式为:
Figure PCTCN2021124337-appb-000007
其中,L seg,2为尺度类激活图损失,y 2i为第i个分类目标的尺度类激活图,s 2i为第i个分类目标的尺度分割结果。
示例性的,当尺度图像只包括缩小尺度图像和放大尺度图像中的一种时,可以采用上述公式计算尺度类激活图损失。当尺度图像包括缩小尺度图像和放大尺度图像两个尺度图像时,则尺度类激活图损失包括缩小尺度类激活图损失和放大尺度类激活图损失。则分别利用上述公式计算出缩小尺度图像的缩小尺度类激活图损失L seg,2.1,和放大尺度图像的放大尺度类激活图损失L seg,2.2
步骤2043,基于样本分割结果和尺度分割结果计算尺度损失。
示例性的,由于样本图像和尺度图像的大小不同,则输出的样本分割结果和尺度分割结果的大小也不同,因此,为了比较二者的差异,需要将样本分割结果和尺度分割结果缩放至同一尺寸。示例性的,由于编码器和解码器的结构是对应则,则输出的分割结果和输入的图像尺寸相同,因此,按照样本图像和尺度图像的尺度关系,将样本分割结果和尺度分割结果缩放至同一尺度。
即,计算机设备根据样本图像和尺度图像的尺度关系,将样本分割结果缩放至与尺度分割结果相同的尺寸,得到缩放后的样本分割结果;根据尺度分割结果与缩放后的样本分割结果的误差计算尺度损失。
例如,尺度图像是对样本图像进行上采样,放大至两倍后获得的图像。则将样本分割结果进行上采样,放大至两倍后获得缩放后的样本分割结果。
示例性的,计算尺度分割结果与缩放后的样本分割结果的第一矩阵差,将第一矩阵差的2范数确定为尺度损失。
尺度损失的计算公式可以是:
L seg,3=||s 2-R(s 1)|| 2
其中,L seg,3为尺度损失,s 2为尺度分割结果,s 1为样本分割结果,R(s 1)为缩放后的样本分割结果。
示例性的,上述式子是图像分割网络的分类目标为1个时的尺度损失的计算公式,当图像分割网络的分类目标为N个时,尺度损失的计算公式为:
Figure PCTCN2021124337-appb-000008
其中,L seg,3为尺度损失,s 2i为第i个分类目标的尺度分割结果,s 1i为第i个分类目标的样本分割结果,R(s 1i)为第i个分类目标的缩放后的样本分割结果。
示例性的,当尺度图像只包括缩小尺度图像和放大尺度图像中的一种时,可以采用上述公式计算尺度损失。当尺度图像包括缩小尺度图像和放大尺度图像两个尺度图像时,则尺度损失的计算公式为:
L seg,3=||s 4-R 4(s 1)|| 2+||s 5-R 5(s 1)|| 2
其中,L seg,3为尺度损失,s 4为缩小尺度图像的缩小尺度分割结果,s 5为放大尺度图像的放大尺度分割结果,s 1为样本分割结果,R 4(s 1)为按照缩小尺度图像与样本图像的尺度关系缩小后的样本分割结果,R 5(s 1)为按照放大尺度图像与样本图像的尺度关系放大后的样本分割结果。
示例性的,上述式子是图像分割网络的分类目标为1个时的尺度损失的计算公式,当图像分割网络的分类目标为N个时,尺度损失的计算公式为:
Figure PCTCN2021124337-appb-000009
其中,L seg,3为尺度损失,s 4i为缩小尺度图像在第i个分类目标的缩小尺度分割结果,s 5i为放大尺度图像在第i个分类目标的放大尺度分割结果,s 1i为第i个分类目标的样本分割结果;R 4(s 1i)为第i个分类目标的样本分割结果,按照缩小尺度图像与样本图像的尺度关系缩小后的样本分割结果;R 5(s 1i)为第i个分类目标的样本分割结果,按照放大尺度图像与样本图像的尺度关系放大后的样本分割结果。
步骤2051,计算样本类激活图损失、尺度类激活图损失和尺度损失的加权和。
示例性的,L seg,1和L seg,2代表约束解码器输出的分割结果向对应的类激活图靠近。L seg,3代表图像经过尺度变换后得到的分割结果,应该与原图像的分割结果经过相同的尺度变换后保持一致。
基于上述三种损失(样本类激活图损失、尺度类激活图损失和尺度损失),计算总损失:
L seg=L seg,1+L seg,2+λL seg,3
其中,L seg为总损失,L seg,1为样本类激活图损失,L seg,2为尺度类激活图损失,L seg,3为尺度损失,λ为尺度损失的权重。
示例性的,样本类激活图损失的权重为1、尺度类激活图损失的权重为1、尺度损失的权重为2,即,λ=2。
示例性的,当尺度图像只包括缩小尺度图像和放大尺度图像中的一种时,可以采用上述公式计算总损失。当尺度图像包括缩小尺度图像和放大尺度图像两个尺度图像时,则尺度类激活图损失包括缩小尺度类激活图损失和放大尺度类激活图损失。则总损失为:
L seg=L seg,1+L seg,2.1+L seg,2.2+λL seg,3
其中,L seg为总损失,L seg,1为样本类激活图损失,L seg,2.1为缩小尺度类激活图损失,L seg,2.2为放大尺度类激活图损失,L seg,3为尺度损失,λ为尺度损失的权重。
步骤2052,根据加权和训练解码器。
示例性的,根据计算出的加权和(总损失)训练解码器,约束分割结果向类激活图靠近,约束相同图像不同尺寸的分割结果保持一致。
示例性的,在完成对图像分割模型的训练后,可以利用图像分割模型进行图像分割。
即,计算机设备调用图像分割模型对输入图像进行图像分割,得到图像分割结果,图像分割模型包括编码器和解码器。
例如,调用编码器对输入图像进行特征提取,得到输入图像的输入图像特征;调用解码器对输入图像特征进行解码,得到输入图像的图像分割结果。
综上所述,本实施例提供的方法,通过使用预先训练好的图像分类模型的编码器进行特征提取,使编码器输出的图像特征具有分类所需的特征信息,使用图像分类模型的全连接层输出图像的分类结果,然后利用分类结果和图像特征求出图像的类激活图。虽然类激活图也可以输出为一种图像分割结果,但由于图像尺度变化会极大地影响类激活图,而正常来说尺度的变化不会影响图像的分割结果,因此,基于尺度不变性的思路,引入同一张图像的两种尺度,用两种尺度的两个分割结果的误差训练解码器,使解码器能够在准确进行图像分割的同时,保证尺度变换后分割结果不变。
示例性的,图像分割模型中的编码器和全连接层使用了已经训练好的图像分类模型的编码器和全连接层。示例性的,假设图像分类模型中的编码器为分类编码器,全连接层为分类全连接层,在图像分割模型的初始化阶段,计算机设备根据已训练完毕的图像分类模型中分类编码器的参数,设置编码器的参数,图像分类模型与图像分割模型的分类目标相同;根据已训练完毕的图像分类模型中分类全连接层的参数,设置全连接层的参数,图像分类模型用于根据输入的图像输出分类结果。
给出一种训练图像分类模型的示例性实施例。
图4示出了本申请一个示例性实施例提供的图像分割模型的训练方法的流程图。该方法可以由计算机设备来执行,例如,如图1所示的终端或服务器来执行。该方法包括以下步骤。
步骤401,获取数据样本。
步骤402,调用分类编码器和分类全连接层对数据样本进行特征提取和分类,得到数据样本的分类向量,确定数据样本和分类向量的相关性;分类向量包括类别向量和类内风格向量。
计算机设备获取数据样本集。数据样本集中包括数据样本。数据样本为图像。计算机设备中预先建立了图像分类模型,图像分类模型包括编码器以及连接在编码器之后的全连接层。图像分类模型可以采用多种神经网络。例如,图像分类模型可以采用卷积神经网络,其卷积块可以根据图像大小进行调整。图像越大,卷积块相应的越大。如,对于32*32的图像,可以采用2个卷积块,对于96*96的图像,可以采用4个卷积块。
计算机设备将数据样本输入至图像分类模型,图像分类模型的编码器对数据样本进行特征提取得到样本特征,全连接层根据样本特征输出数据样本的分类向量。分类向量包含类别向量以及类内风格向量。其中,类别向量中的元素为该数据样本属于各分类目标的概率。类内风格向量描述了数据样本的类内风格信息。计算机设备可以利用其它网络模型对图像分类模型进行训练。其中,计算机设备可以利用判别器确定数据样本和分类向量的相关性(数据样本与根据数据样本得到的分类向量相对应),利用评价器确定分类向量服从于先验分布的评分值。计算机设备对图像分类模型进行训练的整体网络结构图,可以如图5所示。
判别器501是由多个全连接层组成的深度神经网络。例如,可以是由三个或三个以上的全连接层组成的深度神经网络。
评价器502是由多个全连接层组成的深度神经网络。可以是由三个或三个以上的全连接层组成的深度神经网络。
判别器501可以判断数据样本与分类向量之间是否相关,以此对数据样本与分类向量之间的互信息进行最大化。计算机设备可以将数据样本和提取到的分类向量同时输入至判别器501。其中,数据样本中包括第一样本和第二样本。当输入至判别器501的数据样本为第一样本,提取到的分类向量来源于第二样本,且第一样本与第二样本不同时,则第一样本与该分类向量是负样本,判别器501判断两者不相关。当输入至判别器501的数据样本为第一样本,提取到的分类向量来源于第一样本,则第一样本与提取到的分类向量是正样本,判别器501判断两者相关。在图5中,鞋子图像503可以作为第一样本,衣服图像504可以作为第二样本,编码器和全连接层根据输入的第一样本得到第一分类向量,根据输入的第二样本得到第二分类向量。第一样本与第一分类向量相关,第一样本与第二分类向量不相关。当判别器501能够正确地判断数据样本与分类向量是否相关时,说明分类向量中蕴含了与数据样本相关的信息,从而能够达到最大化互信息的目的。
步骤403,对类别向量引入类别先验分布,对类内风格向量引入类内风格先验分布,以确定分类向量服从于先验分布的评分值。
评价器为分类向量引入先验分布。评价器也是由多个全连接层组成的深度神经网络。 可以是由三个或三个以上的全连接层组成的深度神经网络。
先验分布包括类别先验分布和类内风格先验分布。类别先验分布可以简称为类别分布,类内风格先验分布可以是高斯分布。评价器为类别向量z c引入类别分布
Figure PCTCN2021124337-appb-000010
为类内风格向量z s引入高斯分布
Figure PCTCN2021124337-appb-000011
由此可以将类别向量与类内风格向量进行有效解耦。
当分类向量服从先验分布时,使得输出的类别特征部分为独热向量,可以直接利用独热向量中数值最大的元素来代表数据样本的类别,避免还需要进行下一步分类操作。同时,还可以防止数据样本只被聚到1类或几类中,能够保证聚成所需的类别数,如聚到10类。
步骤404,至少根据相关性和评分值训练图像分类模型。
计算机设备可以利用数据样本和分类向量的相关性、分类向量服从于先验分布的评分值对图像分类模型的网络参数进行反向优化。其中,可以采用反向传播法对图像分类模型中各网络参数进行优化。例如,反向传播法可以采用基于Adam的梯度下降法。对图像分类模型进行反向优化时,可以对图像分类模型、判别器以及评价器的网络参数进行权重更新。训练时,学习率为0.0001,控制损失函数收敛的参数β 1设为0.5,β 2设为0.9。批大小(batch size)设为64。在反向优化过程中,可以每次利用同一批次的数据样本对评价器、图像分类模型和判别器交替进行优化。当评价器的损失函数开始收敛时,说明图像分类模型学习到的分类向量已经靠近先验分布,可以停止训练。
在一个实施例中,图像分类模型的训练方法还包括:对数据样本进行增强处理,通过图像分类模型,映射得到增强后的分类向量;增强后的分类向量包括增强后的类别向量和增强后的类内风格向量;确定类别向量和增强后的类别向量的类别特征差异;至少根据相关性和评分值训练图像分类模型包括:根据相关性、类别特征差异和评分值训练图像分类模型。
计算机设备将数据样本输入图像分类模型,经过编码器和全连接层得到对应的分类向量。分类向量包括类别向量和类内风格向量。其中,类别向量是经过Softmax函数激活后的向量,该向量中的元素表示数据样本属于各分类目标的概率,向量维度设为分类目标的数量。类内风格向量为线性激活后的向量。该向量描述了数据样本的类内风格信息,向量维度可以是预设数量,例如可以是50。示例性的,类内风格信息是指,对于属于同一分类目标的多个图像,图像与图像间存在的风格上的差异信息。类别向量与类内风格向量经过不同的激励后,得到的数值不同,但部分信息可能混在一起。通过对类别向量引入类别先验分布,对类内风格向量引入类内风格先验分布,可以将类别向量与类内风格向量进行有效解耦。
由于同一类数据样本会存在不同的风格,风格的改变不会改变数据样本所属的分类目标的。即,基于特定的数据增强不会改变数据样本所属的分类目标这一现象,本实施例中,通过对数据样本进行数据增强处理,通过训练使得增强后的分类向量不会发生变化。计算机设备对数据样本进行增强处理,增强处理包括对图像进行随机裁剪、随机水平翻转、颜色抖动和随机组合颜色通道等。将增强处理后的数据样本输入至图像分类模型,经过编码 器和全连接层得到增强后的分类向量。计算机设备在分类向量中提取类别向量,在增强后的分类向量中提取增强后的类别向量,将类别向量与增强后的类别向量输入至评价器,通过评价器识别类别向量与增强后的类别向量之间的类别特征差异。其中,类别向量的向量中的元素是数据样本属于各分类目标的概率值。类别向量与增强后的类别向量之间的类别特征差异可以通过散度来进行衡量。
计算机设备可以利用数据样本和分类向量的相关性、分类向量服从于先验分布的评分值以及类别向量和增强后的类别向量的类别特征差异对图像分类模型的网络参数进行反向优化。在网络的反向传播过程中,使用梯度下降更新图像分类模型、判别器和评价器的网络参数对应的权重值。由此能够使得图像分类模型学习到分类向量与数据样本相关,学习到类别向量可以代表数据样本的分类目标,学习到类内风格向量可以代表同一类数据样本的区别。经过数据增强处理后的,数据样本的类别向量保持不变,即数据样本的风格可能会发生一定变化,但仍然属于同一类别。而且,由于引入了先验分布的约束,可以使得类别向量尽量贴近独热向量,即大部分元素的数值接近0,只有一个元素的值接近1,从而能够根据类别向量直接确定数据样本对应的分类目标。
在一个实施例中,数据样本包括第一样本和第二样本;确定数据样本和分类向量的相关性包括:获取第一样本,利用第一样本的分类向量与第一样本向量进行拼接,生成拼接后的第一样本向量(正样本);利用第二样本的分类向量与第一样本向量进行拼接,生成拼接后的第二样本向量(负样本);训练判别器根据第一样本向量输出“相关”结果,根据第二样本向量输出“不相关”结果。
数据样本中包括第一样本和第二样本,其中,第一样本与第二样本可以完全不同,也可以相同。第一样本输入图像分类模型,映射得到与第一样本对应的第一分类向量。第二样本输入图像分类模型,映射得到与第二样本对应的第二分类向量。第一分类向量与第二分类向量都可以是多维向量,例如50维。计算机设备将第一样本转换为第一样本向量。计算机设备将第一分类向量与第一样本向量进行拼接,生成拼接后的第一样本向量。拼接的方式可以是在第一分类向量后添加第一样本向量,也可以在第一样本向量之后添加第一分类向量。计算机设备可以采用上述拼接方式将第二分类向量与第一样向量进行拼接,生成拼接后的第二样本向量。将拼接后的第一样本向量输入判别器,判别器若判断二者相关,则输出1,若判断两者不相关,输出0;将拼接后的第二样本向量输入判别器,判别器若判断二者相关,则输出1,若判断两者不相关,输出0。当判别器能够正确判断数据样本与分类向量之间是否相关时,说明分类向量中蕴含了与数据样本相关的信息,达到最大化互信息的目的,由此能够使得图像分类模型学习到的分类向量与数据样本相关。
在一个实施例中,对类别向量引入类别先验分布,对类内风格向量引入类内风格先验分布,以确定分类向量服从于先验分布的评分值包括:通过评价器对类别向量引入类别先验分布,得到类别向量的类别分布结果;通过评价器对类内风格向量引入类内风格先验分布,得到类内风格向量的类内风格先验分布结果;通过评价器对类别分布结果以及类内风格先验分布结果进行评分,得到分类向量服从于先验分布的评分值。
评价器为分类向量引入先验分布。先验分布包括类别先验分布和类内风格先验分布。类别先验分布可以简称为类别分布,类内风格先验分布可以是高斯分布。类别分布可以是
Figure PCTCN2021124337-appb-000012
其中,
Figure PCTCN2021124337-appb-000013
为类别向量的分布,Cat为类别分布,是独热向量,K为分类目标数,P为K的倒数。类内风格向量可以是
Figure PCTCN2021124337-appb-000014
为类内风格向量的分布,N为高斯分布,σ为标准差,σ可以是预设数值,如0.1。
计算机设备将类别向量与类内风格向量同时输入至评价器,评价器分别输出类别向量对应的类别分布结果以及类内风格向量对应的高斯分布结果。其中,类别分布结果可以是类别向量,类别向量可以是独热向量。高斯分布结果可以是风格向量。
在一个实施例中,通过评价器对类别分布结果以及类内风格先验分布结果进行评分包括:对类别向量的类别分布向量与类内风格向量的高斯分布向量进行拼接,生成先验分布向量;通过评价器对先验分布向量进行评分,得到分类向量服从于先验分布的评分值。
计算机设备将类别结果与高斯分布结果进行拼接,即将相应的类别向量与风格向量进行拼接。拼接方式可以是在类别向量的最后一个元素之后添加风格向量的元素。也可以是在类别向量的最后一个元素之后添加风格向量的元素。评价器对拼接后的向量进行评分,得到相应分数,该分数为分类向量服从于先验分布的概率。概率越高,说明分类向量越服从于先验分布。分类向量服从先验分布时,可以使得输出的类别向量尽量接近独热向量,由此可以直接利用独热向量中数值最大的元素代表数据样本的类别,避免还需要进行下一步分类操作。此外,在服从先验分布时,可以防止数据样本只被分到一类或者几类中,从而能保证将数据样本分到想要的分类目标中。
在一个实施例中,图像分类模型的训练方法还包括:通过判别器确定数据样本和分类向量的相关性;通过评价器确定分类向量服从于先验分布的评分值;至少根据相关性和评分值训练图像分类模型包括:至少根据相关性和评分值对图像分类模型、判别器和评价器进行交替优化。
通过判别器识别数据样本和分类向量之间的相关性。判别器识别数据样本和分类向量之间相关性的损失函数,可以称为互信息损失函数。判别器可以通过互信息损失函数进行训练。互信息损失函数可以采用如下表示:
Figure PCTCN2021124337-appb-000015
其中,X为数据样本,Z为分类向量,S为sigmoid函数,E表示期望,D为判别器,用于判断X和Z是否相关,Q(Z|X)为图像分类模型映射得到的Z的后验分布;P X为输入图像的先验分布,
Figure PCTCN2021124337-appb-000016
为Z的聚合后验分布,
Figure PCTCN2021124337-appb-000017
表示X、 Z服从Q(Z|X)P X(X)的数学期望。当X与Z为正样本时,
Figure PCTCN2021124337-appb-000018
当X与Z为负样本时,
Figure PCTCN2021124337-appb-000019
通过互信息损失函数对判别器进行训练的过程中,损失函数值越小,相关性判断越准确,反向优化时,对判别器网络中的每一层权重的影响就越小。当判别器能够正确判断数据样本与特征之间是否相关时,说明特征中蕴含了与数据样本相关的信息,达到最大化互信息的目的。
类别向量与增强后的类别向量之间的类别特征差异可以通过散度来进行衡量。散度可以是KL散度。相应的损失函数可以称为类别差异损失函数,采用如下公式:
L Aug=KL(Q(Z c|X)||Q(Z c|T(X)))
其中,KL为KL散度,Q为图像分类模型,Z c为类别向量,X为数据样本,T为数据增强,Q(Z c|X)为Z c的聚合后验分布,Q(Z c|T(X))为增强后的分类向量的后验分布。
类别差异损失函数的函数值越小,类别向量与增强后的类别向量之间的类别特征差异越小,相应的,数据样本在经过数据增强处理后,类别向量发生变化的几率就越小。
通过评价器对分类向量服从于先验分布进行评分。为分类向量引入先验分布的损失函数,可以称为先验分布损失函数。其中,可以分别针对图像分类模型和评价器定义不同的先验分布损失函数。通过先验分布损失函数可以使得图像分类模型映射的分类向量尽量贴近先验分布。图像分类模型的先验分布损失函数可以如下:
Figure PCTCN2021124337-appb-000020
其中,Q为图像分类模型,Z为数据样本的分类向量,C(Z)为分类向量是否服从先验分布的概率值,Q Z是Z的聚合后验分布,
Figure PCTCN2021124337-appb-000021
为Z服从Qz[C(Z)]的数学期望。
评价器的先验分布损失函数可以如下所示:
Figure PCTCN2021124337-appb-000022
其中,C为评价器,P Z为先验分布,
Figure PCTCN2021124337-appb-000023
为从先验分布P Z和聚合后验分布Q Z采样的特征对连线上的特征,
Figure PCTCN2021124337-appb-000024
为梯度惩罚项,用于让评价器C满足Lipschitz约束,让其评价的得分,即服从先验分布的概率变化不会过于剧烈,λ为梯度惩罚项系数,设为10。
在一个实施例中,可以将互信息损失函数、类别差异损失函数、图像分类模型的先验分布损失函数,作为子损失函数来定义图像分类模型的总损失函数。每个子损失函数可以分别具有对应的权重。可以利用互信息损失函数以及其对应的权重定义判别器的总损失函数。可以利用评价器的先验分布损失函数及其权重定义评价器的总损失函数。
图像分类模型的总损失函数如下:
判别器的总损失函数如下:
Figure PCTCN2021124337-appb-000025
判别器的总损失函数如下:
L D=β MIL MI
评价器的总损失函数如下:
Figure PCTCN2021124337-appb-000026
其中,L Q为图像分类模型的总损失函数。L MI为互信息损失函数,L Aug为类别差异损失函数,
Figure PCTCN2021124337-appb-000027
为图像分类模型的先验分布损失函数,β MI为L MI的权重,β Aug为L Aug的权重,β Adv
Figure PCTCN2021124337-appb-000028
的权重。β MI、β Adv可以设置为相应的固定值,例如,β MI设为0.5,β Adv设为1。β Aug与数据样本的数据集相关,可以通过以下方式设置。具体的,计算机设备可以通过对分类向量进行非线性降维处理,生成相应的可视化降维图,根据可视化降维图选择类别差异损失函数的权重。可视化降维图,是将高维数据降维到低维数据后的结果,使得该结果是可视化的。低维比如二维或三维。例如,可以采用t-SNE对分类向量进行非线性降维处理,根据处理结果生成可视化降维图,即t-SNE图。在t-SNE图中,数据样本会 进行分类,形成分类簇,在β Aug的值较低时,各数据样本的分类簇较为分散,随着β Aug的值升高,得到特征趋向于聚合,分类簇甚至会出现重叠。不同的数据类型的数据集,所分类的结果不同。以数据样本为图像为例,在β Aug=2时,t-SNE图中的分类簇无重叠。在β Aug=3时,t-SNE图中的分类簇出现重叠。由此可以在2和3之间选择分类簇物重叠的最大值,作为β Aug的值,由此可以使得图像分类模型总损失函数更准确,从而使得训练后的图像分类模型的分类结果更准确。
在图像分类模型的训练可以采用反向优化的方式进行。在进行反向优化时,可以对评价器、图像分类模型和判别器交替进行优化。其中,首先优化评价器,然后优化图像分类模型和判别器。具体的,首先利用评价器的总损失函数反向优化评价器,使其对服从先验分布的分类向量的概率接近1,对不服从先验分布的分类向量的概率接近0。然后再利用图像分类模型的总损失函数反向优化图像分类模型,以及利用判别器的总损失函数反向优化判别器,使得图像分类模型输出的分类向量尽量获得高分,即分类向量服从先验分布的概率尽可能高,重复这样的交替优化过程,使得图像分类模型输出的分类向量获取高分,即分类向量服从先验分布的概率接近1,从而服从先验分布。
在其中一个实施例中,至少根据相关性和评分值对图像分类模型、判别器和评价器进行交替优化包括:先根据评分值对评价器的网络参数进行至少一次优化;再至少根据相关性和评分值对图像分类模型的网络参数进行优化,及根据相关性对判别器的网络参数进行优化。
具体的,由于数据样本的数量较多,不能一次性将所有的数据样本都输入图像分类模型进行训练。在反向优化时,可以将数据样本随机分成多批,每一批次采用固定数量的数据样本,也可以称为批次样本。例如,批次样本可以设定为64个数据样本,即批大小(batch size)设为64。
训练时,计算机设备确定分类向量服从于先验分布的评分值,确定数据样本和分类向量的相关性。对图像分类模型、判别器和评价器进行交替优化时,更新各网络参数对应的权重。首先根据分类向量服从于先验分布的评分值和评价器的总损失函数对评价器的网络参数进行至少一次优化之后,再根据数据样本和分类向量的相关性、分类向量服从于先验分布的评分值、类别特征差异和图像分类模型的总损失函数对图像分类模型的网络参数进行优化,及根据数据样本和分类向量的相关性和判别器的总损失函数对判别器的网络参数进行优化。例如,首先对评价器进行4次优化之后,再对图像分类模型和判别器进行1次优化。对图像分类模型和判别器进行反向优化时,可以先后进行反向优化,也可以同时进行反向优化。
对评价器进行反向优化时,对于先验分布的输入,其输出越接近于1时,损失函数值越小,反向传播时,对参数的变化越小,对于数据样本的输入,其输出越接近于0,损失 函数越小,反向传播时,对参数的变化越小。对图像分类模型进行反向优化时,数据样本的输入,其输出越接近于1,损失函数值越小,反向传播时,对参数的变化越小。对图像分类模型进行反向优化时不考虑先验分布。在图像分类模型进行反向优化时,可以由评价器的总损失函数指示当前图像分类模型学习的特征分布与先验分布之间的差异,当评价器的总损失函数开始收敛时,说明图像分类模型学习到的特征分布已经靠近先验分布,可以停止训练。
综上所述,本实施例提供的方法,对于分类业务中分类对象的数据样本,不需要执行额外的分类算法,也无需生成真实图像与原图图像进行比对,通过确定数据样本与分类向量之间的相关性,以及对类别向量引入类别先验分布,对样类内风格向量引入类内风格先验分布,以确定分类向量服从于先验分布的评分值,由此利用相关性和评分对图像分类模型进行训练,可以有效改善图像分类模型对分类向量的学习。由于图像分类模型学习到的特征分布靠近先验分布,并且对类别向量与类内风格向量进行有效解耦,由此根据类别向量即可得到数据样本对应的分类目标。从而实现了在无需人工标注的情况下有效提高数据分类的精度。进而实现了图像分割网络在无需人工标注的情况下的有效训练。
示例性的,如图6所述,给出了一种应用本申请提供的图像分割模型的训练方法训练图像分割模型的示意图。
图6中的编码器102采用了图4所示的实施例中图像分类模型的编码器,在图像分割模型的训练阶段,编码器102后还接有全连接层(图中未示出)。
在训练时,首先获取样本图像x 1,对于样本图像x 1,对其进行上采样(一般将长宽扩增为以前的2倍),得到尺度图像x 2=R(x 1)。
将样本图像x 1、尺度图像x 2各自输入到编码器中,编码器输出样本图像x 1的样本图像特征和尺度图像x 2的尺度图像特征。
将样本图像特征输入全连接层输出样本分类结果,将尺度图像特征输入全连接层输出尺度分类结果。
基于样本分类结果和样本图像特征,按照类激活图的计算公式,计算样本图像的样本类激活图C 1。基于尺度分类结果和尺度图像特征,按照类激活图的计算公式,计算尺度图像的尺度类激活图C 2
将样本图像特征输入解码器103得到样本分割结果s 1。将尺度图像特征输入解码器103得到尺度分割结果s 2
引入两种约束,一是分割结果应该尽量贴近类激活图,二是在尺度变换后分割出的目标区域(分类目标所在的区域)经过同样的尺度变换应该保持一致。基于这种思路,基于样本类激活图和样本分割结果计算样本类激活图损失;基于尺度类激活图和尺度分割结果计算尺度类激活图损失;基于样本分割结果和尺度分割结果计算尺度损失。计算样本类激活图损失、尺度类激活图损失和尺度损失的加权和得到总损失L seg
根据总损失L seg,通过反向传播法(back propagation)优化解码器中的网络参数。优化方法使用基于Adam的梯度下降法,学习率为0.0001,Adam的参数一阶矩估计的指数衰减率 β 1设为0.5,二阶矩估计的指数衰减率β 2设为0.9。如前文所述,先优化图像分类模型,再将编码器部分固定,迁移到图像分割模型中,再对图像分割模型的解码器部分进行优化。
示例性的,在训练得到图像分割模型后,给出一种使用训练好的图像分割模型进行图像分割的示例性实施例。
图7示出了本申请一个示例性实施例提供的图像分割方法的流程图。该方法可以由计算机设备来执行,例如,如图1所示的终端或服务器来执行。示例性的,执行图像分割模型的训练方法的计算机设备,与,执行图像分割方法的计算机设备,可以是同一个计算机设备,也可以是不同的计算机设备。该方法包括以下步骤。
步骤701,获取输入图像。
示例性的,输入图像可以是任意的、需要进行图像分割的图像。
例如,当图像分割模型被训练为用于分割图像中的人脸,则输入图像可以是包含人脸的图像;当图像分割模型被训练为用于分割图像中的病灶,则输入图像可以是包含病灶的图像。当然,输入图像中也可以并不包含图像分割模型的分类目标,即,输入图像中也可以不包含人脸,或,不包含病灶。
步骤702,调用编码器对输入图像进行特征提取,得到输入图像的输入图像特征。
示例性的,编码器是上述任意实施例中所提到的图像分割模型中的编码器。
示例性的,根据图像分割模型的分类目标的不同,编码器的来源也不同。当图像分割模型的分类目标是人脸时,编码器的参数是根据人脸分类模型的分类编码器的参数设置的,人脸分类模型用于识别输入的图像是否包含人脸。
当图像分割模型的分类目标是病灶时,编码器的参数是根据病灶分类模型的分类编码器的参数设置的,病灶分类模型用于识别输入的图像是否包含病灶。
使用图像分类模型的分类编码器来设置图像分割模型的编码器,则图像分割模型的编码器能够准确提取图像在分类任务上的特征,由于图像分割的分类目标与图像分类的分类目标相同,则图像分割模型可以根据提取的特征进行准确的图像分割,既简化了图像分割模型的训练过程,又提升了图像分割模型的分割准确度。
步骤703,调用解码器对输入图像特征进行解码,得到输入图像的图像分割结果,解码器是根据类激活图损失和尺度损失训练得到的,类激活图损失用于训练解码器输出靠近类激活图的分割结果,类激活图用于表示图像中各个像素点对图像的分类结果的贡献值,尺度损失用于训练解码器对图像内容相同、尺度不同的多个图像输出相近的分割结果。
示例性的,解码器是上述任意实施例中所提到的图像分割模型中的解码器。解码器的训练方法可以参照上述实施例。
编码器输出的分割结果包括输入图像中各个像素点属于各个分类目标的概率值,或,编码器输出的分割结果包括输入图像中各个像素点所属的分类目标。
例如,当图像分割模型的分类目标是人脸时,图像分割结果包括输入图像中各个像素点为人脸的概率值,或,图像分割结果包括输入图像中各个像素点是否为人脸。
例如,当图像分割模型的分类目标是病灶时,图像分割结果包括输入图像中各个像素点为病灶的概率值,或,图像分割结果包括输入图像中各个像素点是否为病灶。
综上所述,本实施例提供的方法,通过使用训练好的图像分割模型对输入图像进行图像分割,用神经网络对输入的图像进行特征提取和图像分割,可以基于图像的深层特征来 分割图像,提高了图像分割的准确度。并且,由于图像分割模型的解码器基于尺度不变性进行了训练,使图像分割模型对图像内容相同但尺度不同的多个图像输出相近的图像分割结果,更贴近图像分割的实际情况,进一步提高了图像分割的准确度。
示例性的,本申请提供的图像分割模型可以在多种应用场景中用于对图像中的不同分类目标进行图像分割。以上述提到的人脸识别和病灶识别的应用场景为例,本申请还给出针对这两个应用场景训练对应的图像分割模型的示例性实施例。
示例性的,给出一种使用本申请提供的图像分割模型的训练方法,训练用于分割图像中的人脸区域的图像分割模型的示例性实施例。
首先,训练用于识别人脸的图像分类模型,图像分类模型包括分类编码器和分类全连接层。图像分类模型用于聚类,即,将输入的多个图像聚为包含人脸的一类和不包含人脸的一类。
第一步,获取数据样本。
第二步,调用分类编码器和分类全连接层对数据样本进行特征提取和分类,得到数据样本的分类向量,确定数据样本与分类向量的相关性。其中,分类向量包括类别向量和类内风格向量,类别向量用于描述输入的图像中是否包含人脸。
第三步,对类别向量引入类别先验分布,对类内风格向量引入类内风格先验分布,以确定分类向量服从于先验分布的评分值。
第四步,至少根据相关性和评分值训练图像分类模型。
然后,基于已训练完毕的图像分类模型训练图像分割模型,图像分割模型在训练阶段包括编码器、全连接层和解码器,图像分割模型在应用阶段包括编码器和解码器。
第一步,根据分类编码器的参数初始化编码器,根据分类全连接层的参数初始化全连接层,即,编码器和全连接层可以准确对输入的图像进行分类,识别图像中是否包含人脸。
第二步,获取训练数据集,训练数据集中包括至少一组样本图像和尺度图像,样本图像是包括或不包括人脸的图像,尺度图像是对样本图像进行上采样得到的图像,尺度图像的尺寸是样本图像的二倍。
第三步,调用编码器对样本图像进行特征提取得到样本图像特征,调用全连接层对样本图像特征进行分类得到样本图像的样本分类结果,样本分类结果包括样本图像包括人脸的概率值。调用编码器对尺度图像进行特征提取得到尺度图像特征,调用全连接层对尺度图像特征进行分类得到尺度图像的尺度分类结果,尺度分类结果包括尺度图像包括人脸的概率值。
第四步,调用解码器对样本图像特征进行解码得到样本图像的样本分割结果,样本分割结果包括样本图像中每个像素点为人脸的概率值。调用解码器对尺度图像特征进行解码得到尺度图像的尺度分割结果,尺度分割结果包括尺度图像中每个像素点为人脸的概率值。
第五步,根据样本图像特征和样本分类结果计算样本图像的样本类激活图。根据尺度图像特征和尺度分类结果计算尺度图像的尺度类激活图。
第六步,计算样本类激活图和样本分割结果的样本类激活图损失,计算尺度类激活图和尺度分割结果的尺度类激活图损失,计算样本分割结果和尺度分割结果的尺度损失。计算样本类激活图损失、尺度类激活图损失和尺度损失的加权和,得到总损失。
第七步,根据总损失训练解码器。
第八步,重复第三步至第七步,迭代训练解码器,得到最终的图像分割模型。
第九步,使用训练好的图像分割模型分割图像中的人脸区域。
示例性的,给出一种使用本申请提供的图像分割模型的训练方法,训练用于分割图像中的病灶区域的图像分割模型的示例性实施例。
首先,训练用于识别病灶的图像分类模型,图像分类模型包括分类编码器和分类全连接层。图像分类模型用于聚类,即,将输入的多个图像聚为包含病灶的一类和不包含病灶的一类。示例性的,图像分类模型还可以用于识别具体的几种病灶,例如,用于将输入的多种图像聚类为病灶一、病灶二、病灶三和正常。
第一步,获取数据样本。
第二步,调用分类编码器和分类全连接层对数据样本进行特征提取和分类,得到数据样本的分类向量,确定数据样本与分类向量的相关性。其中,分类向量包括类别向量和类内风格向量,类别向量用于描述输入的图像中是否包含病灶,或,用于描述输入的图像属于各类病灶的概率值。
第三步,对类别向量引入类别先验分布,对类内风格向量引入类内风格先验分布,以确定分类向量服从于先验分布的评分值。
第四步,至少根据相关性和评分值训练图像分类模型。
然后,基于已训练完毕的图像分类模型训练图像分割模型,图像分割模型在训练阶段包括编码器、全连接层和解码器,图像分割模型在应用阶段包括编码器和解码器。
第一步,根据分类编码器的参数初始化编码器,根据分类全连接层的参数初始化全连接层,即,编码器和全连接层可以准确对输入的图像进行分类,识别图像中是否包含病灶,或,识别图像属于哪一类病灶。
第二步,获取训练数据集,训练数据集中包括至少一组样本图像和尺度图像,样本图像是包括或不包括病灶的图像,尺度图像是对样本图像进行上采样得到的图像,尺度图像的尺寸是样本图像的二倍。
第三步,调用编码器对样本图像进行特征提取得到样本图像特征,调用全连接层对样本图像特征进行分类得到样本图像的样本分类结果,样本分类结果包括样本图像包括病灶区域的概率值,或,样本分类结果包括样本图像属于每一类病灶的概率值。调用编码器对尺度图像进行特征提取得到尺度图像特征,调用全连接层对尺度图像特征进行分类得到尺度图像的尺度分类结果,尺度分类结果包括尺度图像包括病灶的概率值,或,尺度分类结果包括尺度图像属于每一类病灶的概率值。
第四步,调用解码器对样本图像特征进行解码得到样本图像的样本分割结果,样本分割结果包括样本图像中每个像素点为病灶的概率值,或,样本分割结果包括样本图像中每个像素点属于每个病灶的概率值。调用解码器对尺度图像特征进行解码得到尺度图像的尺度分割结果,尺度分割结果包括尺度图像中每个像素点为病灶的概率值,或,尺度分割结果包括尺度图像中每个像素点属于每个病灶的概率值。
第五步,根据样本图像特征和样本分类结果计算样本图像的样本类激活图。根据尺度图像特征和尺度分类结果计算尺度图像的尺度类激活图。
第六步,计算样本类激活图和样本分割结果的样本类激活图损失,计算尺度类激活图 和尺度分割结果的尺度类激活图损失,计算样本分割结果和尺度分割结果的尺度损失。计算样本类激活图损失、尺度类激活图损失和尺度损失的加权和,得到总损失。
第七步,根据总损失训练解码器。
第八步,重复第三步至第七步,迭代训练解码器,得到最终的图像分割模型。
第九步,使用训练好的图像分割模型分割图像中的病灶区域。
以下为本申请的装置实施例,对于装置实施例中未详细描述的细节,可以结合参考上述方法实施例中相应的记载,本文不再赘述。
图8示出了本申请的一个示例性实施例提供的图像分割模型的训练装置的结构示意图。该装置可以通过软件、硬件或者两者的结合实现成为计算机设备的全部或一部分,所述图像分割模型包括编码器和解码器,该装置包括:
编码模块601,用于调用编码器对样本图像和尺度图像进行特征提取,得到样本图像的样本图像特征和尺度图像的尺度图像特征,尺度图像包括:放大样本图像得到的图像,或,缩小样本图像得到的图像中的至少一种;
类激活图模块602,用于基于样本图像特征计算类激活图得到样本图像的样本类激活图,基于尺度图像特征计算类激活图得到尺度图像的尺度类激活图;类激活图用于表示图像中各个像素点对图像的分类结果的贡献值;
解码模块603,用于调用解码器对样本图像特征进行解码得到样本图像的样本分割结果,调用解码器对尺度图像特征进行解码得到尺度图像的尺度分割结果;样本分割结果包括样本图像中各个像素点的分类概率值;
损失模块604,用于基于样本类激活图、样本分割结果、尺度类激活图和尺度分割结果计算类激活图损失,基于样本分割结果和尺度分割结果计算尺度损失;类激活图损失用于训练解码器使样本分割结果靠近样本类激活图,使尺度分割结果靠近尺度类激活图;尺度损失用于训练解码器使样本分割结果靠近尺度分割结果;
训练模块605,用于基于类激活图损失和尺度损失训练解码器。
在一种可选的实施例中,类激活图损失包括样本类激活图损失和尺度类激活图损失;
损失模块604,用于基于样本类激活图和样本分割结果计算样本类激活图损失;
损失模块604,用于基于尺度类激活图和尺度分割结果计算尺度类激活图损失;
损失模块604,用于基于样本分割结果和尺度分割结果计算尺度损失。
在一种可选的实施例中,损失模块604,用于根据样本图像和尺度图像的尺度关系,将样本分割结果缩放至与尺度分割结果相同的尺寸,得到缩放后的样本分割结果;
损失模块604,用于基于尺度分割结果与缩放后的样本分割结果的误差,计算尺度损失。
在一种可选的实施例中,损失模块604,用于计算尺度分割结果与缩放后的样本分割结果的第一矩阵差,将第一矩阵差的2范数确定为尺度损失。
在一种可选的实施例中,损失模块604,用于将样本类激活图和样本分割结果的交叉熵确定为样本类激活图损失;
损失模块604,用于将尺度类激活图和尺度分割结果的交叉熵确定为尺度类激活图损失。
在一种可选的实施例中,损失模块604,用于计算样本类激活图损失、尺度类激活图 损失和尺度损失的加权和;
训练模块605,用于根据加权和训练解码器。
在一种可选的实施例中,图像分割模型还包括经过预训练的全连接层;装置还包括:
分类模块606,用于调用全连接层对样本图像特征进行分类预测得到样本图像的样本分类结果;调用全连接层对尺度图像特征进行分类预测得到尺度图像的尺度分类结果;
类激活图模块602,用于基于样本图像特征和样本分类结果计算得到样本图像的样本类激活图;
类激活图模块602,用于基于尺度图像特征和尺度分类结果计算得到尺度图像的尺度类激活图。
在一种可选的实施例中,编码器是经过预训练的编码器;装置还包括:
初始化模块607,用于根据已训练完毕的图像分类模型中分类编码器的参数,设置编码器的参数,图像分类模型与图像分割模型的分类目标相同。
在一种可选的实施例中,装置还包括:
初始化模块607,用于根据已训练完毕的图像分类模型中分类全连接层的参数,设置全连接层的参数,图像分类模型与图像分割模型的分类目标相同。
在一种可选的实施例中,图像分类模型包括分类编码器和分类全连接层;装置还包括:
分类训练模块608,用于获取数据样本;调用分类编码器和分类全连接层对数据样本进行特征提取和分类,得到数据样本的分类向量,确定数据样本和分类向量的相关性;分类向量包括类别向量和类内风格向量;对类别向量引入类别先验分布,对类内风格向量引入类内风格先验分布,以确定分类向量服从于先验分布的评分值;至少根据相关性和评分值训练图像分类模型。
图9示出了本申请的一个示例性实施例提供的图像分割装置的结构示意图。该装置可以通过软件、硬件或者两者的结合实现成为计算机设备的全部或一部分,图像分割模型包括编码器和解码器,该装置包括:
获取模块1001,用于获取输入图像;
特征提取模块1002,用于调用编码器对输入图像进行特征提取,得到输入图像的输入图像特征;
图像分割模块1003,用于调用解码器对输入图像特征进行解码,得到输入图像的图像分割结果,解码器是根据类激活图损失和尺度损失训练得到的,类激活图损失用于训练解码器输出靠近类激活图的分割结果,类激活图用于表示图像中各个像素点对图像的分类结果的贡献值,尺度损失用于训练解码器对图像内容相同、尺度不同的多个图像输出相近的分割结果。
在一种可选的实施例中,编码器的参数是根据人脸分类模型的分类编码器的参数设置的,人脸分类模型用于识别输入的图像是否包含人脸;
图像分割结果包括输入图像中各个像素点为人脸的概率值。
在一种可选的实施例中,编码器的参数是根据病灶分类模型的分类编码器的参数设置的,病灶分类模型用于识别输入的图像是否包含病灶;
图像分割结果包括输入图像中各个像素点为病灶的概率值。
图10是本申请一个实施例提供的服务器的结构示意图。具体来讲:服务器800包括 中央处理单元(英文:Central Processing Unit,简称:CPU)801、包括随机存取存储器(英文:Random Access Memory,简称:RAM)802和只读存储器(英文:Read-Only Memory,简称:ROM)803的系统存储器804,以及连接系统存储器804和中央处理单元801的系统总线805。服务器800还包括帮助计算机内的各个器件之间传输信息的基本输入/输出系统(I/O系统)806,和用于存储操作系统813、应用程序814和其他程序模块815的大容量存储设备807。
基本输入/输出系统806包括有用于显示信息的显示器808和用于用户帐号输入信息的诸如鼠标、键盘之类的输入设备809。其中显示器808和输入设备809都通过连接到系统总线805的输入/输出控制器810连接到中央处理单元801。基本输入/输出系统806还可以包括输入/输出控制器810以用于接收和处理来自键盘、鼠标、或电子触控笔等多个其他设备的输入。类似地,输入/输出控制器810还提供输出到显示屏、打印机或其他类型的输出设备。
大容量存储设备807通过连接到系统总线805的大容量存储控制器(未示出)连接到中央处理单元801。大容量存储设备807及其相关联的计算机
可读存储介质为服务器800提供非易失性存储。也就是说,大容量存储设备807可以包括诸如硬盘或者只读光盘(英文:Compact Disc Read-Only Memory,简称:CD-ROM)驱动器之类的计算机可读存储介质(未示出)。
不失一般性,计算机可读存储介质可以包括计算机存储介质和通信介质。计算机存储介质包括以用于存储诸如计算机可读指令、数据结构、程序模块或其他数据等信息的任何方法或技术实现的易失性和非易失性、可移动和不可移动介质。计算机存储介质包括RAM、ROM、可擦除可编程只读存储器(英文:Erasable Programmable Read-Only Memory,简称:EPROM)、电可擦除可编程只读存储器(英文:Electrically Erasable Programmable Read-Only Memory,简称:EEPROM)、闪存或其他固态存储其技术,CD-ROM、数字通用光盘(英文:Digital Versatile Disc,简称:DVD)或其他光学存储、磁带盒、磁带、磁盘存储或其他磁性存储设备。当然,本领域技术人员可知计算机存储介质不局限于上述几种。上述的系统存储器804和大容量存储设备807可以统称为存储器。
根据本申请的各种实施例,服务器800还可以通过诸如因特网等网络连接到网络上的远程计算机运行。也即服务器800可以通过连接在系统总线805上的网络接口单元811连接到网络812,或者说,也可以使用网络接口单元811来连接到其他类型的网络或远程计算机系统(未示出)。
本申请还提供了一种终端,该终端包括处理器和存储器,存储器中存储有至少一条指令,至少一条指令由处理器加载并执行以实现上述各个方法实施例提供的图像分割模型的训练方法或图像分割方法。需要说明的是,该终端可以是如下图11所提供的终端。
图11示出了本申请一个示例性实施例提供的终端900的结构框图。该终端900可以是:智能手机、平板电脑、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、笔记本电脑或台式电脑。终端900还可能被称为用户帐号设备、便携式终端、膝上型终端、台式终端等其他名称。
通常,终端900包括有:处理器901和存储器902。
处理器901可以包括一个或多个处理核心,比如4核心处理器、8核心处理器等。存储器902可以包括一个或多个计算机可读存储介质,该计算机可读存储介质可以是非暂态的。在一些实施例中,存储器902中的非暂态的计算机可读存储介质用于存储至少一个计算机可读指令,该至少一个计算机可读指令用于被处理器901所执行以实现本申请中方法实施例提供的图像分割模型的训练方法或图像分割方法。
在一些实施例中,终端900还可选包括有:外围设备接口903和至少一个外围设备。处理器901、存储器902和外围设备接口903之间可以通过总线或信号线相连。各个外围设备可以通过总线、信号线或电路板与外围设备接口903相连。具体地,外围设备包括:射频电路904、显示屏905、摄像头组件906、音频电路907、定位组件908和电源909中的至少一种。
在一些实施例中,终端900还包括有一个或多个传感器910。该一个或多个传感器910包括但不限于:加速度传感器911、陀螺仪传感器912、压力传感器913、指纹传感器914、光学传感器915以及接近传感器916。
本领域技术人员可以理解,图11中示出的结构并不构成对终端900的限定,可以包括比图示更多或更少的组件,或者组合某些组件,或者采用不同的组件布置。
所述存储器还包括一个或者一个以上的程序,所述一个或者一个以上程序存储于存储器中,所述一个或者一个以上程序包含用于进行本申请实施例提供的图像分割模型的训练方法或图像分割方法。
本申请还提供一种计算机设备,该计算机设备包括:处理器和存储器,该存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,该至少一条指令、至少一段程序、代码集或指令集由处理器加载并执行以实现上述各方法实施例提供的图像分割模型的训练方法或图像分割方法。
本申请还提供一种计算机可读存储介质,该存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,该至少一条指令、至少一段程序、代码集或指令集由处理器加载并执行以实现上述各方法实施例提供的图像分割模型的训练方法或图像分割方法。
本申请还提供一种计算机程序产品或计算机可读指令,该计算机程序产品包括计算机可读指令,该计算机可读指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机可读指令,处理器执行该计算机可读指令,使得该计算机设备执行上述可选实现方式中提供的图像分割模型的训练方法或图像分割方法。
应当理解的是,在本文中提及的“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上仅为本申请的可选实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (20)

  1. 一种图像分割模型的训练方法,由计算机设备执行,所述图像分割模型包括编码器和解码器,所述方法包括:
    调用所述编码器对样本图像和尺度图像进行特征提取,得到所述样本图像的样本图像特征和所述尺度图像的尺度图像特征,所述尺度图像包括:放大所述样本图像得到的图像和缩小所述样本图像得到的图像中的至少一种;
    基于所述样本图像特征计算类激活图得到所述样本图像的样本类激活图,基于所述尺度图像特征计算类激活图得到所述尺度图像的尺度类激活图;所述类激活图用于表示图像中各个像素点对所述图像的分类结果的贡献值;
    调用所述解码器对所述样本图像特征进行解码得到所述样本图像的样本分割结果,调用所述解码器对所述尺度图像特征进行解码得到所述尺度图像的尺度分割结果;所述样本分割结果包括所述样本图像中各个像素点的分类概率值;
    基于所述样本类激活图、所述样本分割结果、所述尺度类激活图和所述尺度分割结果计算类激活图损失,基于所述样本分割结果和所述尺度分割结果计算尺度损失;
    基于所述类激活图损失和所述尺度损失训练所述解码器;所述类激活图损失用于训练所述解码器使所述样本分割结果靠近所述样本类激活图,使所述尺度分割结果靠近所述尺度类激活图;所述尺度损失用于训练所述解码器使所述样本分割结果靠近所述尺度分割结果。
  2. 根据权利要求1所述的方法,其特征在于,所述类激活图损失包括样本类激活图损失和尺度类激活图损失;
    所述基于所述样本类激活图、所述样本分割结果、所述尺度类激活图和所述尺度分割结果计算类激活图损失,基于所述样本分割结果和所述尺度分割结果计算尺度损失,包括:
    基于所述样本类激活图和所述样本分割结果计算所述样本类激活图损失;
    基于所述尺度类激活图和所述尺度分割结果计算所述尺度类激活图损失;
    基于所述样本分割结果和所述尺度分割结果计算所述尺度损失。
  3. 根据权利要求2所述的方法,其特征在于,所述基于所述样本分割结果和所述尺度分割结果计算所述尺度损失,包括:
    根据所述样本图像和所述尺度图像的尺度关系,将所述样本分割结果缩放至与所述尺度分割结果相同的尺寸,得到缩放后的样本分割结果;
    基于所述尺度分割结果与所述缩放后的样本分割结果的误差,计算所述尺度损失。
  4. 根据权利要求3所述的方法,其特征在于,所述基于所述尺度分割结果与所述缩放后的样本分割结果的误差,计算所述尺度损失,包括:
    计算所述尺度分割结果与所述缩放后的样本分割结果的第一矩阵差,将所述第一矩阵差的2范数确定为所述尺度损失。
  5. 根据权利要求2所述的方法,其特征在于,所述基于所述样本类激活图和所述样本分割结果计算所述样本类激活图损失,包括:
    将所述样本类激活图和所述样本分割结果的交叉熵确定为所述样本类激活图损失;
    所述基于所述尺度类激活图和所述尺度分割结果计算所述尺度类激活图损失,包括:
    将所述尺度类激活图和所述尺度分割结果的交叉熵确定为所述尺度类激活图损失。
  6. 根据权利要求2至5任一所述的方法,其特征在于,所述基于所述类激活图损失和所述尺度损失训练所述解码器,包括:
    计算所述样本类激活图损失、所述尺度类激活图损失和所述尺度损失的加权和;
    根据所述加权和训练所述解码器。
  7. 根据权利要求1至5任一所述的方法,其特征在于,所述图像分割模型还包括经过预训练的全连接层;所述方法还包括:
    调用所述全连接层对所述样本图像特征进行分类预测得到所述样本图像的样本分类结果;调用所述全连接层对所述尺度图像特征进行分类预测得到所述尺度图像的尺度分类结果;
    所述样本类激活图的计算步骤包括:
    基于所述样本图像特征和所述样本分类结果计算得到所述样本图像的所述样本类激活图;
    所述尺度类激活图的计算步骤包括:
    基于所述尺度图像特征和所述尺度分类结果计算得到所述尺度图像的所述尺度类激活图。
  8. 根据权利要求1至5任一所述的方法,其特征在于,所述编码器是经过预训练的编码器;所述调用编码器对样本图像和尺度图像进行特征提取之前,还包括:
    根据已训练完毕的图像分类模型中分类编码器的参数,设置所述编码器的参数,所述图像分类模型与所述图像分割模型的分类目标相同。
  9. 根据权利要求7所述的方法,其特征在于,所述方法还包括:
    根据已训练完毕的图像分类模型中分类全连接层的参数,设置所述全连接层的参数,所述图像分类模型与所述图像分割模型的分类目标相同。
  10. 根据权利要求1至5任一项所述的方法,其特征在于,所述图像分割模型包括所述分类编码器和分类全连接层;所述方法还包括:
    获取数据样本;
    调用所述分类编码器和所述分类全连接层对所述数据样本进行特征提取和分类,得到所述数据样本的分类向量,确定所述数据样本和所述分类向量的相关性;所述分类向量包括类别向量和类内风格向量;
    对所述类别向量引入类别先验分布,对所述类内风格向量引入类内风格先验分布,以确定所述分类向量服从于先验分布的评分值;
    至少根据所述相关性和所述评分值训练所述图像分类模型。
  11. 一种图像分割方法,由计算机设备执行,所述方法包括:
    获取输入图像;
    调用编码器对输入图像进行特征提取,得到所述输入图像的输入图像特征;
    调用解码器对所述输入图像特征进行解码,得到所述输入图像的图像分割结果,所述解码器是根据类激活图损失和尺度损失训练得到的,所述类激活图损失用于训练所述解码器输出靠近类激活图的分割结果,所述类激活图用于表示图像中各个像素点对所述图像的分类结果的贡献值,所述尺度损失用于训练所述解码器对图像内容相同、尺度不同的多个图像输出相近的分割结果。
  12. 根据权利要求11所述的方法,其特征在于,所述编码器的参数是根据人脸分类模型的分类编码器的参数设置的,所述人脸分类模型用于识别输入的图像是否包含人脸;
    所述图像分割结果包括所述输入图像中各个像素点为人脸的概率值。
  13. 根据权利要求11所述的方法,其特征在于,所述编码器的参数是根据病灶分类模型的分类编码器的参数设置的,所述病灶分类模型用于识别输入的图像是否包含病灶;
    所述图像分割结果包括所述输入图像中各个像素点为病灶的概率值。
  14. 一种图像分割模型的训练装置,所述图像分割模型包括编码器和解码器,所述装置包括:
    编码模块,用于调用所述编码器对样本图像和尺度图像进行特征提取,得到所述样本图像的样本图像特征和所述尺度图像的尺度图像特征,所述尺度图像包括:放大所述样本图像得到的图像和缩小所述样本图像得到的图像中的至少一种;
    类激活图模块,用于基于所述样本图像特征计算类激活图得到所述样本图像的样本类激活图,基于所述尺度图像特征计算类激活图得到所述尺度图像的尺度类激活图;所述类激活图用于表示图像中各个像素点对所述图像的分类结果的贡献值;
    解码模块,用于调用所述解码器对所述样本图像特征进行解码得到所述样本图像的样本分割结果,调用所述解码器对所述尺度图像特征进行解码得到所述尺度图像的尺度分割结果;所述样本分割结果包括所述样本图像中各个像素点的分类概率值;
    损失模块,用于基于所述样本类激活图、所述样本分割结果、所述尺度类激活图和所述尺度分割结果计算类激活图损失,基于所述样本分割结果和所述尺度分割结果计算尺度损失;
    训练模块,用于基于所述类激活图损失和所述尺度损失训练所述解码器;所述类激活图损失用于训练所述解码器使所述样本分割结果靠近所述样本类激活图,使所述尺度分割结果靠近所述尺度类激活图;所述尺度损失用于训练所述解码器使所述样本分割结果靠近所述尺度分割结果。
  15. 根据权利要求14所述的装置,其特征在于,所述类激活图损失包括样本类激活图损失和尺度类激活图损失;
    所述损失模块还用于基于所述样本类激活图和所述样本分割结果计算所述样本类激活图损失;基于所述尺度类激活图和所述尺度分割结果计算所述尺度类激活图损失;基于所述样本分割结果和所述尺度分割结果计算所述尺度损失。
  16. 根据权利要求15所述的装置,其特征在于,所述所述损失模块还用于根据所述样本图像和所述尺度图像的尺度关系,将所述样本分割结果缩放至与所述尺度分割结果相同的尺寸,得到缩放后的样本分割结果;基于所述尺度分割结果与所述缩放后的样本分割结果的误差,计算所述尺度损失。
  17. 根据权利要求14至16任一项所述的装置,其特征在于,所述图像分割模型还包括经过预训练的全连接层;所述装置还包括:
    分类模块,用于调用所述全连接层对所述样本图像特征进行分类预测得到所述样本图像的样本分类结果;调用所述全连接层对所述尺度图像特征进行分类预测得到所述尺度图像的尺度分类结果;
    所述类激活图模块,用于基于所述样本图像特征和所述样本分类结果计算得到所述样 本图像的所述样本类激活图;
    所述类激活图模块,用于基于所述尺度图像特征和所述尺度分类结果计算得到所述尺度图像的所述尺度类激活图。
  18. 一种图像分割装置,所述装置包括:
    获取模块,用于获取输入图像;
    特征提取模块,用于调用编码器对输入图像进行特征提取,得到所述输入图像的输入图像特征;
    图像分割模块,用于调用解码器对所述输入图像特征进行解码,得到所述输入图像的图像分割结果,所述解码器是根据类激活图损失和尺度损失训练得到的,所述类激活图损失用于训练所述解码器输出靠近类激活图的分割结果,所述类激活图用于表示图像中各个像素点对所述图像的分类结果的贡献值,所述尺度损失用于训练所述解码器对图像内容相同、尺度不同的多个图像输出相近的分割结果。
  19. 一种计算机设备,包括存储器和一个或多个处理器,所述存储器存储有计算机程序,所述计算机可读指令被所述一个或多个处理器执行时,使得所述一个或多个处理器实现如权利要求1至13中任一项所述方法的步骤。
  20. 一个或多个存储有计算机可读指令的非易失性一种计算机可读存储介质,存储有计算机可读指令程序,所述计算机可读指令程序被一个或多个处理器执行时,使得所述一个或多个处理器实现如权利要求1至13中任一项所述方法的步骤。
PCT/CN2021/124337 2020-12-16 2021-10-18 图像分割模型的训练方法、图像分割方法、装置、设备 WO2022127333A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21905262.8A EP4220555A4 (en) 2020-12-16 2021-10-18 TRAINING METHOD AND APPARATUS FOR IMAGE SEGMENTATION MODEL, IMAGE SEGMENTATION METHOD AND APPARATUS, AND DEVICE
US17/955,726 US20230021551A1 (en) 2020-12-16 2022-09-29 Using training images and scaled training images to train an image segmentation model

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011487554.0A CN113516665A (zh) 2020-12-16 2020-12-16 图像分割模型的训练方法、图像分割方法、装置、设备
CN202011487554.0 2020-12-16

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/955,726 Continuation US20230021551A1 (en) 2020-12-16 2022-09-29 Using training images and scaled training images to train an image segmentation model

Publications (1)

Publication Number Publication Date
WO2022127333A1 true WO2022127333A1 (zh) 2022-06-23

Family

ID=78060679

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/124337 WO2022127333A1 (zh) 2020-12-16 2021-10-18 图像分割模型的训练方法、图像分割方法、装置、设备

Country Status (4)

Country Link
US (1) US20230021551A1 (zh)
EP (1) EP4220555A4 (zh)
CN (1) CN113516665A (zh)
WO (1) WO2022127333A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115147526A (zh) * 2022-06-30 2022-10-04 北京百度网讯科技有限公司 服饰生成模型的训练、生成服饰图像的方法和装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180108137A1 (en) * 2016-10-18 2018-04-19 Adobe Systems Incorporated Instance-Level Semantic Segmentation System
CN110543911A (zh) * 2019-08-31 2019-12-06 华南理工大学 一种结合分类任务的弱监督目标分割方法
CN111401247A (zh) * 2020-03-17 2020-07-10 杭州趣维科技有限公司 一种基于级联卷积神经网络的人像分割方法
CN111582175A (zh) * 2020-05-09 2020-08-25 中南大学 一种共享多尺度对抗特征的高分遥感影像语义分割方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180108137A1 (en) * 2016-10-18 2018-04-19 Adobe Systems Incorporated Instance-Level Semantic Segmentation System
CN110543911A (zh) * 2019-08-31 2019-12-06 华南理工大学 一种结合分类任务的弱监督目标分割方法
CN111401247A (zh) * 2020-03-17 2020-07-10 杭州趣维科技有限公司 一种基于级联卷积神经网络的人像分割方法
CN111582175A (zh) * 2020-05-09 2020-08-25 中南大学 一种共享多尺度对抗特征的高分遥感影像语义分割方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4220555A4

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115147526A (zh) * 2022-06-30 2022-10-04 北京百度网讯科技有限公司 服饰生成模型的训练、生成服饰图像的方法和装置
CN115147526B (zh) * 2022-06-30 2023-09-26 北京百度网讯科技有限公司 服饰生成模型的训练、生成服饰图像的方法和装置

Also Published As

Publication number Publication date
EP4220555A1 (en) 2023-08-02
EP4220555A4 (en) 2024-03-20
CN113516665A (zh) 2021-10-19
US20230021551A1 (en) 2023-01-26

Similar Documents

Publication Publication Date Title
WO2021169723A1 (zh) 图像识别方法、装置、电子设备及存储介质
US20220092351A1 (en) Image classification method, neural network training method, and apparatus
CN111488826B (zh) 一种文本识别方法、装置、电子设备和存储介质
US20230087526A1 (en) Neural network training method, image classification system, and related device
WO2020228446A1 (zh) 模型训练方法、装置、终端及存储介质
CN114155543B (zh) 神经网络训练方法、文档图像理解方法、装置和设备
US20210342643A1 (en) Method, apparatus, and electronic device for training place recognition model
CN107103326B (zh) 基于超像素聚类的协同显著性检测方法
JP7403909B2 (ja) 系列マイニングモデルの訓練装置の動作方法、系列データの処理装置の動作方法、系列マイニングモデルの訓練装置、系列データの処理装置、コンピュータ機器、及びコンピュータプログラム
CN111814810A (zh) 图像识别方法、装置、电子设备及存储介质
CN111950723A (zh) 神经网络模型训练方法、图像处理方法、装置及终端设备
US20230237771A1 (en) Self-supervised learning method and apparatus for image features, device, and storage medium
WO2021238586A1 (zh) 一种训练方法、装置、设备以及计算机可读存储介质
US20230326173A1 (en) Image processing method and apparatus, and computer-readable storage medium
WO2023016087A1 (zh) 图像聚类方法、装置、计算机设备及存储介质
WO2022127333A1 (zh) 图像分割模型的训练方法、图像分割方法、装置、设备
CN111340213A (zh) 神经网络的训练方法、电子设备、存储介质
CN115909336A (zh) 文本识别方法、装置、计算机设备和计算机可读存储介质
CN110717405A (zh) 人脸特征点定位方法、装置、介质及电子设备
CN114913339B (zh) 特征图提取模型的训练方法和装置
CN113139540B (zh) 背板检测方法及设备
CN115170854A (zh) 基于端到端的PCANetV2的图像分类方法和系统
CN111507421A (zh) 一种基于视频的情感识别方法及装置
CN110807452A (zh) 预测模型构建方法、装置、系统及银行卡卡号识别方法
CN111782874B (zh) 视频检索方法、装置、电子设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21905262

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021905262

Country of ref document: EP

Effective date: 20230425

NENP Non-entry into the national phase

Ref country code: DE