WO2021164534A1 - 图像处理方法、装置、设备及存储介质 - Google Patents

图像处理方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2021164534A1
WO2021164534A1 PCT/CN2021/074722 CN2021074722W WO2021164534A1 WO 2021164534 A1 WO2021164534 A1 WO 2021164534A1 CN 2021074722 W CN2021074722 W CN 2021074722W WO 2021164534 A1 WO2021164534 A1 WO 2021164534A1
Authority
WO
WIPO (PCT)
Prior art keywords
sample
image
feature
network
feature map
Prior art date
Application number
PCT/CN2021/074722
Other languages
English (en)
French (fr)
Inventor
刘钰安
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2021164534A1 publication Critical patent/WO2021164534A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20016Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Definitions

  • the embodiments of the present application relate to the field of image processing, and in particular, to an image processing method, device, device, and storage medium.
  • Image segmentation refers to the process of accurately separating the foreground object of interest from the background in a static image or continuous video sequence. It has a wide range of applications in portrait blurring and background replacement.
  • the main task is The transparent channel image is obtained.
  • the transparency value corresponding to each pixel is marked in the transparent channel image.
  • the area with a transparency value of 1 is the foreground image area, and the area with a transparency value of 0 is the background image area.
  • the obtained transparent channel image can be used Separate the foreground image in the original image.
  • an image segmentation method which needs to generate a three-part image based on the original image.
  • the three-part image is used to divide the original image into three parts, namely, a certain foreground image area, a certain background image area, and a different image. Determine the area, use the three-part map to first determine the uncertain area, and then input the three-part map and the original image into the trained neural network to determine the corresponding transparency value of each pixel in the uncertain area, and then output it for the image Segmented transparent channel image.
  • the transparent channel image obtained in the related technology depends on the accuracy of the tripartite graph, and the tripartite graph needs to be generated by training a specific neural network or manually annotated, resulting in lower accuracy of the generated transparent channel image.
  • the embodiments of the present application provide an image processing method, device, equipment, and storage medium.
  • the technical solution is as follows:
  • an embodiment of the present application provides an image processing method, and the method includes:
  • the first transparency channel image and the original image are input into a second prediction model to obtain a second transparency channel image output by the second prediction model, and the fineness of the second transparency channel image is higher than that of the first The fineness of the transparent channel image;
  • an image processing device which includes:
  • the first acquisition module is configured to acquire an original image, and the original image contains at least one target object;
  • the first prediction module is configured to input the original image into a first prediction model to obtain a first transparency channel image output by the first prediction model, and the first transparency channel image includes each pixel in the original image The corresponding predicted transparency value;
  • the second prediction module is used to input the first transparency channel image and the original image into a second prediction model to obtain a second transparency channel image output by the second prediction model.
  • the fineness of the second transparency channel image is Higher than the fineness of the first transparent channel image;
  • the segmentation processing module is configured to perform segmentation processing on the original image according to the second transparency channel image to obtain an image corresponding to the target object.
  • an embodiment of the present application provides a computer device.
  • the computer device includes a processor and a memory.
  • the memory stores at least one instruction, at least one program, code set, or instruction set, and the at least one instruction
  • the at least one piece of program, the code set or the instruction set is loaded and executed by the processor to implement the image processing method as described in the foregoing aspect.
  • an embodiment of the present application provides a computer-readable storage medium, and the computer-readable storage medium stores at least one instruction, at least one program, code set, or instruction set, the at least one instruction, the At least one program, the code set or the instruction set is loaded and executed by the processor to implement the image processing method as described in the above aspect.
  • the embodiments of the present application provide a computer program product or computer program.
  • the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the computer device executes the image processing method provided in the various optional implementation manners of the foregoing aspects.
  • Fig. 1 shows a flowchart of an image processing method provided by an exemplary embodiment of the present application
  • Fig. 2 shows a flowchart of an image processing method shown in an exemplary embodiment of the present application
  • FIG. 3 shows a flowchart of a training method of a first prediction model shown in an exemplary embodiment of the present application
  • Fig. 4 shows a flowchart of a training method of a first prediction model shown in another exemplary embodiment of the present application
  • FIG. 5 shows a schematic diagram of the process of the training method of the first prediction model shown in an exemplary embodiment of the present application
  • Figure 6 shows a schematic diagram of the structure of each convolution block used by the multi-scale decoding network
  • FIG. 7 shows a flowchart of a training method of a second prediction model shown in an exemplary embodiment of the present application
  • FIG. 8 shows a flowchart of a training method of a second prediction model shown in another exemplary embodiment of the present application.
  • FIG. 9 shows a schematic diagram of a process of a second prediction model training method shown in an exemplary embodiment of the present application.
  • Fig. 10 shows a flowchart of an image processing method shown in another exemplary embodiment of the present application.
  • Fig. 11 shows a flowchart of an image processing method shown in another exemplary embodiment of the present application.
  • Fig. 12 shows a network deployment diagram of an image processing method shown in an exemplary embodiment of the present application
  • Fig. 13 shows a structural block diagram of an image processing apparatus provided by an exemplary embodiment of the present application
  • Fig. 14 shows a schematic structural diagram of a computer device provided by an exemplary embodiment of the present application.
  • the "plurality” mentioned herein means two or more.
  • “And/or” describes the association relationship of the associated objects, indicating that there can be three types of relationships, for example, A and/or B, which can mean: A alone exists, A and B exist at the same time, and B exists alone.
  • the character “/” generally indicates that the associated objects before and after are in an "or” relationship.
  • Image segmentation refers to the process of accurately separating the foreground object of interest from the background in a static image or continuous video sequence.
  • it is necessary to generate a transparent channel image for segmenting the foreground image, and a transparent channel image It contains the transparency value corresponding to each pixel.
  • the area with a transparency value of 1 represents the foreground image area
  • the area with a transparency value of 0 represents the background image area. Therefore, the obtained transparency channel image can be used to convert the foreground image in the original image. separate from.
  • An image processing method is proposed in the related technology, which is mainly divided into two stages.
  • the first stage is to generate a three-part image based on the original image.
  • the three-part image is used to divide the original image into three parts, which are determined respectively.
  • the foreground image area, the determined background image area and the uncertain area can be divided into the uncertain area in the original image through the generated three-part image;
  • the second stage is: input the generated three-part image and the original image into the trained nerve
  • the transparency value corresponding to each pixel in the uncertain area is determined, and the transparent channel image for image segmentation is output.
  • the tripartite graph also needs to rely on specific neural network generation or manual annotation, which increases the cumbersomeness of the training process, and the corresponding transparent channel image cannot be directly generated from the original image.
  • FIG. 1 shows a flowchart of an image processing method provided by an exemplary embodiment of the present application.
  • the method includes:
  • Step 101 Train a first prediction model based on the sample image and the sample segmentation image.
  • the first prediction model is used to generate a first transparency channel image, and the first transparency channel image includes the predicted transparency value corresponding to each pixel in the original image.
  • Step 102 Train a second prediction model based on the sample image, the first sample transparent channel image, and the sample labeled image.
  • the first sample transparent channel image is obtained by inputting the sample image into the first prediction model, and the second prediction model is used to generate For the second transparency channel image, the fineness of the second transparency channel image is higher than the fineness of the first transparency channel image.
  • Step 103 After preprocessing the original image, input the trained first prediction model to obtain the first transparent channel image output by the first prediction model.
  • Step 104 Input the first transparent channel image and the original image into a second prediction model to obtain a second transparent channel image output by the second prediction model.
  • Step 105 Perform segmentation processing on the original image according to the second transparency channel image to obtain a foreground image.
  • the first prediction model and the second prediction model that can generate transparent channel images are trained, and the original images are preprocessed and input into the first prediction model to obtain the first transparent channel image output by the first prediction model, and The generated first transparent channel image and original image are re-input to the second prediction model to obtain the second transparent channel image output by the second prediction model, so that the second transparent channel image can be used for image processing, compared to the related art . Without generating a three-part image, it is possible to directly generate a transparent channel image from the original image, which further improves the accuracy of the transparent channel image, thereby improving the accuracy of image segmentation.
  • the image processing methods provided in the various embodiments of the present application can be used in computer equipment with image processing functions, and the computer equipment can be a smart phone, a tablet computer, a personal portable computer, and the like.
  • the image processing method provided in the embodiment of the present application may be applied to an application program that requires tasks such as image segmentation, background replacement, and target object blurring.
  • an application program with a beauty function for example, an application program with a beauty function; optionally, the training process of the prediction model in the image processing method provided in each embodiment of the application can be performed in the server, and after the prediction model training is completed, the completed prediction model is trained It is deployed in a computer device for subsequent image processing; optionally, the image processing method provided in each embodiment of the present application can also be used for a server with image processing functions.
  • FIG. 2 shows a flowchart of an image processing method shown in an exemplary embodiment of the present application.
  • the method is used in a computer device as an example for description, and the method includes the following steps.
  • Step 201 Obtain an original image.
  • the foreground image is the image corresponding to the target object. Therefore, the original image should contain at least one target object.
  • the target object may be a person, a scene, an animal, an article, etc., and the embodiment of the present application does not limit the type of the target object.
  • the original image may be preprocessed before the original image is processed.
  • the preprocessing methods may include random rotation, random left-right flipping, random cropping, gamma transformation, etc.
  • the internal data is enhanced to be used in the subsequent feature extraction process.
  • Step 202 Input the original image into the first prediction model to obtain the first transparency channel image output by the first prediction model.
  • the first transparency channel image includes the predicted transparency value corresponding to each pixel in the original image.
  • the first transparency channel image is a probability map, that is, the value of the predicted transparency value corresponding to each pixel in the first transparency channel image is 0-1, for example, the predicted transparency value of a certain pixel is 0.9.
  • the first prediction model it is obtained by training based on sample images, sample labeled images, and sample segmented images, and the sample labeled images are labeled with the standard transparency value corresponding to each pixel in the sample image. Therefore, in a possible implementation manner After the original image is preprocessed and input into the trained first prediction model, the predicted first transparency channel image can be obtained, and the first transparency channel image includes the predicted transparency value corresponding to each pixel in the original image.
  • Step 203 Input the first transparency channel image and the original image into a second prediction model to obtain a second transparency channel image output by the second prediction model.
  • the fineness of the second transparency channel image is higher than that of the first transparency channel image.
  • a second prediction model is deployed, which can be based on the input original image and the first One transparent channel image generates a second transparent channel image with higher precision.
  • the second prediction model is mainly used to correct the transparency value of each pixel in the first predicted transparency channel image, so that the predicted transparency value of each pixel in the second transparency channel image is closer to the standard transparency value.
  • a more refined second transparent channel image can be obtained for subsequent use.
  • Concat concatenated
  • Step 204 Perform segmentation processing on the original image according to the second transparent channel image to obtain an image corresponding to the target object.
  • the second transparency channel image contains the predicted transparency value corresponding to each pixel. Since the transparency value corresponding to the foreground image is 1, the transparency value corresponding to the background image is 0, so according to the second The transparent channel image can separate the foreground image and the background image in the original image, so that the image corresponding to the target object can be obtained.
  • the obtained original image is input into the first prediction model to obtain the first transparency channel image output by the first prediction model (including the predicted transparency value corresponding to each pixel in the original image),
  • the first transparency channel image and the original image are input into the second prediction model to obtain the second transparency channel image output by the second prediction model, which is used to segment the original image according to the second transparency channel image to obtain the image corresponding to the target object .
  • the accuracy of image segmentation can be improved; compared with the image segmentation method in the related technology, it does not need to introduce a three-part image, which can be achieved from The original image directly generates the transparent channel image for segmentation, which further improves the accuracy of image segmentation.
  • the method further includes:
  • sample images are marked with the transparency value corresponding to each pixel in the sample image.
  • sample segmentation image is a binary image obtained by binarizing the sample annotation image;
  • sample labeled image and sample image train the second prediction model.
  • training the second prediction model based on the first sample transparent channel image, sample labeled image, and sample image includes:
  • the refined network obtained by training is determined as the second prediction model.
  • input the sample image, the second sample transparent channel image, and the sample labeled image into the edge gradient network to obtain the edge gradient loss corresponding to the second sample transparent channel image including:
  • the sample edge image is used to indicate the boundary area between the foreground image and the background image in the sample annotated image;
  • the sample edge gradient image is used to indicate the boundary area between the foreground image and the background image in the sample image;
  • the edge transparency channel image is used to indicate the boundary area between the foreground image and the background image in the second sample transparency channel image;
  • the edge gradient loss is calculated.
  • the first prediction model includes a multi-scale coding network, a feature pyramid network, a multi-scale decoding network, and a deep supervision network;
  • training the first prediction model includes:
  • Multi-scale coding network is used for feature extraction of sample images
  • the m first sample feature maps are input into the feature pyramid network to obtain m second sample feature maps output by the feature pyramid network, where different second sample feature maps have the same number of channels and different resolutions, and the feature pyramid network is used Processing the number of channels of the m first sample feature maps as the target number of channels;
  • the multi-scale decoding network is used to add and resolution conversion of m second sample feature maps Operation, the resolution of the transparent channel image of the first sample is the same as the resolution of the sample image;
  • the deep supervision network is used to up-sampling the m second sample feature maps, different second samples
  • the feature map corresponds to different upsampling multiples, and the resolution of the m third sample transparent channel image is the same as the resolution of the sample image;
  • the first prediction model is trained.
  • inputting the original image into the first prediction model to obtain the first transparent channel image output by the first prediction model includes:
  • n first feature maps into the feature pyramid network to obtain n second feature maps output by the feature pyramid network, where the number of channels of different second feature maps is the same and the resolution is different, and the number of channels of the n second feature maps Is the number of target channels;
  • the n second feature maps are input into the multi-scale decoding network to obtain the first transparent channel image output by the multi-scale decoding network.
  • n first feature maps into the feature pyramid network to obtain n second feature maps output by the feature pyramid network, including:
  • the n-th first feature map is subjected to convolution processing to obtain the fourth feature map, and the n+1-th first feature map is convolved and up-sampled
  • the fifth feature map is obtained after processing, the fourth feature map and the fifth feature map are mixed, and the convolution processing is performed to obtain the n-th second feature map.
  • inputting n second feature maps into the multi-scale decoding network to obtain the first transparent channel image output by the multi-scale decoding network includes:
  • the n second feature maps are processed through the convolution block to obtain n third feature maps.
  • the resolutions of the n third feature maps are the same. Among them, different second feature maps correspond to different convolution blocks, and different first feature maps correspond to different convolution blocks. The number of convolution blocks corresponding to the two feature maps is different;
  • input m first sample feature maps into the feature pyramid network to obtain m second sample feature maps output by the feature pyramid network including:
  • the m first sample feature maps are formed into a sample feature pyramid according to the resolution, and the resolution of the first sample feature map in the sample feature pyramid is in a negative correlation with the level of the first sample feature map;
  • the m-th first sample feature map is subjected to convolution processing to obtain the first intermediate sample feature map, and the m+1 first sample
  • the feature map is subjected to convolution and upsampling processing to obtain a second intermediate sample feature map, the first intermediate sample feature map and the second intermediate sample feature map are mixed, and the convolution processing is performed to obtain the m-th second sample feature map.
  • the first prediction model should also be included in the model training stage.
  • the training phase of the model and the training phase of the second predictive model since the process of generating the second transparent channel image is divided into two prediction model stages, namely, the first prediction model and the second prediction model, the first prediction model should also be included in the model training stage.
  • the training phase of the model and the training phase of the second predictive model since the process of generating the second transparent channel image is divided into two prediction model stages, namely, the first prediction model and the second prediction model, the first prediction model should also be included in the model training stage.
  • FIG. 3 shows a flowchart of the training method of the first prediction model shown in an exemplary embodiment of the present application.
  • the method includes:
  • Step 301 Obtain the sample image, sample annotation image and sample segmentation image.
  • the sample annotation image is annotated with the transparency value corresponding to each pixel in the sample image.
  • the sample segmentation image is the binarization obtained by binarizing the sample annotation image. image.
  • the used data set includes a preset number of data pairs, each data pair is a sample image and a sample labeled image corresponding to the sample image, where the sample labeled image has a corresponding sample labeled The standard transparency value corresponding to each pixel in the image.
  • the preset number can be set by the developer.
  • the data set can include 5000 data pairs.
  • the sample annotated image can be annotated by the developer.
  • the first prediction model may be trained based on a deep learning tensor library (PyTorch) framework and a graphics processing unit (GPU).
  • PyTorch deep learning tensor library
  • GPU graphics processing unit
  • the first prediction model is trained using the sample segmentation image and the original image.
  • the sample labeled image can be obtained by binarization, that is, a transparency threshold is set. If the transparency value corresponding to the pixel is greater than the transparency threshold, the The transparency value corresponding to the pixel is represented by 1. If the transparency value corresponding to the pixel is less than the transparency threshold, the transparency value corresponding to the pixel is represented by 0, so that the transparency value corresponding to each pixel in the sample labeled image is equal It is 0 or 1.
  • the transparency threshold can be set by the developer.
  • the transparency threshold can be 0.8, that is, the transparency value of pixels greater than 0.8 is represented by 1, and the transparency value of pixels less than 0.8 is represented by 0.
  • the obtained preset number of sample images and sample segmentation images are divided into a test set and a sample set according to a certain ratio, where the sample set is used in the subsequent training process of the first prediction model,
  • the test set is used to verify the first prediction model.
  • the preset ratio can be set by the developer. For example, if the preset ratio is 2:8, the data set can be divided into a test set and a sample set at a ratio of 2:8.
  • Step 302 Train a first prediction model based on the sample image and the sample segmented image.
  • the amount of sample data can be expanded by preprocessing the image, for example, preprocessing such as random rotation, random left-to-right flipping, random cropping, and Gamma transformation on the sample images in the sample set.
  • preprocessing such as random rotation, random left-to-right flipping, random cropping, and Gamma transformation on the sample images in the sample set.
  • the embodiment of the present application does not limit the preprocessing method of the sample image.
  • the first prediction model may include a multi-scale coding network for feature extraction, a feature pyramid network for compounding and integrating features, a multi-scale decoding network for feature decoding, and a multi-scale decoding network for feature extraction.
  • a deep supervision network that supervises deep features at multiple scales, etc., to achieve rapid convergence and produce preliminary segmentation results.
  • step 302 may include steps 302A to 302F.
  • Step 302A Input the sample image into the multi-scale coding network to obtain m first sample feature maps output by the multi-scale coding network, where m is an integer greater than or equal to 2.
  • the multi-scale coding network is used to extract the features of sample images from different scales (resolutions)
  • the sample images are input into the multi-scale coding network, and the sample images are extracted at different resolutions to obtain features with Sample feature maps of different resolutions, that is, different first sample feature maps have different resolutions and channel numbers, and m is an integer greater than or equal to 2.
  • the multi-scale coding network may adopt a neural network model for feature extraction, for example, a mobile facial network (MobileNetV2) model.
  • a neural network model for feature extraction for example, a mobile facial network (MobileNetV2) model.
  • MobileNetV2 mobile facial network
  • the embodiment of the present application does not limit the neural network model adopted by the multi-scale coding network.
  • the preprocessed sample image is input to the multi-scale coding network, and the multi-scale feature extraction is performed on the sample image through the multi-scale coding network, and m first sample feature maps can be obtained. Feature extraction on multiple scales is performed, and therefore, the resolution and the number of channels of the obtained m first sample feature maps are not the same.
  • the obtained m first sample feature maps can be: 320 ⁇ 1/32, 64 ⁇ 1/16, 32 ⁇ 1/8, 24 ⁇ 1/4, where 320 , 64, 32, 24 represent the number of channels corresponding to each first sample feature map, 1/32, 1/16, 1/8, 1/4 represent the resolution of each first sample feature map relative to the sample image, For example, 1/4 indicates that the resolution corresponding to the first sample feature map is 1/4 of the sample image.
  • the sample image is input into the multi-scale coding network 501, and after the multi-scale feature extraction is performed, four first sample feature maps output by the multi-scale coding network 501 are obtained, and the four are the same.
  • the number of channels and resolution corresponding to this feature map are: 320 ⁇ 1/32, 64 ⁇ 1/16, 32 ⁇ 1/8, 24 ⁇ 1/4.
  • Step 302B input the m first sample feature maps into the feature pyramid network to obtain m second sample feature maps output by the feature pyramid network.
  • the feature pyramid network is used to mix the extracted feature maps and process the number of corresponding channels as the target number of channels.
  • the m first sample feature maps are input into the feature pyramid network, and the m first sample feature maps are integrated and multiplexed through the feature pyramid network to obtain m second sample features picture.
  • the process of processing the first sample feature map by the feature pyramid network may include the following steps:
  • the m first sample feature maps are formed into a sample feature pyramid according to the resolution.
  • the resolution of the first sample feature map in the sample feature pyramid has a negative correlation with the level of the first sample feature map.
  • the m first sample feature maps output by the multi-scale coding network are input to the feature pyramid network, and the sample feature pyramids are first arranged according to the resolution of each first sample feature map, where , The level of the first sample feature map on the sample feature pyramid has a negative correlation with its resolution.
  • the 4 first sample feature maps output by the multi-scale coding network are input into the feature pyramid network 502, and the first sample feature pyramid (such as the feature pyramid) is first arranged according to the resolution.
  • the left side of the network is shown in the pyramid), that is, each level contained in the first sample feature pyramid and its corresponding first sample feature map are: 24 ⁇ 1/4 (first layer), 32 ⁇ 1/8 (first layer) Second layer), 64 ⁇ 1/16 (third layer), 320 ⁇ 1/32 (fourth layer).
  • the first sample feature maps are mixed through upsampling and convolution processing, so that the obtained second sample feature maps not only focus on features of the same sampling size , And make full use of the first sample feature map of other resolutions.
  • the number of channels corresponding to the first sample feature map is the maximum number of channels (that is, corresponding to the minimum resolution)
  • only convolution processing is performed on the first sample feature map to obtain The second sample feature map corresponding to the minimum resolution.
  • the corresponding channel number and resolution are: 320 ⁇ 1/ 32. It can be seen that it corresponds to the maximum number of channels.
  • the first sample feature map of the fourth layer needs to be convolved, that is, the second sample feature map with a resolution of 1/32 can be obtained.
  • the process of generating the second sample feature map it is necessary to mix the first sample feature maps of different resolutions, that is, first Perform convolution processing on the first sample feature map to obtain the first intermediate sample feature map, and then mix the upper layer features, that is, compare the first sample feature map one level higher than the first sample feature map to perform convolution and upper
  • the sampling operation obtains the second intermediate sample feature map, and after mixing the first intermediate feature map and the second intermediate feature map, convolution processing is performed to obtain the second sample feature map corresponding to the resolution.
  • the sample feature map is represented by 128 ⁇ 1/16; in the same way, the second sample feature map with a resolution of 1/8 (128 ⁇ 1/8) and the second sample feature with a resolution of 1/4 can be obtained respectively Figure (128 ⁇ 1/4).
  • the right half of the feature pyramid network 502 in FIG. 5 it is each second feature map output by the feature pyramid network.
  • each second sample feature map is also arranged according to the size of the resolution to form a second sample feature pyramid, where the second sample feature map is at the same level as the second sample feature pyramid.
  • the resolution is negatively correlated.
  • each level of the second sample feature pyramid and its corresponding second sample feature map are: 128 ⁇ 1/4 (first layer), 128 ⁇ 1/8 (second layer), 128 ⁇ 1/16( The third layer), 128 ⁇ 1/32 (the fourth layer).
  • the number of target channels corresponding to the m second sample feature maps can be set by the developer.
  • the number of target channels is 128, and the embodiment of the present application does not limit the number of target channels.
  • Step 302C Input the m second sample feature maps into the multi-scale decoding network to obtain the first sample transparent channel image output by the multi-scale decoding network.
  • the resolution of the first sample transparent channel image is the same as the resolution of the sample image.
  • the m second sample feature maps output by the feature pyramid are input into a multi-scale decoding network, and the multi-scale decoding network performs addition and resolution conversion operations on the second sample feature maps to obtain samples
  • the first sample transparency channel image corresponding to the image, the first sample transparency channel image contains the predicted transparency value corresponding to each pixel in the sample image, and is used for subsequent comparison with the sample segmentation image to calculate the cross-entropy loss.
  • each second sample feature map corresponds to a different resolution, it cannot be directly added, and the minimum resolution is 1/4. Therefore, it is necessary to unify the resolution of each second sample feature map to 1/4 of the original image. .
  • each second sample feature map is processed by a convolution block, and different resolutions correspond to different convolution blocks, and the number of convolution blocks corresponding to different resolutions is different.
  • the types of convolutional blocks used by the multi-scale decoding network include cgr2x, sgr2x, sgr, etc.
  • cgr2x includes convolutional layer, group normalization (Group Normalization) layer, activation function (ReLU) layer, and bilinear difference 2 times upsampling layer, where the number of input and output channels corresponding to the convolutional layer is the same, for example, the number of input channels is 128, and the number of output channels is 128
  • sgr2x includes convolutional layer, Group Normalization layer, ReLU layer and bilinear difference Value 2 times the upsampling layer, where the number of input and output channels corresponding to the convolutional layer is different, for example, the number of input channels is 128, and the number of output channels is 64
  • sgr includes convolutional layer, Group Normalization layer, ReLU layer, and convolutional layer The number of corresponding input and output channels is different. For example, the number of input channels is 128 and the number of output channels
  • FIG. 6 shows a schematic structural diagram of each convolution block used by the multi-scale decoding network.
  • A in FIG. 6 is a schematic diagram of the structure corresponding to cgr2x
  • B in FIG. 6 is a schematic diagram of the structure corresponding to sgr2x
  • C in FIG. 6 is a schematic diagram of the structure corresponding to sgr.
  • the four second sample feature maps output by the feature pyramid network 502 are input into the multi-scale decoding network 503, and processed by different convolution blocks to form four feature maps with a resolution of 1/4 of the original image.
  • the third sample feature map (not shown in the figure), for example, for the second sample feature map 128 ⁇ 1/32, through two cgr2x and one sgr2x convolution blocks in turn, the corresponding third sample feature map can be obtained .
  • the second sample feature map 128 ⁇ 1/16 through a cgr2x and a sgr2x convolution block in turn, the corresponding third sample feature map can be obtained, and in the same way, the first sample feature map corresponding to each second sample feature map can be obtained.
  • Step 302D input the m second sample feature maps into the deep supervision network to obtain m third sample transparent channel images output by the deep supervision network.
  • different second sample feature maps correspond to different upsampling multiples, and the resolution of the m third sample transparent channel images is the same as the resolution of the sample image.
  • the m second sample feature maps output by the feature pyramid network are input to a deep supervision network, and the deep supervision network is used to perform up-sampling processing on the m second sample feature maps to obtain the same image as the sample image.
  • the m third sample transparent channel images with the same resolution are used to provide the first prediction model with cross-entropy loss at different resolutions.
  • the upsampling multiples corresponding to different second sample feature maps are related to their corresponding resolutions.
  • the upsampling multiples corresponding to the second sample feature maps with a resolution of 1/32 are 32 times, and the resolution is 1/16.
  • the upsampling factor corresponding to the second sample feature map is 16 times.
  • the second sample feature map 128 ⁇ 1/4 is upsampled 4 times to obtain the third sample transparent channel image 4, and the second sample feature map 128 ⁇ 1/8 is upsampled 8 times
  • the third sample transparent channel image 8 is obtained by sampling.
  • the third sample transparent channel image 16 and the third sample transparent channel image 32 can be obtained.
  • Step 302E Binarize the first sample transparent channel image and m third sample transparent channel images to obtain the first sample segmented image and m second sample segmented images.
  • the sample segmented image and the m third sample transparent channel images are probabilistic images, in order to accelerate the convergence speed of the first prediction model, the sample segmented image is used, and the sample segmented image is a binary image, therefore, The first sample transparent channel image and m third sample transparent channel images need to be binarized to obtain the first sample segmented image and m second sample segmented images before they can be compared with the sample segmented image, and calculate The cross entropy loss of the first predictive model.
  • the method of binarizing the first sample transparent channel image and the m third sample transparent channel images can refer to the sample segmentation image generation process in the above embodiment, which will not be repeated in this embodiment.
  • binarization processing is performed on the four third sample transparent channel images to obtain four second sample segmented images, which are respectively denoted as the second sample segmented image 32 and the second sample segmented image 16.
  • the second sample segmented image 8 the second sample segmented image 4. Binarize the first sample transparent channel image to obtain the first sample segmented image.
  • Step 302F Train a first prediction model according to the first sample segmentation image, m second sample segmentation images, and sample segmentation images.
  • the loss of the first prediction model uses cross entropy loss, that is, the cross entropy loss between the first sample segmentation image and the sample segmentation image, and the cross entropy loss between the m third sample segmentation images and the sample segmentation image And, the cross entropy loss corresponding to the first prediction model can be obtained.
  • the cross entropy loss between the first sample segmentation image and the sample segmentation image, and the cross entropy loss between each second sample segmentation image and the sample segmentation image are calculated separately, and the cross entropy loss is calculated for each cross entropy loss.
  • the sum is the cross entropy loss corresponding to the first prediction model.
  • the loss corresponding to the first prediction model can be expressed as:
  • the comprehensive loss corresponding to the first prediction model can be calculated according to the above formula (1) and formula (2), so that the comprehensive loss is used to execute the backpropagation algorithm on the first prediction model and update the first prediction model. -Each parameter in the prediction model.
  • the first prediction model is trained through the acquired sample images and sample segmentation images, and cross-entropy loss is introduced in the training process of the first prediction model, so that the first prediction model can quickly converge and improve the first prediction model. Predict the training efficiency of the model.
  • the sample image can be input into the trained first prediction model to obtain the first sample transparent channel image, which is used for the training of the second prediction model. process.
  • FIG. 7 shows a flowchart of a training method of a second prediction model according to an exemplary embodiment of the present application.
  • the method includes:
  • Step 701 Input a sample image into a first prediction model obtained by training, to obtain a first sample transparent channel image output by the first prediction model.
  • the training process of the second prediction model can be performed only after the training of the first prediction model is completed for the second prediction model.
  • the sample image is input into the first prediction model obtained by training, and the first transparent channel image corresponding to the sample image output by the first prediction model is obtained, which is used for the subsequent training of the second prediction model process.
  • each sample image in the data set can be input into the first prediction model to obtain the first sample transparent channel image corresponding to each sample image, so that the sample image and its corresponding first sample transparent channel image are used as training
  • the data set of the second prediction model that is, the data set of the second prediction model is composed of a data pair composed of several sample images and the first sample transparent channel image.
  • the data set when training the second prediction model, can also be divided into a sample set and a training set, where the sample set is used to train the second prediction model, and the training set is used to verify The second predictive model.
  • Step 702 Train a second prediction model based on the first sample transparent channel image, sample label image, and sample image. Since the main task of the first prediction model is to obtain preliminary segmentation results, in order to further improve the convergence speed of the first prediction model, sample annotation images are used to participate in the training process of the first prediction model, and the purpose of the second prediction model is to improve the first prediction model.
  • the fineness of each pixel in a transparent channel image is based on the relationship between the sample segmented image and the sample annotated image (the sample segmented image is obtained by binarizing the sample annotated image). Obviously the sample annotated image is more accurate, so
  • the first sample transparent channel image, sample labeled image, and sample image are used to train the second prediction model.
  • the transparency channel image is mainly used to realize the separation between the foreground image and the background image in the original image
  • a division is introduced.
  • Losses other than the basic matting loss such as connectivity difference loss, structural similarity loss, and edge gradient loss. These losses pay more attention to the transparency channel value of the boundary between the foreground image and the background image, making the output of the refined network
  • the fineness of the transparent channel image of the second sample is higher than that of the transparent channel image of the first sample.
  • step 702 may include steps 702A to 702F.
  • Step 702A Input the first sample transparent channel image and the sample image into the refined network to obtain the second sample transparent channel image output by the refined network.
  • the second prediction model mainly includes a refined network, which is used to perform convolution processing on the first sample transparent channel image and the sample image, and can correct the first sample transparent channel Some incorrect transparency channel values in the image and the correction of the transparency channel values in the boundary area between the foreground image and the background image, so as to improve the fineness of the transparency channel values corresponding to each pixel, and output the second sample transparency channel image.
  • the first sample transparent channel image and the sample image may be subjected to Concat processing, and then input into the refined network.
  • the refined network may include three convolutional network blocks and one convolutional layer, reducing the calculation process.
  • the sample image is input into the trained first prediction model 901, the first sample transparent channel image corresponding to the sample image can be obtained, and then the first sample transparent channel image and the sample image After Concat, input the refined network 902 to output the second sample transparent channel image, where the refined network 902 is composed of three convolutional blocks and one convolutional layer.
  • Step 702B Input the sample image, the second sample transparent channel image, and the sample labeled image into the edge gradient network to obtain the edge gradient loss corresponding to the second sample transparent channel image.
  • a network that specifically calculates the edge gradient loss is provided, that is, the edge gradient network, and the sample image, the second sample transparent channel image and the sample annotation image are input into the edge gradient network , That is, the edge gradient loss corresponding to the transparent channel image of the second sample can be obtained, and the edge loss can be provided for the subsequent training process.
  • the process of obtaining the edge gradient loss may include the following steps:
  • the edge of the image is the boundary area between the foreground image and the background image, to obtain the edge gradient loss, it is necessary to first obtain the edge images corresponding to the sample image and the second sample transparent channel image.
  • a preset operator is provided in the edge gradient network, and the preset operator can perform a first derivative operation on the sample image to obtain the gradient of the sample image in the x and y directions, thereby outputting the sample Gradient image.
  • the preset operator can use Sobel (Sobel) operator, or other filter operators that generate image gradients, such as Scharr operator, Laplacian operator, etc. .
  • Sobel Sobel
  • filter operators that generate image gradients, such as Scharr operator, Laplacian operator, etc.
  • the embodiment of the present application does not limit the preset operator used.
  • A represents the input sample image
  • Gx represents the gradient image of the sample image in the x direction
  • Gy represents the gradient image of the sample image in the y direction
  • G represents the sample gradient image output after the Sobel operator.
  • the gradient map of the sample image in the x and y directions is obtained, and the gradient map of the sample image in the x and y directions is obtained. Obtain the sample gradient image corresponding to the sample image.
  • sample edge image is used to indicate the boundary area between the foreground image and the background image in the sample annotated image.
  • binarization and dilation and corrosion operations are performed on the labeled image to obtain a sample edge image, and the sample edge image is used to divide the edge region between the second sample channel image and the sample gradient image.
  • the sample edge gradient image is generated, and the sample edge gradient image is used to indicate the boundary area between the foreground image and the background image in the sample image.
  • the boundary area between the foreground image and the background image in the sample image can be divided from the sample gradient image, that is, the sample edge gradient image is obtained.
  • an edge transparent channel image is generated, and the edge transparent channel image is used to indicate the boundary area between the foreground image and the background image in the second sample transparent channel image.
  • the boundary area between the foreground image and the background image in the second sample transparency channel image can be divided, that is, the corresponding edge transparency channel can be obtained. image.
  • the edge gradient loss corresponding to the second sample transparency channel image can be calculated.
  • G input represents the sample gradient image
  • E label represents the sample edge image
  • G Refindd represents the second sample transparent channel image
  • ⁇ ... ⁇ 1 represents the edge gradient loss using L 1 How the norm is calculated.
  • the sample image is input to the edge gradient network 903, and the sample gradient image is first obtained through the sobel operator; the sample label image is input to the edge gradient network 903, and after binarization and dilation and corrosion operations, Obtain the sample edge image; input the second sample transparency channel image into the edge gradient network 903, and multiply the sample edge image to obtain the edge transparency channel image; multiply the sample gradient image and the sample edge image to obtain the sample edge gradient image; according to The sample edge gradient image and the edge transparency channel image calculate the edge gradient loss.
  • Step 702C According to the second sample transparent channel image and the sample annotation image, calculate the structural similarity loss and the matting loss corresponding to the second sample transparent channel image.
  • the second sample transparent channel image and the sample label image are brought into the above formula to obtain the matting loss corresponding to the second sample transparent channel image.
  • SSIM(x,y) represents the structural similarity index
  • ⁇ x is the mean value of the sample annotated image
  • ⁇ x is the variance of the sample annotated image
  • ⁇ y is the mean value of the second sample transparent channel image
  • ⁇ y is the second sample
  • C 1 and C 2 are constants.
  • Step 702D Input the second sample transparent channel image and the sample labeled image into the connectivity difference network to obtain the connectivity difference loss corresponding to the second sample transparent channel image.
  • the connectivity refers to a single pixel in a grayscale image, and its neighboring pixels with the same value exist on the top, bottom, left, and right. If the prediction effect of the second prediction model is better, then the predicted second sample transparent channel image and sample labeled image should have similar connected graphs and similar connectivity.
  • the developer presets a connectivity difference network, and can input the second sample transparent channel image and the sample annotation image into the connectivity difference network to calculate the connectivity corresponding to the second sample transparent channel image. Loss of sexual differences.
  • represents the connected area with the maximum value of 1 shared by the second sample transparent channel image and the sample annotated image.
  • the function calculates the connectivity between the i-th pixel p i of the transparent channel image of the second sample and ⁇ . The value of 1 means fully connected, and 0 means disconnected. Indicates the i-th pixel on the sample labeled image.
  • [theta] is a threshold parameter
  • d i represents the current pixel value p i l i critical threshold distance
  • d i [theta] is less than negligible.
  • dist k (i) represents the normalized Euclidean distance between the pixel i connected to the source domain and the pixel i when the threshold is set to k.
  • the second sample transparent channel image and the sample labeled image are input into the connectivity difference network 904, and the connectivity difference loss output by the connectivity difference network 904 can be obtained.
  • Step 702E training a refined network based on edge gradient loss, connectivity difference loss, matting loss, and structural similarity loss.
  • the refined network is trained by combining the various losses obtained in the above embodiments. Compared with only using the matting loss, the fineness of the generated second sample transparent channel image can be significantly improved. .
  • Step 702F Determine the refined network obtained by training as the second prediction model.
  • the back propagation algorithm is performed on the refined network, the parameters of each convolutional layer of the refined network are updated, and the training process in the above embodiment is repeated in each training period until the second prediction
  • the loss function corresponding to the model completely converges, and the refined network that has been trained is determined as the second prediction model.
  • the refined network is trained by introducing multiple loss functions, connectivity difference loss, edge gradient loss, matting loss, and structural similarity loss, so that the second sample transparent channel image output by the refined network is more concerned
  • the value of the transparency channel on the edge area which helps to improve the accuracy of image segmentation.
  • the trained prediction model can be deployed on a computer device and used The first prediction model and the second prediction model implement segmentation processing on the original image.
  • FIG. 10 shows a flowchart of an image processing method according to another exemplary embodiment of the present application.
  • the method is used in a computer device as an example for description, and the method includes the following steps.
  • Step 1001 Obtain an original image.
  • step 201 For the implementation of this step, reference may be made to step 201, which is not described in detail in this embodiment.
  • Step 1002 Input the original image into the multi-scale coding network to obtain n first feature maps output by the multi-scale coding network, where different first feature maps have different resolutions and channel numbers, and n is an integer greater than or equal to 2.
  • the multi-scale coding network is used to extract features of the original image.
  • the original image is preprocessed and then input into the multi-scale coding network.
  • the multi-scale coding network is used to extract different scales in the original image. (Resolution) on the features, so that n first feature maps with different resolutions and channel numbers are obtained.
  • Step 1003 input n first feature maps into the feature pyramid network to obtain n second feature maps output by the feature pyramid network, where the number of channels of different second feature maps is the same and the resolution is different, and n second feature maps The number of channels is the number of target channels.
  • the feature pyramid network is used to integrate features of different resolutions.
  • the n first sample feature maps are input into the feature pyramid network, and the feature maps at different resolutions are compared through the feature pyramid network.
  • step 1003 may include step 1003A, step 1003B, and step 1003C.
  • step 1003A the n first feature maps are arranged according to the resolution to form a feature pyramid, and the resolution of the first feature map in the feature pyramid is in a negative correlation with the level of the first feature map.
  • the n first feature maps are first sorted according to resolution to form a feature pyramid, where the lower the resolution of the first feature map, the higher its level in the feature pyramid network.
  • Step 1003B in response to the number of channels corresponding to the n-th first feature map being the maximum number of channels, perform convolution processing on the n-th first feature map to obtain the n-th second feature map.
  • the number of channels corresponding to the first feature map is the maximum number of channels (corresponding to the minimum resolution)
  • only the first feature map of this layer needs to be convolved to obtain the minimum resolution.
  • the corresponding second feature map if the number of channels corresponding to the first feature map is the maximum number of channels (corresponding to the minimum resolution), only the first feature map of this layer needs to be convolved to obtain the minimum resolution. The corresponding second feature map.
  • Step 1003C in response to the number of channels corresponding to the nth first feature map being not the maximum number of channels, perform convolution processing on the nth first feature map to obtain a fourth feature map, and perform convolution on the n+1th first feature map
  • the fifth feature map is obtained after the up-sampling process, the fourth feature map and the fifth feature map are mixed, and the convolution process is performed to obtain the n-th second feature map.
  • the process of generating the second feature map it is necessary to mix the first feature maps of different resolutions, that is, first The image is subjected to convolution processing to obtain the fourth feature map, and then the upper layer features are mixed, that is, compared with the first feature map one level higher than the first feature map, the convolution and upsampling operations are performed to obtain the fifth feature map, and the fourth feature map is obtained. After the feature map and the fifth feature map are mixed, perform convolution processing to obtain the second feature map corresponding to the resolution
  • step 1003B and step 1003C can be performed at the same time; step 1003A can be performed first, and then step 1003C can be performed; step 1003C can be performed first, and then step 1003B can be performed.
  • the order of execution of step 1003B and step 1003C in this embodiment Does not constitute a limitation.
  • Step 1004 Input the n second feature maps into the multi-scale decoding network to obtain the first transparent channel image output by the multi-scale decoding network.
  • the multi-scale decoding network is used to decode the features.
  • n second feature maps are input to the multi-scale decoding network, and the multi-scale decoding network adds and distinguishes the second feature maps.
  • the rate conversion operation obtains the first transparency channel image corresponding to the original image, and the first transparency channel image contains the predicted transparency value corresponding to each pixel in the original image.
  • step 1004 includes step 1004A and step 1004B.
  • step 1004A the n second feature maps are respectively processed by the convolution block to obtain n third feature maps, and the n third feature maps correspond to the same resolution, where different second feature maps correspond to different convolution blocks, In addition, the number of convolution blocks corresponding to different second feature maps is different.
  • the second feature maps of different resolutions correspond to different convolution blocks, and the number of convolution blocks corresponding to different resolutions is different.
  • Step 1004B Perform addition, convolution and up-sampling processing on the n third feature maps to obtain the first transparent channel image.
  • the generated n third feature maps of the same resolution are added, convolved, and up-sampled, so as to obtain the first transparent channel image with the same resolution as the original image.
  • the model training phase requires the deep supervision network to provide cross-entropy loss at different resolutions, while the model application phase does not need to deploy the deep supervision network, that is, in the model application phase, the first prediction
  • the model only includes a multi-scale coding network, a feature pyramid network, and a multi-scale decoding network. Through feature extraction, multi-scale feature fusion and feature decoding of the original image, the first transparent channel image corresponding to the original image can be generated.
  • Step 1005 Input the first transparency channel image and the original image into a second prediction model to obtain a second transparency channel image output by the second prediction model.
  • the fineness of the second transparency channel image is higher than that of the first transparency channel image.
  • the second prediction model is the refined network trained in the above embodiment.
  • the first transparent channel image and the original image are stitched together, and the refined network is input.
  • Step 1006 Perform segmentation processing on the original image according to the second transparency channel image to obtain an image corresponding to the target object.
  • step 1005 and step 1006 For the implementation manner of step 1005 and step 1006, reference may be made to step 201 and step 202, which are not described in detail in this embodiment.
  • the first prediction model and the second prediction model that have been trained are deployed, and the original image is preprocessed and input into the first prediction model to obtain the first transparent channel image output by the first prediction model.
  • the first transparency channel image and the original image are re-input to the second prediction model to obtain the second transparency channel image output by the second prediction model so that the second transparency channel image can be used for image processing.
  • the three-part image can directly generate a transparent channel image from the original image, which further improves the accuracy of the transparent channel image, thereby improving the accuracy of image segmentation.
  • FIG. 12 shows a network deployment diagram of an image processing method shown in an exemplary embodiment of the present application.
  • the network deployment diagram includes: multi-scale coding network, feature pyramid network, multi-scale decoding network and refined network.
  • the original image is preprocessed, it is input to the multi-scale coding network 1201 to obtain n first feature maps output by the multi-scale coding network 1201; the n first feature maps are input to the feature pyramid
  • the network 1202 obtains n second feature maps output by the feature pyramid network 1202, and the number of channels of the n second feature maps is the number of target channels; the n second feature maps are input to the multi-scale decoding network 1203 and added
  • the first transparent channel image output by the multi-scale decoding network 1203 is obtained; the first transparency channel image and the original image are input to the refinement network 1204 to obtain the second transparency channel image output by the refinement network 1204 , Thereby using the second transparent channel image to perform segmentation processing on the original image to obtain an image corresponding to the target object.
  • FIG. 13 shows a structural block diagram of an image processing apparatus provided by an exemplary embodiment of the present application.
  • the device can be implemented as all or a part of computer equipment through software, hardware or a combination of the two.
  • the device includes:
  • the first acquisition module 1301 is configured to acquire an original image, and the original image contains at least one target object;
  • the first prediction module 1302 is configured to input the original image into a first prediction model to obtain a first transparency channel image output by the first prediction model, and the first transparency channel image includes each pixel in the original image The predicted transparency value corresponding to the point;
  • the second prediction module 1303 is configured to input the first transparency channel image and the original image into a second prediction model to obtain a second transparency channel image output by the second prediction model.
  • the fineness is higher than the fineness of the first transparent channel image
  • the segmentation processing module 1304 is configured to perform segmentation processing on the original image according to the second transparency channel image to obtain an image corresponding to the target object.
  • the device further includes:
  • the second acquisition module is configured to acquire a sample image, a sample annotation image, and a sample segmentation image.
  • the sample annotation image is annotated with a transparency value corresponding to each pixel in the sample image
  • the sample segmentation image is a sample annotation image Binarized image obtained by binarization processing
  • the first training module is configured to train the first prediction model according to the sample image and the sample segmented image
  • the third prediction module is configured to input the sample image into the first prediction model obtained by training to obtain the first sample transparent channel image output by the first prediction model;
  • the second training module is configured to train the second prediction model according to the first sample transparent channel image, the sample annotation image, and the sample image.
  • the second training module includes:
  • a refinement unit configured to input the first sample transparent channel image and the sample image into a refinement network to obtain a second sample transparency channel image output by the refinement network;
  • An edge gradient unit configured to input the sample image, the second sample transparent channel image, and the sample labeled image into an edge gradient network to obtain the edge gradient loss corresponding to the second sample transparent channel image;
  • a calculation unit configured to calculate a structural similarity loss and a matting loss corresponding to the second sample transparency channel image according to the second sample transparency channel image and the sample annotation image;
  • a connectivity difference unit configured to input the second sample transparent channel image and the sample labeled image into a connectivity difference network to obtain the connectivity difference loss corresponding to the second sample transparent channel image;
  • the first training unit is configured to train the refined network according to the edge gradient loss, the connectivity difference loss, the matting loss, and the structural similarity loss;
  • the determining unit is configured to determine the refined network obtained by training as the second prediction model.
  • the edge gradient unit is also used for:
  • sample edge image is used to indicate the boundary area between the foreground image and the background image in the sample labeled image
  • the edge gradient loss is calculated.
  • the first prediction model includes a multi-scale coding network, a feature pyramid network, a multi-scale decoding network, and a deep supervision network;
  • the first training module includes:
  • the first multi-scale encoding unit is configured to input the sample image into the multi-scale encoding network to obtain m first sample feature maps output by the multi-scale encoding network, wherein the features of different first sample feature maps are The resolution and the number of channels are different, m is an integer greater than or equal to 2, and the multi-scale coding network is used for feature extraction of the sample image;
  • the first feature pyramid unit is configured to input the m feature maps of the first sample into the feature pyramid network to obtain m second sample feature maps output by the feature pyramid network, wherein the second sample feature maps are different
  • the number of channels is the same with different resolutions, and the feature pyramid network is used to process the number of channels of the m feature maps of the first sample into the target number of channels;
  • the first multi-scale decoding unit is configured to input m feature maps of the second samples into the multi-scale decoding network to obtain the first sample transparent channel image output by the multi-scale decoding network.
  • the decoding network is configured to perform addition and resolution conversion operations on the m second sample feature maps, and the resolution of the first sample transparent channel image is the same as the resolution of the sample image;
  • a deep supervision unit configured to input m feature maps of the second sample into the deep supervision network to obtain m third sample transparent channel images output by the deep supervision network, and the deep supervision network is used to Up-sampling processing is performed on m second sample feature maps, different second sample feature maps correspond to different upsampling multiples, and the resolution of the m third sample transparent channel images is the same as the resolution of the sample image;
  • a binarization processing unit configured to perform binarization processing on the first sample transparent channel image and the m third sample transparent channel images to obtain a first sample segmented image and m second sample segmented images ;
  • the second training unit is configured to train the first prediction model according to the first sample segmentation image, the m second sample segmentation images, and the sample segmentation image.
  • the first prediction module 1302 includes:
  • the second multi-scale encoding unit is configured to input the original image into the multi-scale encoding network to obtain n first feature maps output by the multi-scale encoding network, wherein the resolutions and channels of the first feature maps are different Different numbers, n is an integer greater than or equal to 2;
  • the second feature pyramid unit is configured to input n of the first feature maps into the feature pyramid network to obtain n second feature maps output by the feature pyramid network, wherein the number of channels of different second feature maps is the same And the resolution is different, the number of channels of the n second feature maps is the number of target channels;
  • the second multi-scale decoding unit is configured to input the n second feature maps into the multi-scale decoding network to obtain the first transparent channel image output by the multi-scale decoding network.
  • the second feature pyramid unit is also used for:
  • the n-th first feature map is subjected to convolution processing to obtain a fourth feature map, and the n+1-th first feature map is convolved and summed
  • a fifth feature map is obtained after upsampling processing, the fourth feature map and the fifth feature map are mixed, and convolution processing is performed to obtain the nth second feature map.
  • the second multi-scale decoding unit is further configured to:
  • the n second feature maps are respectively processed by convolution blocks to obtain n third feature maps.
  • the resolutions of the n third feature maps are the same, and different second feature maps correspond to different convolution blocks. , And the number of convolution blocks corresponding to different second feature maps is different;
  • Adding, convolution, and up-sampling processing are performed on the n third feature maps to obtain the first transparent channel image.
  • the first feature pyramid unit is also used for:
  • the m first sample feature maps are formed into a sample feature pyramid according to the resolution, and the resolution of the first sample feature map in the sample feature pyramid is negatively correlated with the level of the first sample feature map relation;
  • the m-th first sample feature map is subjected to convolution processing to obtain a first intermediate sample feature map, and the The first sample feature map is subjected to convolution and upsampling processing to obtain a second intermediate sample feature map, the first intermediate sample feature map and the second intermediate sample feature map are mixed, and the convolution processing is performed to obtain The m-th second sample feature map.
  • the first transparency channel image output by the first prediction model (including the predicted transparency value corresponding to each pixel in the original image) is obtained, so that the first The transparent channel image and the original image are input into the second prediction model to obtain the second transparent channel image output by the second prediction model, which is used to perform segmentation processing on the original image according to the second transparent channel image to obtain an image corresponding to the target object.
  • the accuracy of image segmentation can be improved; compared with the image segmentation method in the related technology, it does not need to introduce a three-part image, which can be achieved from The original image directly generates the transparent channel image for segmentation, which further improves the accuracy of image segmentation.
  • FIG. 14 shows a schematic structural diagram of a computer device provided by an exemplary embodiment of the present application.
  • the computer equipment 1400 includes a central processing unit (CPU) 1401, a system memory 1404 including a random access memory (Random Access Memory, RAM) 1402 and a read-only memory (Read-Only Memory, ROM) 1403, and A system bus 1405 connecting the system memory 1404 and the central processing unit 1401.
  • the computer device 1400 also includes a basic input/output system (Input/Output system, I/O system) 1406 that helps to transfer information between various devices in the computer device, and is used to store an operating system 1413, application programs 1414, and others.
  • the mass storage device 1407 of the program module 1415 The mass storage device 1407 of the program module 1415.
  • the basic input/output system 1406 includes a display 1408 for displaying information and an input device 1409 such as a mouse and a keyboard for the user to input information.
  • the display 1408 and the input device 1409 are both connected to the central processing unit 1401 through the input and output controller 1410 connected to the system bus 1405.
  • the basic input/output system 1406 may also include an input and output controller 1410 for receiving and processing input from multiple other devices such as a keyboard, a mouse, or an electronic stylus.
  • the input and output controller 1410 also provides output to a display screen, a printer, or other types of output devices.
  • the mass storage device 1407 is connected to the central processing unit 1401 through a mass storage controller (not shown) connected to the system bus 1405.
  • the mass storage device 1407 and its associated computer-readable storage medium provide non-volatile storage for the computer device 1400. That is, the mass storage device 1407 may include a computer-readable storage medium (not shown) such as a hard disk or a CD-ROM (Compact Disc Read-Only Memory) drive.
  • the computer-readable storage medium may include a computer storage medium and a communication medium.
  • Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storing information such as computer-readable storage instructions, data structures, program modules, or other data.
  • Computer storage media include RAM, ROM, Erasable Programmable Read Only Memory (EPROM), Electronically-Erasable Programmable Read-Only Memory (EEPROM), flash memory or Other solid-state storage technologies, such as CD-ROM, Digital Versatile Disc (DVD) or other optical storage, tape cartridges, magnetic tape, magnetic disk storage or other magnetic storage devices.
  • EPROM Erasable Programmable Read Only Memory
  • EEPROM Electronically-Erasable Programmable Read-Only Memory
  • flash memory or Other solid-state storage technologies, such as CD-ROM, Digital Versatile Disc (DVD) or other optical storage, tape cartridges, magnetic tape, magnetic disk storage or other magnetic storage devices.
  • CD-ROM Compact Disc
  • DVD Digital Versatile
  • the memory stores one or more programs, one or more programs are configured to be executed by one or more central processing units 1401, one or more programs contain instructions for implementing the above method embodiments, and the central processing unit 1401 executes the One or more programs implement the methods provided in the foregoing method embodiments.
  • the computer device 1400 may also be connected to a remote server on the network through a network such as the Internet to operate. That is, the computer device 1400 can be connected to the network 1412 through the network interface unit 1411 connected to the system bus 1405, or in other words, the network interface unit 1411 can also be used to connect to other types of networks or remote server systems (not shown) ).
  • the memory further includes one or more programs, the one or more programs are stored in the memory, and the one or more programs include steps for performing the steps executed by the computer device in the method provided in the embodiments of the present application .
  • the embodiments of the present application also provide a computer-readable storage medium that stores at least one instruction, and the at least one instruction is loaded and executed by the processor to realize the image described in each of the above embodiments. Approach.
  • a computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the computer device executes the image processing method provided in the various optional implementation manners of the foregoing aspects.
  • the functions described in the embodiments of the present application may be implemented by hardware, software, firmware, or any combination thereof. When implemented by software, these functions can be stored in a computer-readable storage medium or transmitted as one or more instructions or codes on the computer-readable storage medium.
  • the computer-readable storage medium includes a computer storage medium and a communication medium, where the communication medium includes any medium that facilitates the transfer of a computer program from one place to another.
  • the storage medium may be any available medium that can be accessed by a general-purpose or special-purpose computer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

一种图像处理方法、装置、设备及存储介质,属于图像处理领域。该方法包括:获取原始图像,该原始图像中包含至少一个目标对象(201);将原始图像输入第一预测模型,得到第一预测模型输出的第一透明通道图像,第一透明通道图像中包括原始图像中各个像素点对应的预测透明度值(202);将第一透明通道图像和原始图像输入第二预测模型,得到第二预测模型输出的第二透明通道图像,第二透明通道图像的精细度高于第一透明通道图像的精细度(203);根据第二透明通道图像对原始图像进行分割处理,得到目标对象对应的图像(204)。相较于相关技术中的图像分割方法,无需引入三分图,可以实现从原始图像直接生成透明通道图像,进一步提升了进行图像分割的准确性。

Description

图像处理方法、装置、设备及存储介质
本申请要求于2020年02月18日提交的申请号为202010099612.6、发明名称为“图像处理方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及图像处理领域,特别涉及一种图像处理方法、装置、设备及存储介质。
背景技术
图像分割指在静态图像或连续的视频序列中精确地将感兴趣的前景对象从背景中分离出来的过程,在人像虚化、背景替换等方面具有广泛应用,对于图像分割任务来说,主要任务是得到透明通道图像,透明通道图像中标注有各个像素点对应的透明度值,其中,透明度值为1的区域为前景图像区域,透明度为0的区域为背景图像区域,利用得到的透明通道图像可以将原图中的前景图像分离出来。
相关技术中,提供了一种图像分割方法,需要根据原图生成三分图,三分图用来将原图分为三个部分,分别是确定的前景图像区域、确定的背景图像区域和不确定区域,利用三分图首先确定出不确定区域,再将三分图和原图输入训练完成的神经网络中,确定出不确定区域中的各个像素点对应的透明度值,从而输出用于图像分割的透明通道图像。
显然,相关技术中得到的透明通道图像依赖于三分图的精确性,而三分图需要依赖于训练特定神经网络产生或由人工标注得到,导致生成的透明通道图像的精确性较低。
发明内容
本申请实施例提供了一种图像处理方法、装置、设备及存储介质。所述技术方案如下:
一方面,本申请实施例提供了一种图像处理方法,所述方法包括:
获取原始图像,所述原始图像中包含至少一个目标对象;
将所述原始图像输入第一预测模型,得到所述第一预测模型输出的第一透明通道图像,所述第一透明通道图像中包括所述原始图像中各个像素点对应的预测透明度值;
将所述第一透明通道图像和所述原始图像输入第二预测模型,得到所述第二预测模型输出的第二透明通道图像,所述第二透明通道图像的精细度高于所述第一透明通道图像的精细度;
根据所述第二透明通道图像对所述原始图像进行分割处理,得到所述目标对象对应的图像。
另一方面,本申请实施例提供了一种图像处理装置,所述装置包括:
第一获取模块,用于获取原始图像,所述原始图像中包含至少一个目标对象;
第一预测模块,用于将所述原始图像输入第一预测模型,得到所述第一预测模型输出的第一透明通道图像,所述第一透明通道图像中包括所述原始图像中各个像素点对应的预测透明度值;
第二预测模块,用于将所述第一透明通道图像和所述原始图像输入第二预测模型,得到所述第二预测模型输出的第二透明通道图像,所述第二透明通道图像的精细度高于所述第一透明通道图像的精细度;
分割处理模块,用于根据所述第二透明通道图像对所述原始图像进行分割处理,得到所述目标对象对应的图像。
另一方面,本申请实施例提供了一种计算机设备,所述计算机设备包括处理器和存储器,所述存储器中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现如上述方面所述的图像处理方法。
另一方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由处理器加载并执行以实现如上述方面所述的图像处理方法。
另一方面,本申请实施例提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述方面的各种可选实现方式中提供的图像处理方法。
附图说明
图1示出了本申请一个示例性实施例提供的图像处理方法的流程图;
图2示出了本申请一个示例性实施例示出的图像处理方法的流程图;
图3示出了本申请一个示例性实施例示出的第一预测模型的训练方法的流程图;
图4示出了本申请另一个示例性实施例示出的第一预测模型的训练方法的流程图;
图5示出了本申请一个示例性实施例示出的第一预测模型的训练方法的过程示意图;
图6示出了多尺度解码网络使用的各个卷积块的结构示意图;
图7示出了本申请一个示例性实施例示出的第二预测模型的训练方法的流程图;
图8示出了本申请另一个示例性实施例示出的第二预测模型的训练方法的流程图;
图9示出了本申请一个示例性实施例示出的第二预测模型的训练方法的过程示意图;
图10示出了本申请另一个示例性实施例示出的图像处理方法的流程图;
图11示出了本申请另一个示例性实施例示出的图像处理方法的流程图;
图12示出了本申请一个示例性实施例示出的图像处理方法的网络部署图;
图13示出了本申请一个示例性实施例提供的图像处理装置的结构框图;
图14示出了本申请一个示例性实施例提供的计算机设备的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
在本文中提及的“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。
图像分割指在静态图像或连续的视频序列中精确地将感兴趣的前景对象从背景中分离出来的过程,在图像分割任务中,需要生成用于分割出前景图像的透明通道图像,透明通道图像中包含有各个像素点对应的透明度值,比如,透明度值为1的区域代表前景图像区域,透明度为0的区域代表背景图像区域,因此,可以利用得到的透明通道图像将原图中的前景图像分离出来。
相关技术中提出了一种图像处理方法,主要分为两个阶段,第一阶段是:根据原图生成三分图,该三分图用于将原图分为三个部分,分别是确定的前景图像区域、确定的背景图像区域和不确定区域,通过生成的三分图可以划分出原图中的不确定区域;第二阶段是:将生成的三分图和原图输入训练完成的神经网络中,确定出不确定区域各个像素点对应的透明度值,从而输出用于图像分割的透明通道图像。
采用上述相关技术中的方法,若三分图划分出的不确定区域不够准确,则生成的透明通道图的精确性就会较低,从而影响图像分割的准确率。而且,三分图还需依赖特定的神经网络生成或需要人工进行标注,增加了训练过程的繁琐性,不能直接由原图生成对应的透明通道图像。
为了解决上述问题,本申请实施例提供了一种图像处理方法。请参考图1,其示出了本申请一个示例性实施例提供的图像处理方法的流程图。该方法包括:
步骤101,根据样本图像和样本分割图像训练第一预测模型,第一预测模型用于生成第一透明通道图像,第一透明通道图像中包括原始图像中各个像素点对应的预测透明度值。
步骤102,根据样本图像、第一样本透明通道图像和样本标注图像训练第二预测模型,第一样本透明通道图像为将样本图像输入第一预测模型中得到,第二预测模型用于生成第二透明通道图像,第二透明通道图像的精细度高于第一透明通道图像的精细度。
步骤103,将原始图像进行预处理后,输入训练完成的第一预测模型,得到第一预测模型输出的第一透明通道图像。
步骤104,将第一透明通道图像和原始图像输入第二预测模型,得到第二预测模型输出的第二透明通道图像。
步骤105,根据第二透明通道图像对原始图像进行分割处理,得到前景图像。
本申请实施例中,通过训练可以生成透明通道图像的第一预测模型和第二预测模型,将原始图像预处理后输入第一预测模型,得到第一预测模型输出的第一透明通道图像,并将生成的第一透明通道图像和原始图像重新输入第二预测模型,得到第二预测模型输出的第二透明通道图像,以便将该第二透明通道图像用于图像处理,相比于相关技术中,无需生成三分图,可以实现由原始图像直接生成透明通道图像,进一步提高了透明通道图像的精度,从而提高了图像分割的准确率。
需要说明的是,本申请各个实施例提供的图像处理方法可以用于具有图像处理功能的计算机设备,该计算机设备可以是智能手机、平板电脑、个人便携式计算机等。在一种可能的实施方式中,本申请实施例提供的图像处理方法可以应用于需要进行图像分割、背景替换、目标对象虚化等任务的应用程序。比如, 具有美颜功能的应用程序;可选的,本申请各个实施例提供的图像处理方法中预测模型的训练过程可以在服务器中进行,并在预测模型训练完成之后,将训练完成的预测模型部署在计算机设备中进行后续的图像处理;可选的,本申请各个实施例提供的图像处理方法也可以用于具有图像处理功能的服务器。
为了便于描述,在下述方法实施例中,仅以图像处理方法的执行主体是计算机设备为例进行介绍说明。
请参考图2,其示出了本申请一个示例性实施例示出的图像处理方法的流程图。本实施例以该方法用于计算机设备为例进行说明,该方法包括如下步骤。
步骤201,获取原始图像。
由于本申请实施例中图像处理的目的在于将前景图像从原始图像中分离出来,前景图像即目标对象对应的图像。因此,原始图像中应该至少包含一个目标对象。
其中,目标对象可以是人物、景物、动物、物品等,本申请实施例对目标对象的类型不构成限定。
在一种可能的实施方式中,在对原始图像进行处理之前,可以对原始图像进行预处理,进行预处理的方式可以包括随机旋转、随机左右翻转、随机裁剪、伽马(Gamma)变换等在内的数据增强处理,以便用于后续的特征提取过程。
步骤202,将原始图像输入第一预测模型,得到第一预测模型输出的第一透明通道图像,第一透明通道图像中包括原始图像中各个像素点对应的预测透明度值。
其中,第一透明通道图像为概率图,即第一透明通道图像中各个像素点对应的预测透明度值的取值为0-1,比如,某一像素点的预测透明度值为0.9。
针对第一预测模型,其根据样本图像、样本标注图像、样本分割图像训练得到,而样本标注图像中标注有样本图像中各个像素点对应的标准透明度值,因此,在一种可能的实施方式中,将原始图像经过预处理之后,输入训练完成的第一预测模型中,可以得到预测的第一透明通道图像,该第一透明通道图像中包括原始图像中各个像素点对应的预测透明度值。
步骤203,将第一透明通道图像和原始图像输入第二预测模型,得到第二预测模型输出的第二透明通道图像,第二透明通道图像的精细度高于第一透明通道图像的精细度。
由于分割图像的精确度取决于获得的透明通道图像的精确度,因此,为了提高第一透明通道图像的精确度,部署有第二预测模型,该第二预测模型可以根据输入的原始图像和第一透明通道图像,生成精细度更高的第二透明通道图像。
其中,第二预测模型主要用于修正第一预测透明通道图像中各个像素点的透明度值,使得第二透明通道图像中各个像素点的预测透明度值更接近标准透明度值。
在一种可能的实施方式中,将第一透明通道图像和原始图像经过拼接(Concat)处理后,输入第二预测模型,可以得到精细度更高的第二透明通道图像,以便用于后续的图像分割处理过程中。
步骤204,根据第二透明通道图像对原始图像进行分割处理,得到目标对象对应的图像。
在一种可能的实施方式中,第二透明通道图像中包含有各个像素点对应的预测透明度值,由于前景图像对应的透明度值为1,背景图像对应的透明度值为0,因此,根据第二透明通道图像可以将原始图像中的前景图像和背景图像分离,从而可以得到目标对象对应的图像。
综上所述,本申请实施例中,将获取到的原始图像输入第一预测模型,得到第一预测模型输出的第一透明通道图像(包括原始图像中各个像素点对应的预测透明度值),从而将第一透明通道图像和原始图像输入第二预测模型,得到第二预测模型输出的第二透明通道图像,用于根据第二透明通道图像对原始图像进行分割处理,得到目标对象对应的图像。由于第二透明通道图像的精细度高于第一透明通道图像的精细度,因此,可以提高图像分割的准确率;相较于相关技术中的图像分割方法,无需引入三分图,可以实现从原始图像直接生成用于分割的透明通道图像,进一步提升了进行图像分割的准确性。
可选的,获取原始图像之前,方法还包括:
获取样本图像、样本标注图像和样本分割图像,样本标注图像中标注有样本图像中各个像素点对应的透明度值,样本分割图像是对样本标注图像进行二值化处理得到的二值化图像;
根据样本图像和样本分割图像,训练第一预测模型;
将样本图像输入训练得到的第一预测模型,得到第一预测模型输出的第一样本透明通道图像;
根据第一样本透明通道图像、样本标注图像和样本图像,训练第二预测模型。
可选的,根据第一样本透明通道图像、样本标注图像和样本图像,训练第二预测模型,包括:
将第一样本透明通道图像和样本图像输入精细化网络,得到精细化网络输出的第二样本透明通道图像;
将样本图像、第二样本透明通道图像和样本标注图像输入边缘梯度网络,得到第二样本透明通道图像对应的边缘梯度损失;
根据第二样本透明通道图像和样本标注图像,计算第二样本透明通道图像对应的结构相似性损失和抠 图损失;
将第二样本透明通道图像和样本标注图像输入连通性差异网络,得到第二样本透明通道图像对应的连通性差异损失;
根据边缘梯度损失、连通性差异损失、抠图损失和结构相似性损失,训练精细化网络;
将训练得到的精细化网络确定为第二预测模型。
可选的,将样本图像、第二样本透明通道图像和样本标注图像输入边缘梯度网络,得到第二样本透明通道图像对应的边缘梯度损失,包括:
将样本图像输入预设算子中,得到样本图像对应的样本梯度图像,预设算子用于对原始样本图像进行一阶倒数运算;
对样本标注图像进行二值化和膨胀腐蚀操作,得到样本边缘图像,样本边缘图像用于指示样本标注图像中前景图像和背景图像的交界区域;
根据样本边缘图像和样本梯度图像,生成样本边缘梯度图像,样本边缘梯度图像用于指示样本图像中前景图像和背景图像的交界区域;
根据第二样本透明通道图像和样本边缘图像,生成边缘透明通道图像,边缘透明通道图像用于指示第二样本透明通道图像中前景图像和背景图像的交界区域;
根据边缘透明通道图像和样本边缘梯度图像,计算得到边缘梯度损失。
可选的,第一预测模型包括多尺度编码网络、特征金字塔网络、多尺度解码网络和深度监督网络;
根据样本图像和样本分割图像,训练第一预测模型,包括:
将样本图像输入多尺度编码网络,得到多尺度编码网络输出的m个第一样本特征图,其中,不同第一样本特征图的分辨率和通道数不同,m为大于等于2的整数,多尺度编码网络用于对样本图像进行特征提取;
将m个第一样本特征图输入特征金字塔网络,得到特征金字塔网络输出的m个第二样本特征图,其中,不同第二样本特征图的通道数相同且分辨率不同,特征金字塔网络用于将m个第一样本特征图的通道数处理为目标通道数;
将m个第二样本特征图输入多尺度解码网络,得到多尺度解码网络输出的第一样本透明通道图像,多尺度解码网络用于对m个第二样本特征图进行相加和分辨率转换操作,第一样本透明通道图像的分辨率与样本图像的分辨率相同;
将m个第二样本特征图输入深度监督网络,得到深度监督网络输出的m个第三样本透明通道图像,深度监督网络用于对m个第二样本特征图进行上采样处理,不同第二样本特征图对应不同上采样倍数,m个第三样本透明通道图像的分辨率与样本图像的分辨率相同;
对第一样本透明通道图像和m个第三样本透明通道图像进行二值化处理,得到第一样本分割图像和m个第二样本分割图像;
根据第一样本分割图像、m个第二样本分割图像和样本分割图像,训练第一预测模型。
可选的,将原始图像输入第一预测模型,得到第一预测模型输出的第一透明通道图像,包括:
将原始图像输入多尺度编码网络,得到多尺度编码网络输出的n个第一特征图,其中,不同第一特征图的分辨率和通道数不同,n为大于等于2的整数;
将n个第一特征图输入特征金字塔网络,得到特征金字塔网络输出的n个第二特征图,其中,不同第二特征图的通道数相同且分辨率不同,n个第二特征图的通道数为目标通道数;
将n个第二特征图输入多尺度解码网络,得到多尺度解码网络输出的第一透明通道图像。
可选的,将n个第一特征图输入特征金字塔网络,得到特征金字塔网络输出的n个第二特征图,包括:
将n个第一特征图按照分辨率排列形成特征金字塔,特征金字塔中第一特征图的分辨率与第一特征图所在层级呈负相关关系;
响应于第n第一特征图对应的通道数为最大通道数,对第n第一特征图进行卷积处理,得到第n第二特征图;
响应于第n第一特征图对应的通道数不是最大通道数,对第n第一特征图进行卷积处理后得到第四特征图,对第n+1第一特征图进行卷积和上采样处理后得到第五特征图,对第四特征图和第五特征图进行混合,并进行卷积处理后得到第n第二特征图。
可选的,将n个第二特征图输入多尺度解码网络,得到多尺度解码网络输出的第一透明通道图像,包括:
将n个第二特征图分别通过卷积块处理,得到n个第三特征图,n个第三特征图对应的分辨率相同,其中,不同第二特征图对应不同卷积块,且不同第二特征图对应使用的卷积块数目不同;
对n个第三特征图进行相加、卷积和上采样处理,得到第一透明通道图像。
可选的,将m个第一样本特征图输入特征金字塔网络,得到特征金字塔网络输出的m个第二样本特 征图,包括:
将m个第一样本特征图按照分辨率形成样本特征金字塔,样本特征金字塔中第一样本特征图的分辨率与第一样本特征图所在层级呈负相关关系;
响应于第m第一样特征图对应的通道数为最大通道数,对第m第一样本特征图进行卷积处理,得到第m第二样本特征图;
响应于第m第一样本特征图对应的通道数不是最大通道数,对第m第一样本特征图进行卷积处理后得到第一中间样本特征图,对第m+1第一样本特征图进行卷积和上采样处理后得到第二中间样本特征图,对第一中间样本特征图和第二中间样本特征图进行混合,并进行卷积处理后得到第m第二样本特征图。
在一种可能的实施方式中,由于生成第二透明通道图像的过程分为两个预测模型阶段,即第一预测模型和第二预测模型,因此,在模型训练阶段,也应该包括第一预测模型的训练阶段和第二预测模型的训练阶段。
请参考图3,其示出了本申请一个示例性实施例示出的第一预测模型的训练方法的流程图。该方法包括:
步骤301,获取样本图像、样本标注图像和样本分割图像,样本标注图像中标注有样本图像中各个像素点对应的透明度值,样本分割图像是对样本标注图像进行二值化处理得到的二值化图像。
针对第一预测模型的训练过程,所采用的数据集中包括预设数量的数据对,每个数据对均为样本图像和样本图像对应的样本标注图像,其中,样本标注图像中标注有对应的样本图像中各个像素点对应的标准透明度值。
可选的,预设数量可以由开发人员自行设置,数据对的数量越多,第一预测模型的预测准确性就越高。比如,数据集中可以包括5000个数据对。
可选的,样本标注图像可以由开发人员进行标注得到。
可选的,第一预测模型可以基于深度学习张量库(PyTorch)框架,以及图形处理器(Graphics Processing Unit,GPU)来进行训练。
为了使得第一预测模型可以快速的收敛,从而提高第一预测模型的训练速度,使用样本分割图像和原始图像训练第一预测模型。
针对样本分割图像的获取方式,在一种可能的实施方式中,可以将样本标注图像进行二值化处理得到,即设置有透明度阈值,若像素点对应的透明度值大于该透明度阈值时,则将该像素点对应的透明度值用1表示,若像素点对应的透明度值小于该透明度阈值时,则将该像素点对应的透明度值用0表示,使得样本标注图像中各个像素点对应的透明度值均为0或1。
可选的,透明度阈值可以由开发人员进行设置,比如,透明度阈值可以为0.8,即将大于0.8的像素点的透明度值用1表示,小于0.8的像素点的透明度值用0表示。
在一种可能的实施方式中,将获取到的预设数量的样本图像和样本分割图像按照一定比例分为测试集和样本集,其中,样本集用于后续对第一预测模型的训练过程,测试集用于对第一预测模型的校验过程。
可选的,预设比例可以由开发人员进行设置,比如,预设比例为2:8,则可以将数据集按照2:8的比例划分为测试集和样本集。
步骤302,根据样本图像和样本分割图像,训练第一预测模型。
在一种可能的实施方式中,可以通过对图像进行预处理,来扩充样本数据量,比如,对样本集中的样本图像进行随机旋转、随机左右翻转、随机裁剪、Gamma变换等预处理。本申请实施例对样本图像的预处理方式不构成限定。
在一种可能的实施方式中,第一预测模型可以包括用于特征提取的多尺度编码网络、用于复合和整合特征的特征金字塔网络、用于特征解码的多尺度解码网络以及用于从多个尺度对深层特征进行监督的深度监督网络等,以实现快速收敛并产生初步分割结果。
示意性的,如图4所示,步骤302可以包括步骤302A至302F。
步骤302A,将样本图像输入多尺度编码网络,得到多尺度编码网络输出的m个第一样本特征图,m为大于等于2的整数。
由于多尺度编码网络用于从不同尺度(分辨率)上对样本图像进行特征提取,因此,将样本图像输入多尺度编码网络中,通过在不同分辨率上对样本图像进行特征提取,可以得到具有不同分辨率的样本特征图,即不同第一样本特征图的分辨率和通道数不同,m为大于等于2的整数。
其中,多尺度编码网络可以采用用于特征提取的神经网络模型,比如,移动面部网络(MobileNetV2)模型,本申请实施例对多尺度编码网络采用的神经网络模型不构成限定。
在一种可能的实施方式中,将经过预处理之后的样本图像输入多尺度编码网络,通过多尺度编码网络对样本图像进行多尺度特征提取,可以得到m个第一样本特征图,由于是进行了多个尺度上的特征提取, 因此,得到的m个第一样本特征图的分辨率和通道数均不相同。
示意性的,若m取4,则得到的m个第一样本特征图可以为:320×1/32、64×1/16、32×1/8、24×1/4,其中,320、64、32、24表示各个第一样本特征图对应的通道数,1/32、1/16、1/8、1/4表示各个第一样本特征图相对于样本图像的分辨率,比如,1/4表示第一样本特征图对应的分辨率为样本图像的1/4。
示意性的,如图5所示,将样本图像输入多尺度编码网络501中,进行多尺度特征提取之后,得到多尺度编码网络501输出的4个第一样本特征图,4个第一样本特征图对应的通道数和分辨率分别为:320×1/32、64×1/16、32×1/8、24×1/4。
步骤302B,将m个第一样本特征图输入特征金字塔网络,得到特征金字塔网络输出的m个第二样本特征图。
其中,不同第二样本特征图的通道数相同且分辨率不同。
其中,特征金字塔网络用于对提取到的特征图进行混合,并将其对应的通道数处理为目标通道数。
在一种可能的实施方式中,将m个第一样本特征图输入特征金字塔网络,通过特征金字塔网络对m个第一样本特征图进行特征整合和复用,得到m个第二样本特征图。
在一个示例性的例子中,特征金字塔网络对第一样本特征图进行处理的过程可以包括以下步骤:
一、将m个第一样本特征图按照分辨率形成样本特征金字塔,样本特征金字塔中第一样本特征图的分辨率与第一样本特征图所在层级呈负相关关系。
在一种可能的实施方式中,将多尺度编码网络输出的m个第一样本特征图,输入特征金字塔网络,首先按照各个第一样本特征图分辨率的大小排列形成样本特征金字塔,其中,第一样本特征图在该样本特征金字塔上所处的层级与其分辨率呈负相关关系。
示意性的,如图5所示,将多尺度编码网络输出的4个第一样本特征图输入特征金字塔网络502中,首先按照分辨率的高低排列形成第一样本特征金字塔(如特征金字塔网络的左边金字塔所示),即第一样本特征金字塔中包含的各个层级与其对应的第一样本特征图分别为:24×1/4(第一层)、32×1/8(第二层)、64×1/16(第三层)、320×1/32(第四层)。在一种可能的实施方式中,形成第一样本特征金字塔之后,通过上采样和卷积处理,混合各个第一样本特征图,使得获得的第二样本特征图不仅关注同一采样尺寸的特征,且充分利用到其他分辨率的第一样本特征图。
其中,在对第一样本特征图进行整合过程中,对于位于不同层级的第一样本特征图采用不同的整合方式。
二、响应于第m第一样特征图对应的通道数为最大通道数,对第m第一样本特征图进行卷积处理,得到第m第二样本特征图。
在一种可能的实施方式中,若第一样本特征图对应的通道数为最大通道数(即对应最小分辨率),则仅对该第一样本特征图进行卷积处理,即可得到最小分辨率对应的第二样本特征图。
示例性的,如图5所示,基于第一样本特征金字塔与分辨率的关系,对于第四层的第一样本特征图,由于其对应的通道数和分辨率为:320×1/32,可见其对应最大通道数,在进行特征整合过程中,仅需要对第四层的第一样本特征图进行卷积处理,即可以得到分辨率为1/32的第二样本特征图。
三、响应于第m第一样本特征图对应的通道数不是最大通道数,对第m第一样本特征图进行卷积处理后得到第一中间样本特征图,对第m+1第一样本特征图进行卷积和上采样处理后得到第二中间样本特征图,对第一中间样本特征图和第二中间样本特征图进行混合,并进行卷积处理后得到第m第二样本特征图。
在一种可能的实施方式中,若第一样本特征图对应的通道数不是最大通道数,在生成第二样本特征图过程中,需要混合不同分辨率的第一样本特征图,即首先对该第一样本特征图进行卷积处理,得到第一中间样本特征图,再混合上层特征,即对比该第一样本特征图高一层级的第一样本特征图进行卷积和上采样操作,得到第二中间样本特征图,并将第一中间特征图和第二中间特征图混合后,进行卷积处理,得到该分辨率对应的第二样本特征图。
示意性的,如图5所示,对于第四层对应的第一样本特征图320×1/32,仅对该第一样本特征图进行卷积操作,即可得到分辨率为1/32的第二样本特征图,用128×1/32来表示;对于第三层对应的第一样本特征图64×1/16,首先对第一样本特征图64×1/16进行卷积处理得到第一中间样本特征图,并对第四层(高一层级)对应的第一样本特征图320×1/32进行卷积和双线性差值2倍上采样(up2x)处理,得到第二中间样本特征图,再将第一中间样本特征图和第二中间样本特征图混合,并对混合后的样本特征图进行卷积处理,即得到分辨率为1/16的第二样本特征图,用128×1/16表示;同理,可以分别得到分辨率为1/8的第二样本特征图(128×1/8),以及分辨率为1/4的第二样本特征图(128×1/4)。如图5中特征金字塔网络502中的右半边所示,即为特征金字塔网络输出的各个第二特征图。
在一种可能的实施方式中,各个第二样本特征图也同样按照分辨率的大小排列形成第二样本特征金字塔,其中,第二样本特征图在该第二样本特征金字塔上所处的层级与其分辨率呈负相关关系。
比如,第二样本特征金字塔包含的各个层级与其对应的第二样本特征图分别为:128×1/4(第一层)、 128×1/8(第二层)、128×1/16(第三层)、128×1/32(第四层)。
可选的,m个第二样本特征图对应的目标通道数可以由开发人员自行设置,比如,目标通道数为128,本申请实施例对目标通道数不构成限定。
步骤302C,将m个第二样本特征图输入多尺度解码网络,得到多尺度解码网络输出的第一样本透明通道图像。
其中,第一样本透明通道图像的分辨率与样本图像的分辨率相同。
在一种可能的实施方式中,将特征金字塔输出的m个第二样本特征图输入多尺度解码网络中,由多尺度解码网络对第二样本特征图进行相加和分辨率转换操作,得到样本图像对应的第一样本透明通道图像,该第一样本透明通道图像包含样本图像中各个像素点对应的预测透明度值,用于后续与样本分割图像进行比较,计算交叉熵损失。
由于各个第二样本特征图对应不同分辨率,不能直接进行相加处理,且最小分辨率为1/4,因此,首先需要将各个第二样本特征图的分辨率统一为原图的1/4。在一种可能的实施方式中,将各个第二样本特征图经过卷积块处理,且不同分辨率对应不同卷积块,不同分辨率对应的卷积块的数量不同。
其中,多尺度解码网络使用的卷积块的类型包括cgr2x、sgr2x、sgr等,cgr2x包括卷积层、组群归一化(Group Normalization)层、激活函数(ReLU)层以及双线性差值2倍上采样层,其中,卷积层对应的输入输出通道数相同,比如,输入通道数为128,输出通道数为128;sgr2x包括卷积层、Group Normalization层、ReLU层以及双线性差值2倍上采样层,其中,卷积层对应的输入输出通道数不同,比如,输入通道数为128,输出通道数为64;sgr包括卷积层、Group Normalization层、ReLU层,卷积层对应的输入输出通道数不同,比如,输入通道数为128,输出通道数为64。
示意性的,如图6所示,其示出了多尺度解码网络使用的各个卷积块的结构示意图。其中,图6中的(A)为cgr2x对应的结构示意图,图6中的(B)为sgr2x对应的结构示意图,图6中的(C)为sgr对应的结构示意图。
示意性的,如图5所示,将由特征金字塔网络502输出的4个第二样本特征图输入多尺度解码网络503,经过不同卷积块处理,形成4个分辨率为原图1/4的第三样本特征图(图中未示出),比如,对第二样本特征图128×1/32,依次经过两个cgr2x和一个sgr2x卷积块,即可以得到其对应的第三样本特征图,对第二样本特征图128×1/16,依次经过一个cgr2x和一个sgr2x卷积块,即可以得到其对应的第三样本特征图,同理,可以得到各个第二样本特征图对应的第三样本特征图;再将得到的4个第三样本特征图进行相加处理后,经过卷积和4倍的上采样操作,即可以得到第一样本透明通道图像。
步骤302D,将m个第二样本特征图输入深度监督网络,得到深度监督网络输出的m个第三样本透明通道图像。
其中,不同第二样本特征图对应不同上采样倍数,m个第三样本透明通道图像的分辨率与样本图像的分辨率相同。
在一种可能的实施方式中,将特征金字塔网络输出的m个第二样本特征图输入深度监督网络,该深度监督网络用于对m个第二样本特征图进行上采样处理,得到与样本图像相同分辨率的m个第三样本透明通道图像,用于为第一预测模型提供不同分辨率上的交叉熵损失。
其中,不同第二样本特征图对应的上采样倍数与其对应的分辨率有关,比如,分辨率为1/32的第二样本特征图对应的上采样倍数为32倍,分辨率为1/16的第二样本特征图对应的上采样倍数为16倍。
示意性的,如图5所示,将第二样本特征图128×1/4进行4倍上采样得到第三样本透明通道图像4,将第二样本特征图128×1/8进行8倍上采样得到第三样本透明通道图像8,同理,可以得到第三样本透明通道图像16和第三样本透明通道图像32。
步骤302E,对第一样本透明通道图像和m个第三样本透明通道图像进行二值化处理,得到第一样本分割图像和m个第二样本分割图像。
由于第一样本透明通道图像和m个第三样本透明通道图像均是概率图像,而为了加速第一预测模型的收敛速度,采用样本分割图像,而样本分割图像为二值化图像,因此,需要对第一样本透明通道图像和m个第三样本透明通道图像进行二值化处理,得到第一样本分割图像和m个第二样本分割图像,才可以与样本分割图像进行比较,计算第一预测模型的交叉熵损失。
其中,对第一样本透明通道图像和m个第三样本透明通道图像进行二值化处理的方式,可以参考上文实施例中样本分割图像的生成过程,本实施例在此不做赘述。
示意性的,如图5所示,对4个第三样本透明通道图像进行二值化处理,得到4个第二样本分割图像,分别表示为第二样本分割图像32、第二样本分割图像16、第二样本分割图像8、第二样本分割图像4。对第一样本透明通道图像进行二值化处理,得到第一样本分割图像。
步骤302F,根据第一样本分割图像、m个第二样本分割图像和样本分割图像,训练第一预测模型。
其中,第一预测模型的损失采用交叉熵损,即将第一样本分割图像和样本分割图像之间的交叉熵损失, 以及m个第三样本分割图像和样本分割图像之间的交叉熵损失求和,可以得到第一预测模型对应的交叉熵损失。
示意性的,如图5所示,分别计算第一样本分割图像和样本分割图像之间的交叉熵损失,以及各个第二样本分割图像与样本分割图像的交叉熵损失,对各个交叉熵损失求和,即为第一预测模型对应的交叉熵损失。
其中,交叉熵损失的公式可以表示为:
Figure PCTCN2021074722-appb-000001
其中,若
Figure PCTCN2021074722-appb-000002
为样本分割图像和第一样本分割图像的交叉熵损失,则y i表示样本图像对应的样本分割图像,p i表示第一样本分割图像,对所有样本的对数损失表示对每个样本的对数损失的平均值,在理想情况下,对数损失为0。
示意性的,第一预测模型对应的损失可以表示为:
Figure PCTCN2021074722-appb-000003
其中,
Figure PCTCN2021074722-appb-000004
表示第一预测模型对应的综合损失,
Figure PCTCN2021074722-appb-000005
表示第一样本分割图像和样本分割图像的交叉熵损失,
Figure PCTCN2021074722-appb-000006
表示第二样本分割图像32与样本分割图像的交叉熵损失,
Figure PCTCN2021074722-appb-000007
表示第二样本分割图像16与样本分割图像的交叉熵损失,
Figure PCTCN2021074722-appb-000008
表示第二样本分割图像8与样本分割图像的交叉熵损失,
Figure PCTCN2021074722-appb-000009
表示第二样本分割图像4与样本分割图像的交叉熵损失。
在一种可能的实施方式中,可以根据上述公式(1)和公式(2)计算得到第一预测模型对应的综合损失,从而利用该综合损失对第一预测模型执行反向传播算法,更新第一预测模型中的各个参数。
可选的,在多个训练周期内,按照上文实施例所示的方法重复对一预测模型进行训练,直至第一预测模型对应的损失函数完全收敛时,完成第一预测模型的训练,保存第一预测模型,不冻结参数。
本实施例中,通过获取到的样本图像和样本分割图像,训练第一预测模型,并在第一预测模型的训练过程中引入了交叉熵损失,使得第一预测模型可以快速收敛,提升第一预测模型的训练效率。
在一种可能的实施方式中,当第一预测模型训练完成之后,即可以将样本图像输入训练完成的第一预测模型中,得到第一样本透明通道图像,用于第二预测模型的训练过程。
请参考图7,其示出了本申请一个示例性实施例示出的第二预测模型的训练方法的流程图。该方法包括:
步骤701,将样本图像输入训练得到的第一预测模型,得到第一预测模型输出的第一样本透明通道图像。
由于第二预测模型的训练需要依赖于第一预测模型的输出结果,即需要由第一预测模型输出对应的第一样本透明通道图像。因此,第二预测模型需要在第一预测模型训练完成之后,才可以进行第二预测模型的训练过程。
在一种可能的实施方式中,将样本图像输入训练得到的第一预测模型中,得到第一预测模型输出的该样本图像对应的第一透明通道图像,用于后续的第二预测模型的训练过程。
可选的,可以将数据集中各个样本图像均输入第一预测模型中,得到各个样本图像对应的第一样本透明通道图像,从而将样本图像和其对应的第一样本透明通道图像作为训练第二预测模型的数据集,也就是说,第二预测模型的数据集中由若干样本图像和第一样本透明通道图像组成的数据对构成。
可选的,与第一预测模型类似,在训练第二预测模型时,也可以将数据集划分为样本集和训练集,其中,样本集用于训练第二预测模型,训练集用于校验第二预测模型。
可选的,样本集和训练集的数据对比值可以由开发人员自行设置,比如,样本集:训练集=8:2。
步骤702,根据第一样本透明通道图像、样本标注图像和样本图像,训练第二预测模型。由于第一预测模型的主要任务是得到初步分割结果,为了进一步提高第一预测模型的收敛速度,采用了样本标注图像参与第一预测模型的训练过程,而第二预测模型的目的是为了提高第一透明通道图像中各个像素点的精细度,基于样本分割图像和样本标注图像之间的关系(样本分割图像是通过对样本标注图像进行二值化处理得到),显然样本标注图像更精确,因此,在一种可能的实施方式中,采用第一样本透明通道图像、样本标注图像和样本图像训练第二预测模型。
在一种可能的实施方式中,由于透明度通道图像主要是为了实现原图中前景图像和背景图像之间的分离,为了提高第二预测模型输出的第二样本透明通道图像的精度,引入了除基本的抠图损失之外的损失,比如,连通性差异损失、结构相似性损失以及边缘梯度损失,这些损失均更关注前景图像和背景图像交界区域的透明度通道值,使得由精细化网络输出的第二样本透明通道图像的精细度高于第一样本透明通道图 像。
示意性的,如图8所示,步骤702可以包括步骤702A至702F。
步骤702A,将第一样本透明通道图像和样本图像输入精细化网络,得到精细化网络输出的第二样本透明通道图像。
在一种可能的实施方式中,第二预测模型中主要包含精细化网络,该精细化网络用于对第一样本透明通道图像和样本图像进行卷积处理,可以修正第一样本透明通道图像中的某些错误透明度通道值,以及对前景图像和背景图像交界区域的透明度通道值进行修正,从而提高各个像素点对应的透明度通道值的精细度,输出第二样本透明通道图像。
可选的,可以将第一样本透明通道图像和样本图像进行Concat处理后,输入精细化网络。
可选的,精细化网络可以包括三个卷积网络块以及一个卷积层,减少运算过程。
示意性的,如图9所示,将样本图像输入训练完成的第一预测模型901中,可以得到样本图像对应的第一样本透明通道图像,再将第一样本透明通道图像和样本图像Concat后,输入精细化网络902中,输出第二样本透明通道图像,其中,精细化网络902由三个卷积块和一个卷积层构成。
步骤702B,将样本图像、第二样本透明通道图像和样本标注图像输入边缘梯度网络,得到第二样本透明通道图像对应的边缘梯度损失。
在一种可能的实施方式中,对于边缘梯度损失的计算方式,设置有专门计算边缘梯度损失的网络,即边缘梯度网络,将样本图像、第二样本透明通道图像和样本标注图像输入边缘梯度网络,即可以得到第二样本透明通道图像对应的边缘梯度损失,为后续的训练过程提供边缘上的损失。
在一种可能的实施方式中,获得边缘梯度损失的过程可以包括以下步骤:
一、将样本图像输入预设算子中,得到样本图像对应的样本梯度图像,预设算子用于对样本图像进行一阶倒数运算。
由于图像边缘为前景图像和背景图像之间的交界区域,因此,要获得边缘梯度损失,需要首先获得样本图像和第二样本透明通道图像分别对应的边缘图像。
在一种可能的实施方式中,边缘梯度网络中设置有预设算子,该预设算子可以对样本图像进行一阶导数运算,得到样本图像在x和y方向上的梯度,从而输出样本梯度图像。
可选的,预设算子可以采用索贝尔(Sobel)算子,也可以采用其他产生图像梯度的滤波算子,比如,沙尔(Scharr)算子、拉普拉斯(Laplacian)算子等。本申请实施例对采用的预设算子不构成限定。
示意性的,以采用Sobel算子为例,生成样本梯度图像的过程可以表示为:
Figure PCTCN2021074722-appb-000010
Figure PCTCN2021074722-appb-000011
Figure PCTCN2021074722-appb-000012
其中,A表示输入的样本图像,Gx表示样本图像在x方向上的梯度图,Gy表示样本图像在y方向上的梯度图,G表示经过Sobel算子后输出的样本梯度图像。
在一种可能的实施方式中,根据公式(3)和公式(4)对样本图像进行梯度运算后,得到样本图像在x和y方向上的梯度图,带入公式(5)中,可以计算得到样本图像对应的样本梯度图像。
二、对样本标注图像进行二值化和膨胀腐蚀操作,得到样本边缘图像,样本边缘图像用于指示样本标注图像中前景图像和背景图像的交界区域。
在一种可能的实施方式中,对标注图像进行二值化和膨胀腐蚀操作,可以得到样本边缘图像,该样本边缘图像用于划分出第二样本通道图像和样本梯度图像之间的边缘区域。
三、根据样本边缘图像和样本梯度图像,生成样本边缘梯度图像,样本边缘梯度图像用于指示样本图像中前景图像和背景图像的交界区域。
在一种可能的实施方式中,将样本边缘图像和样本梯度图像相乘,可以将样本图像中前景图像和背景图像的交界区域从样本梯度图像中划分出来,即得到样本边缘梯度图像。
四、根据第二样本透明通道图像和样本边缘图像,生成边缘透明通道图像,边缘透明通道图像用于指示第二样本透明通道图像中前景图像和背景图像的交界区域。
在一种可能的实施方式中,将第二样本透明通道图像和样本边缘图像相乘,可以将第二样本透明通道 图像中前景图像和背景图像的交界区域划分出来,即得到对应的边缘透明通道图像。
五、根据边缘透明通道图像和样本边缘梯度图像,计算得到边缘梯度损失。
在一种可能的实施方式中,根据获得的边缘透明通道图像和样本边缘梯度图像,即可以计算得到第二样本透明通道图像对应的边缘梯度损失。
示意性的,边缘梯度损失的计算方式可以表示为:
Figure PCTCN2021074722-appb-000013
其中,
Figure PCTCN2021074722-appb-000014
表示第二样本透明通道图像对应的边缘梯度损失,G input表示样本梯度图像,E label表示样本边缘图像,G Refindd,Mask表示第二样本透明通道图像,‖…‖ 1表示边缘梯度损失采用L 1范数的计算方式。
示意性的,如图9所示,将样本图像输入边缘梯度网络903,首先经过sobel算子,得到样本梯度图像;将样本标注图像输入边缘梯度网络903,经过二值化和膨胀腐蚀操作后,得到样本边缘图像;将第二样本透明通道图像输入边缘梯度网络903,与样本边缘图像相乘后,得到边缘透明通道图像;将样本梯度图像和样本边缘图像相乘,得到样本边缘梯度图像;根据样本边缘梯度图像和边缘透明通道图像计算边缘梯度损失。
步骤702C,根据第二样本透明通道图像和样本标注图像,计算第二样本透明通道图像对应的结构相似性损失和抠图损失。
其中,抠图损失的计算公式可以表示为:
Figure PCTCN2021074722-appb-000015
其中,
Figure PCTCN2021074722-appb-000016
表示第二样本透明通道图像对应的抠图损失,
Figure PCTCN2021074722-appb-000017
表示第二样本透明通道图像中第i个像素点对应的透明度通道值,
Figure PCTCN2021074722-appb-000018
表示样本标注图像中第i个像素点对应的透明度通道值,∈为常数。
在一种可能的实施方式中,将第二样本透明通道图像和样本标注图像带入上述公式中,即可得到第二样本透明通道图像对应的抠图损失。
其中,结构相似性损失的计算公式可以表示为:
Figure PCTCN2021074722-appb-000019
Figure PCTCN2021074722-appb-000020
其中,SSIM(x,y)表示结构相似性指数,
Figure PCTCN2021074722-appb-000021
为第二样本透明通道图像对应的结构相似性损失,μ x为样本标注图像的均值,σ x为样本标注图像的方差,μ y为第二样本透明通道图像的均值,σ y为第二样本透明通道图像的方差,C 1和C 2为常数。
步骤702D,将第二样本透明通道图像和样本标注图像输入连通性差异网络,得到第二样本透明通道图像对应的连通性差异损失。
其中,连通性指灰度图片中针对单个像素,其相邻的上下左右存在相同值的像素。若第二预测模型的预测效果越好,那么预测的第二样本透明通道图像和样本标注图像也应有越相似的连通图,连通性也越相似。
在一种可能的实施方式中,开发人员预先设置有连通性差异网络,可以将第二样本透明通道图像和样本标注图像输入该连通性差异网络,用于计算第二样本透明通道图像对应的连通性差异损失。
其中,连通性差异损失的计算公式可以表示为:
Figure PCTCN2021074722-appb-000022
其中,
Figure PCTCN2021074722-appb-000023
表示各个像素点的连通性差异的累积和,Ω表示第二样本透明通道图像和样本标注图像共有的最大值为1的连通区域。
Figure PCTCN2021074722-appb-000024
函数计算的是第二样本透明通道图像的第i个像素p i与Ω的连通度,其值为1则表示全连通,为0则表示不连通,
Figure PCTCN2021074722-appb-000025
表示样本标注图像上第i个像素点。
其中,
Figure PCTCN2021074722-appb-000026
函数可以采用如下形式表示:
Figure PCTCN2021074722-appb-000027
d i=p i-l i
其中,θ是一个阈值参数,d i表示当前像素值p i到临界阈值l i的距离,当d i小于θ则忽略不计。其中,
Figure PCTCN2021074722-appb-000028
表示p i到l i之间离散像素值的集合,dist k(i)表示设置阈值为k时,对于像素i距离最近的连通到源域的像素,与像素i之间的标准化欧氏距离。
示意性的,如图9所示,将第二样本透明通道图像和样本标注图像输入连通性差异网络904中,可以得到连通性差异网络904输出的连通性差异损失。
步骤702E,根据边缘梯度损失、连通性差异损失、抠图损失和结构相似性损失,训练精细化网络。
在一种可能的实施方式中,通过综合以上实施例中得到的多种损失,来训练精细化网络,相比于仅使用抠图损失,可以明显提高生成的第二样本透明通道图像的精细度。
步骤702F,将训练得到的精细化网络确定为第二预测模型。
在一种可能的实施方式中,对精细化网络执行反向传播算法,更新精细网络各个卷积层的参数,并在各个训练周期内重复进行上文实施例中的训练过程,直至第二预测模型对应的损失函数完全收敛,则将训练完成的精细化网络确定为第二预测模型。
本实施例中,通过引入多个损失函数连通性差异损失、边缘梯度损失、抠图损失和结构相似性损失,来训练精细化网络,使得由精细化网络输出的第二样本透明通道图像更关注在边缘区域上的透明度通道值,从而有利于提高图像分割的精确度。
在一种可能的实施方式中,当按照上文各个实施例所示的方法完成第一预测模型和第二预测模型的训练之后,即可以将训练完成的预测模型部署在计算机设备上,并利用该第一预测模型和第二预测模型实现对原始图像的分割处理。
请参考图10,其示出了本申请另一个示例性实施例示出的图像处理方法的流程图。本实施例以该方法用于计算机设备为例进行说明,该方法包括如下步骤。
步骤1001,获取原始图像。
本步骤的实施方式可以参考步骤201,本实施例在此不做赘述。
步骤1002,将原始图像输入多尺度编码网络,得到多尺度编码网络输出的n个第一特征图,其中,不同第一特征图的分辨率和通道数不同,n为大于等于2的整数。
其中,多尺度编码网络用于对原始图像进行特征提取,在一种可能的实施方式中,将原始图像进行预处理后,输入多尺度编码网络中,通过多尺度编码网络提取原始图像中不同尺度(分辨率)上的特征,从而得到不同分辨率和通道数的n个第一特征图。步骤1003,将n个第一特征图输入特征金字塔网络,得到特征金字塔网络输出的n个第二特征图,其中,不同第二特征图的通道数相同且分辨率不同,n个第二特征图的通道数为目标通道数。
其中,特征金字塔网络用于整合不同分辨率的特征,在一种可能的实施方式中,将n个第一样本特征图输入特征金字塔网络中,通过特征金字塔网络对不同分辨率上的特征图进行特征混合,以及卷积处理,从而得到特征金字塔网络输出的n个第二特征图。
由于不同第一特征图对应的通道数不同,而特征金字塔网络需要将特征图对应的通道数处理为同一通道数,因此,对于不同通道数的第一特征图,需要进行不同的特征整合处理。
在一个示例性的例子中,如图11所示,步骤1003可以包括步骤1003A、步骤1003B和步骤1003C。
步骤1003A,将n个第一特征图按照分辨率排列形成特征金字塔,特征金字塔中第一特征图的分辨率与第一特征图所在层级呈负相关关系。
在一种可能的实施方式中,首先将n个第一特征图按照分辨率进行排序,形成特征金字塔,其中,第一特征图的分辨率越低,其在特征金字塔网络中的层级越高。
步骤1003B,响应于第n第一特征图对应的通道数为最大通道数,对第n第一特征图进行卷积处理,得到第n第二特征图。
在一种可能的实施方式中,若第一特征图对应的通道数为最大通道数(对应最小分辨率),则仅需要对该层第一特征图进行卷积处理,即可以得到最小分辨率对应的第二特征图。
步骤1003C,响应于第n第一特征图对应的通道数不是最大通道数,对第n第一特征图进行卷积处理后得到第四特征图,对第n+1第一特征图进行卷积和上采样处理后得到第五特征图,对第四特征图和第五特征图进行混合,并进行卷积处理后得到第n第二特征图。
在一种可能的实施方式中,若第一特征图对应的通道数不是最大通道数,在生成第二特征图过程中,需要混合不同分辨率的第一特征图,即首先对该第一特征图进行卷积处理,得到第四特征图,再混合上层特征,即对比该第一特征图高一层级的第一特征图进行卷积和上采样操作,得到第五特征图,并将第四特征图和第五特征图混合后,进行卷积处理,得到该分辨率对应的第二特征图
需要说明的是,步骤1003B和步骤1003C可以同时执行;也可以先执行步骤1003A,再执行步骤1003C;也可以先执行步骤1003C,再执行步骤1003B,本实施例对步骤1003B和步骤1003C的执行顺序不构成限定。
步骤1004,将n个第二特征图输入多尺度解码网络,得到多尺度解码网络输出的第一透明通道图像。
其中,多尺度解码网络用于对特征进行解码,在一种可能的实施方式中,将n个第二特征图输入多尺 度解码网络,由多尺度解码网络对第二特征图进行相加和分辨率转换操作,得到原始图像对应的第一透明通道图像,该第一透明通道图像包含原始图像中各个像素点对应的预测透明度值。
示意性的,如图11所示,步骤1004包括步骤1004A和步骤1004B。
步骤1004A,将n个第二特征图分别通过卷积块处理,得到n个第三特征图,n个第三特征图对应的分辨率相同,其中,不同第二特征图对应不同卷积块,且不同第二特征图对应使用的卷积块数目不同。
由于各个第二特征图对应的分辨率不同,不能直接进行相加处理,且存在最小分辨率1/4,因此,首先需要将各个第二特征图的分辨率统一为原始图像的1/4。在一种可能的实施方式中,将各个第二特征图经过卷积块处理,可以得到n个分辨率为最小分辨率的第三特征图。
其中,为了统一分辨率,则不同分辨率的第二特征图对应不同卷积块,且不同分辨率对应的卷积块的数量不同。
步骤1004B,对n个第三特征图进行相加、卷积和上采样处理,得到第一透明通道图像。
在一种可能的实施方式中,将生成的n个相同分辨率的第三特征图进行相加、卷积和上采样处理,从而得到分辨率与原始图像相同的第一透明通道图像。
其中,上述生成第一透明通道图像的过程可以参考上文实施例中的第一预测模型训练过程中第一样本透明通道图像的生成过程,本申请实施例在此不做赘述。
与第一预测模型训练过程不同的是,模型训练阶段需要深度监督网络提供不同分辨率上的交叉熵损失,而模型应用阶段无需部署深度监督网络,也就是说,在模型应用阶段,第一预测模型中仅包含多尺度编码网络、特征金字塔网络以及多尺度解码网络,通过对原始图像进行特征提取、多尺度特征融合以及特征解码,即可以生成原始图像对应的第一透明通道图像。
步骤1005,将第一透明通道图像和原始图像输入第二预测模型,得到第二预测模型输出的第二透明通道图像,第二透明通道图像的精细度高于第一透明通道图像的精细度。
在模型应用阶段,第二预测模型即上文实施例中训练得到的精细化网络,在一种可能的实施方式中,将第一透明通道图像和原始图像进行拼接处理,输入精细化网络,可以得到精细化网络输出的第二透明通道图像。
步骤1006,根据第二透明通道图像对原始图像进行分割处理,得到目标对象对应的图像。
步骤1005和步骤1006的实施方式可以参考步骤201和步骤202,本实施例在此不做赘述。
本申请实施例中,通过部署训练完成的第一预测模型和第二预测模型,将原始图像预处理后输入第一预测模型,得到第一预测模型输出的第一透明通道图像,并将生成的第一透明通道图像和原始图像重新输入第二预测模型,得到第二预测模型输出的第二透明通道图像,以便将该第二透明通道图像用于图像处理,相比于相关技术中,无需生成三分图,可以实现由原始图像直接生成透明通道图像,进一步提高了透明通道图像的精度,从而提高了图像分割的准确率。
请参考图12,其示出了本申请一个示例性实施例示出的图像处理方法的网络部署图。该网络部署图包括:多尺度编码网络、特征金字塔网络、多尺度解码网络和精细化网络。
在一种可能的实施方式中,将原始图像经过预处理后,输入多尺度编码网络1201,得到多尺度编码网络1201输出的n个第一特征图;将该n个第一特征图输入特征金字塔网络1202,得到特征金字塔网络1202输出的n个第二特征图,且该n个第二特征图的通道数为目标通道数;将n个第二特征图输入多尺度解码网络1203,进行相加和分辨率转换等操作后,得到多尺度解码网络1203输出的第一透明通道图像;将该第一透明度通道图像和原始图像输入精细化网络1204,得到精细化网络1204输出的第二透明通道图像,从而利用该第二透明通道图像对原始图像进行分割处理,得到目标对象对应的图像。
请参考图13,其示出了本申请一个示例性实施例提供的图像处理装置的结构框图。该装置可以通过软件、硬件或者两者的结合实现成为计算机设备的全部或一部分,该装置包括:
第一获取模块1301,用于获取原始图像,所述原始图像中包含至少一个目标对象;
第一预测模块1302,用于将所述原始图像输入第一预测模型,得到所述第一预测模型输出的第一透明通道图像,所述第一透明通道图像中包括所述原始图像中各个像素点对应的预测透明度值;
第二预测模块1303,用于将所述第一透明通道图像和所述原始图像输入第二预测模型,得到所述第二预测模型输出的第二透明通道图像,所述第二透明通道图像的精细度高于所述第一透明通道图像的精细度;
分割处理模块1304,用于根据所述第二透明通道图像对所述原始图像进行分割处理,得到所述目标对象对应的图像。
可选的,所述装置还包括:
第二获取模块,用于获取样本图像、样本标注图像和样本分割图像,所述样本标注图像中标注有所述样本图像中各个像素点对应的透明度值,所述样本分割图像是对样本标注图像进行二值化处理得到的二值 化图像;
第一训练模块,用于根据所述样本图像和所述样本分割图像,训练所述第一预测模型;
第三预测模块,用于将所述样本图像输入训练得到的所述第一预测模型,得到所述第一预测模型输出的第一样本透明通道图像;
第二训练模块,用于根据所述第一样本透明通道图像、所述样本标注图像和所述样本图像,训练所述第二预测模型。
可选的,所述第二训练模块,包括:
精细化单元,用于将所述第一样本透明通道图像和所述样本图像输入精细化网络,得到所述精细化网络输出的第二样本透明通道图像;
边缘梯度单元,用于将所述样本图像、所述第二样本透明通道图像和所述样本标注图像输入边缘梯度网络,得到所述第二样本透明通道图像对应的边缘梯度损失;
计算单元,用于根据所述第二样本透明通道图像和所述样本标注图像,计算所述第二样本透明通道图像对应的结构相似性损失和抠图损失;
连通性差异单元,用于将所述第二样本透明通道图像和所述样本标注图像输入连通性差异网络,得到所述第二样本透明通道图像对应的连通性差异损失;
第一训练单元,用于根据所述边缘梯度损失、所述连通性差异损失、所述抠图损失和所述结构相似性损失,训练所述精细化网络;
确定单元,用于将训练得到的所述精细化网络确定为所述第二预测模型。
可选的,所述边缘梯度单元,还用于:
将所述样本图像输入预设算子中,得到所述样本图像对应的样本梯度图像,所述预设算子用于对所述原始样本图像进行一阶倒数运算;
对所述样本标注图像进行二值化和膨胀腐蚀操作,得到样本边缘图像,所述样本边缘图像用于指示所述样本标注图像中前景图像和背景图像的交界区域;
根据所述样本边缘图像和所述样本梯度图像,生成样本边缘梯度图像,所述样本边缘梯度图像用于指示所述样本图像中前景图像和背景图像的交界区域;
根据所述第二样本透明通道图像和所述样本边缘图像,生成边缘透明通道图像,所述边缘透明通道图像用于指示所述第二样本透明通道图像中前景图像和背景图像的交界区域;
根据所述边缘透明通道图像和所述样本边缘梯度图像,计算得到所述边缘梯度损失。
可选的,所述第一预测模型包括多尺度编码网络、特征金字塔网络、多尺度解码网络和深度监督网络;
可选的,所述第一训练模块,包括:
第一多尺度编码单元,用于将所述样本图像输入所述多尺度编码网络,得到所述多尺度编码网络输出的m个第一样本特征图,其中,不同第一样本特征图的分辨率和通道数不同,m为大于等于2的整数,所述多尺度编码网络用于对所述样本图像进行特征提取;
第一特征金字塔单元,用于将m个所述第一样本特征图输入所述特征金字塔网络,得到所述特征金字塔网络输出的m个第二样本特征图,其中,不同第二样本特征图的通道数相同且分辨率不同,所述特征金字塔网络用于将m个所述第一样本特征图的通道数处理为目标通道数;
第一多尺度解码单元,用于将m个所述第二样本特征图输入所述多尺度解码网络,得到所述多尺度解码网络输出的所述第一样本透明通道图像,所述多尺度解码网络用于对m个所述第二样本特征图进行相加和分辨率转换操作,所述第一样本透明通道图像的分辨率与所述样本图像的分辨率相同;
深度监督单元,用于将m个所述第二样本特征图输入所述深度监督网络,得到所述深度监督网络输出的m个第三样本透明通道图像,所述深度监督网络用于对所述m个第二样本特征图进行上采样处理,不同第二样本特征图对应不同上采样倍数,m个所述第三样本透明通道图像的分辨率与所述样本图像的分辨率相同;
二值化处理单元,用于对所述第一样本透明通道图像和m个所述第三样本透明通道图像进行二值化处理,得到第一样本分割图像和m个第二样本分割图像;
第二训练单元,用于根据所述第一样本分割图像、m个所述第二样本分割图像和所述样本分割图像,训练所述第一预测模型。
可选的,所述第一预测模块1302,包括:
第二多尺度编码单元,用于将所述原始图像输入所述多尺度编码网络,得到所述多尺度编码网络输出的n个第一特征图,其中,不同第一特征图的分辨率和通道数不同,n为大于等于2的整数;
第二特征金字塔单元,用于将n个所述第一特征图输入所述特征金字塔网络,得到所述特征金字塔网络输出的n个第二特征图,其中,不同第二特征图的通道数相同且分辨率不同,n个所述第二特征图的通道数为所述目标通道数;
第二多尺度解码单元,用于将n个所述第二特征图输入所述多尺度解码网络,得到所述多尺度解码网络输出的所述第一透明通道图像。
可选的,所述第二特征金字塔单元,还用于:
将n个所述第一特征图按照分辨率排列形成特征金字塔,所述特征金字塔中所述第一特征图的分辨率与所述第一特征图所在层级呈负相关关系;
响应于第n第一特征图对应的通道数为最大通道数,对所述第n第一特征图进行卷积处理,得到第n第二特征图;
响应于第n第一特征图对应的通道数不是最大通道数,对所述第n第一特征图进行卷积处理后得到第四特征图,对第n+1第一特征图进行卷积和上采样处理后得到第五特征图,对所述第四特征图和所述第五特征图进行混合,并进行卷积处理后得到所述第n第二特征图。
可选的,所述第二多尺度解码单元,还用于:
将n个所述第二特征图分别通过卷积块处理,得到n个第三特征图,n个所述第三特征图对应的分辨率相同,其中,不同第二特征图对应不同卷积块,且不同第二特征图对应使用的卷积块数目不同;
对n个所述第三特征图进行相加、卷积和上采样处理,得到所述第一透明通道图像。
可选的,所述第一特征金字塔单元,还用于:
将m个所述第一样本特征图按照分辨率形成样本特征金字塔,所述样本特征金字塔中所述第一样本特征图的分辨率与所述第一样本特征图所在层级呈负相关关系;
响应于第m第一样特征图对应的通道数为最大通道数,对所述第m第一样本特征图进行卷积处理,得到第m第二样本特征图;
响应于所述第m第一样本特征图对应的通道数不是最大通道数,对所述第m第一样本特征图进行卷积处理后得到第一中间样本特征图,对第m+1第一样本特征图进行卷积和上采样处理后得到第二中间样本特征图,对所述第一中间样本特征图和所述第二中间样本特征图进行混合,并进行卷积处理后得到所述第m第二样本特征图。
本申请实施例中,通过将获取到的原始图像输入第一预测模型,得到第一预测模型输出的第一透明通道图像(包括原始图像中各个像素点对应的预测透明度值),从而将第一透明通道图像和原始图像输入第二预测模型,得到第二预测模型输出的第二透明通道图像,用于根据第二透明通道图像对原始图像进行分割处理,得到目标对象对应的图像。由于第二透明通道图像的精细度高于第一透明通道图像的精细度,因此,可以提高图像分割的准确率;相较于相关技术中的图像分割方法,无需引入三分图,可以实现从原始图像直接生成用于分割的透明通道图像,进一步提升了进行图像分割的准确性。
请参考图14,其示出了本申请一个示例性实施例提供的计算机设备的结构示意图。所述计算机设备1400包括中央处理单元(Central Processing Unit,CPU)1401、包括随机存取存储器(Random Access Memory,RAM)1402和只读存储器(Read-Only Memory,ROM)1403的系统存储器1404,以及连接系统存储器1404和中央处理单元1401的系统总线1405。所述计算机设备1400还包括帮助计算机设备内的各个器件之间传输信息的基本输入/输出系统(Input/Output系统,I/O系统)1406,和用于存储操作系统1413、应用程序1414和其他程序模块1415的大容量存储设备1407。
所述基本输入/输出系统1406包括有用于显示信息的显示器1408和用于用户输入信息的诸如鼠标、键盘之类的输入设备1409。其中所述显示器1408和输入设备1409都通过连接到系统总线1405的输入输出控制器1410连接到中央处理单元1401。所述基本输入/输出系统1406还可以包括输入输出控制器1410以用于接收和处理来自键盘、鼠标、或电子触控笔等多个其他设备的输入。类似地,输入输出控制器1410还提供输出到显示屏、打印机或其他类型的输出设备。
所述大容量存储设备1407通过连接到系统总线1405的大容量存储控制器(未示出)连接到中央处理单元1401。所述大容量存储设备1407及其相关联的计算机可读存储介质为计算机设备1400提供非易失性存储。也就是说,所述大容量存储设备1407可以包括诸如硬盘或者只读光盘(Compact Disc Read-Only Memory,CD-ROM)驱动器之类的计算机可读存储介质(未示出)。
不失一般性,所述计算机可读存储介质可以包括计算机存储介质和通信介质。计算机存储介质包括以用于存储诸如计算机可读存储指令、数据结构、程序模块或其他数据等信息的任何方法或技术实现的易失性和非易失性、可移动和不可移动介质。计算机存储介质包括RAM、ROM、可擦除可编程只读寄存器(Erasable Programmable Read Only Memory,EPROM)、电子抹除式可复写只读存储器(Electrically-Erasable Programmable Read-Only Memory,EEPROM)、闪存或其他固态存储其技术,CD-ROM、数字多功能光盘(Digital Versatile Disc,DVD)或其他光学存储、磁带盒、磁带、磁盘存储或其他磁性存储设备。当然,本领域技术人员可知所述计算机存储介质不局限于上述几种。上述的系统存储器1404和大容量存储设备1407可以统称为存储器。
存储器存储有一个或多个程序,一个或多个程序被配置成由一个或多个中央处理单元1401执行,一个或多个程序包含用于实现上述方法实施例的指令,中央处理单元1401执行该一个或多个程序实现上述各个方法实施例提供的方法。
根据本申请的各种实施例,所述计算机设备1400还可以通过诸如因特网等网络连接到网络上的远程服务器运行。也即计算机设备1400可以通过连接在所述系统总线1405上的网络接口单元1411连接到网络1412,或者说,也可以使用网络接口单元1411来连接到其他类型的网络或远程服务器系统(未示出)。
所述存储器还包括一个或者一个以上的程序,所述一个或者一个以上程序存储于存储器中,所述一个或者一个以上程序包含用于进行本申请实施例提供的方法中由计算机设备所执行的步骤。
本申请实施例还提供了一种计算机可读存储介质,该计算机可读存储介质存储有至少一条指令,所述至少一条指令由所述处理器加载并执行以实现如上各个实施例所述的图像处理方法。
根据本申请的一个方面,提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述方面的各种可选实现方式中提供的图像处理方法。
本领域技术人员应该可以意识到,在上述一个或多个示例中,本申请实施例所描述的功能可以用硬件、软件、固件或它们的任意组合来实现。当使用软件实现时,可以将这些功能存储在计算机可读存储介质中或者作为计算机可读存储介质上的一个或多个指令或代码进行传输。计算机可读存储介质包括计算机存储介质和通信介质,其中通信介质包括便于从一个地方向另一个地方传送计算机程序的任何介质。存储介质可以是通用或专用计算机能够存取的任何可用介质。
以上所述仅为本申请的可选实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (20)

  1. 一种图像处理方法,所述方法包括:
    获取原始图像,所述原始图像中包含至少一个目标对象;
    将所述原始图像输入第一预测模型,得到所述第一预测模型输出的第一透明通道图像,所述第一透明通道图像中包括所述原始图像中各个像素点对应的预测透明度值;
    将所述第一透明通道图像和所述原始图像输入第二预测模型,得到所述第二预测模型输出的第二透明通道图像,所述第二透明通道图像的精细度高于所述第一透明通道图像的精细度;
    根据所述第二透明通道图像对所述原始图像进行分割处理,得到所述目标对象对应的图像。
  2. 根据权利要求1所述的方法,其中,所述获取原始图像之前,所述方法还包括:
    获取样本图像、样本标注图像和样本分割图像,所述样本标注图像中标注有所述样本图像中各个像素点对应的透明度值,所述样本分割图像是对样本标注图像进行二值化处理得到的二值化图像;
    根据所述样本图像和所述样本分割图像,训练所述第一预测模型;
    将所述样本图像输入训练得到的所述第一预测模型,得到所述第一预测模型输出的第一样本透明通道图像;
    根据所述第一样本透明通道图像、所述样本标注图像和所述样本图像,训练所述第二预测模型。
  3. 根据权利要求2所述的方法,其中,所述根据所述第一样本透明通道图像、所述样本标注图像和所述样本图像,训练所述第二预测模型,包括:
    将所述第一样本透明通道图像和所述样本图像输入精细化网络,得到所述精细化网络输出的第二样本透明通道图像;
    将所述样本图像、所述第二样本透明通道图像和所述样本标注图像输入边缘梯度网络,得到所述第二样本透明通道图像对应的边缘梯度损失;
    根据所述第二样本透明通道图像和所述样本标注图像,计算所述第二样本透明通道图像对应的结构相似性损失和抠图损失;
    将所述第二样本透明通道图像和所述样本标注图像输入连通性差异网络,得到所述第二样本透明通道图像对应的连通性差异损失;
    根据所述边缘梯度损失、所述连通性差异损失、所述抠图损失和所述结构相似性损失,训练所述精细化网络;
    将训练得到的所述精细化网络确定为所述第二预测模型。
  4. 根据权利要求3所述的方法,其中,所述将所述样本图像、所述第二样本透明通道图像和所述样本标注图像输入边缘梯度网络,得到所述第二样本透明通道图像对应的边缘梯度损失,包括:
    将所述样本图像输入预设算子中,得到所述样本图像对应的样本梯度图像,所述预设算子用于对所述原始样本图像进行一阶倒数运算;
    对所述样本标注图像进行二值化和膨胀腐蚀操作,得到样本边缘图像,所述样本边缘图像用于指示所述样本标注图像中前景图像和背景图像的交界区域;
    根据所述样本边缘图像和所述样本梯度图像,生成样本边缘梯度图像,所述样本边缘梯度图像用于指示所述样本图像中前景图像和背景图像的交界区域;
    根据所述第二样本透明通道图像和所述样本边缘图像,生成边缘透明通道图像,所述边缘透明通道图像用于指示所述第二样本透明通道图像中前景图像和背景图像的交界区域;
    根据所述边缘透明通道图像和所述样本边缘梯度图像,计算得到所述边缘梯度损失。
  5. 根据权利要求2所述的方法,其中,所述第一预测模型包括多尺度编码网络、特征金字塔网络、多尺度解码网络和深度监督网络;
    所述根据所述样本图像和所述样本分割图像,训练所述第一预测模型,包括:
    将所述样本图像输入所述多尺度编码网络,得到所述多尺度编码网络输出的m个第一样本特征图,其中,不同第一样本特征图的分辨率和通道数不同,m为大于等于2的整数,所述多尺度编码网络用于对所述样本图像进行特征提取;
    将m个所述第一样本特征图输入所述特征金字塔网络,得到所述特征金字塔网络输出的m个第二样 本特征图,其中,不同第二样本特征图的通道数相同且分辨率不同,所述特征金字塔网络用于将m个所述第一样本特征图的通道数处理为目标通道数;
    将m个所述第二样本特征图输入所述多尺度解码网络,得到所述多尺度解码网络输出的所述第一样本透明通道图像,所述多尺度解码网络用于对m个所述第二样本特征图进行相加和分辨率转换操作,所述第一样本透明通道图像的分辨率与所述样本图像的分辨率相同;
    将m个所述第二样本特征图输入所述深度监督网络,得到所述深度监督网络输出的m个第三样本透明通道图像,所述深度监督网络用于对所述m个第二样本特征图进行上采样处理,不同第二样本特征图对应不同上采样倍数,m个所述第三样本透明通道图像的分辨率与所述样本图像的分辨率相同;
    对所述第一样本透明通道图像和m个所述第三样本透明通道图像进行二值化处理,得到第一样本分割图像和m个第二样本分割图像;
    根据所述第一样本分割图像、m个所述第二样本分割图像和所述样本分割图像,训练所述第一预测模型。
  6. 根据权利要求5所述的方法,其中,所述将所述原始图像输入第一预测模型,得到所述第一预测模型输出的第一透明通道图像,包括:
    将所述原始图像输入所述多尺度编码网络,得到所述多尺度编码网络输出的n个第一特征图,其中,不同第一特征图的分辨率和通道数不同,n为大于等于2的整数;
    将n个所述第一特征图输入所述特征金字塔网络,得到所述特征金字塔网络输出的n个第二特征图,其中,不同第二特征图的通道数相同且分辨率不同,n个所述第二特征图的通道数为所述目标通道数;
    将n个所述第二特征图输入所述多尺度解码网络,得到所述多尺度解码网络输出的所述第一透明通道图像。
  7. 根据权利要求6所述的方法,其中,所述将n个所述第一特征图输入所述特征金字塔网络,得到所述特征金字塔网络输出的n个第二特征图,包括:
    将n个所述第一特征图按照分辨率排列形成特征金字塔,所述特征金字塔中所述第一特征图的分辨率与所述第一特征图所在层级呈负相关关系;
    响应于第n第一特征图对应的通道数为最大通道数,对所述第n第一特征图进行卷积处理,得到第n第二特征图;
    响应于第n第一特征图对应的通道数不是最大通道数,对所述第n第一特征图进行卷积处理后得到第四特征图,对第n+1第一特征图进行卷积和上采样处理后得到第五特征图,对所述第四特征图和所述第五特征图进行混合,并进行卷积处理后得到所述第n第二特征图。
  8. 根据权利要求6所述的方法,其中,所述将n个所述第二特征图输入所述多尺度解码网络,得到所述多尺度解码网络输出的所述第一透明通道图像,包括:
    将n个所述第二特征图分别通过卷积块处理,得到n个第三特征图,n个所述第三特征图对应的分辨率相同,其中,不同第二特征图对应不同卷积块,且不同第二特征图对应使用的卷积块数目不同;
    对n个所述第三特征图进行相加、卷积和上采样处理,得到所述第一透明通道图像。
  9. 根据权利要求5所述的方法,其中,所述将m个所述第一样本特征图输入所述特征金字塔网络,得到所述特征金字塔网络输出的m个第二样本特征图,包括:
    将m个所述第一样本特征图按照分辨率形成样本特征金字塔,所述样本特征金字塔中所述第一样本特征图的分辨率与所述第一样本特征图所在层级呈负相关关系;
    响应于第m第一样特征图对应的通道数为最大通道数,对所述第m第一样本特征图进行卷积处理,得到第m第二样本特征图;
    响应于所述第m第一样本特征图对应的通道数不是最大通道数,对所述第m第一样本特征图进行卷积处理后得到第一中间样本特征图,对第m+1第一样本特征图进行卷积和上采样处理后得到第二中间样本特征图,对所述第一中间样本特征图和所述第二中间样本特征图进行混合,并进行卷积处理后得到所述第m第二样本特征图。
  10. 一种图像处理装置,所述装置包括:
    第一获取模块,用于获取原始图像,所述原始图像中包含至少一个目标对象;
    第一预测模块,用于将所述原始图像输入第一预测模型,得到所述第一预测模型输出的第一透明通道 图像,所述第一透明通道图像中包括所述原始图像中各个像素点对应的预测透明度值;
    第二预测模块,用于将所述第一透明通道图像和所述原始图像输入第二预测模型,得到所述第二预测模型输出的第二透明通道图像,所述第二透明通道图像的精细度高于所述第一透明通道图像的精细度;
    分割处理模块,用于根据所述第二透明通道图像对所述原始图像进行分割处理,得到所述目标对象对应的图像。
  11. 根据权利要求10所述的装置,其中,所述装置还包括:
    第二获取模块,用于获取样本图像、样本标注图像和样本分割图像,所述样本标注图像中标注有所述样本图像中各个像素点对应的透明度值,所述样本分割图像是对样本标注图像进行二值化处理得到的二值化图像;
    第一训练模块,用于根据所述样本图像和所述样本分割图像,训练所述第一预测模型;
    第三预测模块,用于将所述样本图像输入训练得到的所述第一预测模型,得到所述第一预测模型输出的第一样本透明通道图像;
    第二训练模块,用于根据所述第一样本透明通道图像、所述样本标注图像和所述样本图像,训练所述第二预测模型。
  12. 根据权利要求11所述的装置,其中,所述第二训练模块,包括:
    精细化单元,用于将所述第一样本透明通道图像和所述样本图像输入精细化网络,得到所述精细化网络输出的第二样本透明通道图像;
    边缘梯度单元,用于将所述样本图像、所述第二样本透明通道图像和所述样本标注图像输入边缘梯度网络,得到所述第二样本透明通道图像对应的边缘梯度损失;
    计算单元,用于根据所述第二样本透明通道图像和所述样本标注图像,计算所述第二样本透明通道图像对应的结构相似性损失和抠图损失;
    连通性差异单元,用于将所述第二样本透明通道图像和所述样本标注图像输入连通性差异网络,得到所述第二样本透明通道图像对应的连通性差异损失;
    第一训练单元,用于根据所述边缘梯度损失、所述连通性差异损失、所述抠图损失和所述结构相似性损失,训练所述精细化网络;
    确定单元,用于将训练得到的所述精细化网络确定为所述第二预测模型。
  13. 根据权利要求12所述的装置,其中,所述边缘梯度单元,还用于:
    将所述样本图像输入预设算子中,得到所述样本图像对应的样本梯度图像,所述预设算子用于对所述原始样本图像进行一阶倒数运算;
    对所述样本标注图像进行二值化和膨胀腐蚀操作,得到样本边缘图像,所述样本边缘图像用于指示所述样本标注图像中前景图像和背景图像的交界区域;
    根据所述样本边缘图像和所述样本梯度图像,生成样本边缘梯度图像,所述样本边缘梯度图像用于指示所述样本图像中前景图像和背景图像的交界区域;
    根据所述第二样本透明通道图像和所述样本边缘图像,生成边缘透明通道图像,所述边缘透明通道图像用于指示所述第二样本透明通道图像中前景图像和背景图像的交界区域;
    根据所述边缘透明通道图像和所述样本边缘梯度图像,计算得到所述边缘梯度损失。
  14. 根据权利要求11所述的装置,其中,所述第一预测模型包括多尺度编码网络、特征金字塔网络、多尺度解码网络和深度监督网络;
    所述第一训练模块,包括:
    第一多尺度编码单元,用于将所述样本图像输入所述多尺度编码网络,得到所述多尺度编码网络输出的m个第一样本特征图,其中,不同第一样本特征图的分辨率和通道数不同,m为大于等于2的整数,所述多尺度编码网络用于对所述样本图像进行特征提取;
    第一特征金字塔单元,用于将m个所述第一样本特征图输入所述特征金字塔网络,得到所述特征金字塔网络输出的m个第二样本特征图,其中,不同第二样本特征图的通道数相同且分辨率不同,所述特征金字塔网络用于将m个所述第一样本特征图的通道数处理为目标通道数;
    第一多尺度解码单元,用于将m个所述第二样本特征图输入所述多尺度解码网络,得到所述多尺度解码网络输出的所述第一样本透明通道图像,所述多尺度解码网络用于对m个所述第二样本特征图进行相加和分辨率转换操作,所述第一样本透明通道图像的分辨率与所述样本图像的分辨率相同;
    深度监督单元,用于将m个所述第二样本特征图输入所述深度监督网络,得到所述深度监督网络输出的m个第三样本透明通道图像,所述深度监督网络用于对所述m个第二样本特征图进行上采样处理,不同第二样本特征图对应不同上采样倍数,m个所述第三样本透明通道图像的分辨率与所述样本图像的分辨率相同;
    二值化处理单元,用于对所述第一样本透明通道图像和m个所述第三样本透明通道图像进行二值化处理,得到第一样本分割图像和m个第二样本分割图像;
    第二训练单元,用于根据所述第一样本分割图像、m个所述第二样本分割图像和所述样本分割图像,训练所述第一预测模型。
  15. 根据权利要求14所述的装置,其中,所述第一预测模块,包括:
    第二多尺度编码单元,用于将所述原始图像输入所述多尺度编码网络,得到所述多尺度编码网络输出的n个第一特征图,其中,不同第一特征图的分辨率和通道数不同,n为大于等于2的整数;
    第二特征金字塔单元,用于将n个所述第一特征图输入所述特征金字塔网络,得到所述特征金字塔网络输出的n个第二特征图,其中,不同第二特征图的通道数相同且分辨率不同,n个所述第二特征图的通道数为所述目标通道数;
    第二多尺度解码单元,用于将n个所述第二特征图输入所述多尺度解码网络,得到所述多尺度解码网络输出的所述第一透明通道图像。
  16. 根据权利要求15所述的装置,其中,所述第二特征金字塔单元,还用于:
    将n个所述第一特征图按照分辨率排列形成特征金字塔,所述特征金字塔中所述第一特征图的分辨率与所述第一特征图所在层级呈负相关关系;
    响应于第n第一特征图对应的通道数为最大通道数,对所述第n第一特征图进行卷积处理,得到第n第二特征图;
    响应于第n第一特征图对应的通道数不是最大通道数,对所述第n第一特征图进行卷积处理后得到第四特征图,对第n+1第一特征图进行卷积和上采样处理后得到第五特征图,对所述第四特征图和所述第五特征图进行混合,并进行卷积处理后得到所述第n第二特征图。
  17. 根据权利要求15所述的装置,其中,所述第二多尺度解码单元,还用于:
    将n个所述第二特征图分别通过卷积块处理,得到n个第三特征图,n个所述第三特征图对应的分辨率相同,其中,不同第二特征图对应不同卷积块,且不同第二特征图对应使用的卷积块数目不同;
    对n个所述第三特征图进行相加、卷积和上采样处理,得到所述第一透明通道图像。
  18. 根据权利要求14所述的装置,其中,所述第一特征金字塔单元,还用于:
    将m个所述第一样本特征图按照分辨率形成样本特征金字塔,所述样本特征金字塔中所述第一样本特征图的分辨率与所述第一样本特征图所在层级呈负相关关系;
    响应于第m第一样特征图对应的通道数为最大通道数,对所述第m第一样本特征图进行卷积处理,得到第m第二样本特征图;
    响应于所述第m第一样本特征图对应的通道数不是最大通道数,对所述第m第一样本特征图进行卷积处理后得到第一中间样本特征图,对第m+1第一样本特征图进行卷积和上采样处理后得到第二中间样本特征图,对所述第一中间样本特征图和所述第二中间样本特征图进行混合,并进行卷积处理后得到所述第m第二样本特征图。
  19. 一种计算机设备,所述计算机设备包括处理器和存储器,所述存储器中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现如权利要求1至9任一所述的图像处理方法。
  20. 一种计算机可读存储介质,所述计算机可读存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由处理器加载并执行以实现如权利要求1至9任一所述的图像处理方法。
PCT/CN2021/074722 2020-02-18 2021-02-01 图像处理方法、装置、设备及存储介质 WO2021164534A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010099612.6 2020-02-18
CN202010099612.6A CN111369581B (zh) 2020-02-18 2020-02-18 图像处理方法、装置、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2021164534A1 true WO2021164534A1 (zh) 2021-08-26

Family

ID=71210735

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/074722 WO2021164534A1 (zh) 2020-02-18 2021-02-01 图像处理方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN111369581B (zh)
WO (1) WO2021164534A1 (zh)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114359307A (zh) * 2022-01-04 2022-04-15 浙江大学 一种全自动高分辨率图像抠图方法
CN114399454A (zh) * 2022-01-18 2022-04-26 平安科技(深圳)有限公司 图像处理方法、装置、电子设备及存储介质
CN114418999A (zh) * 2022-01-20 2022-04-29 哈尔滨工业大学 基于病变关注金字塔卷积神经网络的视网膜病变检测系统
CN114468977A (zh) * 2022-01-21 2022-05-13 深圳市眼科医院 一种眼科视力检查数据收集分析方法、系统及计算机存储介质
CN115052154A (zh) * 2022-05-30 2022-09-13 北京百度网讯科技有限公司 一种模型训练和视频编码方法、装置、设备及存储介质
CN115470830A (zh) * 2022-10-28 2022-12-13 电子科技大学 一种基于多源域适应的脑电信号跨用户警觉性监测方法
CN117252892A (zh) * 2023-11-14 2023-12-19 江西师范大学 基于轻量化视觉自注意力网络的双分支人像自动抠图模型
CN117314741A (zh) * 2023-12-01 2023-12-29 成都华栖云科技有限公司 一种绿幕背景抠像方法、装置、设备及可读存储介质
CN118071867A (zh) * 2024-04-19 2024-05-24 腾讯科技(深圳)有限公司 将文本数据转换为图像数据的方法和装置

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111369581B (zh) * 2020-02-18 2023-08-08 Oppo广东移动通信有限公司 图像处理方法、装置、设备及存储介质
CN111311629B (zh) * 2020-02-21 2023-12-01 京东方科技集团股份有限公司 图像处理方法、图像处理装置及设备
CN112001923B (zh) * 2020-11-02 2021-01-05 中国人民解放军国防科技大学 一种视网膜图像分割方法及装置
CN116389753A (zh) * 2021-12-30 2023-07-04 科大讯飞股份有限公司 数据封装方法和相关装置、设备、介质
CN117422757B (zh) * 2023-10-31 2024-05-03 安徽唯嵩光电科技有限公司 一种果蔬大小分选方法、装置、计算机设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101980537A (zh) * 2010-10-21 2011-02-23 北京航空航天大学 一种基于对象和分形的双目立体视频压缩编解码方法
CN109325954A (zh) * 2018-09-18 2019-02-12 北京旷视科技有限公司 图像分割方法、装置及电子设备
US20190102656A1 (en) * 2017-09-29 2019-04-04 Here Global B.V. Method, apparatus, and system for providing quality assurance for training a feature prediction model
CN110148102A (zh) * 2018-02-12 2019-08-20 腾讯科技(深圳)有限公司 图像合成方法、广告素材合成方法及装置
CN110782466A (zh) * 2018-07-31 2020-02-11 阿里巴巴集团控股有限公司 图片分割方法、装置和系统
CN111369581A (zh) * 2020-02-18 2020-07-03 Oppo广东移动通信有限公司 图像处理方法、装置、设备及存储介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10255681B2 (en) * 2017-03-02 2019-04-09 Adobe Inc. Image matting using deep learning
CN109447994B (zh) * 2018-11-05 2019-12-17 陕西师范大学 结合完全残差与特征融合的遥感图像分割方法
CN110689083B (zh) * 2019-09-30 2022-04-12 苏州大学 一种上下文金字塔融合网络及图像分割方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101980537A (zh) * 2010-10-21 2011-02-23 北京航空航天大学 一种基于对象和分形的双目立体视频压缩编解码方法
US20190102656A1 (en) * 2017-09-29 2019-04-04 Here Global B.V. Method, apparatus, and system for providing quality assurance for training a feature prediction model
CN110148102A (zh) * 2018-02-12 2019-08-20 腾讯科技(深圳)有限公司 图像合成方法、广告素材合成方法及装置
CN110782466A (zh) * 2018-07-31 2020-02-11 阿里巴巴集团控股有限公司 图片分割方法、装置和系统
CN109325954A (zh) * 2018-09-18 2019-02-12 北京旷视科技有限公司 图像分割方法、装置及电子设备
CN111369581A (zh) * 2020-02-18 2020-07-03 Oppo广东移动通信有限公司 图像处理方法、装置、设备及存储介质

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114359307A (zh) * 2022-01-04 2022-04-15 浙江大学 一种全自动高分辨率图像抠图方法
CN114399454A (zh) * 2022-01-18 2022-04-26 平安科技(深圳)有限公司 图像处理方法、装置、电子设备及存储介质
CN114418999A (zh) * 2022-01-20 2022-04-29 哈尔滨工业大学 基于病变关注金字塔卷积神经网络的视网膜病变检测系统
CN114468977A (zh) * 2022-01-21 2022-05-13 深圳市眼科医院 一种眼科视力检查数据收集分析方法、系统及计算机存储介质
CN115052154A (zh) * 2022-05-30 2022-09-13 北京百度网讯科技有限公司 一种模型训练和视频编码方法、装置、设备及存储介质
CN115052154B (zh) * 2022-05-30 2023-04-14 北京百度网讯科技有限公司 一种模型训练和视频编码方法、装置、设备及存储介质
CN115470830B (zh) * 2022-10-28 2023-04-07 电子科技大学 一种基于多源域适应的脑电信号跨用户警觉性监测方法
CN115470830A (zh) * 2022-10-28 2022-12-13 电子科技大学 一种基于多源域适应的脑电信号跨用户警觉性监测方法
CN117252892A (zh) * 2023-11-14 2023-12-19 江西师范大学 基于轻量化视觉自注意力网络的双分支人像自动抠图模型
CN117252892B (zh) * 2023-11-14 2024-03-08 江西师范大学 基于轻量化视觉自注意力网络的双分支人像自动抠图装置
CN117314741A (zh) * 2023-12-01 2023-12-29 成都华栖云科技有限公司 一种绿幕背景抠像方法、装置、设备及可读存储介质
CN117314741B (zh) * 2023-12-01 2024-03-26 成都华栖云科技有限公司 一种绿幕背景抠像方法、装置、设备及可读存储介质
CN118071867A (zh) * 2024-04-19 2024-05-24 腾讯科技(深圳)有限公司 将文本数据转换为图像数据的方法和装置

Also Published As

Publication number Publication date
CN111369581B (zh) 2023-08-08
CN111369581A (zh) 2020-07-03

Similar Documents

Publication Publication Date Title
WO2021164534A1 (zh) 图像处理方法、装置、设备及存储介质
CN111931684B (zh) 一种基于视频卫星数据鉴别特征的弱小目标检测方法
WO2022001623A1 (zh) 基于人工智能的图像处理方法、装置、设备及存储介质
US20220108478A1 (en) Processing images using self-attention based neural networks
CN112634296A (zh) 门机制引导边缘信息蒸馏的rgb-d图像语义分割方法及终端
CN110827236B (zh) 基于神经网络的脑组织分层方法、装置、计算机设备
CN113298815A (zh) 一种半监督遥感图像语义分割方法、装置和计算机设备
CN111814620A (zh) 人脸图像质量评价模型建立方法、优选方法、介质及装置
CN111209916A (zh) 病灶识别方法及系统、识别设备
CN113706562B (zh) 图像分割方法、装置、系统及细胞分割方法
CN112084859A (zh) 一种基于稠密边界块和注意力机制的建筑物分割方法
CN113111716B (zh) 一种基于深度学习的遥感影像半自动标注方法和装置
CN113065551A (zh) 利用深度神经网络模型执行图像分割的方法
CN115661459A (zh) 一种使用差异信息的2D mean teacher模型
CN116091946A (zh) 一种基于YOLOv5的无人机航拍图像目标检测方法
CN113158856B (zh) 一种提取遥感图像中目标区域的处理方法和装置
CN113570509A (zh) 数据处理方法以及计算机设备
CN115115552B (zh) 图像矫正模型训练及图像矫正方法、装置和计算机设备
CN116977260A (zh) 目标物的缺陷检测方法、装置、电子设备及存储介质
CN116309612A (zh) 基于频率解耦监督的半导体硅晶圆检测方法、装置及介质
TWI803243B (zh) 圖像擴增方法、電腦設備及儲存介質
CN115984588A (zh) 图像背景相似度分析方法、装置、电子设备及存储介质
CN112750124B (zh) 模型生成、图像分割方法、装置、电子设备及存储介质
CN114299590A (zh) 人脸补全模型的训练方法、人脸补全方法及系统
CN115147434A (zh) 图像处理方法、装置、终端设备及计算机可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21757400

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21757400

Country of ref document: EP

Kind code of ref document: A1