WO2021164429A1 - 图像处理方法、图像处理装置及设备 - Google Patents

图像处理方法、图像处理装置及设备 Download PDF

Info

Publication number
WO2021164429A1
WO2021164429A1 PCT/CN2020/140781 CN2020140781W WO2021164429A1 WO 2021164429 A1 WO2021164429 A1 WO 2021164429A1 CN 2020140781 W CN2020140781 W CN 2020140781W WO 2021164429 A1 WO2021164429 A1 WO 2021164429A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature maps
image
network
convolution
sampling
Prior art date
Application number
PCT/CN2020/140781
Other languages
English (en)
French (fr)
Inventor
卢运华
刘瀚文
那彦波
张丽杰
朱丹
Original Assignee
京东方科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司 filed Critical 京东方科技集团股份有限公司
Priority to US17/419,726 priority Critical patent/US20220319155A1/en
Publication of WO2021164429A1 publication Critical patent/WO2021164429A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4046Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20016Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Definitions

  • the present disclosure relates to an image processing method, image processing device and equipment.
  • Image matting is a research direction in the field of image processing and computer vision.
  • the foreground and background in the image can be separated by matting.
  • the result of matting can have multiple applications, such as background replacement, ID photo generation, virtual group photo generation, virtual scenery, background blur, etc.
  • the present disclosure provides an image processing method, image processing device and equipment.
  • the present disclosure provides an image processing method, including: acquiring an input image; performing down-sampling and feature extraction on the input image through an encoder network to obtain multiple feature maps; and processing the multiple features through a decoder network
  • the image is up-sampled and feature extracted to obtain the target segmentation image; among them, the encoder network and the decoder network respectively include multiple processing levels, the L-th processing level in the encoder network is processed in multiple feature maps and the decoder network After the multiple feature maps processed by the J-th processing level are fused, they are input to the J+1 processing level in the decoder network.
  • the multiple feature maps and decoders processed by the L-th processing level in the encoder network have the same resolution; L and J are both positive integers; among them, at least one of the multiple processing levels of the encoder network includes dense calculation blocks, and the decoder At least one of the multiple processing levels of the network includes dense calculation blocks; the M-th dense calculation block in the encoder network and the decoder network includes N convolution modules, and the i-th one of the N convolution modules
  • the input of the convolution module includes the output of i-1 convolution modules before the i-th convolution module; at least one convolution module of the N convolution modules includes at least one set of asymmetric convolution kernels; i, N, and M are all integers, M is greater than or equal to 1 and less than or equal to the total number of dense computing blocks in the encoder network and decoder network, N is greater than or equal to 3, i is greater than or equal to 3 and less than or equal to N
  • the present disclosure provides an image processing device, including: an image acquisition module configured to acquire an input image; an image processing module configured to down-sampling and feature extraction of the input image through an encoder network to obtain multiple Feature map; Up-sampling and feature extraction are performed on the multiple feature maps through the decoder network to obtain the target segmentation image; wherein the encoder network and the decoder network respectively include multiple processing levels, and the L-th processing in the encoder network After the multiple feature maps obtained by the hierarchical processing and the multiple feature maps processed by the Jth processing level in the decoder network are merged, they are input to the J+1 processing level in the decoder network.
  • the Lth processing level in the encoder network The multiple feature maps processed by one processing level and the multiple feature maps processed by the Jth processing level in the decoder network have the same resolution; L and J are both positive integers; among them, the multiple processing of the encoder network At least one of the processing levels in the hierarchy includes dense calculation blocks, and at least one of the multiple processing levels of the decoder network includes dense calculation blocks; the Mth dense calculation block in the encoder network and the decoder network includes N convolution modules , The input of the i-th convolution module in the N convolution modules includes the output of i-1 convolution modules before the i-th convolution module; at least one of the N convolution modules The convolution module includes at least one set of asymmetric convolution kernels; i, N, and M are all integers, M is greater than or equal to 1 and less than or equal to the total number of dense calculation blocks in the encoder network and decoder network, and N is greater than or Equal to 3, i is greater than or equal to 3 and less than or
  • the present disclosure provides an image processing device including a memory and a processor, the memory is configured to store program instructions, and the processor implements the steps of the image processing method as described above when the program instructions are executed.
  • the present disclosure provides a computer-readable storage medium that stores program instructions, and when the program instructions are executed, the steps of the image processing method described above are implemented.
  • FIG. 1 is a flowchart of an image processing method provided by an embodiment of the present disclosure
  • FIG. 2 is an example diagram of an input image and a generated target segmented image according to an embodiment of the present disclosure
  • FIG. 3 is an example diagram of the application effect of the target segmented image shown in FIG. 2;
  • FIG. 4 is an example diagram of an encoder network and a decoder network provided by an embodiment of the present disclosure
  • FIG. 5 is an exemplary diagram of the first dense calculation block provided by an embodiment of the present disclosure.
  • FIG. 6 is an example diagram of the first convolution module provided by an embodiment of the present disclosure.
  • FIG. 7 is an example diagram of a training process of a convolutional neural network provided by an embodiment of the disclosure.
  • FIG. 8 is a schematic diagram of training a convolutional neural network provided by an embodiment of the present disclosure.
  • FIG. 9 is a schematic diagram of an image processing device provided by an embodiment of the disclosure.
  • FIG. 10 is a schematic diagram of an image processing device provided by an embodiment of the disclosure.
  • the specification may have presented the method or process as a specific sequence of steps. However, to the extent that the method or process does not depend on the specific order of the steps described herein, the method or process should not be limited to the steps in the specific order described. As those of ordinary skill in the art will understand, other sequence of steps are also possible. Therefore, the specific order of the steps set forth in the specification should not be construed as a limitation on the claims. In addition, the claims for the method or process should not be limited to performing their steps in the written order, and those skilled in the art can easily understand that these orders can be changed and still remain within the spirit and scope of the embodiments of the present disclosure.
  • Convolutional Neural Networks is a neural network structure that uses, for example, images as input and output, and uses filters (convolution kernels) to replace scalar weights.
  • the convolution process can be seen as using a trainable filter to convolve an input image or convolution feature map, and output a convolution feature plane.
  • the convolution feature plane can also be called a feature map.
  • the convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network. In the convolutional layer of a convolutional neural network, a neuron is only connected to some of the neurons in the adjacent layer.
  • the convolutional layer can apply several convolution kernels to the input image to extract multiple types of features of the input image. Each convolution kernel can extract one type of feature.
  • the convolution kernel is generally initialized in the form of a random-sized matrix. During the training process of the convolutional neural network, the convolution kernel will learn to obtain reasonable weights. In the same convolutional layer, multiple convolution kernels can be used to extract different image information.
  • the embodiments of the present disclosure provide an image processing method, an image processing device, and equipment, which can use a convolutional neural network to process an input image to automatically generate a target segmentation image.
  • the convolutional neural network provided by the embodiment of the present disclosure combines with The dense calculation block with asymmetric convolution kernel and the encoder decoder network with jump connection can improve the matting effect and processing speed, reduce the time required for calculation, support real-time automatic matting of the input image, and have more Good and broader application prospects.
  • Fig. 1 is a flowchart of an image processing method provided by an embodiment of the present disclosure. As shown in FIG. 1, the image processing method provided by the embodiment of the present disclosure includes the following steps:
  • Step 101 Obtain an input image
  • Step 102 Perform down-sampling and feature extraction on the input image through the encoder network to obtain multiple feature maps
  • Step 103 Up-sampling and feature extraction are performed on the multiple feature maps through the decoder network to obtain the target segmented image.
  • the image processing method provided in this embodiment is used to separate the target object in the input image from the background, and the target segmented image may be a matting mask of the target object.
  • the target object may be a portrait in the input image, or may be a preset detection object (for example, an animal, a building, etc.).
  • this disclosure is not limited to this.
  • the input image may include an image of a person.
  • the input image may be an image of a person captured by an image capture device such as a digital camera or a mobile phone, or may be an image of a person captured by an image capture device.
  • An image of a person is not limited to this.
  • Fig. 2 is an example diagram of an input image and a generated target segmented image according to an embodiment of the present disclosure.
  • the convolutional neural network includes an encoder network and a decoder network.
  • the target segmented image is a matting mask of the target person in the input image.
  • this disclosure is not limited to this.
  • the target segmented image may be an image of the target person extracted from the input image.
  • the input image is a grayscale image.
  • the input image can be a color image, such as an RGB image.
  • FIG. 3 is an example diagram of the application effect of the target segmented image shown in FIG. 2.
  • the target segmented image (the matting mask of the target person shown in FIG. 2) obtained by the image processing method provided by the embodiment of the present disclosure can be used to convert the input image (the first image in the first row of FIG. 3) The human body area in the figure) is cut out, and then synthesized into other natural scenes that do not contain the human body to realize the background replacement; for example, the fourth picture in the first row of Figure 3 is the effect obtained after the background replacement of the input image Figure example.
  • the target segmented image (the matting mask shown in FIG.
  • the image processing method provided by the embodiment of the present disclosure can be used to input the image (the first image in the first row in FIG. 3).
  • the human body area in) is cut out, and then synthesized into other images containing the human body and natural scenes to achieve a virtual group photo; for example, in Figure 3, except for the first image in the first row and the fourth image in the first row
  • the rest of the images are examples of the effect of taking a virtual group photo of the input image.
  • the convolutional neural network includes: an encoder network and a decoder network, where the encoder network is configured to down-sampling and feature extraction on an input image to obtain multiple feature maps; and decoding;
  • the processor network is configured to perform up-sampling and feature extraction on multiple feature maps to obtain target segmentation images.
  • the encoder network and the decoder network respectively include multiple processing levels, multiple feature maps processed by the L-th processing level in the encoder network and multiple feature maps processed by the J-th processing level in the decoder network. After the fusion of the feature maps, they are input to the J+1 processing level in the decoder network.
  • the multiple feature maps processed by the L-th processing level in the encoder network and the J-th processing level in the decoder network are processed
  • the multiple feature maps of have the same resolution; where L and J are both positive integers.
  • a down-sampling and feature extraction in the encoder network can be respectively regarded as a processing level; in the decoder network, an up-sampling and feature extraction can be respectively regarded as a processing level.
  • the value of L may be one or more, and the value of J may be one or more.
  • the values of L and J can both be three; for example, if L is 1, J is 5, the multiple feature maps obtained by the first processing level in the encoder network and the first in the decoder network The multiple feature maps processed by the five processing levels have the same resolution, and the feature maps processed by the two processing levels are fused and input to the sixth processing level in the decoder network; L is 2 and J is 1.
  • the multiple feature maps processed by the second processing level in the encoder network and the multiple feature maps processed by the first processing level in the decoder network have the same resolution, and the two processing levels processed After the feature map is merged, it is input to the second processing level in the decoder network; L is 3 and J is 3, then the multiple feature maps processed by the third processing level in the encoder network and the third processing level in the decoder network
  • the multiple feature maps processed by the processing level have the same resolution, and the feature maps processed by the two processing levels are fused and input to the fourth processing level in the decoder network.
  • the multiple feature maps processed by the L-th processing level in the encoder network and the multiple feature maps processed by the J-th processing level in the decoder network are merged, and then input to the decoder network
  • the J+1 processing level in the middle can achieve a jump connection between the encoder network and the decoder network, that is, the processing levels that obtain the same resolution feature map in the encoder network and the decoder network are connected, and the two The multiple feature maps obtained by processing at the processing level are fused and input to the next processing level in the decoder network.
  • the jump connection between the encoder network and the decoder network can increase the preservation of the image details by the decoder network, thereby improving the accuracy of the matting result.
  • the multiple feature maps processed by the L-th processing level in the encoder network and the multiple feature maps processed by the J-th processing level in the decoder network are merged, and then input into the decoder network
  • the J+1 processing level may include: multiple feature maps processed by the L-th processing level in the encoder network and multiple feature maps processed by the J-th processing level in the decoder network after being spliced in the channel dimension , Input to the J+1 processing level in the decoder network.
  • the multiple feature maps processed by the L-th processing level in the encoder network and the multiple feature maps processed by the J-th processing level in the decoder network are merged through a concat operation.
  • this disclosure is not limited to this.
  • the multiple feature maps processed by the Lth processing level in the encoder network and the multiple feature maps processed by the Jth processing level in the decoder network can be processed through operations such as Add or multiply.
  • Feature maps are fused.
  • the feature maps of the same size and resolution obtained by the encoder network and the decoder network are fused and input to the decoder network.
  • the image details and information lost in the downsampling process of the encoder network can be transferred to the decoder network.
  • the decoder network can use this information to generate a more accurate target segmentation image, thereby improving the matting effect.
  • the L-th processing level in the encoder network and the J-th processing level in the decoder network perform corresponding processing
  • the L-th processing level in the encoder network obtains multiple processing levels. If the multiple feature maps and the multiple feature maps processed by the Jth processing level in the decoder network have the same resolution, then the multiple feature maps processed by the Lth processing level in the encoder network and the Jth processing level in the decoder network have the same resolution. After multiple feature maps obtained from the processing of the two processing levels are merged, they can be input to the J+1 processing level in the decoder network.
  • the corresponding processing performed by the L-th processing level in the encoder network and the J-th processing level in the decoder network may be, for example: the L-th processing level in the encoder network performs down-sampling processing, and the decoder network Up-sampling is performed at the J-th processing level in the encoder network; or, the L-th processing level in the encoder network performs multi-level feature extraction, and the J-th processing level in the decoder network also performs multi-level feature extraction.
  • this disclosure is not limited to this.
  • the feature maps with the same resolution obtained by corresponding processing levels in the encoder network and the decoder network are fused and then input to the decoder network, which can improve the preservation effect of the fused feature maps on image details. , Improve the accuracy of the target segmentation image obtained by the decoder network using the fusion feature map, thereby improving the matting result.
  • At least one of the multiple processing levels of the encoder network includes dense calculation blocks, and at least one of the multiple processing levels of the decoder network includes dense calculation blocks; in the encoder network and the decoder network
  • the M-th intensive calculation block includes N convolution modules, and the input of the i-th convolution module in the N convolution modules includes the i-1 convolution modules before the i-th convolution module Output; at least one of the N convolution modules includes at least one set of asymmetric convolution kernels; where i, N, and M are all integers, and M is greater than or equal to 1 and less than or equal to the encoder network and
  • the total number of intensive calculation blocks in the decoder network N is greater than or equal to 3, i is greater than or equal to 3 and less than or equal to N.
  • all the convolution modules in the N convolution modules include at least one set of asymmetric convolution kernels, or only part of the convolution modules in the N convolution modules include at least one set of asymmetric convolution kernels.
  • this disclosure is not limited to this.
  • any dense calculation block may include N convolution modules, and the number of convolution modules included in different dense calculation blocks may be the same or different.
  • the first intensive calculation block may include five convolution modules
  • the second intensive calculation block may include eight convolution modules
  • the third intensive calculation block may include five convolution modules.
  • any dense calculation block is configured to perform multi-level feature extraction, and one dense calculation block corresponds to one processing level.
  • the order of the multiple dense calculation blocks can be determined according to the order of the processing level of the encoder network and the processing level of the decoder network.
  • the encoder network includes two dense calculation blocks (corresponding to the third and fifth processing levels in the encoder network), and the decoder network includes one dense calculation block (corresponding to the third processing level in the decoder network).
  • the dense calculation block corresponding to the third processing level in the encoder network can be marked as the first dense calculation block, and the dense calculation block corresponding to the fifth processing level can be marked as the second dense calculation block, and the decoder network
  • the intensive calculation block corresponding to the third processing level is marked as the third intensive calculation block.
  • this disclosure is not limited to this.
  • the dense calculation block is an efficient dense calculation block (EDA block, Effective Dense Asymmetric block) with asymmetric convolution.
  • EDA block Effective Dense Asymmetric block
  • a dense calculation block includes multiple convolution modules, and in the multiple convolution modules, except for the first convolution module, the input of each convolution module includes the input from all convolution modules before the convolution module. Output, so that multiple convolution modules in the dense calculation block can form dense connections.
  • dense calculation blocks are used for feature extraction, which can greatly reduce parameters, reduce calculation amount, increase processing speed, and has better anti-overfitting performance.
  • at least one convolution module in the dense calculation block of this embodiment includes one or more sets of asymmetric convolution kernels. By adopting the asymmetric convolution kernel for feature extraction, the amount of calculation can be greatly reduced, thereby increasing the processing speed.
  • a convolution module including at least one set of asymmetric convolution kernels among the N convolution modules may include: an asymmetric convolution kernel obtained by expansion of the at least one set of asymmetric convolution kernels.
  • a certain convolution module among N convolution modules may include two sets of asymmetric convolution kernels, and the second set of asymmetric convolution kernels may be expanded from the previous set of asymmetric convolution kernels.
  • the first group The asymmetric convolution kernel can be a 3 ⁇ 1 convolution kernel and a 1 ⁇ 3 convolution kernel.
  • an expanded asymmetric convolution kernel can be obtained by performing an expansion operation on the asymmetric convolution kernel. Using the expanded asymmetric convolution kernel can not only increase the Receptive Field, but also reduce image processing. The loss of spatial information in the process and the formation of densely connected multiple convolution modules can generate feature maps with consistent resolution.
  • step 102 may include: down-sampling the input image to obtain multiple first down-sampled feature maps with a first resolution; down-sampling the multiple first down-sampled feature maps , Obtain multiple second down-sampled feature maps with the second resolution; perform multi-level feature extraction on the multiple second down-sampled feature maps through the first dense calculation block to obtain multiple second-resolution feature maps.
  • step 103 may include: up-sampling the plurality of second densely calculated feature maps to obtain a plurality of first up-sampling feature maps with a second resolution; An up-sampling feature map and the plurality of second down-sampling feature maps are spliced in the channel dimension to obtain a first fusion feature map group; feature extraction is performed on the first fusion feature map group to obtain a second resolution A plurality of first intermediate feature maps of the plurality of first intermediate feature maps; multi-level feature extraction is performed on the plurality of first intermediate feature maps through a third intensive calculation block to obtain a plurality of third intensive calculation feature maps with a second resolution; The plurality of third intensive calculation feature maps and the plurality of first intensive calculation feature maps are spliced in the channel dimension to obtain a second fusion feature map group; feature extraction is performed on the second fusion feature map group to obtain A plurality of second intermediate feature maps with a second resolution; up-sampling the plurality of second intermediate feature maps to obtain a plurality of
  • the first intensive calculation block may include five convolution modules
  • the second intensive calculation block may include eight convolution modules
  • the third intensive calculation block may include five convolution modules
  • Each convolution module in the first dense calculation block, the second dense calculation block, and the third dense calculation block includes a 1 ⁇ 1 convolution kernel and two groups of asymmetric convolution kernels.
  • the first group of asymmetric convolution kernels is 3 ⁇ 1 convolution kernel and 1 ⁇ 3 convolution kernel
  • the second group of asymmetric convolution kernels are obtained according to the first group of asymmetric convolution kernels and the corresponding expansion coefficient.
  • Fig. 4 is an exemplary diagram of an encoder network and a decoder network provided by an embodiment of the disclosure.
  • the convolutional neural network can matte a color person image (for example, the color image of the input image shown in FIG. 2) to obtain a black and white matting mask (for example, the color image shown in FIG. 2). Target segmentation image).
  • the encoder network 201 includes: a first down-sampling block 301, a second down-sampling block 203, a first dense calculation block 303, a third down-sampling block 304, and a second dense calculation block 305.
  • the second down-sampling block 302 is located between the first down-sampling block 301 and the first dense calculation block 303
  • the third down-sampling block 304 is located between the first dense calculation block 303 and the second dense calculation block 305.
  • the first down-sampling block 301 is configured to down-sample the input image to obtain a plurality of first down-sampling feature maps with a first resolution;
  • the second down-sampling block 302 is configured to perform down-sampling on a plurality of first down-sampling feature maps.
  • the down-sampling feature maps are down-sampled to obtain multiple second down-sampled feature maps with a second resolution;
  • the first dense calculation block 303 is configured to perform multi-level feature extraction on the multiple second down-sampled feature maps to obtain A plurality of first densely-computed feature maps with a second resolution;
  • the third down-sampling block 304 is configured to down-sample the plurality of first densely-computed feature maps to obtain a plurality of third down-sampled features with a third resolution Figure;
  • the second dense calculation block 305 is configured to perform multi-level feature extraction from multiple third down-sampled feature maps to obtain multiple second dense calculation feature maps with a third resolution.
  • the first resolution is greater than the second resolution
  • the second resolution is greater than the third resolution
  • the first resolution is less than the resolution of the input image.
  • the encoder network 201 includes five processing levels, corresponding to three downsampling blocks and two dense calculation blocks respectively; among them, three downsampling blocks and two dense calculation blocks are used.
  • Using multiple down-sampling blocks to gradually reduce the spatial dimension of the feature map can expand the receptive field, so that the encoder network can better extract local and global features of different scales, and the down-sampling block can compress the extracted feature maps, thereby saving The amount of calculation and memory occupancy, and increase the processing speed.
  • the decoder network 202 includes: a first up-sampling block 306, a first convolution block 307, a third dense calculation block 308, a second convolution block 309, a second up-sampling block 310, and a third volume The product block 311 and the third up-sampling block 312.
  • the first up-sampling block 306 is located between the second dense calculation block 305 and the first convolution block 307
  • the third dense calculation block 308 is located between the first convolution block 307 and the second convolution block 309
  • the second The upsampling block 310 is located between the second convolution block 309 and the third convolution block 311
  • the third convolution block 311 is located between the second upsampling block 310 and the third upsampling block 311.
  • the first up-sampling block 306 is configured to up-sample the multiple second densely calculated feature maps output by the encoder network 201 to obtain multiple first up-sampled feature maps with a second resolution.
  • the first convolution block 307 is configured to perform feature extraction on the first fusion feature map group obtained by splicing the plurality of first up-sampling feature maps and the plurality of second down-sampling feature maps in the channel dimension to obtain a second A plurality of first intermediate feature maps with a resolution, and the plurality of first up-sampling feature maps and the plurality of second down-sampling feature maps have the same resolution;
  • the third dense calculation block 308 is configured to compare the plurality of first intermediate feature maps The image performs multi-level feature extraction to obtain multiple third densely calculated feature maps with a second resolution;
  • the second convolution block 309 is configured to perform multi-level feature extraction on the multiple third densely calculated feature maps and multiple first densely calculated feature maps.
  • the second fusion feature map group obtained by splicing the feature maps in the channel dimension performs feature extraction to obtain a plurality of second intermediate feature maps with a second resolution, the plurality of third densely calculated feature maps and a plurality of first dense
  • the calculated feature maps have the same resolution
  • the second up-sampling block 310 is configured to up-sampling a plurality of second intermediate feature maps to obtain a plurality of second up-sampling feature maps with the first resolution
  • the third convolution block 311 Configured to perform feature extraction on the third fusion feature map group obtained by splicing the plurality of second up-sampling feature maps and the plurality of first down-sampling feature maps in the channel dimension to obtain a plurality of third intermediate feature maps, the The multiple second up-sampling feature maps and the multiple first down-sampling feature maps have the same resolution
  • the third up-sampling block 312 is configured to perform an up-sampling operation on the multiple third intermediate feature maps to obtain the same resolution as the input image Rate the
  • the decoder network 202 includes seven processing levels, corresponding to three upsampling blocks (Upsampling Block), three convolution blocks (Convolution Block), and one dense calculation block.
  • the up-sampling block restores the spatial resolution of the multiple feature maps extracted by the encoder network 201 to be consistent with the input image, and performs feature extraction through three up-sampling blocks, three convolutional blocks, and a dense calculation block to convert the encoder
  • the multiple feature maps extracted by the network 201 are gradually transformed into the target segmented image of the input image.
  • the first resolution may be 1/2
  • the second resolution may be 1/4
  • the third resolution may be 1/8.
  • the size of the feature map with the first resolution is (H/2) ⁇ (W/2)
  • the size of the feature map with the resolution is (H/4) ⁇ (W/4)
  • the size of the feature map with the third resolution is (H/8) ⁇ (W/8).
  • a skip connection is established between the encoder network 201 and the decoder network 202, and the skip connection between the encoder network 201 and the decoder network 202 may be Concat method.
  • the resolutions of the multiple first downsampling feature maps obtained by the first downsampling block 301 and the resolutions of the multiple second upsampling feature maps obtained by the second upsampling block 310 are both the first resolution.
  • the multiple first down-sampling feature maps obtained by the first down-sampling block 301 are input to the second down-sampling block 302, and the multiple second up-sampling feature maps obtained by the second up-sampling block 310 are in the channel dimension After stitching, input to the third convolution block 311; the resolution of the multiple second down-sampling feature maps obtained by the second down-sampling block 302 and the resolution of the multiple first up-sampling feature maps obtained by the first up-sampling block 306 Rate is the second resolution, the multiple second down-sampling feature maps obtained by the second down-sampling block 302 are not only input to the first dense calculation block 303, but also combined with the multiple first up-sampling feature maps obtained by the first up-sampling block 306.
  • the sampled feature maps are spliced in the channel dimension and then input to the first convolution block 307; the resolutions of the multiple first densely calculated feature maps obtained by the first dense calculation block 303 and the multiple third densely calculated feature maps obtained by the third dense calculation block 308
  • the resolution of the densely calculated feature maps is the second resolution, and the multiple first densely calculated feature maps obtained by the first densely calculated block 303 are both input to the third down-sampling block 304, and obtained from the third densely calculated block 308
  • the multiple third densely-computed feature maps of are input to the second convolution block 309 after being spliced in the channel dimension.
  • splicing feature maps with the same resolution and size according to the channel dimensions is an increase in the number of channels of the feature maps with the same resolution and size.
  • the feature map output by the first dense calculation block 303 is Channel1 ⁇ h ⁇ w, where h and w represent the length and width of the feature map, Channel1 represents the number of output channels of the first dense calculation block 303, and the third dense calculation block
  • the feature map output by 308 is Channel2 ⁇ h ⁇ w, where Channel2 represents the number of output channels of the third dense calculation block 308, and the size and resolution of the feature maps output by the first dense calculation block 303 and the third dense calculation block 308 are the same .
  • the second fused feature map group obtained by splicing the feature maps output by the first dense calculation block 303 and the third dense calculation block 308 in the channel dimension is (Channel1+Channel2) ⁇ h ⁇ w.
  • the image details and information lost in the encoder network 201 during multiple downsampling processes can be transferred to the decoder network 202.
  • the decoder network 202 can use this information to generate a more accurate target segmentation image, thereby improving the matting effect.
  • the number of input channels of the first downsampling block 301 is 3 and the number of output channels is 15; the number of input channels of the second downsampling block 302 is 15, and the number of output channels is 60; the first intensive calculation The number of input channels of block 303 is 60, and the number of output channels is 260; the number of input channels of the third downsampling 304 is 260, and the number of output channels is 130; the number of input channels of the second intensive calculation block 305 is 130, and the number of output channels is 450.
  • the number of input channels of the first upsampling block 306 is 450, and the number of output channels is 60; the number of input channels of the first convolution block 307 is 120, and the number of output channels is 60; the number of input channels of the third dense calculation block 308 is 60 , The number of output channels is 260; the number of input channels of the second convolution block 309 is 520, and the number of output channels is 260; the number of input channels of the second upsampling block 310 is 260, and the number of output channels is 15; the third convolution block The number of input channels of 311 is 30 and the number of output channels is 15; the number of input channels of the third upsampling block 312 is 15, and the number of output channels is 1.
  • any convolutional block in the decoder network may include a convolutional layer and an activation layer, where the activation layer is located after the convolutional layer.
  • the convolutional layer is configured to perform a convolution operation, and may include one or more convolution kernels.
  • the structures and parameters of multiple convolutional blocks in the convolutional neural network of the present disclosure may be different from each other, or at least partially the same. However, this disclosure is not limited to this.
  • any down-sampling block in the encoder network may include a convolutional layer, a pooling layer, and an activation layer.
  • the convolutional layer is configured to perform a convolution operation, and may include one or more convolution kernels.
  • the pooling layer is a form of downsampling; the pooling layer can be configured to reduce the size of the input image, simplify the complexity of calculation, and reduce the phenomenon of overfitting to a certain extent; the pooling layer can perform feature compression , To extract the main features of the input image.
  • the structures and parameters of multiple downsampling blocks may be different from each other, or at least partially the same. However, this disclosure is not limited to this.
  • any down-sampling block in the encoder network is configured to perform a down-sampling operation, which can reduce the size of the feature map, perform feature compression, and extract main features to simplify calculation complexity, and to a certain extent To reduce the phenomenon of over-fitting.
  • down-sampling operations can include: maximum merging, average merging, random merging, decimation (for example, selecting fixed pixels), demultiplexing output (demuxout, splitting the input image into multiple smaller images) )Wait.
  • this disclosure is not limited to this.
  • any upsampling block in the decoder network may include an upsampling layer and an activation layer, where the upsampling layer may include a convolutional layer.
  • the convolutional layer is configured to perform a convolution operation, and may include one or more convolution kernels.
  • the structures and parameters of multiple upsampling blocks may be different from each other, or at least partially the same. However, this disclosure is not limited to this.
  • any up-sampling layer in the decoder network is configured to perform an up-sampling operation; wherein, the up-sampling operation may include: maximum merging, strides transposed convolutions, interpolation ( For example, interpolation, two cubic interpolation, etc.) and so on.
  • the up-sampling operation may include: maximum merging, strides transposed convolutions, interpolation ( For example, interpolation, two cubic interpolation, etc.) and so on.
  • interpolation For example, interpolation, two cubic interpolation, etc.
  • the number of up-sampling blocks in the decoder network 220 is the same as the number of down-sampling blocks in the encoder network 210, so that the target segmented image and the input image have the same resolution, and can It is ensured that the feature maps obtained by the two processing levels of skip connection have the same resolution.
  • the first dense calculation block 303 includes five convolution modules, and the input of any convolution module except the first convolution module in the first dense calculation block 303 includes the convolution The output of all convolution modules before the module.
  • the second dense calculation block 305 includes eight convolution modules, and the input of any convolution module except the first convolution module in the second dense calculation block 305 includes the input of all convolution modules before the convolution module. Output.
  • the third dense calculation block 308 includes five convolution modules, and the input of any convolution module except the first convolution module in the third dense calculation block 308 includes the input of all convolution modules before the convolution module. Output.
  • the convolution modules in the first dense calculation block 303, the second dense calculation block 305, and the third dense calculation block 308 are connected in series to achieve dense connection.
  • FIG. 5 is an exemplary diagram of the first dense calculation block 303 provided by an embodiment of the disclosure.
  • the first dense calculation block 303 includes a first convolution module 315, a second convolution module 316, a third convolution module 317, a fourth convolution module 318, and a fifth convolution module 319 connected in series.
  • the first dense calculation block 303 includes a first convolution module 315, a second convolution module 316, a third convolution module 317, a fourth convolution module 318, and a fifth convolution module 319 connected in series.
  • the first convolution module 315 is configured to receive and process C 1 feature maps to obtain K 1 feature maps, and convert C 1 The two feature maps and K 1 feature maps are spliced in the channel dimension to obtain C 1 +K 1 feature maps;
  • the second convolution module 316 is configured to receive and process C 2 feature maps to obtain K 2 feature maps,
  • the C 2 feature maps and K 2 feature maps are spliced in the channel dimension to obtain C 2 +K 2 feature maps, where C 2 feature maps are C 1 + obtained by the first convolution module 315 K 1 feature map.
  • the number of input channels of the first convolution module 315 is C 1
  • the number of output channels is C 1 +K 1
  • the number of input channels of the second convolution module 316 is C 1 +K 1
  • the number of output channels is C 1 +K 1 +K 2 .
  • the number of input channels of the third convolution module 317 is C 1 +K 1 +K 2
  • the number of output channels is C 1 +K 1 +K 2 +K 3
  • the input of the fourth convolution module 318 The number of channels is C 1 +K 1 +K 2 +K 3
  • the number of output channels is C 1 +K 1 +K 2 +K 3 +K 4
  • the number of input channels of the fifth convolution module 319 is C 1 +K 1 +K 2 +K 3 +K 4
  • the number of output channels is C 1 +K 1 +K 2 +K 3 +K 4 +K 5 .
  • the input of the third convolution module 317 includes the previous output of the first convolution module 315 and the second convolution module 316
  • the input of the fourth convolution module 318 includes the previous first convolution module 315 and the second convolution module 315.
  • the input of the fifth convolution module 319 includes the previous first convolution module 315, the second convolution module 316, the third convolution module 317, and the fourth convolution module. 318 output.
  • the growth rate coefficient of any convolution module is the same, and the growth rate coefficient of the convolution module is the number of channels increased by the number of output channels of the convolution module compared to the number of input channels.
  • K 1 , K 2 , K 3 , K 4 and K 5 are all the same.
  • each dense calculation block in Figure 4 illustrates the number of feature maps obtained by each convolution module processing the input feature maps (the i-th convolution module obtains K i feature maps) transfer method.
  • Figure 5 is a block intensive computing a convolution integral in series a plurality of modules, as in FIG. 4 K reflects the i-th transmission of the i-th feature modules convolution is equivalent to FIG.
  • FIG. 6 is an exemplary diagram of the first convolution module 315 provided by an embodiment of the disclosure.
  • the first convolution module includes a convolution layer 401, an activation layer 402, a first asymmetric convolution network 41, a second asymmetric convolution network 42, and a random loss Live (Dropout) layer 409.
  • the first asymmetric convolutional network 41 includes a convolutional layer 403, a convolutional layer 404, and an activation layer 405 that are sequentially cascaded;
  • the second asymmetric convolutional network 42 includes a convolutional layer 406 and a convolutional layer that are sequentially cascaded 407 and activation layer 408.
  • the first convolution module 315 includes two sets of asymmetric convolution kernels. However, this disclosure is not limited to this.
  • the number of input channels of the first convolution module is C 1
  • the number of output channels is C 1 +K 1
  • the number of input channels of the convolutional layer 401 is C 1
  • the number of output channels is K 1
  • the number of input channels and output channels of the convolutional layers 403 and 404 are both K 1
  • the number of input channels of the convolutional layers 406 and 407 And the number of output channels are both K 1 .
  • the input of the first convolution module and the output result of the Dropout layer 409 are connected by Concat to generate the output result of the first convolution module; in other words, the output result of the first convolution module is the first convolution
  • the input feature map of the module and the feature map generated by the convolutional network are spliced in the channel dimension. In this way, multiple convolution modules can be connected in series to form a dense connection, that is, any convolution module in the dense calculation block receives and processes the output results of all convolution modules before the convolution module.
  • the convolution kernel of the convolution layer 401 is 1 ⁇ 1.
  • the convolution layer 401 with the convolution kernel of 1 ⁇ 1 can reduce the dimension when the convolution module performs feature extraction operations, reduce the number of feature maps, reduce the amount of calculation, and increase the degree of non-linearity of the convolutional neural network.
  • the convolution kernel of the convolution layer 403 is 3 ⁇ 1, and the convolution kernel of the convolution layer 404 is 1 ⁇ 3;
  • the convolution kernel of the convolutional layer 406 is obtained by the expansion operation of 3 ⁇ 1, and the convolution kernel of the convolutional layer 407 is obtained by the expansion operation of 1 ⁇ 3.
  • the expansion coefficient of the expansion operation can be Is d. Different convolution modules in the same intensive calculation block may use the same or different expansion coefficients, or some convolution modules may use the same expansion coefficient. However, this disclosure is not limited to this.
  • the expansion operation on the asymmetric convolution kernel can not only increase the receptive field, but also reduce the loss of spatial information, and keep the resolution of the feature maps output by the densely connected convolution modules consistent.
  • the amount of calculation can be greatly reduced, thereby increasing the processing speed.
  • the Dropout layer 409 can effectively prevent overfitting, and the Dropout layer 409 can be automatically turned off in the non-training phase.
  • this disclosure is not limited to this.
  • the structures and parameters of the second convolution module 316, the third convolution module 317, the fourth convolution module 318, and the fifth convolution module 319 in FIG. 5 may be the same as those of the first convolution module.
  • the structure and parameters of 315 are the same or partly the same. However, this disclosure is not limited to this.
  • multiple convolution modules in one intensive calculation block can select different growth rate coefficients and expansion coefficients.
  • this disclosure is not limited to this.
  • the growth rate coefficients of all convolution modules included in the three dense calculation blocks can be All are 40; the expansion coefficients of the five convolution modules included in the first intensive calculation block 303 may be (1, 1, 1, 2, 2); the expansion coefficients of the five convolution modules included in the third intensive calculation block 303 The coefficients can be respectively (1, 1, 1, 2, 2); the expansion coefficients of the eight convolution modules included in the second dense calculation block can be respectively (2, 2, 4, 4, 8, 8, 16, 16 ).
  • the structures and parameters of the first dense calculation block 303 and the third dense calculation block 308 may be completely the same.
  • the activation layer may include an activation function, and the activation function is used to introduce a non-linear factor to the convolutional neural network, so that the convolutional neural network can better solve more complicated problems.
  • the activation function may include a linear correction unit (ReLU) function, a sigmoid function (Sigmoid function), or a hyperbolic tangent function (tanh function).
  • the ReLU function is an unsaturated nonlinear function, and the Sigmoid function and tanh function are saturated nonlinear functions.
  • the activation layer can be used as a layer of the convolutional neural network alone, or the activation layer can be included in the convolutional layer.
  • the activation layer may include a normalization (Normalization) layer and an activation function.
  • the activation layer 402 is configured to perform an activation operation on the output of the convolution layer 401
  • the activation layer 405 is configured to perform an activation operation on the output of the convolution layer 404
  • the activation layer 408 It is configured to perform an activation operation on the output of the convolutional layer 407.
  • the activation layers 402, 405, and 408 may each include a regularization layer and an activation function. Among them, the activation functions in different activation layers may be the same or different, and the regularization layers in different activation layers may be the same or different. However, this disclosure is not limited to this.
  • the image processing method provided by this exemplary embodiment can automatically extract the input portrait image by combining a dense calculation block with an asymmetric convolution kernel and a convolutional neural network with a jump-connected encoder decoder network , And the cutout results can be obtained in real time, which improves the processing speed and accuracy of the cutout results.
  • the image processing method provided by the embodiment of the present disclosure further includes: training a convolutional neural network, and the convolutional neural network includes an encoder network and a decoder network. Before using the convolutional neural network for matting, it is necessary to train the convolutional neural network. After training, the parameters of the convolutional neural network remain unchanged during image processing. During the training process, the parameters of the convolutional neural network will be adjusted according to the training results to obtain an optimized convolutional neural network.
  • the parameters of the convolutional neural network may include: convolution kernel and bias. Among them, the convolution kernel determines how to process the processed image, and the bias determines whether the output of the convolution kernel is input to the next layer.
  • FIG. 7 is an example diagram of a training process of a convolutional neural network provided by an embodiment of the present disclosure.
  • FIG. 8 is a schematic diagram of training a convolutional neural network provided by an embodiment of the disclosure.
  • training a convolutional neural network may include the following steps 501 to 504.
  • Step 501 Obtain training images.
  • the training image can be selected from the two matting data sets of aisegment matting human and Portrait Matting; or, the COCO (Common Objects in Context) data set can be used to background images that do not contain portraits. Replace to achieve data expansion.
  • COCO Common Objects in Context
  • Step 502 Use a convolutional neural network to process the training image to generate a training segmentation image. This process is the same as the process of using the convolutional neural network to process the input image to generate the target segmentation image, and will not be repeated here.
  • Step 503 According to the training segmentation image and the standard segmentation image corresponding to the training image, use the loss function to calculate the loss value of the convolutional neural network.
  • Step 504 Optimize the parameters of the convolutional neural network according to the loss value.
  • the loss function is an important equation used to measure the difference between the predicted value (training segmentation image) and the target value (standard segmentation image). For example, the higher the output value (loss) of the loss function, the greater the difference.
  • judging whether the convolutional neural network has converged can be determined by at least one of the following methods: judging whether the number of times of updating the parameters of the convolutional neural network reaches the iteration threshold ; Determine whether the loss value of the convolutional neural network is lower than the loss threshold.
  • the iteration threshold may be a preset number of iterations. For example, if the number of times that the parameters of the convolutional neural network are updated is greater than the beat threshold, the training ends.
  • the loss threshold may be preset. For example, if the loss value calculated by the loss function is less than the loss threshold, the training is ended.
  • the training module 60 may include a loss calculation unit 601 and an optimizer 602; the loss calculation unit 601 is configured to use a loss function to calculate the loss value of the convolutional neural network 20, and the optimizer 602 is configured to optimize the convolution according to the loss value.
  • the parameters of the neural network 20 include matting on the training image to generate training segmentation images.
  • the loss calculation unit 601 obtains the standard segmentation image corresponding to the training image from the training data set, and uses the loss function to calculate the loss value according to the training segmentation image and the standard segmentation image; the optimizer 602 adjusts the convolutional nerve according to the loss value calculated by the loss calculation unit 601 Network 20 parameters.
  • the optimizer can use the stochastic gradient descent method, and the learning rate adjustment strategy of the optimizer uses the cosine annealing method with restart.
  • this disclosure is not limited to this.
  • the loss function may be obtained by weighted addition of an edge loss function, a matting mask loss function, and a foreground loss function. That is, the loss function can be expressed as:
  • L w 1 L edge + w 2 L alpha + w 3 L foreground ;
  • L edge is the edge loss function
  • L alpha is the matting mask loss function
  • L foreground is the foreground loss function
  • w 1 , w 2 , and w 3 are the weights.
  • w 1 , w 2 , and w 3 can be determined according to actual conditions or based on empirical values, which is not limited in the present disclosure.
  • G x (A out ) K x ⁇ A out
  • G y (A out ) K y ⁇ A out ;
  • G x (A gt ) K x ⁇ A gt
  • G y (A gt ) K y ⁇ A gt ;
  • K x and K y are edge detection operators
  • a out is the training segmentation image
  • a gt is the standard segmentation image corresponding to the training image.
  • the edge detection operator can use operators such as Sobel, Prewitt, Scharr, etc.
  • operators such as Sobel, Prewitt, Scharr, etc.
  • the present disclosure is not limited to this.
  • the edge detection operator can use the Scharr operator, namely:
  • the edge loss function is designed by using the edge detection operator, and the edge of the subject to be extracted in the matting result is added. Constraints to get a better matting effect.
  • the foreground loss function can be expressed as:
  • a out is the training segmentation image
  • a gt is the standard segmentation image corresponding to the training image
  • I is the training image
  • To train the i-th pixel of the segmented image A out Is the i-th pixel of the standard segmented image A gt
  • I ij is the j-th channel of the i-th pixel of the training image I.
  • the image processing method provided by the embodiments of the present disclosure can use a convolutional neural network that combines a dense calculation block with an asymmetric convolution kernel, an encoder network and a decoder network with jump connections to realize real-time automatic automatic input of an input image. Cut out and improve the effect of matting, that is, not only can output a high-quality image, but also greatly increase the processing speed.
  • the exemplary embodiment of the present disclosure adopts an edge loss function, a matting mask loss function, and a foreground loss function in the training process of the convolutional neural network, which can improve the matting effect.
  • FIG. 9 is a schematic diagram of an image processing device provided by an embodiment of the present disclosure.
  • the image processing device 70 provided in this embodiment includes: an image acquisition module 701 and an image processing module 702. These components are interconnected by a bus system or other form of connection mechanism (not shown).
  • the components and structures of the image processing device shown in FIG. 9 are only exemplary, and not restrictive. The image processing device may also have other components and structures as required.
  • the image acquisition module 701 is configured to acquire an input image.
  • the image acquisition module 701 may include a memory in which the input image is stored; or, the image acquisition module 701 may include one or more cameras to acquire the input image.
  • the image acquisition module 701 may be hardware, software, firmware, and any feasible combination thereof.
  • the image processing module 702 is configured to down-sampling and feature extraction of the input image through the encoder network to obtain multiple feature maps; Sampling and feature extraction, to obtain the target segmentation image.
  • the encoder network and the decoder network respectively include multiple processing levels, multiple feature maps processed at the L-th processing level in the encoder network, and multiple feature maps processed at the J-th processing level in the decoder network. After multiple feature maps are merged, they are input to the J+1 processing level in the decoder network, where the multiple feature maps processed by the L-th processing level in the encoder network and the J-th processing level in the decoder network Multiple feature maps obtained by hierarchical processing have the same resolution; L and J are both positive integers.
  • At least one of the multiple processing levels of the encoder network includes a dense calculation block, and at least one of the multiple processing levels of the decoder network includes a dense calculation block; the encoder network and the decoder network
  • the M-th intensive calculation block in includes N convolution modules, and the input of the i-th convolution module in the N convolution modules includes i-1 convolution modules before the i-th convolution module
  • At least one of the N convolution modules includes at least one set of asymmetric convolution kernels; i, N, and M are all integers, and M is greater than or equal to 1 and less than or equal to the encoder network and decoding
  • the total number of dense computing blocks in the processor network N is greater than or equal to 3, i is greater than or equal to 3 and less than or equal to N.
  • the image processing module 702 in the image processing device 70 provided in this embodiment includes a convolutional neural network.
  • the convolutional neural network has the same structure and function as the convolutional neural network in the embodiment of the above-mentioned image processing method. Go into details again.
  • the image processing device may further include a training module configured to train a convolutional neural network.
  • the training module may include: a loss calculation unit and an optimizer. The process of using the training module to train the convolutional neural network can refer to the relevant description in the embodiment of the above-mentioned image processing method, so it will not be repeated here.
  • FIG. 10 is a schematic diagram of an image processing device provided by an embodiment of the disclosure.
  • the image processing device 80 includes: a processor 801 and a memory 802; the memory 802 is configured to store program instructions; the processor 801 implements the steps of the image processing method in any of the foregoing embodiments when the program instructions are executed.
  • the components of the image processing device 80 shown in FIG. 10 are only exemplary and not restrictive. According to actual application requirements, the image processing device 80 may also have other components.
  • the processor 801 and the memory 802 may directly or indirectly communicate with each other.
  • the network may include a wireless network, a wired network, or any combination of a wired network and a wireless network.
  • the network may include a local area network, the Internet, a telecommunications network, the Internet of Things based on the Internet, the Internet of Things based on a telecommunications network, and any combination of the above networks.
  • Wired networks can, for example, use twisted pair, coaxial cable, or optical fiber transmission for communication, and wireless networks can, for example, use 3G, 4G, 5G mobile communication networks, Bluetooth, or WIFI, and other communication methods.
  • the present disclosure does not limit the types and functions of the network here.
  • the processor 801 may control other components in the image processing apparatus to perform desired functions.
  • the processor 801 may be a central processing unit (CPU, Central Processing Unit), a tensor processor (TPU, Tensor Processing Unit), or an image processor (GPU, Graphics Processing Unit) and other devices with data processing capabilities or program execution capabilities.
  • the GPU can be directly integrated on the motherboard alone, or built into the north bridge chip of the motherboard; or, the GPU can be built into the CPU.
  • the memory 802 may include any combination of one or more computer program products, and the computer program products may include various forms of computer-readable storage media, such as volatile memory and non-volatile memory.
  • Volatile memory may include, for example, random access memory (RAM, Random Access Memory), cache memory (Cache), and the like.
  • Non-volatile memory may include, for example, read only memory (ROM, Read Only Memory), hard disk, erasable programmable read only memory (EPROM, Erasable Programmable Read Only Memory), compact disk read only memory (CD-ROM), general purpose Serial bus (USB, Universal Serial Bus) memory, flash memory, etc.
  • the computer-readable storage medium may also store one or more application programs and one or more types of data, for example, input images, and one or more types of data used or generated by the application programs.
  • one or more computer-readable codes or program instructions may be stored in the memory 802, and the processor may execute the program instructions to execute the above-mentioned image processing method.
  • the image processing method reference may be made to the relevant description in the embodiment of the above image processing method, so it will not be repeated here.
  • At least one embodiment of the present disclosure also provides a computer-readable storage medium that stores program instructions, and when the program instructions are executed, the foregoing image processing method can be implemented.
  • Such software may be distributed on a computer-readable medium, and the computer-readable medium may include a computer storage medium (or a non-transitory medium) and a communication medium (or a transitory medium).
  • the term computer storage medium includes volatile and non-volatile data implemented in any method or technology for storing information (such as computer-readable instructions, data structures, program modules, or other data). Sexual, removable and non-removable media.
  • Computer storage media include but are not limited to RAM, ROM, EEPROM, flash memory or other memory technologies, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cassettes, magnetic tapes, magnetic disk storage or other magnetic storage devices, or Any other medium used to store desired information and that can be accessed by a computer.
  • communication media usually contain computer-readable instructions, data structures, program modules, or other data in a modulated data signal such as carrier waves or other transmission mechanisms, and may include any information delivery media. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Multimedia (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Image Analysis (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

一种图像处理方法,包括:获取输入图像;通过编码器网络对输入图像进行下采样和特征提取,得到多个特征图;通过解码器网络对所述多个特征图进行上采样和特征提取,得到目标分割图像;其中,编码器网络和解码器网络之间输出相同分辨率的特征图的处理层级相连,且编码器网络和解码器网络分别包括一个或多个密集计算块,任一密集计算块内的至少一个卷积模块包括至少一组非对称卷积核。

Description

图像处理方法、图像处理装置及设备
相关文献的交叉引用
本公开要求于2020年2月21日递交的中国专利申请第202010110386.7号的优先权,在此全文引用上述中国专利申请公开的内容以作为本申请的一部分。
技术领域
本公开涉及一种图像处理方法、图像处理装置及设备。
背景技术
图像抠图(Image matting)是图像处理与计算机视觉领域的一个研究方向,通过抠图可以将图像中的前景与背景分离。抠图所得的结果可以有多种应用,比如背景替换、证件照生成、虚拟合影生成、虚拟布景、背景虚化等。
发明内容
本公开提供一种图像处理方法、图像处理装置及设备。
一方面,本公开提供一种图像处理方法,包括:获取输入图像;通过编码器网络对所述输入图像进行下采样和特征提取,得到多个特征图;通过解码器网络对所述多个特征图进行上采样和特征提取,得到目标分割图像;其中,编码器网络和解码器网络分别包括多个处理层级,编码器网络中第L个处理层级处理得到的多个特征图和解码器网络中第J个处理层级处理得到的多个特征图融合后,输入到解码器网络中第J+1个处理层级,其中,编码器网络中第L个处理层级处理得到的多个特征图和解码器网络中第J个处理层级处理得到的多个特征图具有相同的分辨率;L、J均为正整数;其中,编码器网络的多个处理层级中至少一个处理层级包括密集计算块,解码器网络的多个处理层级中至少一个处理层级包括密集计算块;编码器网络和解码器网络中的第M个密集计算块包括N个卷积模块,所述N个卷积模块中的第i 个卷积模块的输入包括所述第i个卷积模块之前的i-1个卷积模块的输出;所述N个卷积模块中的至少一个卷积模块包括至少一组非对称卷积核;i、N、M均为整数,M大于或等于1且小于或等于编码器网络和解码器网络中的密集计算块的总数量,N大于或等于3,i大于或等于3且小于或等于N。
另一方面,本公开提供一种图像处理装置,包括:图像获取模块,配置为获取输入图像;图像处理模块,配置为通过编码器网络对所述输入图像进行下采样和特征提取,得到多个特征图;通过解码器网络对所述多个特征图进行上采样和特征提取,得到目标分割图像;其中,编码器网络和解码器网络分别包括多个处理层级,编码器网络中第L个处理层级处理得到的多个特征图和解码器网络中第J个处理层级处理得到的多个特征图融合后,输入到解码器网络中第J+1个处理层级,其中,编码器网络中第L个处理层级处理得到的多个特征图和解码器网络中第J个处理层级处理得到的多个特征图具有相同的分辨率;L、J均为正整数;其中,编码器网络的多个处理层级中至少一个处理层级包括密集计算块,解码器网络的多个处理层级中至少一个处理层级包括密集计算块;编码器网络和解码器网络中的第M个密集计算块包括N个卷积模块,所述N个卷积模块中的第i个卷积模块的输入包括所述第i个卷积模块之前的i-1个卷积模块的输出;所述N个卷积模块中的至少一个卷积模块包括至少一组非对称卷积核;i、N、M均为整数,M大于或等于1且小于或等于编码器网络和解码器网络中的密集计算块的总数量,N大于或等于3,i大于或等于3且小于或等于N。
另一方面,本公开提供一种图像处理设备,包括:存储器和处理器,所述存储器配置为存储程序指令,所述处理器执行所述程序指令时实现如上所述的图像处理方法的步骤。
另一方面,本公开提供一种计算机可读存储介质,存储有程序指令,当所述程序指令被执行时实现如上所述的图像处理方法的步骤。
附图说明
附图用来提供对本公开技术方案的理解,并且构成说明书的一部分,与本公开实施例一起用于解释本公开的技术方案,并不构成对本公开技术方案 的限制。
图1为本公开一实施例提供的图像处理方法的流程图;
图2为本公开一实施例的输入图像和生成的目标分割图像的示例图;
图3为图2所示的目标分割图像的应用效果的示例图;
图4为本公开一实施例提供的编码器网络和解码器网络的示例图;
图5为本公开一实施例提供的第一密集计算块的示例图;
图6为本公开一实施例提供的第一卷积模块的示例图;
图7为本公开一实施例提供的卷积神经网络的训练流程示例图;
图8为本公开一实施例提供的卷积神经网络的训练示意图;
图9为本公开一实施例提供的图像处理装置的示意图;
图10为本公开一实施例提供的图像处理设备的示意图。
具体实施方式
本公开描述了多个实施例,但是该描述是示例性的,而不是限制性的,并且对于本领域的普通技术人员来说显而易见的是,在本公开所描述的实施例包含的范围内可以有更多的实施例和实现方案。尽管在附图中示出了许多可能的特征组合,并在实施方式中进行了讨论,但是所公开的特征的许多其它组合方式也是可能的。除非特意加以限制的情况以外,任何实施例的任何特征或元件可以与任何其它实施例中的任何其他特征或元件结合使用,或可以替代任何其它实施例中的任何其他特征或元件。
本公开包括并设想了与本领域普通技术人员已知的特征和元件的组合。本公开已经公开的实施例、特征和元件也可以与任何常规特征或元件组合,以形成由权利要求限定的独特的方案。任何实施例的任何特征或元件也可以与来自其它方案的特征或元件组合,以形成另一个由权利要求限定的独特的技术方案。因此,应当理解,在本公开中示出或讨论的任何特征可以单独地或以任何适当的组合来实现。因此,除了根据所附权利要求及其等同替换所做的限制以外,实施例不受其它限制。此外,可以在所附权利要求的保护范 围内进行各种修改和改变。
此外,在描述具有代表性的实施例时,说明书可能已经将方法或过程呈现为特定的步骤序列。然而,在该方法或过程不依赖于本文所述步骤的特定顺序的程度上,该方法或过程不应限于所述的特定顺序的步骤。如本领域普通技术人员将理解的,其它的步骤顺序也是可能的。因此,说明书中阐述的步骤的特定顺序不应被解释为对权利要求的限制。此外,针对该方法或过程的权利要求不应限于按照所写顺序执行它们的步骤,本领域技术人员可以容易地理解,这些顺序可以变化,并且仍然保持在本公开实施例的精神和范围内。
除非另外定义,本公开使用的技术术语或科学术语为本公开所属领域内具有一般技能的人士所理解的通常意义。本公开中使用的“第一”、“第二”以及类似的词语并不表示任何顺序、数量或者重要性,而只是用来区分不同的组成部分。本说明中,“多个”表示两个或两个以上的数目。
为了保持本公开实施例的以下说明清楚且简明,本公开省略了部分已知功能和已知部件的详细说明。本公开实施例附图只涉及到与本公开实施例涉及到的结构,其他结构可参考通常设计。
随着图像处理技术的发展,利用深度学习技术对图像进行抠图逐渐成为研究热点。比如,利用串联的多个不同结构的全卷积深度神经网络分别进行以下处理:检测包含待抠图图像中所需抠取主体的主体框,对主体框中的像素进行分类得到三分图(Trimap),以及根据三分图抠取主体框中的主体。然而,采用多个全卷积深度神经网络进行抠图时,上一级全卷积神经网络的输出结果的准确性会影响下一级全卷积神经网络的输出结果的准确性,导致抠图效果不佳。而且,目前用于抠图的卷积深度神经网络的计算效率较低、处理速度无法达到实时抠图。
卷积神经网络(CNN,Convolutional Neural Networks)是一种使用例如图像作为输入和输出,并通过滤波器(卷积核)来替代标量权重的神经网络结构。卷积过程可以看作是使用一个可训练的滤波器对一个输入的图像或卷积特征平面(feature map)做卷积,输出一个卷积特征平面,卷积特征平面还可以称为特征图。卷积层是指卷积神经网络中对输入信号进行卷积处理的 神经元层。在卷积神经网络的卷积层中,一个神经元只与部分相邻层的神经元连接。卷积层可以对输入图像应用若干个卷积核,以提取输入图像的多种类型的特征。每个卷积核可以提取一种类型的特征。卷积核一般以随机大小的矩阵的形式初始化,在卷积神经网络的训练过程中,卷积核将通过学习以得到合理的权值。在同一卷积层中,可以使用多个卷积核来提取不同的图像信息。
本公开实施例提供一种图像处理方法、图像处理装置及设备,可以利用一个卷积神经网络对输入图像进行处理,自动生成目标分割图像,其中,本公开实施例提供的卷积神经网络结合带有非对称卷积核的密集计算块、带有跳跃连接的编码器解码器网络,可以提高抠图效果和处理速度,降低运算所需时间,支持实现对输入图像的实时自动抠图,具有更好、更广泛的应用前景。
图1为本公开一实施例提供的图像处理方法的流程图。如图1所示,本公开实施例提供的图像处理方法包括以下步骤:
步骤101、获取输入图像;
步骤102、通过编码器网络对输入图像进行下采样和特征提取,得到多个特征图;
步骤103、通过解码器网络对所述多个特征图进行上采样和特征提取,得到目标分割图像。
本实施例提供的图像处理方法用于将输入图像中的目标对象与背景分离,目标分割图像可以为目标对象的抠图蒙版。其中,目标对象可以是输入图像中的人像,或者可以是预先设置的检测对象(例如,动物、建筑物等)。然而,本公开对此并不限定。
在一示例性实施方式中,输入图像可以包括人物图像,例如,输入图像可以为通过数码相机或手机等图像采集设备拍摄的一张人物图像,或者,可以为通过图像采集设备拍摄的视频中的一帧人物图像。然而,本公开对此并不限定。
图2为本公开一实施例的输入图像和生成的目标分割图像的示例图。如 图2所示,卷积神经网络包括编码器网络和解码器网络。在本示例性实施方式中,目标分割图像为输入图像中目标人物的抠图蒙版。然而,本公开对此并不限定。比如,目标分割图像可以为从输入图像中抠取的目标人物的图像。
如图2所示,在本示例性实施方式中,输入图像为灰度图像。然而,本公开对此并不限定。在实际应用中,输入图像可以为彩色图像,比如RGB图像。
图3为图2所示的目标分割图像的应用效果的示例图。在一示例,利用本公开实施例提供的图像处理方法得到的目标分割图像(如图2所示的目标人物的抠图蒙版),可以将输入图像(如图3中第一行第一幅图)中的人体区域抠出,然后再合成到其它不包含人体的自然场景中,以实现背景替换;比如,图3中第一行第四幅图为对输入图像进行背景替换后得到的效果图示例。在另一示例中,可以利用本公开实施例提供的图像处理方法得到的目标分割图像(如图2所示的抠图蒙版),将输入图像(如图3中第一行第一幅图)中的人体区域抠出,然后再合成到其他包含人体和自然场景的图像中,以实现虚拟合影;比如,图3中除第一行第一幅图和第一行第四幅图以外的其余图像均为对输入图像进行虚拟合影后的效果示例图。
如图2所示,本公开实施例提供的卷积神经网络包括:编码器网络和解码器网络,其中,编码器网络配置为对输入图像进行下采样和特征提取,得到多个特征图;解码器网络配置为对多个特征图进行上采样和特征提取,得到目标分割图像。
在本实施例中,编码器网络和解码器网络分别包括多个处理层级,编码器网络中第L个处理层级处理得到的多个特征图和解码器网络中第J个处理层级处理得到的多个特征图融合后,输入到解码器网络中第J+1个处理层级,其中,编码器网络中第L个处理层级处理得到的多个特征图和解码器网络中第J个处理层级处理得到的多个特征图具有相同的分辨率;其中,L、J均为正整数。其中,在编码器网络中一次下采样和特征提取可以分别作为一个处理层级;在解码器网络中一次上采样和特征提取可以分别作为一个处理层级。在本实施例中,L的取值可以为一个或多个,J的取值可以为一个或多个。在一示例中,L和J的取值可以均为三个;比如,L为1,J为5,则编码器网 络中第一个处理层级处理得到的多个特征图与解码器网络中第五个处理层级处理得到的多个特征图具有相同的分辨率,且这两个处理层级处理得到的特征图融合后,输入到解码器网络中第六个处理层级;L为2,J为1,则编码器网络中第二个处理层级处理得到的多个特征图与解码器网络中第一个处理层级处理得到的多个特征图具有相同的分辨率,且这两个处理层级处理得到的特征图融合后,输入到解码器网络中第二个处理层级;L为3,J为3,则编码器网络中第三个处理层级处理得到的多个特征图与解码器网络中第三个处理层级处理得到的多个特征图具有相同的分辨率,且这两个处理层级处理得到的特征图融合后,输入到解码器网络中第四个处理层级。
在本实施例中,通过将编码器网络中第L个处理层级处理得到的多个特征图和解码器网络中第J个处理层级处理得到的多个特征图进行融合后,输入到解码器网络中第J+1个处理层级,可以在编码器网络和解码器网络之间实现跳跃连接,即,编码器网络和解码器网络中得到相同分辨率的特征图的处理层级相连,且这两个处理层级处理得到的多个特征图进行融合后输入到解码器网络中的下一个处理层级。通过编码器网络和解码器网络之间的跳跃连接可以增加解码器网络对图像细节的保留,从而提高抠图结果的准确性。
在一示例性实施方式中,编码器网络中第L个处理层级处理得到的多个特征图和解码器网络中第J个处理层级处理得到的多个特征图融合后,输入到解码器网络中第J+1个处理层级,可以包括:编码器网络中第L个处理层级处理得到的多个特征图和解码器网络中第J个处理层级处理得到的多个特征图在通道维度上拼接后,输入到所述解码器网络中第J+1个处理层级。比如,通过拼接(Concat)操作对编码器网络中第L个处理层级处理得到的多个特征图和解码器网络中第J个处理层级处理得到的多个特征图进行融合。然而,本公开对此并不限定。在其他实现方式中,可以通过相加(Add)操作或相乘等操作对编码器网络中第L个处理层级处理得到的多个特征图和解码器网络中第J个处理层级处理得到的多个特征图进行融合。通过对编码器网络和解码器网络处理得到的相同尺寸和分辨率的特征图融合后输入到解码器网络,可以将编码器网络在下采样过程中损失的图像细节和信息,传递至解码器网络,使得解码器网络在上采样恢复空间分辨率的过程中,可以利用 这些信息生成出更准确的目标分割图像,从而提升抠图效果。
在一示例性实施方式中,当编码器网络中的第L个处理层级和解码器网络中的第J个处理层级进行相对应的处理,且编码器网络中第L个处理层级处理得到的多个特征图和解码器网络中第J个处理层级处理得到的多个特征图具有相同的分辨率,则编码器网络中第L个处理层级处理得到的多个特征图和解码器网络中第J个处理层级处理得到的多个特征图融合后,可以输入到解码器网络中第J+1个处理层级。其中,编码器网络中的第L个处理层级和解码器网络中的第J个处理层级进行相对应的处理例如可以为:编码器网络中第L个处理层级进行下采样处理,且解码器网络中第J个处理层级进行上采样处理;或者,编码器网络中第L个处理层级进行多层次特征提取,且解码器网络中第J个处理层级也进行多层次特征提取。然而,本公开对此并不限定。本示例性实施方式中通过对编码器网络和解码器网络中相对应的处理层级得到的具有相同分辨率的特征图融合后输入到解码器网络,可以提升融合后特征图对图像细节的保留效果,提高解码器网络利用融合特征图得到的目标分割图像的准确性,从而提高抠图结果。
在本实施例中,编码器网络的多个处理层级中至少一个处理层级包括密集计算块,解码器网络的多个处理层级中至少一个处理层级包括密集计算块;编码器网络和解码器网络中的第M个密集计算块包括N个卷积模块,所述N个卷积模块中的第i个卷积模块的输入包括所述第i个卷积模块之前的i-1个卷积模块的输出;所述N个卷积模块中的至少一个卷积模块包括至少一组非对称卷积核;其中,i、N、M均为整数,M大于或等于1且小于或等于编码器网络和解码器网络中的密集计算块的总数量,N大于或等于3,i大于或等于3且小于或等于N。比如,N个卷积模块中全部的卷积模块均包括至少一组非对称卷积核,或者,N个卷积模块中仅部分的卷积模块包括至少一组非对称卷积核。然而,本公开对此并不限定。
在本实施例中,任一密集计算块可以包括N个卷积模块,且不同密集计算块包括的卷积模块的数目可以相同或不同。比如,第一个密集计算块可以包括五个卷积模块,第二个密集计算块可以包括八个卷积模块,第三个密集计算块可以包括五个卷积模块。
本实施例中,任一密集计算块配置为进行多层次的特征提取,且一个密集计算块对应一个处理层级。当编码器网络和解码器网络中包括多个密集计算块时,可以按照编码器网络的处理层级和解码器网络的处理层级的顺序,确定多个密集计算块的排序。比如,编码器网络中包括两个密集计算块(分别对应编码器网络中的第三处理层级和第五处理层级),解码器网络中包括一个密集计算块(对应解码器网络中的第三处理层级),则可以将编码器网络中对应第三处理层级的密集计算块标记为第一个密集计算块,对应第五处理层级的密集计算块标记为第二个密集计算块,将解码器网络中对应第三处理层级的密集计算块标记为第三个密集计算块。然而,本公开对此并不限定。
在本实施例中,密集计算块为带有非对称卷积的高效的密集计算块(EDA block,Effective Dense Asymmetric block)。其中,一个密集计算块包括多个卷积模块,且多个卷积模块中除第一个卷积模块之外,每个卷积模块的输入包括来自该卷积模块之前的所有卷积模块的输出,使得密集计算块中的多个卷积模块之间可以形成密集连接。本实施例采用密集计算块进行特征提取,可以大幅度减少参数,降低计算量,提升处理速度,而且具有较好的抗过拟合性能。而且,本实施例的密集计算块中的至少一个卷积模块包括一组或多组非对称卷积核,通过采用非对称卷积核进行特征提取,可以大幅减少计算量,从而提高处理速度。
在一示例性实施方式中,N个卷积模块中包括至少一组非对称卷积核的卷积模块可以包括:由所述至少一组非对称卷积核膨胀得到的非对称卷积核。比如,N个卷积模块中的某一卷积模块可以包括两组非对称卷积核,第二组非对称卷积核可以由前一组非对称卷积核膨胀得到,其中,第一组非对称卷积核可以为3×1卷积核和1×3卷积核。本示例性实施方式中,通过对非对称卷积核进行膨胀操作可以得到膨胀的非对称卷积核,采用膨胀的非对称卷积核不仅可以增加感受野(Receptive Field),而且可以减少图像处理过程中空间信息的损失,并保持形成密集连接的多个卷积模块可以生成分辨率一致的特征图。
在一示例性实施方式中,步骤102可以包括:对输入图像进行下采样,得到具有第一分辨率的多个第一下采样特征图;对所述多个第一下采样特征 图进行下采样,得到具有第二分辨率的多个第二下采样特征图;通过第一密集计算块对所述多个第二下采样特征图进行多层次的特征提取,得到具有第二分辨率的多个第一密集计算特征图;对所述多个第一密集计算特征图进行下采样,得到具有第三分辨率的多个第三下采样特征图;通过第二密集计算块对所述多个第三下采样特征图进行多层次的特征提取,得到具有第三分辨率的多个第二密集计算特征图;步骤102中通过编码器网络得到的多个特征图包括所述多个第二密集计算特征图。
在本示例性实施方式中,步骤103可以包括:对所述多个第二密集计算特征图进行上采样,得到具有第二分辨率的多个第一上采样特征图;对所述多个第一上采样特征图和所述多个第二下采样特征图在通道维度上进行拼接,得到第一融合特征图组;对所述第一融合特征图组进行特征提取,得到具有第二分辨率的多个第一中间特征图;通过第三密集计算块对所述多个第一中间特征图进行多层次的特征提取,得到具有第二分辨率的多个第三密集计算特征图;对所述多个第三密集计算特征图和所述多个第一密集计算特征图在通道维度上进行拼接,得到第二融合特征图组;对所述第二融合特征图组进行特征提取,得到具有第二分辨率的多个第二中间特征图;对所述多个第二中间特征图进行上采样,得到具有第一分辨率的多个第二上采样特征图;对所述多个第二上采样特征图和所述多个第一下采样特征图在通道维度上进行拼接,得到第三融合特征图组;对所述第三融合特征图组进行特征提取,得到具有第二分辨率的多个第三中间特征图;对所述多个第三中间特征图进行上采样,得到与输入图像具有相同分辨率的目标分割图像。
在本示例性实施方式中,第一密集计算块可以包括五个卷积模块,第二密集计算块可以包括八个卷积模块,第三密集计算块可以包括五个卷积模块;其中,第一密集计算块、第二密集计算块和第三密集计算块中的每个卷积模块包括1×1卷积核和两组非对称卷积核,第一组非对称卷积核为3×1卷积核和1×3卷积核,第二组非对称卷积核根据第一组非对称卷积核和对应的膨胀系数得到。
图4为本公开一实施例提供的编码器网络和解码器网络的示例图。在本示例性实施例中,卷积神经网络可以对彩色人物图像(比如,图2所示的输 入图像的彩色图像)进行抠图,得到黑白的抠图蒙版(比如,图2所示的目标分割图像)。
如图4所示,编码器网络201包括:第一下采样块301、第二下采样块203、第一密集计算块303、第三下采样块304以及第二密集计算块305。其中,第二下采样块302位于第一下采样块301和第一密集计算块303之间,第三下采样块304位于第一密集计算块303和第二密集计算块305之间。
本示例性实施例中,第一下采样块301配置为对输入图像进行下采样,得到具有第一分辨率的多个第一下采样特征图;第二下采样块302配置为对多个第一下采样特征图进行下采样,得到具有第二分辨率的多个第二下采样特征图;第一密集计算块303配置为对多个第二下采样特征图进行多层次的特征提取,得到具有第二分辨率的多个第一密集计算特征图;第三下采样块304配置为对多个第一密集计算特征图进行下采样,得到具有第三分辨率的多个第三下采样特征图;第二密集计算块305配置为从多个第三下采样特征图进行多层次的特征提取,得到具有第三分辨率的多个第二密集计算特征图。其中,第一分辨率大于第二分辨率,第二分辨率大于第三分辨率,第一分辨率小于输入图像的分辨率。
在本示例性实施例中,编码器网络201包括五个处理层级,分别对应三个下采样块和两个密集计算块;其中,通过三个下采样块(Downsampling Block)和两个密集计算块逐步从输入图像中提取得到多个特征图,并逐步缩小特征图的空间分辨率;其中,特征提取主要通过下采样块和密集计算块来实现,特征图的空间分辨率的缩小通过下采样块实现。利用多个下采样块逐步缩小特征图的空间维度可以扩大感受野,使得编码器网络可以更好地提取不同尺度的局部和全局特征,而且下采样块可以对提取的特征图进行压缩,从而节省计算量和内存的占用,并提高处理速度。
如图4所示,解码器网络202包括:第一上采样块306、第一卷积块307、第三密集计算块308、第二卷积块309、第二上采样块310、第三卷积块311以及第三上采样块312。其中,第一上采样块306位于第二密集计算块305和第一卷积块307之间,第三密集计算块308位于第一卷积块307和第二卷积块309之间,第二上采样块310位于第二卷积块309和第三卷积块311之 间,第三卷积块311位于第二上采样块310和第三上采样块311之间。
在本示例性实施例中,第一上采样块306配置为对编码器网络201输出的多个第二密集计算特征图进行上采样,得到具有第二分辨率的多个第一上采样特征图;第一卷积块307配置为对所述多个第一上采样特征图和多个第二下采样特征图在通道维度上拼接得到的第一融合特征图组进行特征提取,得到具有第二分辨率的多个第一中间特征图,所述多个第一上采样特征图和多个第二下采样特征图具有相同分辨率;第三密集计算块308配置为对多个第一中间特征图进行多层次的特征提取,得到具有第二分辨率的多个第三密集计算特征图;第二卷积块309配置为对所述多个第三密集计算特征图和多个第一密集计算特征图在通道维度上拼接得到的第二融合特征图组进行特征提取,得到具有第二分辨率的多个第二中间特征图,所述多个第三密集计算特征图和多个第一密集计算特征图具有相同分辨率;第二上采样块310配置为对多个第二中间特征图进行上采样,得到具有第一分辨率的多个第二上采样特征图;第三卷积块311配置为对所述多个第二上采样特征图和多个第一下采样特征图在通道维度上拼接得到的第三融合特征图组进行特征提取,得到多个第三中间特征图,所述多个第二上采样特征图和多个第一下采样特征图具有相同分辨率;第三上采样块312配置为对多个第三中间特征图进行上采样操作,得到与输入图像具有相同分辨率的目标分割图像。
在本示例性实施例中,解码器网络202包括七个处理层级,分别对应三个上采样块(Upsampling Block)、三个卷积块(Convolution Block)和一个密集计算块;其中,通过三个上采样块将编码器网络201提取的多个特征图的空间分辨率恢复到与输入图像一致,通过三个上采样块、三个卷积块和一个密集计算块进行特征提取,以将编码器网络201提取到的多个特征图逐步转化生成输入图像的目标分割图像。
在本示例性实施例中,以原始的输入图像的空间分辨率为1,则第一分辨率可以为1/2,第二分辨率可以为1/4,第三分辨率可以为1/8。假设原始的输入图像的尺寸为H×W,H和W代表输入图像的长度和宽度,则具有第一分辨率的特征图的尺寸为(H/2)×(W/2),具有第二分辨率的特征图的尺寸为(H/4)×(W/4),具有第三分辨率的特征图的尺寸为(H/8)×(W/8)。
在本示例性实施例中,如图4所示,在编码器网络201和解码器网络202之间建立有跳跃连接,且在编码器网络201和解码器网络202之间的跳跃连接方式可以为拼接(Concat)方式。如图4所示,第一下采样块301得到的多个第一下采样特征图的分辨率和第二上采样块310得到的多个第二上采样特征图的分辨率均为第一分辨率,则第一下采样块301得到的多个第一下采样特征图既输入到第二下采样块302,又与第二上采样块310得到的多个第二上采样特征图在通道维度进行拼接后输入到第三卷积块311;第二下采样块302得到的多个第二下采样特征图的分辨率和第一上采样块306得到的多个第一上采样特征图的分辨率均为第二分辨率,则第二下采样块302得到的多个第二下采样特征图既输入到第一密集计算块303,又与第一上采样块306得到的多个第一上采样特征图在通道维度进行拼接后输入到第一卷积块307;第一密集计算块303得到的多个第一密集计算特征图的分辨率和第三密集计算块308得到的多个第三密集计算特征图的分辨率均为第二分辨率,则第一密集计算块303得到的多个第一密集计算特征图既输入到第三下采样块304,又与第三密集计算块308得到的多个第三密集计算特征图在通道维度进行拼接后输入到第二卷积块309。
在本示例性实施例中,按照通道维度对分辨率和尺寸相同的特征图进行拼接,是对分辨率和尺寸相同的特征图的通道数的增加。比如,第一密集计算块303输出的特征图为Channel1×h×w,其中,h和w代表特征图的长和宽,Channel1代表第一密集计算块303的输出通道数,第三密集计算块308输出的特征图为Channel2×h×w,其中,Channel2代表第三密集计算块308的输出通道数,第一密集计算块303和第三密集计算块308输出的特征图的尺寸和分辨率相同,则对第一密集计算块303和第三密集计算块308输出的特征图在通道维度进行拼接后得到的第二融合特征图组为(Channel1+Channel2)×h×w。
本示例性实施例中,通过编码器网络201和解码器网络202之间的跳跃连接,可以将编码器网络201在多次下采样过程中损失的图像细节和信息,传递至解码器网络202,使得解码器网络202在上采样恢复空间分辨率的过程中,可以利用这些信息生成出更准确的目标分割图像,从而提升抠图效果。
在本示例性实施例中,第一下采样块301的输入通道数为3,输出通道数为15;第二下采样块302的输入通道数为15,输出通道数为60;第一密集计算块303的输入通道数为60,输出通道数为260;第三下采样304的输入通道数为260,输出通道数为130;第二密集计算块305的输入通道数为130,输出通道数为450。第一上采样块306的输入通道数为450,输出通道数为60;第一卷积块307的输入通道数为120,输出通道数为60;第三密集计算块308的输入通道数为60,输出通道数为260;第二卷积块309的输入通道数为520,输出通道数为260;第二上采样块310的输入通道数为260,输出通道数为15;第三卷积块311的输入通道数为30,输出通道数为15;第三上采样块312的输入通道数为15,输出通道数为1。
在本示例性实施例中,解码器网络中的任一卷积块可以包括卷积层和激活层,其中,激活层位于卷积层之后。其中,卷积层配置为执行卷积操作,可以包括一个或多个卷积核。本公开的卷积神经网络中的多个卷积块的结构和参数可以互不相同,或者至少部分相同。然而,本公开对此并不限定。
在本示例性实施例中,编码器网络中的任一下采样块可以包括卷积层、池化层和激活层。其中,卷积层配置为执行卷积操作,可以包括一个或多个卷积核。其中,池化层是下采样的一种形式;池化层可以配置为缩小输入图像的规模,简化计算的复杂度,在一定程度上减小过拟合的现象;池化层可以进行特征压缩,提取输入图像的主要特征。多个下采样块的结构和参数可以互不相同,或者至少部分相同。然而,本公开对此并不限定。
在本示例性实施例中,编码器网络中的任一下采样块配置为执行下采样操作,可以减少特征图的尺寸,进行特征压缩,提取主要特征,以简化计算的复杂度,并在一定程度上减小过拟合的现象。其中,下采样操作可以包括:最大值合并、平均值合并、随机合并、欠采样(decimation,例如选择固定的像素)、解复用输出(demuxout,将输入图像拆分为多个更小的图像)等。然而,本公开对此并不限定。
在本示例性实施例中,解码器网络中的任一上采样块可以包括上采样层和激活层,其中,上采样层中可以包括卷积层。其中,卷积层配置为执行卷积操作,可以包括一个或多个卷积核。多个上采样块的结构和参数可以互不 相同,或者至少部分相同。然而,本公开对此并不限定。
在本示例性实施例中,解码器网络中的任一上采样层配置为执行上采样操作;其中,上采样操作可以包括:最大值合并、跨度转置卷积(strides transposed convolutions)、插值(例如,内插值、两次立方插值等)等。然而,本公开对此并不限定。
在本示例性实施例中,解码器网络220中的上采样块的个数与编码器网络210中的下采样块的个数相同,使得目标分割图像与输入图像具有相同的分辨率,并可以保证跳跃连接的两个处理层级得到的特征图具有相同的分辨率。
在本示例性实施例中,第一密集计算块303包括五个卷积模块,且第一密集计算块303中除了第一个卷积模块之外的任一卷积模块的输入包括该卷积模块之前的所有卷积模块的输出。第二密集计算块305包括八个卷积模块,且第二密集计算块305中除了第一个卷积模块之外的任一卷积模块的输入包括该卷积模块之前的所有卷积模块的输出。第三密集计算块308包括五个卷积模块,且第三密集计算块308中除了第一个卷积模块之外的任一卷积模块的输入包括该卷积模块之前的所有卷积模块的输出。本实施例中,第一密集计算块303、第二密集计算块305和第三密集计算块308内的卷积模块通过串联实现密集连接。
下面以第一密集计算块303为例说明图4中的密集计算块的结构。图5为本公开一实施例提供的第一密集计算块303的示例图。如图5所示,第一密集计算块303包括串联的第一卷积模块315、第二卷积模块316、第三卷积模块317、第四卷积模块318以及第五卷积模块319。如图5所示,以第一卷积模块315和第二卷积模块316为例,第一卷积模块315配置为接收并处理C 1个特征图得到K 1个特征图,并将C 1个特征图和K 1个特征图在通道维度上进行拼接,以得到C 1+K 1个特征图;第二卷积模块316配置为接收并处理C 2个特征图得到K 2个特征图,并将C 2个特征图和K 2个特征图在通道维度上进行拼接,以得到C 2+K 2个特征图,其中,C 2个特征图为第一卷积模块315得到的C 1+K 1个特征图。本实施例中,第一卷积模块315的输入通道数为C 1,输出通道数为C 1+K 1;第二卷积模块316的输入通道数为C 1+K 1,输出通道 数为C 1+K 1+K 2。依此类推可以得到,第三卷积模块317的输入通道数为C 1+K 1+K 2,输出通道数为C 1+K 1+K 2+K 3;第四卷积模块318的输入通道数为C 1+K 1+K 2+K 3,输出通道数为C 1+K 1+K 2+K 3+K 4;第五卷积模块319的输入通道数为C 1+K 1+K 2+K 3+K 4,输出通道数为C 1+K 1+K 2+K 3+K 4+K 5。换言之,第三卷积模块317的输入包括之前的第一卷积模块315和第二卷积模块316的输出,第四卷积模块318的输入包括之前的第一卷积模块315、第二卷积模块316以及第三卷积模块317的输出,第五卷积模块319的输入包括之前的第一卷积模块315、第二卷积模块316、第三卷积模块317以及第四卷积模块318的输出。
在本示例中,任一卷积模块的增长率系数相同,卷积模块的增长率系数为该卷积模块的输出通道数相比于输入通道数所增长的通道数。如图5所示,K 1、K 2、K 3、K 4和K 5均相同。
在本示例中,图4中的每个密集计算块的结构示意了其中每个卷积模块处理输入的特征图得到的多个特征图(第i个卷积模块得到K i个特征图)的传输方式。图5所示为密集计算块中多个卷积模块的整体串联方式,与图4中体现第i个卷积模块的K i个特征图的传输方式是等效的。
下面以第一卷积模块315为例说明密集计算块中的任一卷积模块的结构。图6为本公开一实施例提供的第一卷积模块315的示例图。如图6所示,在本示例中,第一卷积模块包括依次级联的卷积层401、激活层402、第一非对称卷积网络41、第二非对称卷积网络42以及随机失活(Dropout)层409。其中,第一非对称卷积网络41包括依次级联的卷积层403、卷积层404以及激活层405;第二非对称卷积网络42包括依次级联的卷积层406、卷积层407以及激活层408。在本示例中,第一卷积模块315包括两组非对称卷积核。然而,本公开对此并不限定。
如图6所示,第一卷积模块的输入通道数为C 1,输出通道数为C 1+K 1。其中,卷积层401的输入通道数为C 1,输出通道数为K 1,卷积层403和404的输入通道数和输出通道数均为K 1,卷积层406和407的输入通道数和输出通道数均为K 1
如图6所示,第一卷积模块的输入和Dropout层409的输出结果通过 Concat方式相连,生成第一卷积模块的输出结果;换言之,第一卷积模块的输出结果为第一卷积模块的输入特征图和经过卷积网络生成的特征图在通道维度的拼接结果。如此一来,可以实现多个卷积模块串联后形成密集连接,即密集计算块中的任一卷积模块接收并处理该卷积模块之前的所有卷积模块的输出结果。
在本示例性实施例中,卷积层401的卷积核为1×1。通过卷积核为1×1的卷积层401可以在卷积模块进行特征提取操作时进行降维,减少特征图的数量,降低计算量,并增加卷积神经网络的非线性程度。
在本示例性实施例中,在第一非对称卷积网络41中,卷积层403的卷积核为3×1,卷积层404的卷积核为1×3;在第二非对称卷积网络42中,卷积层406的卷积核由3×1经膨胀操作后得到,卷积层407的卷积核由1×3经膨胀操作后得到,其中,膨胀操作的膨胀系数可以为d。在同一密集计算块中的不同卷积模块,可以采用相同或不同的膨胀系数,或者,部分卷积模块采用相同的膨胀系数。然而,本公开对此并不限定。在本示例中,通过对非对称卷积核进行膨胀操作不仅可以增加感受野,而且可以减少空间信息的损失,并保持密集连接的卷积模块输出的特征图的分辨率一致。在本示例性实施例中,通过采用两组非对称卷积核进行特征提取,可以大幅减少计算量,从而提高处理速度。
在本示例性实施例中,Dropout层409可以有效预防过拟合,且在非训练阶段Dropout层409可以自动关闭。然而,本公开对此并不限定。
在本示例性实施例中,图5中的第二卷积模块316、第三卷积模块317、第四卷积模块318和第五卷积模块319的结构和参数可以与第一卷积模块315的结构和参数相同或部分相同。然而,本公开对此并不限定。
示例性地,一个密集计算块中的多个卷积模块可以选择不同的增长率系数与膨胀系数。然而,本公开对此并不限定。
在图4所示的示例性实施例中,三个密集计算块(第一密集计算块303、第二密集计算块305以及第三密集计算块308)包括的所有卷积模块的增长率系数可以均为40;第一密集计算块303包括的五个卷积模块的膨胀系数可以分别为(1,1,1,2,2);第三密集计算块303包括的五个卷积模块的膨 胀系数可以分别为(1,1,1,2,2);第二密集计算块包括的八个卷积模块的膨胀系数可以分别为(2,2,4,4,8,8,16,16)。其中,第一密集计算块303和第三密集计算块308的结构和参数可以完全相同。
在本示例性实施例中,激活层可以包括激活函数,激活函数用于给卷积神经网络引入非线性因素,以使卷积神经网络可以更好地解决较为复杂的问题。激活函数可以包括线性修正单元(ReLU)函数、S型函数(Sigmoid函数)或者双曲正切函数(tanh函数)等。ReLU函数为非饱和非线性函数,Sigmoid函数和tanh函数为饱和非线性函数。激活层可以单独作为卷积神经网络的一层,或者,激活层可以包含在卷积层中。在一示例中,激活层可以包括正则化(Normalization)层与激活函数。
例如,在图6所示的第一卷积模块中,激活层402配置为对卷积层401的输出执行激活操作,激活层405配置为对卷积层404的输出执行激活操作,激活层408配置为对卷积层407的输出执行激活操作。激活层402、405和408可以均包括正则化层和激活函数。其中,不同激活层中的激活函数可以相同或不同,不同激活层中的正则化层可以相同或不同。然而,本公开对此并不限定。
本示例性实施例提供的图像处理方法可以通过结合带有非对称卷积核的密集计算块、带有跳跃连接的编码器解码器网络的一个卷积神经网络,对输入人像图像进行自动抠取,且可以实时得到抠图结果,提升了处理速度和抠图结果的准确性。
在一示例性实施方式中,本公开实施例提供的图像处理方法还包括:训练卷积神经网络,卷积神经网络包括编码器网络和解码器网络。在利用卷积神经网络进行抠图之前,要对卷积神经网络进行训练,经过训练之后,卷积神经网络的参数在图像处理期间保持不变。在训练过程中,卷积神经网络的参数会根据训练结果进行调整,以获得优化后的卷积神经网络。在本示例中,卷积神经网络的参数可以包括:卷积核和偏置。其中,卷积核决定了对处理的图像进行怎样的处理,偏置则决定了该卷积核的输出是否输入到下一层。
图7为本公开一实施例提供的卷积神经网络的训练流程示例图。图8为本公开一实施例提供的卷积神经网络的训练示意图。
如图7所示,训练卷积神经网络可以包括以下步骤501至步骤504。
步骤501、获取训练图像。比如,训练图像可以从aisegment matting human与Portrait Matting这两个抠图数据集中选择;或者,可以利用COCO(Common Objects in Context)数据集中不包含人像的图片,对上述两个抠图数据集进行背景替换,以实现数据扩充。然而,本公开对此并不限定。
步骤502、利用卷积神经网络对训练图像进行处理,以生成训练分割图像。此过程与利用卷积神经网络对输入图像进行处理生成目标分割图像的过程相同,在此不再赘述。
步骤503、根据训练分割图像和训练图像对应的标准分割图像,利用损失函数计算卷积神经网络的损失值。
步骤504、根据损失值优化卷积神经网络的参数。
其中,损失函数(loss function)是用于衡量预测值(训练分割图像)和目标值(标准分割图像)的差异的重要方程。比如,损失函数的输出值(loss)越高表示差异越大。
在本示例中,通过判断卷积神经网络是否收敛来确定是否结束训练,其中,判断卷积神经网络是否收敛可以通过以下至少之一方式:判断更新卷积神经网络的参数的次数是否达到迭代阈值;判断卷积神经网络的损失值是否低于损失阈值。其中,迭代阈值可以是预先设置的迭代次数,比如,更新卷积神经网络的参数的次数大于跌打阈值,则结束训练。其中,损失阈值可以是预先设置的,比如,若损失函数计算得到的损失值小于损失阈值,则结束训练。
如图8所示,训练模块60可以包括损失计算单元601和优化器602;损失计算单元601配置为利用损失函数计算卷积神经网络20的损失值,优化器602配置为根据损失值优化卷积神经网络20的参数。其中,卷积神经网络20对训练图像进行抠图,可以生成训练分割图像。损失计算单元601从训练数据集获取训练图像对应的标准分割图像,根据训练分割图像和标准分割图像,利用损失函数计算损失值;优化器602根据损失计算单元601计算得到的损失值调整卷积神经网络20的参数。在一示例中,优化器可以使用随机梯度下 降法,且优化器的学习率调整策略使用带重启的余弦退火方法。然而,本公开对此并不限定。
在一示例性实施方式中,损失函数可以由边缘损失函数、抠图蒙版损失函数和前景损失函数加权相加得到。即损失函数可以表示为:
L=w 1L edge+w 2L alpha+w 3L foreground
其中,L edge为边缘损失函数、L alpha为抠图蒙版损失函数、L foreground为前景损失函数,w 1、w 2、w 3为权值。其中,w 1、w 2、w 3可以根据实际情况或依据经验值进行确定,本公开对此并不限定。
在一示例性实施方式中,边缘损失函数可以表示为:L edge=|G(A out)-G(A gt)|;
其中,
Figure PCTCN2020140781-appb-000001
G x(A out)=K x×A out,G y(A out)=K y×A out
Figure PCTCN2020140781-appb-000002
G x(A gt)=K x×A gt,G y(A gt)=K y×A gt
K x和K y为边缘检测算子,A out为训练分割图像,A gt为训练图像对应的标准分割图像。
其中,边缘检测算子可以采用Sobel、Prewitt、Scharr等算子,然而,本公开对此并不限定。
示例性地,边缘检测算子可以选用Scharr算子,即:
Figure PCTCN2020140781-appb-000003
在本示例性实施方式中,由于抠图结果中边缘的精细准确与抠图的效果密切相关,因此,利用边缘检测算子设计边缘损失函数,对抠图结果中所需抠取主体的边缘加以约束,以得到更好的抠图效果。
在一示例性实施方式中,抠图蒙版损失函数可以表示为:L alpha=|A out-A gt|;
前景损失函数可以表示为:
Figure PCTCN2020140781-appb-000004
其中,A out为训练分割图像,A gt为训练图像对应的标准分割图像,I为训练图像;
Figure PCTCN2020140781-appb-000005
为训练分割图像A out的第i个像素,
Figure PCTCN2020140781-appb-000006
为标准分割图像A gt的第i个像素,I ij为训练图像I的第i个像素的第j个通道。
本公开实施例提供的图像处理方法,可以利用结合带有非对称卷积核的密集计算块、带有跳跃连接的编码器网络和解码器网络的一个卷积神经网络实现对输入图像进行实时自动抠图,并提高抠图效果,即不仅可以输出一幅高质量图像且大幅提高处理速度。而且,本公开示例性实施例在卷积神经网络的训练过程中采用边缘损失函数、抠图蒙版损失函数和前景损失函数,可以提高抠图效果。
图9为本公开一实施例提供的一种图像处理装置的示意图。如图9所示,本实施例提供的图像处理装置70包括:图像获取模块701和图像处理模块702。这些组件通过总线系统或其它形式的连接机构(未示出)互连。图9所示的图像处理装置的组件和结构只是示例性的,而非限定性的,根据需要,图像处理装置也可以具有其他组件和结构。
在本示例性实施例中,图像获取模块701,配置为获取输入图像。图像获取模块701可以包括存储器,其中存储有输入图像;或者,图像获取模块701可以包括一个或多个摄像头,以获取输入图像。例如,图像获取模块701可以为硬件、软件、固件以及它们的任意可行的组合。
在本示例性实施例中,图像处理模块702,配置为通过编码器网络对所述输入图像进行下采样和特征提取,得到多个特征图;通过解码器网络对所述多个特征图进行上采样和特征提取,得到目标分割图像。
本示例性实施例中,编码器网络和解码器网络分别包括多个处理层级,编码器网络中第L个处理层级处理得到的多个特征图和解码器网络中第J个处理层级处理得到的多个特征图融合后,输入到解码器网络中第J+1个处理层级,其中,所述编码器网络中第L个处理层级处理得到的多个特征图和解码器网络中第J个处理层级处理得到的多个特征图具有相同的分辨率;L、J均为正整数。
本示例性实施例中,编码器网络的多个处理层级中至少一个处理层级包括密集计算块,解码器网络的多个处理层级中至少一个处理层级包括密集计算块;编码器网络和解码器网络中的第M个密集计算块包括N个卷积模块,所述N个卷积模块中的第i个卷积模块的输入包括所述第i个卷积模块之前的i-1个卷积模块的输出;所述N个卷积模块中的至少一个卷积模块包括至 少一组非对称卷积核;i、N、M均为整数,M大于或等于1且小于或等于编码器网络和解码器网络中的密集计算块的总数量,N大于或等于3,i大于或等于3且小于或等于N。
本实施例提供的图像处理装置70中的图像处理模块702包括卷积神经网络,该卷积神经网络与上述图像处理方法的实施例中的卷积神经网络的结构和功能相同,故在此不再赘述。
在一示例性实施方式中,图像处理装置还可以包括训练模块,配置为训练卷积神经网络。其中,训练模块可以包括:损失计算单元和优化器。利用训练模块对卷积神经网络进行训练的过程可以参照上述图像处理方法的实施例中的相关描述,故于此不再赘述。
图10为本公开一实施例提供的图像处理设备的示意图。如图10所示,图像处理设备80包括:处理器801和存储器802;存储器802配置为存储程序指令;处理器801执行所述程序指令时实现上述任一实施例中图像处理方法的步骤。图10所示的图像处理设备80的组件只是示例性的,而非限制性的,根据实际应用需要,图像处理设备80还可以具有其他组件。例如,处理器801和存储器802之间可以直接或间接地互相通信。
例如,处理器801和存储器802等组件之间可以通过网络连接进行通信。网络可以包括无线网络、有线网络、或者、有线网络和无线网络的任意组合。网络可以包括局域网、互联网、电信网、基于互联网的物联网、基于电信网的物联网、以上网络的任意组合。有线网络例如可以采用双绞线、同轴电缆或光纤传输等方式进行通信,无线网络例如可以采用3G、4G、5G移动通信网络、蓝牙或WIFI等通信方式。本公开对网络的类型和功能在此不作限定。
例如,处理器801可以控制图像处理设备中的其它组件以执行期望的功能。处理器801可以是中央处理单元(CPU,Central Processing Unit)、张量处理器(TPU,Tensor Processing Unit)或者图像处理器(GPU,Graphics Processing Unit)等具有数据处理能力或程序执行能力的器件。GPU可以单独地直接集成到主板上,或内置在主板的北桥芯片中;或者,GPU可以内置在CPU上。
例如,存储器802可以包括一个或多个计算机程序产品的任意组合,计 算机程序产品可以包括多种形式的计算机可读存储介质,例如,易失性存储器、非易失性存储器。易失性存储器例如可以包括随机存取存储器(RAM,Random Access Memory)、高速缓冲存储器(Cache)等。非易失性存储器例如可以包括只读存储器(ROM,Read Only Memory)、硬盘、可擦除可编程只读存储器(EPROM,Erasable Programmable Read Only Memory)、光盘只读存储器(CD-ROM)、通用串行总线(USB,Universal Serial Bus)存储器、闪存等。在计算机可读存储介质中还可以存储一种或多种应用程序和一种或多种数据,例如,输入图像,以及应用程序使用或产生的一种或多种数据等。
例如,在存储器802上可以存储一个或多个计算机可读代码或程序指令,处理器可以运行程序指令,以执行上述图像处理方法。关于图像处理方法可以参考上述图像处理方法的实施例中的相关描述,故于此不再赘述。
本公开至少一实施例还提供一种计算机可读存储介质,存储有程序指令,当该程序指令被执行时可实现上述图像处理方法。
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统、装置中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。在硬件实施方式中,在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可以由若干物理组件合作执行。某些组件或所有组件可以被实施为由处理器,如数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通 信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。
本领域的普通技术人员应当理解,可以对本公开实施例的技术方案进行修改或者等同替换,而不脱离本公开技术方案的精神和范围,均应涵盖在本公开的权利要求范围当中。

Claims (14)

  1. 一种图像处理方法,包括:
    获取输入图像;
    通过编码器网络对所述输入图像进行下采样和特征提取,得到多个特征图;
    通过解码器网络对所述多个特征图进行上采样和特征提取,得到目标分割图像;
    其中,所述编码器网络和解码器网络分别包括多个处理层级,所述编码器网络中第L个处理层级处理得到的多个特征图和解码器网络中第J个处理层级处理得到的多个特征图融合后,输入到所述解码器网络中第J+1个处理层级,其中,所述编码器网络中第L个处理层级处理得到的多个特征图和解码器网络中第J个处理层级处理得到的多个特征图具有相同的分辨率;L、J均为正整数;
    其中,所述编码器网络的多个处理层级中至少一个处理层级包括密集计算块,所述解码器网络的多个处理层级中至少一个处理层级包括密集计算块;所述编码器网络和解码器网络中的第M个密集计算块包括N个卷积模块,所述N个卷积模块中的第i个卷积模块的输入包括所述第i个卷积模块之前的i-1个卷积模块的输出;所述N个卷积模块中的至少一个卷积模块包括至少一组非对称卷积核;i、N、M均为整数,M大于或等于1且小于或等于所述编码器网络和解码器网络中的密集计算块的总数量,N大于或等于3,i大于或等于3且小于或等于N。
  2. 根据权利要求1所述的图像处理方法,所述N个卷积模块中的所述至少一个卷积模块还包括:由所述至少一组非对称卷积核膨胀得到的非对称卷积核。
  3. 根据权利要求1所述的图像处理方法,所述编码器网络中第L个处理层级处理得到的多个特征图和解码器网络中第J个处理层级处理得到的多个特征图融合后,输入到所述解码器网络中第J+1个处理层级,包括:
    所述编码器网络中第L个处理层级处理得到的多个特征图和解码器网络 中第J个处理层级处理得到的多个特征图在通道维度上拼接后,输入到所述解码器网络中第J+1个处理层级。
  4. 根据权利要求1所述的图像处理方法,其中,所述通过编码器网络对所述输入图像进行下采样和特征提取,得到多个特征图,包括:
    对所述输入图像进行下采样,得到具有第一分辨率的多个第一下采样特征图;
    对所述多个第一下采样特征图进行下采样,得到具有第二分辨率的多个第二下采样特征图;
    通过第一密集计算块对所述多个第二下采样特征图进行多层次的特征提取,得到具有第二分辨率的多个第一密集计算特征图;
    对所述多个第一密集计算特征图进行下采样,得到具有第三分辨率的多个第三下采样特征图;
    通过第二密集计算块对所述多个第三下采样特征图进行多层次的特征提取,得到具有第三分辨率的多个第二密集计算特征图;
    所述多个特征图包括所述多个第二密集计算特征图。
  5. 根据权利要求4所述的图像处理方法,其中,所述通过解码器网络对所述多个特征图进行上采样和特征提取,得到目标分割图像,包括:
    对所述多个第二密集计算特征图进行上采样,得到具有第二分辨率的多个第一上采样特征图;
    对所述多个第一上采样特征图和所述多个第二下采样特征图在通道维度上进行拼接,得到第一融合特征图组;
    对所述第一融合特征图组进行特征提取,得到具有第二分辨率的多个第一中间特征图;
    通过第三密集计算块对所述多个第一中间特征图进行多层次的特征提取,得到具有第二分辨率的多个第三密集计算特征图;
    对所述多个第三密集计算特征图和所述多个第一密集计算特征图在通道维度上进行拼接,得到第二融合特征图组;
    对所述第二融合特征图组进行特征提取,得到具有第二分辨率的多个第二中间特征图;
    对所述多个第二中间特征图进行上采样,得到具有第一分辨率的多个第二上采样特征图;
    对所述多个第二上采样特征图和所述多个第一下采样特征图在通道维度上进行拼接,得到第三融合特征图组;
    对所述第三融合特征图组进行特征提取,得到具有第二分辨率的多个第三中间特征图;
    对所述多个第三中间特征图进行上采样,得到与所述输入图像具有相同分辨率的目标分割图像。
  6. 根据权利要求5所述的图像处理方法,其中,所述第一密集计算块包括五个卷积模块,第二密集计算块包括八个卷积模块,所述第三密集计算块包括五个卷积模块;其中,所述第一密集计算块、第二密集计算块和第三密集计算块中的每个卷积模块包括1×1卷积核和两组非对称卷积核,第一组非对称卷积核为3×1卷积核和1×3卷积核,第二组非对称卷积核根据第一组非对称卷积核和对应的膨胀系数得到。
  7. 根据权利要求1所述的图像处理方法,其中,所述图像处理方法还包括:
    获取训练图像;
    使用卷积神经网络对训练图像进行处理,得到所述训练图像的训练分割图像;其中,所述卷积神经网络包括所述编码器网络和所述解码器网络;
    根据所述训练图像的训练分割图像和所述训练图像对应的标准分割图像,利用损失函数计算所述卷积神经网络的损失值;
    根据所述损失值优化所述卷积神经网络的参数。
  8. 根据权利要求7所述的图像处理方法,其中,所述损失函数表示为:L=w 1L edge+w 2L alpha+w 3L foreground
    其中,L edge为边缘损失函数、L alpha为抠图蒙版损失函数、L foreground为前景损失函数,w 1、w 2、w 3为权值。
  9. 根据权利要求8所述的图像处理方法,其中,所述边缘损失函数表示为:L edge=|G(A out)-G(A gt)|;
    其中,
    Figure PCTCN2020140781-appb-100001
    G x(A out)=K x×A out,G y(A out)=K y×A out
    Figure PCTCN2020140781-appb-100002
    G x(A gt)=K x×A gt,G y(A gt)=K y×A gt
    K x和K y为边缘检测算子,A out为训练分割图像,A gt为训练图像对应的标准分割图像。
  10. 根据权利要求8所述的图像处理方法,其中,所述抠图蒙版损失函数表示为:L alpha=|A out-A gt|;
    所述前景损失函数表示为:
    Figure PCTCN2020140781-appb-100003
    其中,A out为训练分割图像,A gt为训练图像对应的标准分割图像,I为训练图像;
    Figure PCTCN2020140781-appb-100004
    为训练分割图像A out的第i个像素,
    Figure PCTCN2020140781-appb-100005
    为标准分割图像A gt的第i个像素,I ij为训练图像I的第i个像素的第j个通道。
  11. 根据权利要求1至10中任一项所述的图像处理方法,其中,所述输入图像为人物图像,所述目标分割图像为所述人物图像中目标人物的抠图蒙版。
  12. 一种图像处理装置,其中,包括:
    图像获取模块,配置为获取输入图像;
    图像处理模块,配置为通过编码器网络对所述输入图像进行下采样和特征提取,得到多个特征图;通过解码器网络对所述多个特征图进行上采样和特征提取,得到目标分割图像;
    其中,所述编码器网络和解码器网络分别包括多个处理层级,所述编码器网络中第L个处理层级处理得到的多个特征图和解码器网络中第J个处理层级处理得到的多个特征图融合后,输入到所述解码器网络中第J+1个处理层级,其中,所述编码器网络中第L个处理层级处理得到的多个特征图和解码器网络中第J个处理层级处理得到的多个特征图具有相同的分辨率;L、J均为正整数;
    其中,所述编码器网络的多个处理层级中至少一个处理层级包括密集计 算块,所述解码器网络的多个处理层级中至少一个处理层级包括密集计算块;所述编码器网络和解码器网络中的第M个密集计算块包括N个卷积模块,所述N个卷积模块中的第i个卷积模块包括所述第i个卷积模块之前的i-1个卷积模块的输出;所述N个卷积模块中的至少一个卷积模块包括至少一组非对称卷积核;i、N、M均为整数,M大于或等于1且小于或等于所述编码器网络和解码器网络中的密集计算块的总数量,N大于或等于3,i大于或等于3且小于或等于N。
  13. 一种图像处理设备,其中,包括:存储器和处理器,所述存储器配置为存储程序指令,所述处理器执行所述程序指令时实现如权利要求1至11中任一项所述的图像处理方法的步骤。
  14. 一种计算机可读存储介质,其中,存储有程序指令,当所述程序指令被执行时实现如权利要求1至11中任一项所述的图像处理方法。
PCT/CN2020/140781 2020-02-21 2020-12-29 图像处理方法、图像处理装置及设备 WO2021164429A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/419,726 US20220319155A1 (en) 2020-02-21 2020-12-29 Image Processing Method, Image Processing Apparatus, and Device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010110386.7 2020-02-21
CN202010110386.7A CN111311629B (zh) 2020-02-21 2020-02-21 图像处理方法、图像处理装置及设备

Publications (1)

Publication Number Publication Date
WO2021164429A1 true WO2021164429A1 (zh) 2021-08-26

Family

ID=71160080

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/140781 WO2021164429A1 (zh) 2020-02-21 2020-12-29 图像处理方法、图像处理装置及设备

Country Status (3)

Country Link
US (1) US20220319155A1 (zh)
CN (1) CN111311629B (zh)
WO (1) WO2021164429A1 (zh)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113705548A (zh) * 2021-10-29 2021-11-26 北京世纪好未来教育科技有限公司 题目类型识别方法和装置
CN113887470A (zh) * 2021-10-15 2022-01-04 浙江大学 基于多任务注意力机制的高分辨率遥感图像地物提取方法
CN114022496A (zh) * 2021-09-26 2022-02-08 天翼爱音乐文化科技有限公司 图像处理方法、系统、装置及存储介质
CN114596580A (zh) * 2022-02-14 2022-06-07 南方科技大学 一种多人体目标识别方法、系统、设备及介质
CN114612683A (zh) * 2022-03-21 2022-06-10 安徽理工大学 基于密集多尺度推理网络的显著性目标检测算法
CN114677392A (zh) * 2022-05-27 2022-06-28 珠海视熙科技有限公司 抠图方法、摄像设备、装置、会议系统、电子设备及介质
CN114723760A (zh) * 2022-05-19 2022-07-08 北京世纪好未来教育科技有限公司 人像分割模型的训练方法、装置及人像分割方法、装置
CN114782708A (zh) * 2022-05-12 2022-07-22 北京百度网讯科技有限公司 图像生成方法、图像生成模型的训练方法、装置和设备
CN114897700A (zh) * 2022-05-25 2022-08-12 马上消费金融股份有限公司 图像优化方法、校正模型的训练方法及装置
CN116228763A (zh) * 2023-05-08 2023-06-06 成都睿瞳科技有限责任公司 用于眼镜打印的图像处理方法及系统
CN116342675A (zh) * 2023-05-29 2023-06-27 南昌航空大学 一种实时单目深度估计方法、系统、电子设备及存储介质
WO2023159746A1 (zh) * 2022-02-23 2023-08-31 平安科技(深圳)有限公司 基于图像分割的图像抠图方法、装置、计算机设备及介质
CN117474925A (zh) * 2023-12-28 2024-01-30 山东润通齿轮集团有限公司 一种基于机器视觉的齿轮点蚀检测方法及系统

Families Citing this family (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111311629B (zh) * 2020-02-21 2023-12-01 京东方科技集团股份有限公司 图像处理方法、图像处理装置及设备
JP7446903B2 (ja) * 2020-04-23 2024-03-11 株式会社日立製作所 画像処理装置、画像処理方法及び画像処理システム
CN111914997B (zh) * 2020-06-30 2024-04-02 华为技术有限公司 训练神经网络的方法、图像处理方法及装置
CN112053363B (zh) * 2020-08-19 2023-12-15 苏州超云生命智能产业研究院有限公司 视网膜血管分割方法、装置及模型构建方法
CN111968122B (zh) * 2020-08-27 2023-07-28 广东工业大学 一种基于卷积神经网络的纺织材料ct图像分割方法和装置
CN112056993B (zh) * 2020-09-07 2022-05-17 上海高仙自动化科技发展有限公司 一种清洁方法、装置、电子设备及计算机可读存储介质
CN112215243A (zh) * 2020-10-30 2021-01-12 百度(中国)有限公司 图像特征提取方法、装置、设备及存储介质
CN112560864B (zh) * 2020-12-22 2024-06-18 苏州超云生命智能产业研究院有限公司 图像语义分割方法、装置及图像语义分割模型的训练方法
CN112866694B (zh) * 2020-12-31 2023-07-14 杭州电子科技大学 联合非对称卷积块和条件上下文的智能图像压缩优化方法
CN112767502B (zh) * 2021-01-08 2023-04-07 广东中科天机医疗装备有限公司 基于医学影像模型的影像处理方法及装置
CN112784897B (zh) * 2021-01-20 2024-03-26 北京百度网讯科技有限公司 图像处理方法、装置、设备和存储介质
CN112949651A (zh) * 2021-01-29 2021-06-11 Oppo广东移动通信有限公司 特征提取方法、装置、存储介质及电子设备
CN112561802B (zh) * 2021-02-20 2021-05-25 杭州太美星程医药科技有限公司 连续序列图像的插值方法、插值模型训练方法及其系统
WO2022205685A1 (zh) * 2021-03-29 2022-10-06 泉州装备制造研究所 一种基于轻量化网络的交通标志识别方法
CN115221102B (zh) * 2021-04-16 2024-01-19 中科寒武纪科技股份有限公司 用于优化片上系统的卷积运算操作的方法和相关产品
CN113034648A (zh) * 2021-04-30 2021-06-25 北京字节跳动网络技术有限公司 图像处理方法、装置、设备和存储介质
CN113723480B (zh) * 2021-08-18 2024-03-05 北京达佳互联信息技术有限公司 一种图像处理方法、装置、电子设备和存储介质
CN113920358A (zh) * 2021-09-18 2022-01-11 广州番禺职业技术学院 一种小麦病害的自动分类方法、装置及系统
CN114040140B (zh) * 2021-11-15 2024-04-12 北京医百科技有限公司 一种视频抠图方法、装置、系统及存储介质
CN116188479B (zh) * 2023-02-21 2024-04-02 北京长木谷医疗科技股份有限公司 基于深度学习的髋关节图像分割方法及系统
CN115953394B (zh) * 2023-03-10 2023-06-23 中国石油大学(华东) 基于目标分割的海洋中尺度涡检测方法及系统
CN116363364B (zh) * 2023-03-27 2023-09-26 南通大学 一种基于改进DSD-LinkNet的电力安全带分割方法
CN116206114B (zh) * 2023-04-28 2023-08-01 成都云栈科技有限公司 一种复杂背景下人像提取方法及装置
CN116912257B (zh) * 2023-09-14 2023-12-29 东莞理工学院 基于深度学习的混凝土路面裂缝识别方法及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109448006A (zh) * 2018-11-01 2019-03-08 江西理工大学 一种注意力机制u型密集连接视网膜血管分割方法
CN109727253A (zh) * 2018-11-14 2019-05-07 西安大数据与人工智能研究院 基于深度卷积神经网络自动分割肺结节的辅助检测方法
US20190221011A1 (en) * 2018-01-12 2019-07-18 Korea Advanced Institute Of Science And Technology Method for processing x-ray computed tomography image using neural network and apparatus therefor
CN110197516A (zh) * 2019-05-29 2019-09-03 浙江明峰智能医疗科技有限公司 一种基于深度学习的tof-pet散射校正方法
CN110689544A (zh) * 2019-09-06 2020-01-14 哈尔滨工程大学 一种遥感图像细弱目标分割方法
CN111311629A (zh) * 2020-02-21 2020-06-19 京东方科技集团股份有限公司 图像处理方法、图像处理装置及设备

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110188760B (zh) * 2019-04-01 2021-10-22 上海卫莎网络科技有限公司 一种图像处理模型训练方法、图像处理方法及电子设备
CN111369581B (zh) * 2020-02-18 2023-08-08 Oppo广东移动通信有限公司 图像处理方法、装置、设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190221011A1 (en) * 2018-01-12 2019-07-18 Korea Advanced Institute Of Science And Technology Method for processing x-ray computed tomography image using neural network and apparatus therefor
CN109448006A (zh) * 2018-11-01 2019-03-08 江西理工大学 一种注意力机制u型密集连接视网膜血管分割方法
CN109727253A (zh) * 2018-11-14 2019-05-07 西安大数据与人工智能研究院 基于深度卷积神经网络自动分割肺结节的辅助检测方法
CN110197516A (zh) * 2019-05-29 2019-09-03 浙江明峰智能医疗科技有限公司 一种基于深度学习的tof-pet散射校正方法
CN110689544A (zh) * 2019-09-06 2020-01-14 哈尔滨工程大学 一种遥感图像细弱目标分割方法
CN111311629A (zh) * 2020-02-21 2020-06-19 京东方科技集团股份有限公司 图像处理方法、图像处理装置及设备

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114022496A (zh) * 2021-09-26 2022-02-08 天翼爱音乐文化科技有限公司 图像处理方法、系统、装置及存储介质
CN113887470A (zh) * 2021-10-15 2022-01-04 浙江大学 基于多任务注意力机制的高分辨率遥感图像地物提取方法
CN113705548A (zh) * 2021-10-29 2021-11-26 北京世纪好未来教育科技有限公司 题目类型识别方法和装置
CN113705548B (zh) * 2021-10-29 2022-02-08 北京世纪好未来教育科技有限公司 题目类型识别方法和装置
CN114596580A (zh) * 2022-02-14 2022-06-07 南方科技大学 一种多人体目标识别方法、系统、设备及介质
CN114596580B (zh) * 2022-02-14 2024-05-14 南方科技大学 一种多人体目标识别方法、系统、设备及介质
WO2023159746A1 (zh) * 2022-02-23 2023-08-31 平安科技(深圳)有限公司 基于图像分割的图像抠图方法、装置、计算机设备及介质
CN114612683A (zh) * 2022-03-21 2022-06-10 安徽理工大学 基于密集多尺度推理网络的显著性目标检测算法
CN114782708A (zh) * 2022-05-12 2022-07-22 北京百度网讯科技有限公司 图像生成方法、图像生成模型的训练方法、装置和设备
CN114782708B (zh) * 2022-05-12 2024-04-16 北京百度网讯科技有限公司 图像生成方法、图像生成模型的训练方法、装置和设备
CN114723760A (zh) * 2022-05-19 2022-07-08 北京世纪好未来教育科技有限公司 人像分割模型的训练方法、装置及人像分割方法、装置
CN114897700A (zh) * 2022-05-25 2022-08-12 马上消费金融股份有限公司 图像优化方法、校正模型的训练方法及装置
CN114677392A (zh) * 2022-05-27 2022-06-28 珠海视熙科技有限公司 抠图方法、摄像设备、装置、会议系统、电子设备及介质
CN116228763A (zh) * 2023-05-08 2023-06-06 成都睿瞳科技有限责任公司 用于眼镜打印的图像处理方法及系统
CN116228763B (zh) * 2023-05-08 2023-07-21 成都睿瞳科技有限责任公司 用于眼镜打印的图像处理方法及系统
CN116342675A (zh) * 2023-05-29 2023-06-27 南昌航空大学 一种实时单目深度估计方法、系统、电子设备及存储介质
CN116342675B (zh) * 2023-05-29 2023-08-11 南昌航空大学 一种实时单目深度估计方法、系统、电子设备及存储介质
CN117474925A (zh) * 2023-12-28 2024-01-30 山东润通齿轮集团有限公司 一种基于机器视觉的齿轮点蚀检测方法及系统
CN117474925B (zh) * 2023-12-28 2024-03-15 山东润通齿轮集团有限公司 一种基于机器视觉的齿轮点蚀检测方法及系统

Also Published As

Publication number Publication date
US20220319155A1 (en) 2022-10-06
CN111311629B (zh) 2023-12-01
CN111311629A (zh) 2020-06-19

Similar Documents

Publication Publication Date Title
WO2021164429A1 (zh) 图像处理方法、图像处理装置及设备
WO2020177651A1 (zh) 图像分割方法和图像处理装置
WO2021073493A1 (zh) 图像处理方法及装置、神经网络的训练方法、合并神经网络模型的图像处理方法、合并神经网络模型的构建方法、神经网络处理器及存储介质
WO2021043273A1 (zh) 图像增强方法和装置
EP3923233A1 (en) Image denoising method and apparatus
CN112308200B (zh) 神经网络的搜索方法及装置
WO2021164731A1 (zh) 图像增强方法以及图像增强装置
WO2021164234A1 (zh) 图像处理方法以及图像处理装置
US20190244362A1 (en) Differentiable Jaccard Loss Approximation for Training an Artificial Neural Network
WO2022134971A1 (zh) 一种降噪模型的训练方法及相关装置
CN111914997B (zh) 训练神经网络的方法、图像处理方法及装置
CN112446380A (zh) 图像处理方法和装置
CN113095470B (zh) 神经网络的训练方法、图像处理方法及装置、存储介质
CN112446835B (zh) 图像恢复方法、图像恢复网络训练方法、装置和存储介质
US20220157046A1 (en) Image Classification Method And Apparatus
WO2022100490A1 (en) Methods and systems for deblurring blurry images
CN113450290A (zh) 基于图像修补技术的低照度图像增强方法及系统
CN113096023B (zh) 神经网络的训练方法、图像处理方法及装置、存储介质
CN113284055A (zh) 一种图像处理的方法以及装置
CN114049314A (zh) 一种基于特征重排和门控轴向注意力的医学图像分割方法
CN113076966B (zh) 图像处理方法及装置、神经网络的训练方法、存储介质
CN111724309B (zh) 图像处理方法及装置、神经网络的训练方法、存储介质
Shi et al. LCA-Net: A Context-Aware Light-Weight Network For Low-Illumination Image Enhancement
CN114862685A (zh) 一种图像降噪方法、及图像降噪模组
Xu et al. Underwater image enhancement method based on a cross attention mechanism

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20919527

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20919527

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20919527

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 30.03.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20919527

Country of ref document: EP

Kind code of ref document: A1