WO2021164429A1 - 图像处理方法、图像处理装置及设备 - Google Patents
图像处理方法、图像处理装置及设备 Download PDFInfo
- Publication number
- WO2021164429A1 WO2021164429A1 PCT/CN2020/140781 CN2020140781W WO2021164429A1 WO 2021164429 A1 WO2021164429 A1 WO 2021164429A1 CN 2020140781 W CN2020140781 W CN 2020140781W WO 2021164429 A1 WO2021164429 A1 WO 2021164429A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- feature maps
- image
- network
- convolution
- sampling
- Prior art date
Links
- 238000012545 processing Methods 0.000 title claims abstract description 175
- 238000003672 processing method Methods 0.000 title claims abstract description 37
- 238000004364 calculation method Methods 0.000 claims abstract description 150
- 238000005070 sampling Methods 0.000 claims abstract description 107
- 238000000605 extraction Methods 0.000 claims abstract description 47
- 238000013527 convolutional neural network Methods 0.000 claims description 54
- 230000006870 function Effects 0.000 claims description 51
- 238000012549 training Methods 0.000 claims description 50
- 238000000034 method Methods 0.000 claims description 38
- 230000011218 segmentation Effects 0.000 claims description 31
- 230000004927 fusion Effects 0.000 claims description 17
- 238000003708 edge detection Methods 0.000 claims description 5
- 230000004913 activation Effects 0.000 description 26
- 230000008569 process Effects 0.000 description 25
- 238000010586 diagram Methods 0.000 description 18
- 230000000694 effects Effects 0.000 description 12
- 238000013528 artificial neural network Methods 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 238000011176 pooling Methods 0.000 description 4
- 101100508818 Mus musculus Inpp5k gene Proteins 0.000 description 3
- 101100366438 Rattus norvegicus Sphkap gene Proteins 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 3
- 210000002569 neuron Anatomy 0.000 description 3
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 238000004321 preservation Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 1
- 238000000137 annealing Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 229920006395 saturated elastomer Polymers 0.000 description 1
- 230000001568 sexual effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/7715—Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/194—Segmentation; Edge detection involving foreground-background segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4046—Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20016—Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
Definitions
- the present disclosure relates to an image processing method, image processing device and equipment.
- Image matting is a research direction in the field of image processing and computer vision.
- the foreground and background in the image can be separated by matting.
- the result of matting can have multiple applications, such as background replacement, ID photo generation, virtual group photo generation, virtual scenery, background blur, etc.
- the present disclosure provides an image processing method, image processing device and equipment.
- the present disclosure provides an image processing method, including: acquiring an input image; performing down-sampling and feature extraction on the input image through an encoder network to obtain multiple feature maps; and processing the multiple features through a decoder network
- the image is up-sampled and feature extracted to obtain the target segmentation image; among them, the encoder network and the decoder network respectively include multiple processing levels, the L-th processing level in the encoder network is processed in multiple feature maps and the decoder network After the multiple feature maps processed by the J-th processing level are fused, they are input to the J+1 processing level in the decoder network.
- the multiple feature maps and decoders processed by the L-th processing level in the encoder network have the same resolution; L and J are both positive integers; among them, at least one of the multiple processing levels of the encoder network includes dense calculation blocks, and the decoder At least one of the multiple processing levels of the network includes dense calculation blocks; the M-th dense calculation block in the encoder network and the decoder network includes N convolution modules, and the i-th one of the N convolution modules
- the input of the convolution module includes the output of i-1 convolution modules before the i-th convolution module; at least one convolution module of the N convolution modules includes at least one set of asymmetric convolution kernels; i, N, and M are all integers, M is greater than or equal to 1 and less than or equal to the total number of dense computing blocks in the encoder network and decoder network, N is greater than or equal to 3, i is greater than or equal to 3 and less than or equal to N
- the present disclosure provides an image processing device, including: an image acquisition module configured to acquire an input image; an image processing module configured to down-sampling and feature extraction of the input image through an encoder network to obtain multiple Feature map; Up-sampling and feature extraction are performed on the multiple feature maps through the decoder network to obtain the target segmentation image; wherein the encoder network and the decoder network respectively include multiple processing levels, and the L-th processing in the encoder network After the multiple feature maps obtained by the hierarchical processing and the multiple feature maps processed by the Jth processing level in the decoder network are merged, they are input to the J+1 processing level in the decoder network.
- the Lth processing level in the encoder network The multiple feature maps processed by one processing level and the multiple feature maps processed by the Jth processing level in the decoder network have the same resolution; L and J are both positive integers; among them, the multiple processing of the encoder network At least one of the processing levels in the hierarchy includes dense calculation blocks, and at least one of the multiple processing levels of the decoder network includes dense calculation blocks; the Mth dense calculation block in the encoder network and the decoder network includes N convolution modules , The input of the i-th convolution module in the N convolution modules includes the output of i-1 convolution modules before the i-th convolution module; at least one of the N convolution modules The convolution module includes at least one set of asymmetric convolution kernels; i, N, and M are all integers, M is greater than or equal to 1 and less than or equal to the total number of dense calculation blocks in the encoder network and decoder network, and N is greater than or Equal to 3, i is greater than or equal to 3 and less than or
- the present disclosure provides an image processing device including a memory and a processor, the memory is configured to store program instructions, and the processor implements the steps of the image processing method as described above when the program instructions are executed.
- the present disclosure provides a computer-readable storage medium that stores program instructions, and when the program instructions are executed, the steps of the image processing method described above are implemented.
- FIG. 1 is a flowchart of an image processing method provided by an embodiment of the present disclosure
- FIG. 2 is an example diagram of an input image and a generated target segmented image according to an embodiment of the present disclosure
- FIG. 3 is an example diagram of the application effect of the target segmented image shown in FIG. 2;
- FIG. 4 is an example diagram of an encoder network and a decoder network provided by an embodiment of the present disclosure
- FIG. 5 is an exemplary diagram of the first dense calculation block provided by an embodiment of the present disclosure.
- FIG. 6 is an example diagram of the first convolution module provided by an embodiment of the present disclosure.
- FIG. 7 is an example diagram of a training process of a convolutional neural network provided by an embodiment of the disclosure.
- FIG. 8 is a schematic diagram of training a convolutional neural network provided by an embodiment of the present disclosure.
- FIG. 9 is a schematic diagram of an image processing device provided by an embodiment of the disclosure.
- FIG. 10 is a schematic diagram of an image processing device provided by an embodiment of the disclosure.
- the specification may have presented the method or process as a specific sequence of steps. However, to the extent that the method or process does not depend on the specific order of the steps described herein, the method or process should not be limited to the steps in the specific order described. As those of ordinary skill in the art will understand, other sequence of steps are also possible. Therefore, the specific order of the steps set forth in the specification should not be construed as a limitation on the claims. In addition, the claims for the method or process should not be limited to performing their steps in the written order, and those skilled in the art can easily understand that these orders can be changed and still remain within the spirit and scope of the embodiments of the present disclosure.
- Convolutional Neural Networks is a neural network structure that uses, for example, images as input and output, and uses filters (convolution kernels) to replace scalar weights.
- the convolution process can be seen as using a trainable filter to convolve an input image or convolution feature map, and output a convolution feature plane.
- the convolution feature plane can also be called a feature map.
- the convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network. In the convolutional layer of a convolutional neural network, a neuron is only connected to some of the neurons in the adjacent layer.
- the convolutional layer can apply several convolution kernels to the input image to extract multiple types of features of the input image. Each convolution kernel can extract one type of feature.
- the convolution kernel is generally initialized in the form of a random-sized matrix. During the training process of the convolutional neural network, the convolution kernel will learn to obtain reasonable weights. In the same convolutional layer, multiple convolution kernels can be used to extract different image information.
- the embodiments of the present disclosure provide an image processing method, an image processing device, and equipment, which can use a convolutional neural network to process an input image to automatically generate a target segmentation image.
- the convolutional neural network provided by the embodiment of the present disclosure combines with The dense calculation block with asymmetric convolution kernel and the encoder decoder network with jump connection can improve the matting effect and processing speed, reduce the time required for calculation, support real-time automatic matting of the input image, and have more Good and broader application prospects.
- Fig. 1 is a flowchart of an image processing method provided by an embodiment of the present disclosure. As shown in FIG. 1, the image processing method provided by the embodiment of the present disclosure includes the following steps:
- Step 101 Obtain an input image
- Step 102 Perform down-sampling and feature extraction on the input image through the encoder network to obtain multiple feature maps
- Step 103 Up-sampling and feature extraction are performed on the multiple feature maps through the decoder network to obtain the target segmented image.
- the image processing method provided in this embodiment is used to separate the target object in the input image from the background, and the target segmented image may be a matting mask of the target object.
- the target object may be a portrait in the input image, or may be a preset detection object (for example, an animal, a building, etc.).
- this disclosure is not limited to this.
- the input image may include an image of a person.
- the input image may be an image of a person captured by an image capture device such as a digital camera or a mobile phone, or may be an image of a person captured by an image capture device.
- An image of a person is not limited to this.
- Fig. 2 is an example diagram of an input image and a generated target segmented image according to an embodiment of the present disclosure.
- the convolutional neural network includes an encoder network and a decoder network.
- the target segmented image is a matting mask of the target person in the input image.
- this disclosure is not limited to this.
- the target segmented image may be an image of the target person extracted from the input image.
- the input image is a grayscale image.
- the input image can be a color image, such as an RGB image.
- FIG. 3 is an example diagram of the application effect of the target segmented image shown in FIG. 2.
- the target segmented image (the matting mask of the target person shown in FIG. 2) obtained by the image processing method provided by the embodiment of the present disclosure can be used to convert the input image (the first image in the first row of FIG. 3) The human body area in the figure) is cut out, and then synthesized into other natural scenes that do not contain the human body to realize the background replacement; for example, the fourth picture in the first row of Figure 3 is the effect obtained after the background replacement of the input image Figure example.
- the target segmented image (the matting mask shown in FIG.
- the image processing method provided by the embodiment of the present disclosure can be used to input the image (the first image in the first row in FIG. 3).
- the human body area in) is cut out, and then synthesized into other images containing the human body and natural scenes to achieve a virtual group photo; for example, in Figure 3, except for the first image in the first row and the fourth image in the first row
- the rest of the images are examples of the effect of taking a virtual group photo of the input image.
- the convolutional neural network includes: an encoder network and a decoder network, where the encoder network is configured to down-sampling and feature extraction on an input image to obtain multiple feature maps; and decoding;
- the processor network is configured to perform up-sampling and feature extraction on multiple feature maps to obtain target segmentation images.
- the encoder network and the decoder network respectively include multiple processing levels, multiple feature maps processed by the L-th processing level in the encoder network and multiple feature maps processed by the J-th processing level in the decoder network. After the fusion of the feature maps, they are input to the J+1 processing level in the decoder network.
- the multiple feature maps processed by the L-th processing level in the encoder network and the J-th processing level in the decoder network are processed
- the multiple feature maps of have the same resolution; where L and J are both positive integers.
- a down-sampling and feature extraction in the encoder network can be respectively regarded as a processing level; in the decoder network, an up-sampling and feature extraction can be respectively regarded as a processing level.
- the value of L may be one or more, and the value of J may be one or more.
- the values of L and J can both be three; for example, if L is 1, J is 5, the multiple feature maps obtained by the first processing level in the encoder network and the first in the decoder network The multiple feature maps processed by the five processing levels have the same resolution, and the feature maps processed by the two processing levels are fused and input to the sixth processing level in the decoder network; L is 2 and J is 1.
- the multiple feature maps processed by the second processing level in the encoder network and the multiple feature maps processed by the first processing level in the decoder network have the same resolution, and the two processing levels processed After the feature map is merged, it is input to the second processing level in the decoder network; L is 3 and J is 3, then the multiple feature maps processed by the third processing level in the encoder network and the third processing level in the decoder network
- the multiple feature maps processed by the processing level have the same resolution, and the feature maps processed by the two processing levels are fused and input to the fourth processing level in the decoder network.
- the multiple feature maps processed by the L-th processing level in the encoder network and the multiple feature maps processed by the J-th processing level in the decoder network are merged, and then input to the decoder network
- the J+1 processing level in the middle can achieve a jump connection between the encoder network and the decoder network, that is, the processing levels that obtain the same resolution feature map in the encoder network and the decoder network are connected, and the two The multiple feature maps obtained by processing at the processing level are fused and input to the next processing level in the decoder network.
- the jump connection between the encoder network and the decoder network can increase the preservation of the image details by the decoder network, thereby improving the accuracy of the matting result.
- the multiple feature maps processed by the L-th processing level in the encoder network and the multiple feature maps processed by the J-th processing level in the decoder network are merged, and then input into the decoder network
- the J+1 processing level may include: multiple feature maps processed by the L-th processing level in the encoder network and multiple feature maps processed by the J-th processing level in the decoder network after being spliced in the channel dimension , Input to the J+1 processing level in the decoder network.
- the multiple feature maps processed by the L-th processing level in the encoder network and the multiple feature maps processed by the J-th processing level in the decoder network are merged through a concat operation.
- this disclosure is not limited to this.
- the multiple feature maps processed by the Lth processing level in the encoder network and the multiple feature maps processed by the Jth processing level in the decoder network can be processed through operations such as Add or multiply.
- Feature maps are fused.
- the feature maps of the same size and resolution obtained by the encoder network and the decoder network are fused and input to the decoder network.
- the image details and information lost in the downsampling process of the encoder network can be transferred to the decoder network.
- the decoder network can use this information to generate a more accurate target segmentation image, thereby improving the matting effect.
- the L-th processing level in the encoder network and the J-th processing level in the decoder network perform corresponding processing
- the L-th processing level in the encoder network obtains multiple processing levels. If the multiple feature maps and the multiple feature maps processed by the Jth processing level in the decoder network have the same resolution, then the multiple feature maps processed by the Lth processing level in the encoder network and the Jth processing level in the decoder network have the same resolution. After multiple feature maps obtained from the processing of the two processing levels are merged, they can be input to the J+1 processing level in the decoder network.
- the corresponding processing performed by the L-th processing level in the encoder network and the J-th processing level in the decoder network may be, for example: the L-th processing level in the encoder network performs down-sampling processing, and the decoder network Up-sampling is performed at the J-th processing level in the encoder network; or, the L-th processing level in the encoder network performs multi-level feature extraction, and the J-th processing level in the decoder network also performs multi-level feature extraction.
- this disclosure is not limited to this.
- the feature maps with the same resolution obtained by corresponding processing levels in the encoder network and the decoder network are fused and then input to the decoder network, which can improve the preservation effect of the fused feature maps on image details. , Improve the accuracy of the target segmentation image obtained by the decoder network using the fusion feature map, thereby improving the matting result.
- At least one of the multiple processing levels of the encoder network includes dense calculation blocks, and at least one of the multiple processing levels of the decoder network includes dense calculation blocks; in the encoder network and the decoder network
- the M-th intensive calculation block includes N convolution modules, and the input of the i-th convolution module in the N convolution modules includes the i-1 convolution modules before the i-th convolution module Output; at least one of the N convolution modules includes at least one set of asymmetric convolution kernels; where i, N, and M are all integers, and M is greater than or equal to 1 and less than or equal to the encoder network and
- the total number of intensive calculation blocks in the decoder network N is greater than or equal to 3, i is greater than or equal to 3 and less than or equal to N.
- all the convolution modules in the N convolution modules include at least one set of asymmetric convolution kernels, or only part of the convolution modules in the N convolution modules include at least one set of asymmetric convolution kernels.
- this disclosure is not limited to this.
- any dense calculation block may include N convolution modules, and the number of convolution modules included in different dense calculation blocks may be the same or different.
- the first intensive calculation block may include five convolution modules
- the second intensive calculation block may include eight convolution modules
- the third intensive calculation block may include five convolution modules.
- any dense calculation block is configured to perform multi-level feature extraction, and one dense calculation block corresponds to one processing level.
- the order of the multiple dense calculation blocks can be determined according to the order of the processing level of the encoder network and the processing level of the decoder network.
- the encoder network includes two dense calculation blocks (corresponding to the third and fifth processing levels in the encoder network), and the decoder network includes one dense calculation block (corresponding to the third processing level in the decoder network).
- the dense calculation block corresponding to the third processing level in the encoder network can be marked as the first dense calculation block, and the dense calculation block corresponding to the fifth processing level can be marked as the second dense calculation block, and the decoder network
- the intensive calculation block corresponding to the third processing level is marked as the third intensive calculation block.
- this disclosure is not limited to this.
- the dense calculation block is an efficient dense calculation block (EDA block, Effective Dense Asymmetric block) with asymmetric convolution.
- EDA block Effective Dense Asymmetric block
- a dense calculation block includes multiple convolution modules, and in the multiple convolution modules, except for the first convolution module, the input of each convolution module includes the input from all convolution modules before the convolution module. Output, so that multiple convolution modules in the dense calculation block can form dense connections.
- dense calculation blocks are used for feature extraction, which can greatly reduce parameters, reduce calculation amount, increase processing speed, and has better anti-overfitting performance.
- at least one convolution module in the dense calculation block of this embodiment includes one or more sets of asymmetric convolution kernels. By adopting the asymmetric convolution kernel for feature extraction, the amount of calculation can be greatly reduced, thereby increasing the processing speed.
- a convolution module including at least one set of asymmetric convolution kernels among the N convolution modules may include: an asymmetric convolution kernel obtained by expansion of the at least one set of asymmetric convolution kernels.
- a certain convolution module among N convolution modules may include two sets of asymmetric convolution kernels, and the second set of asymmetric convolution kernels may be expanded from the previous set of asymmetric convolution kernels.
- the first group The asymmetric convolution kernel can be a 3 ⁇ 1 convolution kernel and a 1 ⁇ 3 convolution kernel.
- an expanded asymmetric convolution kernel can be obtained by performing an expansion operation on the asymmetric convolution kernel. Using the expanded asymmetric convolution kernel can not only increase the Receptive Field, but also reduce image processing. The loss of spatial information in the process and the formation of densely connected multiple convolution modules can generate feature maps with consistent resolution.
- step 102 may include: down-sampling the input image to obtain multiple first down-sampled feature maps with a first resolution; down-sampling the multiple first down-sampled feature maps , Obtain multiple second down-sampled feature maps with the second resolution; perform multi-level feature extraction on the multiple second down-sampled feature maps through the first dense calculation block to obtain multiple second-resolution feature maps.
- step 103 may include: up-sampling the plurality of second densely calculated feature maps to obtain a plurality of first up-sampling feature maps with a second resolution; An up-sampling feature map and the plurality of second down-sampling feature maps are spliced in the channel dimension to obtain a first fusion feature map group; feature extraction is performed on the first fusion feature map group to obtain a second resolution A plurality of first intermediate feature maps of the plurality of first intermediate feature maps; multi-level feature extraction is performed on the plurality of first intermediate feature maps through a third intensive calculation block to obtain a plurality of third intensive calculation feature maps with a second resolution; The plurality of third intensive calculation feature maps and the plurality of first intensive calculation feature maps are spliced in the channel dimension to obtain a second fusion feature map group; feature extraction is performed on the second fusion feature map group to obtain A plurality of second intermediate feature maps with a second resolution; up-sampling the plurality of second intermediate feature maps to obtain a plurality of
- the first intensive calculation block may include five convolution modules
- the second intensive calculation block may include eight convolution modules
- the third intensive calculation block may include five convolution modules
- Each convolution module in the first dense calculation block, the second dense calculation block, and the third dense calculation block includes a 1 ⁇ 1 convolution kernel and two groups of asymmetric convolution kernels.
- the first group of asymmetric convolution kernels is 3 ⁇ 1 convolution kernel and 1 ⁇ 3 convolution kernel
- the second group of asymmetric convolution kernels are obtained according to the first group of asymmetric convolution kernels and the corresponding expansion coefficient.
- Fig. 4 is an exemplary diagram of an encoder network and a decoder network provided by an embodiment of the disclosure.
- the convolutional neural network can matte a color person image (for example, the color image of the input image shown in FIG. 2) to obtain a black and white matting mask (for example, the color image shown in FIG. 2). Target segmentation image).
- the encoder network 201 includes: a first down-sampling block 301, a second down-sampling block 203, a first dense calculation block 303, a third down-sampling block 304, and a second dense calculation block 305.
- the second down-sampling block 302 is located between the first down-sampling block 301 and the first dense calculation block 303
- the third down-sampling block 304 is located between the first dense calculation block 303 and the second dense calculation block 305.
- the first down-sampling block 301 is configured to down-sample the input image to obtain a plurality of first down-sampling feature maps with a first resolution;
- the second down-sampling block 302 is configured to perform down-sampling on a plurality of first down-sampling feature maps.
- the down-sampling feature maps are down-sampled to obtain multiple second down-sampled feature maps with a second resolution;
- the first dense calculation block 303 is configured to perform multi-level feature extraction on the multiple second down-sampled feature maps to obtain A plurality of first densely-computed feature maps with a second resolution;
- the third down-sampling block 304 is configured to down-sample the plurality of first densely-computed feature maps to obtain a plurality of third down-sampled features with a third resolution Figure;
- the second dense calculation block 305 is configured to perform multi-level feature extraction from multiple third down-sampled feature maps to obtain multiple second dense calculation feature maps with a third resolution.
- the first resolution is greater than the second resolution
- the second resolution is greater than the third resolution
- the first resolution is less than the resolution of the input image.
- the encoder network 201 includes five processing levels, corresponding to three downsampling blocks and two dense calculation blocks respectively; among them, three downsampling blocks and two dense calculation blocks are used.
- Using multiple down-sampling blocks to gradually reduce the spatial dimension of the feature map can expand the receptive field, so that the encoder network can better extract local and global features of different scales, and the down-sampling block can compress the extracted feature maps, thereby saving The amount of calculation and memory occupancy, and increase the processing speed.
- the decoder network 202 includes: a first up-sampling block 306, a first convolution block 307, a third dense calculation block 308, a second convolution block 309, a second up-sampling block 310, and a third volume The product block 311 and the third up-sampling block 312.
- the first up-sampling block 306 is located between the second dense calculation block 305 and the first convolution block 307
- the third dense calculation block 308 is located between the first convolution block 307 and the second convolution block 309
- the second The upsampling block 310 is located between the second convolution block 309 and the third convolution block 311
- the third convolution block 311 is located between the second upsampling block 310 and the third upsampling block 311.
- the first up-sampling block 306 is configured to up-sample the multiple second densely calculated feature maps output by the encoder network 201 to obtain multiple first up-sampled feature maps with a second resolution.
- the first convolution block 307 is configured to perform feature extraction on the first fusion feature map group obtained by splicing the plurality of first up-sampling feature maps and the plurality of second down-sampling feature maps in the channel dimension to obtain a second A plurality of first intermediate feature maps with a resolution, and the plurality of first up-sampling feature maps and the plurality of second down-sampling feature maps have the same resolution;
- the third dense calculation block 308 is configured to compare the plurality of first intermediate feature maps The image performs multi-level feature extraction to obtain multiple third densely calculated feature maps with a second resolution;
- the second convolution block 309 is configured to perform multi-level feature extraction on the multiple third densely calculated feature maps and multiple first densely calculated feature maps.
- the second fusion feature map group obtained by splicing the feature maps in the channel dimension performs feature extraction to obtain a plurality of second intermediate feature maps with a second resolution, the plurality of third densely calculated feature maps and a plurality of first dense
- the calculated feature maps have the same resolution
- the second up-sampling block 310 is configured to up-sampling a plurality of second intermediate feature maps to obtain a plurality of second up-sampling feature maps with the first resolution
- the third convolution block 311 Configured to perform feature extraction on the third fusion feature map group obtained by splicing the plurality of second up-sampling feature maps and the plurality of first down-sampling feature maps in the channel dimension to obtain a plurality of third intermediate feature maps, the The multiple second up-sampling feature maps and the multiple first down-sampling feature maps have the same resolution
- the third up-sampling block 312 is configured to perform an up-sampling operation on the multiple third intermediate feature maps to obtain the same resolution as the input image Rate the
- the decoder network 202 includes seven processing levels, corresponding to three upsampling blocks (Upsampling Block), three convolution blocks (Convolution Block), and one dense calculation block.
- the up-sampling block restores the spatial resolution of the multiple feature maps extracted by the encoder network 201 to be consistent with the input image, and performs feature extraction through three up-sampling blocks, three convolutional blocks, and a dense calculation block to convert the encoder
- the multiple feature maps extracted by the network 201 are gradually transformed into the target segmented image of the input image.
- the first resolution may be 1/2
- the second resolution may be 1/4
- the third resolution may be 1/8.
- the size of the feature map with the first resolution is (H/2) ⁇ (W/2)
- the size of the feature map with the resolution is (H/4) ⁇ (W/4)
- the size of the feature map with the third resolution is (H/8) ⁇ (W/8).
- a skip connection is established between the encoder network 201 and the decoder network 202, and the skip connection between the encoder network 201 and the decoder network 202 may be Concat method.
- the resolutions of the multiple first downsampling feature maps obtained by the first downsampling block 301 and the resolutions of the multiple second upsampling feature maps obtained by the second upsampling block 310 are both the first resolution.
- the multiple first down-sampling feature maps obtained by the first down-sampling block 301 are input to the second down-sampling block 302, and the multiple second up-sampling feature maps obtained by the second up-sampling block 310 are in the channel dimension After stitching, input to the third convolution block 311; the resolution of the multiple second down-sampling feature maps obtained by the second down-sampling block 302 and the resolution of the multiple first up-sampling feature maps obtained by the first up-sampling block 306 Rate is the second resolution, the multiple second down-sampling feature maps obtained by the second down-sampling block 302 are not only input to the first dense calculation block 303, but also combined with the multiple first up-sampling feature maps obtained by the first up-sampling block 306.
- the sampled feature maps are spliced in the channel dimension and then input to the first convolution block 307; the resolutions of the multiple first densely calculated feature maps obtained by the first dense calculation block 303 and the multiple third densely calculated feature maps obtained by the third dense calculation block 308
- the resolution of the densely calculated feature maps is the second resolution, and the multiple first densely calculated feature maps obtained by the first densely calculated block 303 are both input to the third down-sampling block 304, and obtained from the third densely calculated block 308
- the multiple third densely-computed feature maps of are input to the second convolution block 309 after being spliced in the channel dimension.
- splicing feature maps with the same resolution and size according to the channel dimensions is an increase in the number of channels of the feature maps with the same resolution and size.
- the feature map output by the first dense calculation block 303 is Channel1 ⁇ h ⁇ w, where h and w represent the length and width of the feature map, Channel1 represents the number of output channels of the first dense calculation block 303, and the third dense calculation block
- the feature map output by 308 is Channel2 ⁇ h ⁇ w, where Channel2 represents the number of output channels of the third dense calculation block 308, and the size and resolution of the feature maps output by the first dense calculation block 303 and the third dense calculation block 308 are the same .
- the second fused feature map group obtained by splicing the feature maps output by the first dense calculation block 303 and the third dense calculation block 308 in the channel dimension is (Channel1+Channel2) ⁇ h ⁇ w.
- the image details and information lost in the encoder network 201 during multiple downsampling processes can be transferred to the decoder network 202.
- the decoder network 202 can use this information to generate a more accurate target segmentation image, thereby improving the matting effect.
- the number of input channels of the first downsampling block 301 is 3 and the number of output channels is 15; the number of input channels of the second downsampling block 302 is 15, and the number of output channels is 60; the first intensive calculation The number of input channels of block 303 is 60, and the number of output channels is 260; the number of input channels of the third downsampling 304 is 260, and the number of output channels is 130; the number of input channels of the second intensive calculation block 305 is 130, and the number of output channels is 450.
- the number of input channels of the first upsampling block 306 is 450, and the number of output channels is 60; the number of input channels of the first convolution block 307 is 120, and the number of output channels is 60; the number of input channels of the third dense calculation block 308 is 60 , The number of output channels is 260; the number of input channels of the second convolution block 309 is 520, and the number of output channels is 260; the number of input channels of the second upsampling block 310 is 260, and the number of output channels is 15; the third convolution block The number of input channels of 311 is 30 and the number of output channels is 15; the number of input channels of the third upsampling block 312 is 15, and the number of output channels is 1.
- any convolutional block in the decoder network may include a convolutional layer and an activation layer, where the activation layer is located after the convolutional layer.
- the convolutional layer is configured to perform a convolution operation, and may include one or more convolution kernels.
- the structures and parameters of multiple convolutional blocks in the convolutional neural network of the present disclosure may be different from each other, or at least partially the same. However, this disclosure is not limited to this.
- any down-sampling block in the encoder network may include a convolutional layer, a pooling layer, and an activation layer.
- the convolutional layer is configured to perform a convolution operation, and may include one or more convolution kernels.
- the pooling layer is a form of downsampling; the pooling layer can be configured to reduce the size of the input image, simplify the complexity of calculation, and reduce the phenomenon of overfitting to a certain extent; the pooling layer can perform feature compression , To extract the main features of the input image.
- the structures and parameters of multiple downsampling blocks may be different from each other, or at least partially the same. However, this disclosure is not limited to this.
- any down-sampling block in the encoder network is configured to perform a down-sampling operation, which can reduce the size of the feature map, perform feature compression, and extract main features to simplify calculation complexity, and to a certain extent To reduce the phenomenon of over-fitting.
- down-sampling operations can include: maximum merging, average merging, random merging, decimation (for example, selecting fixed pixels), demultiplexing output (demuxout, splitting the input image into multiple smaller images) )Wait.
- this disclosure is not limited to this.
- any upsampling block in the decoder network may include an upsampling layer and an activation layer, where the upsampling layer may include a convolutional layer.
- the convolutional layer is configured to perform a convolution operation, and may include one or more convolution kernels.
- the structures and parameters of multiple upsampling blocks may be different from each other, or at least partially the same. However, this disclosure is not limited to this.
- any up-sampling layer in the decoder network is configured to perform an up-sampling operation; wherein, the up-sampling operation may include: maximum merging, strides transposed convolutions, interpolation ( For example, interpolation, two cubic interpolation, etc.) and so on.
- the up-sampling operation may include: maximum merging, strides transposed convolutions, interpolation ( For example, interpolation, two cubic interpolation, etc.) and so on.
- interpolation For example, interpolation, two cubic interpolation, etc.
- the number of up-sampling blocks in the decoder network 220 is the same as the number of down-sampling blocks in the encoder network 210, so that the target segmented image and the input image have the same resolution, and can It is ensured that the feature maps obtained by the two processing levels of skip connection have the same resolution.
- the first dense calculation block 303 includes five convolution modules, and the input of any convolution module except the first convolution module in the first dense calculation block 303 includes the convolution The output of all convolution modules before the module.
- the second dense calculation block 305 includes eight convolution modules, and the input of any convolution module except the first convolution module in the second dense calculation block 305 includes the input of all convolution modules before the convolution module. Output.
- the third dense calculation block 308 includes five convolution modules, and the input of any convolution module except the first convolution module in the third dense calculation block 308 includes the input of all convolution modules before the convolution module. Output.
- the convolution modules in the first dense calculation block 303, the second dense calculation block 305, and the third dense calculation block 308 are connected in series to achieve dense connection.
- FIG. 5 is an exemplary diagram of the first dense calculation block 303 provided by an embodiment of the disclosure.
- the first dense calculation block 303 includes a first convolution module 315, a second convolution module 316, a third convolution module 317, a fourth convolution module 318, and a fifth convolution module 319 connected in series.
- the first dense calculation block 303 includes a first convolution module 315, a second convolution module 316, a third convolution module 317, a fourth convolution module 318, and a fifth convolution module 319 connected in series.
- the first convolution module 315 is configured to receive and process C 1 feature maps to obtain K 1 feature maps, and convert C 1 The two feature maps and K 1 feature maps are spliced in the channel dimension to obtain C 1 +K 1 feature maps;
- the second convolution module 316 is configured to receive and process C 2 feature maps to obtain K 2 feature maps,
- the C 2 feature maps and K 2 feature maps are spliced in the channel dimension to obtain C 2 +K 2 feature maps, where C 2 feature maps are C 1 + obtained by the first convolution module 315 K 1 feature map.
- the number of input channels of the first convolution module 315 is C 1
- the number of output channels is C 1 +K 1
- the number of input channels of the second convolution module 316 is C 1 +K 1
- the number of output channels is C 1 +K 1 +K 2 .
- the number of input channels of the third convolution module 317 is C 1 +K 1 +K 2
- the number of output channels is C 1 +K 1 +K 2 +K 3
- the input of the fourth convolution module 318 The number of channels is C 1 +K 1 +K 2 +K 3
- the number of output channels is C 1 +K 1 +K 2 +K 3 +K 4
- the number of input channels of the fifth convolution module 319 is C 1 +K 1 +K 2 +K 3 +K 4
- the number of output channels is C 1 +K 1 +K 2 +K 3 +K 4 +K 5 .
- the input of the third convolution module 317 includes the previous output of the first convolution module 315 and the second convolution module 316
- the input of the fourth convolution module 318 includes the previous first convolution module 315 and the second convolution module 315.
- the input of the fifth convolution module 319 includes the previous first convolution module 315, the second convolution module 316, the third convolution module 317, and the fourth convolution module. 318 output.
- the growth rate coefficient of any convolution module is the same, and the growth rate coefficient of the convolution module is the number of channels increased by the number of output channels of the convolution module compared to the number of input channels.
- K 1 , K 2 , K 3 , K 4 and K 5 are all the same.
- each dense calculation block in Figure 4 illustrates the number of feature maps obtained by each convolution module processing the input feature maps (the i-th convolution module obtains K i feature maps) transfer method.
- Figure 5 is a block intensive computing a convolution integral in series a plurality of modules, as in FIG. 4 K reflects the i-th transmission of the i-th feature modules convolution is equivalent to FIG.
- FIG. 6 is an exemplary diagram of the first convolution module 315 provided by an embodiment of the disclosure.
- the first convolution module includes a convolution layer 401, an activation layer 402, a first asymmetric convolution network 41, a second asymmetric convolution network 42, and a random loss Live (Dropout) layer 409.
- the first asymmetric convolutional network 41 includes a convolutional layer 403, a convolutional layer 404, and an activation layer 405 that are sequentially cascaded;
- the second asymmetric convolutional network 42 includes a convolutional layer 406 and a convolutional layer that are sequentially cascaded 407 and activation layer 408.
- the first convolution module 315 includes two sets of asymmetric convolution kernels. However, this disclosure is not limited to this.
- the number of input channels of the first convolution module is C 1
- the number of output channels is C 1 +K 1
- the number of input channels of the convolutional layer 401 is C 1
- the number of output channels is K 1
- the number of input channels and output channels of the convolutional layers 403 and 404 are both K 1
- the number of input channels of the convolutional layers 406 and 407 And the number of output channels are both K 1 .
- the input of the first convolution module and the output result of the Dropout layer 409 are connected by Concat to generate the output result of the first convolution module; in other words, the output result of the first convolution module is the first convolution
- the input feature map of the module and the feature map generated by the convolutional network are spliced in the channel dimension. In this way, multiple convolution modules can be connected in series to form a dense connection, that is, any convolution module in the dense calculation block receives and processes the output results of all convolution modules before the convolution module.
- the convolution kernel of the convolution layer 401 is 1 ⁇ 1.
- the convolution layer 401 with the convolution kernel of 1 ⁇ 1 can reduce the dimension when the convolution module performs feature extraction operations, reduce the number of feature maps, reduce the amount of calculation, and increase the degree of non-linearity of the convolutional neural network.
- the convolution kernel of the convolution layer 403 is 3 ⁇ 1, and the convolution kernel of the convolution layer 404 is 1 ⁇ 3;
- the convolution kernel of the convolutional layer 406 is obtained by the expansion operation of 3 ⁇ 1, and the convolution kernel of the convolutional layer 407 is obtained by the expansion operation of 1 ⁇ 3.
- the expansion coefficient of the expansion operation can be Is d. Different convolution modules in the same intensive calculation block may use the same or different expansion coefficients, or some convolution modules may use the same expansion coefficient. However, this disclosure is not limited to this.
- the expansion operation on the asymmetric convolution kernel can not only increase the receptive field, but also reduce the loss of spatial information, and keep the resolution of the feature maps output by the densely connected convolution modules consistent.
- the amount of calculation can be greatly reduced, thereby increasing the processing speed.
- the Dropout layer 409 can effectively prevent overfitting, and the Dropout layer 409 can be automatically turned off in the non-training phase.
- this disclosure is not limited to this.
- the structures and parameters of the second convolution module 316, the third convolution module 317, the fourth convolution module 318, and the fifth convolution module 319 in FIG. 5 may be the same as those of the first convolution module.
- the structure and parameters of 315 are the same or partly the same. However, this disclosure is not limited to this.
- multiple convolution modules in one intensive calculation block can select different growth rate coefficients and expansion coefficients.
- this disclosure is not limited to this.
- the growth rate coefficients of all convolution modules included in the three dense calculation blocks can be All are 40; the expansion coefficients of the five convolution modules included in the first intensive calculation block 303 may be (1, 1, 1, 2, 2); the expansion coefficients of the five convolution modules included in the third intensive calculation block 303 The coefficients can be respectively (1, 1, 1, 2, 2); the expansion coefficients of the eight convolution modules included in the second dense calculation block can be respectively (2, 2, 4, 4, 8, 8, 16, 16 ).
- the structures and parameters of the first dense calculation block 303 and the third dense calculation block 308 may be completely the same.
- the activation layer may include an activation function, and the activation function is used to introduce a non-linear factor to the convolutional neural network, so that the convolutional neural network can better solve more complicated problems.
- the activation function may include a linear correction unit (ReLU) function, a sigmoid function (Sigmoid function), or a hyperbolic tangent function (tanh function).
- the ReLU function is an unsaturated nonlinear function, and the Sigmoid function and tanh function are saturated nonlinear functions.
- the activation layer can be used as a layer of the convolutional neural network alone, or the activation layer can be included in the convolutional layer.
- the activation layer may include a normalization (Normalization) layer and an activation function.
- the activation layer 402 is configured to perform an activation operation on the output of the convolution layer 401
- the activation layer 405 is configured to perform an activation operation on the output of the convolution layer 404
- the activation layer 408 It is configured to perform an activation operation on the output of the convolutional layer 407.
- the activation layers 402, 405, and 408 may each include a regularization layer and an activation function. Among them, the activation functions in different activation layers may be the same or different, and the regularization layers in different activation layers may be the same or different. However, this disclosure is not limited to this.
- the image processing method provided by this exemplary embodiment can automatically extract the input portrait image by combining a dense calculation block with an asymmetric convolution kernel and a convolutional neural network with a jump-connected encoder decoder network , And the cutout results can be obtained in real time, which improves the processing speed and accuracy of the cutout results.
- the image processing method provided by the embodiment of the present disclosure further includes: training a convolutional neural network, and the convolutional neural network includes an encoder network and a decoder network. Before using the convolutional neural network for matting, it is necessary to train the convolutional neural network. After training, the parameters of the convolutional neural network remain unchanged during image processing. During the training process, the parameters of the convolutional neural network will be adjusted according to the training results to obtain an optimized convolutional neural network.
- the parameters of the convolutional neural network may include: convolution kernel and bias. Among them, the convolution kernel determines how to process the processed image, and the bias determines whether the output of the convolution kernel is input to the next layer.
- FIG. 7 is an example diagram of a training process of a convolutional neural network provided by an embodiment of the present disclosure.
- FIG. 8 is a schematic diagram of training a convolutional neural network provided by an embodiment of the disclosure.
- training a convolutional neural network may include the following steps 501 to 504.
- Step 501 Obtain training images.
- the training image can be selected from the two matting data sets of aisegment matting human and Portrait Matting; or, the COCO (Common Objects in Context) data set can be used to background images that do not contain portraits. Replace to achieve data expansion.
- COCO Common Objects in Context
- Step 502 Use a convolutional neural network to process the training image to generate a training segmentation image. This process is the same as the process of using the convolutional neural network to process the input image to generate the target segmentation image, and will not be repeated here.
- Step 503 According to the training segmentation image and the standard segmentation image corresponding to the training image, use the loss function to calculate the loss value of the convolutional neural network.
- Step 504 Optimize the parameters of the convolutional neural network according to the loss value.
- the loss function is an important equation used to measure the difference between the predicted value (training segmentation image) and the target value (standard segmentation image). For example, the higher the output value (loss) of the loss function, the greater the difference.
- judging whether the convolutional neural network has converged can be determined by at least one of the following methods: judging whether the number of times of updating the parameters of the convolutional neural network reaches the iteration threshold ; Determine whether the loss value of the convolutional neural network is lower than the loss threshold.
- the iteration threshold may be a preset number of iterations. For example, if the number of times that the parameters of the convolutional neural network are updated is greater than the beat threshold, the training ends.
- the loss threshold may be preset. For example, if the loss value calculated by the loss function is less than the loss threshold, the training is ended.
- the training module 60 may include a loss calculation unit 601 and an optimizer 602; the loss calculation unit 601 is configured to use a loss function to calculate the loss value of the convolutional neural network 20, and the optimizer 602 is configured to optimize the convolution according to the loss value.
- the parameters of the neural network 20 include matting on the training image to generate training segmentation images.
- the loss calculation unit 601 obtains the standard segmentation image corresponding to the training image from the training data set, and uses the loss function to calculate the loss value according to the training segmentation image and the standard segmentation image; the optimizer 602 adjusts the convolutional nerve according to the loss value calculated by the loss calculation unit 601 Network 20 parameters.
- the optimizer can use the stochastic gradient descent method, and the learning rate adjustment strategy of the optimizer uses the cosine annealing method with restart.
- this disclosure is not limited to this.
- the loss function may be obtained by weighted addition of an edge loss function, a matting mask loss function, and a foreground loss function. That is, the loss function can be expressed as:
- L w 1 L edge + w 2 L alpha + w 3 L foreground ;
- L edge is the edge loss function
- L alpha is the matting mask loss function
- L foreground is the foreground loss function
- w 1 , w 2 , and w 3 are the weights.
- w 1 , w 2 , and w 3 can be determined according to actual conditions or based on empirical values, which is not limited in the present disclosure.
- G x (A out ) K x ⁇ A out
- G y (A out ) K y ⁇ A out ;
- G x (A gt ) K x ⁇ A gt
- G y (A gt ) K y ⁇ A gt ;
- K x and K y are edge detection operators
- a out is the training segmentation image
- a gt is the standard segmentation image corresponding to the training image.
- the edge detection operator can use operators such as Sobel, Prewitt, Scharr, etc.
- operators such as Sobel, Prewitt, Scharr, etc.
- the present disclosure is not limited to this.
- the edge detection operator can use the Scharr operator, namely:
- the edge loss function is designed by using the edge detection operator, and the edge of the subject to be extracted in the matting result is added. Constraints to get a better matting effect.
- the foreground loss function can be expressed as:
- a out is the training segmentation image
- a gt is the standard segmentation image corresponding to the training image
- I is the training image
- To train the i-th pixel of the segmented image A out Is the i-th pixel of the standard segmented image A gt
- I ij is the j-th channel of the i-th pixel of the training image I.
- the image processing method provided by the embodiments of the present disclosure can use a convolutional neural network that combines a dense calculation block with an asymmetric convolution kernel, an encoder network and a decoder network with jump connections to realize real-time automatic automatic input of an input image. Cut out and improve the effect of matting, that is, not only can output a high-quality image, but also greatly increase the processing speed.
- the exemplary embodiment of the present disclosure adopts an edge loss function, a matting mask loss function, and a foreground loss function in the training process of the convolutional neural network, which can improve the matting effect.
- FIG. 9 is a schematic diagram of an image processing device provided by an embodiment of the present disclosure.
- the image processing device 70 provided in this embodiment includes: an image acquisition module 701 and an image processing module 702. These components are interconnected by a bus system or other form of connection mechanism (not shown).
- the components and structures of the image processing device shown in FIG. 9 are only exemplary, and not restrictive. The image processing device may also have other components and structures as required.
- the image acquisition module 701 is configured to acquire an input image.
- the image acquisition module 701 may include a memory in which the input image is stored; or, the image acquisition module 701 may include one or more cameras to acquire the input image.
- the image acquisition module 701 may be hardware, software, firmware, and any feasible combination thereof.
- the image processing module 702 is configured to down-sampling and feature extraction of the input image through the encoder network to obtain multiple feature maps; Sampling and feature extraction, to obtain the target segmentation image.
- the encoder network and the decoder network respectively include multiple processing levels, multiple feature maps processed at the L-th processing level in the encoder network, and multiple feature maps processed at the J-th processing level in the decoder network. After multiple feature maps are merged, they are input to the J+1 processing level in the decoder network, where the multiple feature maps processed by the L-th processing level in the encoder network and the J-th processing level in the decoder network Multiple feature maps obtained by hierarchical processing have the same resolution; L and J are both positive integers.
- At least one of the multiple processing levels of the encoder network includes a dense calculation block, and at least one of the multiple processing levels of the decoder network includes a dense calculation block; the encoder network and the decoder network
- the M-th intensive calculation block in includes N convolution modules, and the input of the i-th convolution module in the N convolution modules includes i-1 convolution modules before the i-th convolution module
- At least one of the N convolution modules includes at least one set of asymmetric convolution kernels; i, N, and M are all integers, and M is greater than or equal to 1 and less than or equal to the encoder network and decoding
- the total number of dense computing blocks in the processor network N is greater than or equal to 3, i is greater than or equal to 3 and less than or equal to N.
- the image processing module 702 in the image processing device 70 provided in this embodiment includes a convolutional neural network.
- the convolutional neural network has the same structure and function as the convolutional neural network in the embodiment of the above-mentioned image processing method. Go into details again.
- the image processing device may further include a training module configured to train a convolutional neural network.
- the training module may include: a loss calculation unit and an optimizer. The process of using the training module to train the convolutional neural network can refer to the relevant description in the embodiment of the above-mentioned image processing method, so it will not be repeated here.
- FIG. 10 is a schematic diagram of an image processing device provided by an embodiment of the disclosure.
- the image processing device 80 includes: a processor 801 and a memory 802; the memory 802 is configured to store program instructions; the processor 801 implements the steps of the image processing method in any of the foregoing embodiments when the program instructions are executed.
- the components of the image processing device 80 shown in FIG. 10 are only exemplary and not restrictive. According to actual application requirements, the image processing device 80 may also have other components.
- the processor 801 and the memory 802 may directly or indirectly communicate with each other.
- the network may include a wireless network, a wired network, or any combination of a wired network and a wireless network.
- the network may include a local area network, the Internet, a telecommunications network, the Internet of Things based on the Internet, the Internet of Things based on a telecommunications network, and any combination of the above networks.
- Wired networks can, for example, use twisted pair, coaxial cable, or optical fiber transmission for communication, and wireless networks can, for example, use 3G, 4G, 5G mobile communication networks, Bluetooth, or WIFI, and other communication methods.
- the present disclosure does not limit the types and functions of the network here.
- the processor 801 may control other components in the image processing apparatus to perform desired functions.
- the processor 801 may be a central processing unit (CPU, Central Processing Unit), a tensor processor (TPU, Tensor Processing Unit), or an image processor (GPU, Graphics Processing Unit) and other devices with data processing capabilities or program execution capabilities.
- the GPU can be directly integrated on the motherboard alone, or built into the north bridge chip of the motherboard; or, the GPU can be built into the CPU.
- the memory 802 may include any combination of one or more computer program products, and the computer program products may include various forms of computer-readable storage media, such as volatile memory and non-volatile memory.
- Volatile memory may include, for example, random access memory (RAM, Random Access Memory), cache memory (Cache), and the like.
- Non-volatile memory may include, for example, read only memory (ROM, Read Only Memory), hard disk, erasable programmable read only memory (EPROM, Erasable Programmable Read Only Memory), compact disk read only memory (CD-ROM), general purpose Serial bus (USB, Universal Serial Bus) memory, flash memory, etc.
- the computer-readable storage medium may also store one or more application programs and one or more types of data, for example, input images, and one or more types of data used or generated by the application programs.
- one or more computer-readable codes or program instructions may be stored in the memory 802, and the processor may execute the program instructions to execute the above-mentioned image processing method.
- the image processing method reference may be made to the relevant description in the embodiment of the above image processing method, so it will not be repeated here.
- At least one embodiment of the present disclosure also provides a computer-readable storage medium that stores program instructions, and when the program instructions are executed, the foregoing image processing method can be implemented.
- Such software may be distributed on a computer-readable medium, and the computer-readable medium may include a computer storage medium (or a non-transitory medium) and a communication medium (or a transitory medium).
- the term computer storage medium includes volatile and non-volatile data implemented in any method or technology for storing information (such as computer-readable instructions, data structures, program modules, or other data). Sexual, removable and non-removable media.
- Computer storage media include but are not limited to RAM, ROM, EEPROM, flash memory or other memory technologies, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cassettes, magnetic tapes, magnetic disk storage or other magnetic storage devices, or Any other medium used to store desired information and that can be accessed by a computer.
- communication media usually contain computer-readable instructions, data structures, program modules, or other data in a modulated data signal such as carrier waves or other transmission mechanisms, and may include any information delivery media. .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Multimedia (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Biodiversity & Conservation Biology (AREA)
- Image Analysis (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Description
Claims (14)
- 一种图像处理方法,包括:获取输入图像;通过编码器网络对所述输入图像进行下采样和特征提取,得到多个特征图;通过解码器网络对所述多个特征图进行上采样和特征提取,得到目标分割图像;其中,所述编码器网络和解码器网络分别包括多个处理层级,所述编码器网络中第L个处理层级处理得到的多个特征图和解码器网络中第J个处理层级处理得到的多个特征图融合后,输入到所述解码器网络中第J+1个处理层级,其中,所述编码器网络中第L个处理层级处理得到的多个特征图和解码器网络中第J个处理层级处理得到的多个特征图具有相同的分辨率;L、J均为正整数;其中,所述编码器网络的多个处理层级中至少一个处理层级包括密集计算块,所述解码器网络的多个处理层级中至少一个处理层级包括密集计算块;所述编码器网络和解码器网络中的第M个密集计算块包括N个卷积模块,所述N个卷积模块中的第i个卷积模块的输入包括所述第i个卷积模块之前的i-1个卷积模块的输出;所述N个卷积模块中的至少一个卷积模块包括至少一组非对称卷积核;i、N、M均为整数,M大于或等于1且小于或等于所述编码器网络和解码器网络中的密集计算块的总数量,N大于或等于3,i大于或等于3且小于或等于N。
- 根据权利要求1所述的图像处理方法,所述N个卷积模块中的所述至少一个卷积模块还包括:由所述至少一组非对称卷积核膨胀得到的非对称卷积核。
- 根据权利要求1所述的图像处理方法,所述编码器网络中第L个处理层级处理得到的多个特征图和解码器网络中第J个处理层级处理得到的多个特征图融合后,输入到所述解码器网络中第J+1个处理层级,包括:所述编码器网络中第L个处理层级处理得到的多个特征图和解码器网络 中第J个处理层级处理得到的多个特征图在通道维度上拼接后,输入到所述解码器网络中第J+1个处理层级。
- 根据权利要求1所述的图像处理方法,其中,所述通过编码器网络对所述输入图像进行下采样和特征提取,得到多个特征图,包括:对所述输入图像进行下采样,得到具有第一分辨率的多个第一下采样特征图;对所述多个第一下采样特征图进行下采样,得到具有第二分辨率的多个第二下采样特征图;通过第一密集计算块对所述多个第二下采样特征图进行多层次的特征提取,得到具有第二分辨率的多个第一密集计算特征图;对所述多个第一密集计算特征图进行下采样,得到具有第三分辨率的多个第三下采样特征图;通过第二密集计算块对所述多个第三下采样特征图进行多层次的特征提取,得到具有第三分辨率的多个第二密集计算特征图;所述多个特征图包括所述多个第二密集计算特征图。
- 根据权利要求4所述的图像处理方法,其中,所述通过解码器网络对所述多个特征图进行上采样和特征提取,得到目标分割图像,包括:对所述多个第二密集计算特征图进行上采样,得到具有第二分辨率的多个第一上采样特征图;对所述多个第一上采样特征图和所述多个第二下采样特征图在通道维度上进行拼接,得到第一融合特征图组;对所述第一融合特征图组进行特征提取,得到具有第二分辨率的多个第一中间特征图;通过第三密集计算块对所述多个第一中间特征图进行多层次的特征提取,得到具有第二分辨率的多个第三密集计算特征图;对所述多个第三密集计算特征图和所述多个第一密集计算特征图在通道维度上进行拼接,得到第二融合特征图组;对所述第二融合特征图组进行特征提取,得到具有第二分辨率的多个第二中间特征图;对所述多个第二中间特征图进行上采样,得到具有第一分辨率的多个第二上采样特征图;对所述多个第二上采样特征图和所述多个第一下采样特征图在通道维度上进行拼接,得到第三融合特征图组;对所述第三融合特征图组进行特征提取,得到具有第二分辨率的多个第三中间特征图;对所述多个第三中间特征图进行上采样,得到与所述输入图像具有相同分辨率的目标分割图像。
- 根据权利要求5所述的图像处理方法,其中,所述第一密集计算块包括五个卷积模块,第二密集计算块包括八个卷积模块,所述第三密集计算块包括五个卷积模块;其中,所述第一密集计算块、第二密集计算块和第三密集计算块中的每个卷积模块包括1×1卷积核和两组非对称卷积核,第一组非对称卷积核为3×1卷积核和1×3卷积核,第二组非对称卷积核根据第一组非对称卷积核和对应的膨胀系数得到。
- 根据权利要求1所述的图像处理方法,其中,所述图像处理方法还包括:获取训练图像;使用卷积神经网络对训练图像进行处理,得到所述训练图像的训练分割图像;其中,所述卷积神经网络包括所述编码器网络和所述解码器网络;根据所述训练图像的训练分割图像和所述训练图像对应的标准分割图像,利用损失函数计算所述卷积神经网络的损失值;根据所述损失值优化所述卷积神经网络的参数。
- 根据权利要求7所述的图像处理方法,其中,所述损失函数表示为:L=w 1L edge+w 2L alpha+w 3L foreground;其中,L edge为边缘损失函数、L alpha为抠图蒙版损失函数、L foreground为前景损失函数,w 1、w 2、w 3为权值。
- 根据权利要求1至10中任一项所述的图像处理方法,其中,所述输入图像为人物图像,所述目标分割图像为所述人物图像中目标人物的抠图蒙版。
- 一种图像处理装置,其中,包括:图像获取模块,配置为获取输入图像;图像处理模块,配置为通过编码器网络对所述输入图像进行下采样和特征提取,得到多个特征图;通过解码器网络对所述多个特征图进行上采样和特征提取,得到目标分割图像;其中,所述编码器网络和解码器网络分别包括多个处理层级,所述编码器网络中第L个处理层级处理得到的多个特征图和解码器网络中第J个处理层级处理得到的多个特征图融合后,输入到所述解码器网络中第J+1个处理层级,其中,所述编码器网络中第L个处理层级处理得到的多个特征图和解码器网络中第J个处理层级处理得到的多个特征图具有相同的分辨率;L、J均为正整数;其中,所述编码器网络的多个处理层级中至少一个处理层级包括密集计 算块,所述解码器网络的多个处理层级中至少一个处理层级包括密集计算块;所述编码器网络和解码器网络中的第M个密集计算块包括N个卷积模块,所述N个卷积模块中的第i个卷积模块包括所述第i个卷积模块之前的i-1个卷积模块的输出;所述N个卷积模块中的至少一个卷积模块包括至少一组非对称卷积核;i、N、M均为整数,M大于或等于1且小于或等于所述编码器网络和解码器网络中的密集计算块的总数量,N大于或等于3,i大于或等于3且小于或等于N。
- 一种图像处理设备,其中,包括:存储器和处理器,所述存储器配置为存储程序指令,所述处理器执行所述程序指令时实现如权利要求1至11中任一项所述的图像处理方法的步骤。
- 一种计算机可读存储介质,其中,存储有程序指令,当所述程序指令被执行时实现如权利要求1至11中任一项所述的图像处理方法。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/419,726 US20220319155A1 (en) | 2020-02-21 | 2020-12-29 | Image Processing Method, Image Processing Apparatus, and Device |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010110386.7 | 2020-02-21 | ||
CN202010110386.7A CN111311629B (zh) | 2020-02-21 | 2020-02-21 | 图像处理方法、图像处理装置及设备 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021164429A1 true WO2021164429A1 (zh) | 2021-08-26 |
Family
ID=71160080
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/140781 WO2021164429A1 (zh) | 2020-02-21 | 2020-12-29 | 图像处理方法、图像处理装置及设备 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20220319155A1 (zh) |
CN (1) | CN111311629B (zh) |
WO (1) | WO2021164429A1 (zh) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113705548A (zh) * | 2021-10-29 | 2021-11-26 | 北京世纪好未来教育科技有限公司 | 题目类型识别方法和装置 |
CN113887470A (zh) * | 2021-10-15 | 2022-01-04 | 浙江大学 | 基于多任务注意力机制的高分辨率遥感图像地物提取方法 |
CN114022496A (zh) * | 2021-09-26 | 2022-02-08 | 天翼爱音乐文化科技有限公司 | 图像处理方法、系统、装置及存储介质 |
CN114596580A (zh) * | 2022-02-14 | 2022-06-07 | 南方科技大学 | 一种多人体目标识别方法、系统、设备及介质 |
CN114612683A (zh) * | 2022-03-21 | 2022-06-10 | 安徽理工大学 | 基于密集多尺度推理网络的显著性目标检测算法 |
CN114677392A (zh) * | 2022-05-27 | 2022-06-28 | 珠海视熙科技有限公司 | 抠图方法、摄像设备、装置、会议系统、电子设备及介质 |
CN114723760A (zh) * | 2022-05-19 | 2022-07-08 | 北京世纪好未来教育科技有限公司 | 人像分割模型的训练方法、装置及人像分割方法、装置 |
CN114782708A (zh) * | 2022-05-12 | 2022-07-22 | 北京百度网讯科技有限公司 | 图像生成方法、图像生成模型的训练方法、装置和设备 |
CN114897700A (zh) * | 2022-05-25 | 2022-08-12 | 马上消费金融股份有限公司 | 图像优化方法、校正模型的训练方法及装置 |
CN116228763A (zh) * | 2023-05-08 | 2023-06-06 | 成都睿瞳科技有限责任公司 | 用于眼镜打印的图像处理方法及系统 |
CN116342675A (zh) * | 2023-05-29 | 2023-06-27 | 南昌航空大学 | 一种实时单目深度估计方法、系统、电子设备及存储介质 |
WO2023159746A1 (zh) * | 2022-02-23 | 2023-08-31 | 平安科技(深圳)有限公司 | 基于图像分割的图像抠图方法、装置、计算机设备及介质 |
CN117474925A (zh) * | 2023-12-28 | 2024-01-30 | 山东润通齿轮集团有限公司 | 一种基于机器视觉的齿轮点蚀检测方法及系统 |
Families Citing this family (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111311629B (zh) * | 2020-02-21 | 2023-12-01 | 京东方科技集团股份有限公司 | 图像处理方法、图像处理装置及设备 |
JP7446903B2 (ja) * | 2020-04-23 | 2024-03-11 | 株式会社日立製作所 | 画像処理装置、画像処理方法及び画像処理システム |
CN111914997B (zh) * | 2020-06-30 | 2024-04-02 | 华为技术有限公司 | 训练神经网络的方法、图像处理方法及装置 |
CN112053363B (zh) * | 2020-08-19 | 2023-12-15 | 苏州超云生命智能产业研究院有限公司 | 视网膜血管分割方法、装置及模型构建方法 |
CN111968122B (zh) * | 2020-08-27 | 2023-07-28 | 广东工业大学 | 一种基于卷积神经网络的纺织材料ct图像分割方法和装置 |
CN112056993B (zh) * | 2020-09-07 | 2022-05-17 | 上海高仙自动化科技发展有限公司 | 一种清洁方法、装置、电子设备及计算机可读存储介质 |
CN112215243A (zh) * | 2020-10-30 | 2021-01-12 | 百度(中国)有限公司 | 图像特征提取方法、装置、设备及存储介质 |
CN112560864B (zh) * | 2020-12-22 | 2024-06-18 | 苏州超云生命智能产业研究院有限公司 | 图像语义分割方法、装置及图像语义分割模型的训练方法 |
CN112866694B (zh) * | 2020-12-31 | 2023-07-14 | 杭州电子科技大学 | 联合非对称卷积块和条件上下文的智能图像压缩优化方法 |
CN112767502B (zh) * | 2021-01-08 | 2023-04-07 | 广东中科天机医疗装备有限公司 | 基于医学影像模型的影像处理方法及装置 |
CN112784897B (zh) * | 2021-01-20 | 2024-03-26 | 北京百度网讯科技有限公司 | 图像处理方法、装置、设备和存储介质 |
CN112949651A (zh) * | 2021-01-29 | 2021-06-11 | Oppo广东移动通信有限公司 | 特征提取方法、装置、存储介质及电子设备 |
CN112561802B (zh) * | 2021-02-20 | 2021-05-25 | 杭州太美星程医药科技有限公司 | 连续序列图像的插值方法、插值模型训练方法及其系统 |
WO2022205685A1 (zh) * | 2021-03-29 | 2022-10-06 | 泉州装备制造研究所 | 一种基于轻量化网络的交通标志识别方法 |
CN115221102B (zh) * | 2021-04-16 | 2024-01-19 | 中科寒武纪科技股份有限公司 | 用于优化片上系统的卷积运算操作的方法和相关产品 |
CN113034648A (zh) * | 2021-04-30 | 2021-06-25 | 北京字节跳动网络技术有限公司 | 图像处理方法、装置、设备和存储介质 |
CN113723480B (zh) * | 2021-08-18 | 2024-03-05 | 北京达佳互联信息技术有限公司 | 一种图像处理方法、装置、电子设备和存储介质 |
CN113920358A (zh) * | 2021-09-18 | 2022-01-11 | 广州番禺职业技术学院 | 一种小麦病害的自动分类方法、装置及系统 |
CN114040140B (zh) * | 2021-11-15 | 2024-04-12 | 北京医百科技有限公司 | 一种视频抠图方法、装置、系统及存储介质 |
CN116188479B (zh) * | 2023-02-21 | 2024-04-02 | 北京长木谷医疗科技股份有限公司 | 基于深度学习的髋关节图像分割方法及系统 |
CN115953394B (zh) * | 2023-03-10 | 2023-06-23 | 中国石油大学(华东) | 基于目标分割的海洋中尺度涡检测方法及系统 |
CN116363364B (zh) * | 2023-03-27 | 2023-09-26 | 南通大学 | 一种基于改进DSD-LinkNet的电力安全带分割方法 |
CN116206114B (zh) * | 2023-04-28 | 2023-08-01 | 成都云栈科技有限公司 | 一种复杂背景下人像提取方法及装置 |
CN116912257B (zh) * | 2023-09-14 | 2023-12-29 | 东莞理工学院 | 基于深度学习的混凝土路面裂缝识别方法及存储介质 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109448006A (zh) * | 2018-11-01 | 2019-03-08 | 江西理工大学 | 一种注意力机制u型密集连接视网膜血管分割方法 |
CN109727253A (zh) * | 2018-11-14 | 2019-05-07 | 西安大数据与人工智能研究院 | 基于深度卷积神经网络自动分割肺结节的辅助检测方法 |
US20190221011A1 (en) * | 2018-01-12 | 2019-07-18 | Korea Advanced Institute Of Science And Technology | Method for processing x-ray computed tomography image using neural network and apparatus therefor |
CN110197516A (zh) * | 2019-05-29 | 2019-09-03 | 浙江明峰智能医疗科技有限公司 | 一种基于深度学习的tof-pet散射校正方法 |
CN110689544A (zh) * | 2019-09-06 | 2020-01-14 | 哈尔滨工程大学 | 一种遥感图像细弱目标分割方法 |
CN111311629A (zh) * | 2020-02-21 | 2020-06-19 | 京东方科技集团股份有限公司 | 图像处理方法、图像处理装置及设备 |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110188760B (zh) * | 2019-04-01 | 2021-10-22 | 上海卫莎网络科技有限公司 | 一种图像处理模型训练方法、图像处理方法及电子设备 |
CN111369581B (zh) * | 2020-02-18 | 2023-08-08 | Oppo广东移动通信有限公司 | 图像处理方法、装置、设备及存储介质 |
-
2020
- 2020-02-21 CN CN202010110386.7A patent/CN111311629B/zh active Active
- 2020-12-29 US US17/419,726 patent/US20220319155A1/en active Pending
- 2020-12-29 WO PCT/CN2020/140781 patent/WO2021164429A1/zh active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190221011A1 (en) * | 2018-01-12 | 2019-07-18 | Korea Advanced Institute Of Science And Technology | Method for processing x-ray computed tomography image using neural network and apparatus therefor |
CN109448006A (zh) * | 2018-11-01 | 2019-03-08 | 江西理工大学 | 一种注意力机制u型密集连接视网膜血管分割方法 |
CN109727253A (zh) * | 2018-11-14 | 2019-05-07 | 西安大数据与人工智能研究院 | 基于深度卷积神经网络自动分割肺结节的辅助检测方法 |
CN110197516A (zh) * | 2019-05-29 | 2019-09-03 | 浙江明峰智能医疗科技有限公司 | 一种基于深度学习的tof-pet散射校正方法 |
CN110689544A (zh) * | 2019-09-06 | 2020-01-14 | 哈尔滨工程大学 | 一种遥感图像细弱目标分割方法 |
CN111311629A (zh) * | 2020-02-21 | 2020-06-19 | 京东方科技集团股份有限公司 | 图像处理方法、图像处理装置及设备 |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114022496A (zh) * | 2021-09-26 | 2022-02-08 | 天翼爱音乐文化科技有限公司 | 图像处理方法、系统、装置及存储介质 |
CN113887470A (zh) * | 2021-10-15 | 2022-01-04 | 浙江大学 | 基于多任务注意力机制的高分辨率遥感图像地物提取方法 |
CN113705548A (zh) * | 2021-10-29 | 2021-11-26 | 北京世纪好未来教育科技有限公司 | 题目类型识别方法和装置 |
CN113705548B (zh) * | 2021-10-29 | 2022-02-08 | 北京世纪好未来教育科技有限公司 | 题目类型识别方法和装置 |
CN114596580A (zh) * | 2022-02-14 | 2022-06-07 | 南方科技大学 | 一种多人体目标识别方法、系统、设备及介质 |
CN114596580B (zh) * | 2022-02-14 | 2024-05-14 | 南方科技大学 | 一种多人体目标识别方法、系统、设备及介质 |
WO2023159746A1 (zh) * | 2022-02-23 | 2023-08-31 | 平安科技(深圳)有限公司 | 基于图像分割的图像抠图方法、装置、计算机设备及介质 |
CN114612683A (zh) * | 2022-03-21 | 2022-06-10 | 安徽理工大学 | 基于密集多尺度推理网络的显著性目标检测算法 |
CN114782708A (zh) * | 2022-05-12 | 2022-07-22 | 北京百度网讯科技有限公司 | 图像生成方法、图像生成模型的训练方法、装置和设备 |
CN114782708B (zh) * | 2022-05-12 | 2024-04-16 | 北京百度网讯科技有限公司 | 图像生成方法、图像生成模型的训练方法、装置和设备 |
CN114723760A (zh) * | 2022-05-19 | 2022-07-08 | 北京世纪好未来教育科技有限公司 | 人像分割模型的训练方法、装置及人像分割方法、装置 |
CN114897700A (zh) * | 2022-05-25 | 2022-08-12 | 马上消费金融股份有限公司 | 图像优化方法、校正模型的训练方法及装置 |
CN114677392A (zh) * | 2022-05-27 | 2022-06-28 | 珠海视熙科技有限公司 | 抠图方法、摄像设备、装置、会议系统、电子设备及介质 |
CN116228763A (zh) * | 2023-05-08 | 2023-06-06 | 成都睿瞳科技有限责任公司 | 用于眼镜打印的图像处理方法及系统 |
CN116228763B (zh) * | 2023-05-08 | 2023-07-21 | 成都睿瞳科技有限责任公司 | 用于眼镜打印的图像处理方法及系统 |
CN116342675A (zh) * | 2023-05-29 | 2023-06-27 | 南昌航空大学 | 一种实时单目深度估计方法、系统、电子设备及存储介质 |
CN116342675B (zh) * | 2023-05-29 | 2023-08-11 | 南昌航空大学 | 一种实时单目深度估计方法、系统、电子设备及存储介质 |
CN117474925A (zh) * | 2023-12-28 | 2024-01-30 | 山东润通齿轮集团有限公司 | 一种基于机器视觉的齿轮点蚀检测方法及系统 |
CN117474925B (zh) * | 2023-12-28 | 2024-03-15 | 山东润通齿轮集团有限公司 | 一种基于机器视觉的齿轮点蚀检测方法及系统 |
Also Published As
Publication number | Publication date |
---|---|
US20220319155A1 (en) | 2022-10-06 |
CN111311629B (zh) | 2023-12-01 |
CN111311629A (zh) | 2020-06-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021164429A1 (zh) | 图像处理方法、图像处理装置及设备 | |
WO2020177651A1 (zh) | 图像分割方法和图像处理装置 | |
WO2021073493A1 (zh) | 图像处理方法及装置、神经网络的训练方法、合并神经网络模型的图像处理方法、合并神经网络模型的构建方法、神经网络处理器及存储介质 | |
WO2021043273A1 (zh) | 图像增强方法和装置 | |
EP3923233A1 (en) | Image denoising method and apparatus | |
CN112308200B (zh) | 神经网络的搜索方法及装置 | |
WO2021164731A1 (zh) | 图像增强方法以及图像增强装置 | |
WO2021164234A1 (zh) | 图像处理方法以及图像处理装置 | |
US20190244362A1 (en) | Differentiable Jaccard Loss Approximation for Training an Artificial Neural Network | |
WO2022134971A1 (zh) | 一种降噪模型的训练方法及相关装置 | |
CN111914997B (zh) | 训练神经网络的方法、图像处理方法及装置 | |
CN112446380A (zh) | 图像处理方法和装置 | |
CN113095470B (zh) | 神经网络的训练方法、图像处理方法及装置、存储介质 | |
CN112446835B (zh) | 图像恢复方法、图像恢复网络训练方法、装置和存储介质 | |
US20220157046A1 (en) | Image Classification Method And Apparatus | |
WO2022100490A1 (en) | Methods and systems for deblurring blurry images | |
CN113450290A (zh) | 基于图像修补技术的低照度图像增强方法及系统 | |
CN113096023B (zh) | 神经网络的训练方法、图像处理方法及装置、存储介质 | |
CN113284055A (zh) | 一种图像处理的方法以及装置 | |
CN114049314A (zh) | 一种基于特征重排和门控轴向注意力的医学图像分割方法 | |
CN113076966B (zh) | 图像处理方法及装置、神经网络的训练方法、存储介质 | |
CN111724309B (zh) | 图像处理方法及装置、神经网络的训练方法、存储介质 | |
Shi et al. | LCA-Net: A Context-Aware Light-Weight Network For Low-Illumination Image Enhancement | |
CN114862685A (zh) | 一种图像降噪方法、及图像降噪模组 | |
Xu et al. | Underwater image enhancement method based on a cross attention mechanism |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20919527 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20919527 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20919527 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 30.03.2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20919527 Country of ref document: EP Kind code of ref document: A1 |