WO2024153156A1 - 一种图像处理方法、装置、设备和介质 - Google Patents
一种图像处理方法、装置、设备和介质 Download PDFInfo
- Publication number
- WO2024153156A1 WO2024153156A1 PCT/CN2024/072880 CN2024072880W WO2024153156A1 WO 2024153156 A1 WO2024153156 A1 WO 2024153156A1 CN 2024072880 W CN2024072880 W CN 2024072880W WO 2024153156 A1 WO2024153156 A1 WO 2024153156A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- attention
- layer
- feature map
- input
- processing
- Prior art date
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 15
- 238000012545 processing Methods 0.000 claims abstract description 213
- 230000008447 perception Effects 0.000 claims abstract description 42
- 230000004913 activation Effects 0.000 claims description 69
- 230000006870 function Effects 0.000 claims description 44
- 238000000034 method Methods 0.000 claims description 39
- 239000011159 matrix material Substances 0.000 claims description 32
- 238000005070 sampling Methods 0.000 claims description 25
- 230000004927 fusion Effects 0.000 claims description 19
- 238000004891 communication Methods 0.000 claims description 16
- 238000004590 computer program Methods 0.000 claims description 14
- 238000007499 fusion processing Methods 0.000 claims description 8
- 238000000605 extraction Methods 0.000 abstract description 10
- 238000001931 thermography Methods 0.000 abstract description 8
- 230000008569 process Effects 0.000 description 23
- 238000010586 diagram Methods 0.000 description 18
- 238000012549 training Methods 0.000 description 12
- 238000013527 convolutional neural network Methods 0.000 description 6
- 238000013528 artificial neural network Methods 0.000 description 5
- 230000007246 mechanism Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000003062 neural network model Methods 0.000 description 3
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 2
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 101000827703 Homo sapiens Polyphosphoinositide phosphatase Proteins 0.000 description 1
- 102100023591 Polyphosphoinositide phosphatase Human genes 0.000 description 1
- 101100012902 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) FIG2 gene Proteins 0.000 description 1
- 101100233916 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) KAR5 gene Proteins 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000001149 cognitive effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000003313 weakening effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4053—Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
- G06T3/4076—Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution using the original low-resolution images to iteratively correct the high-resolution images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/467—Encoded features or binary features, e.g. local binary patterns [LBP]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
Definitions
- the present invention relates to the field of image processing technology, and in particular to an image processing method, device, equipment and medium.
- thermal imaging images usually have low resolution and lack of detail.
- Super-resolution reconstruction can improve the resolution and quality of the image and alleviate the problems of low resolution and lack of detail.
- the present invention provides an image processing method, device, equipment and medium, which are used to solve the problem of black and white edge phenomenon after super-resolution reconstruction caused by inaccurate feature extraction of fuzzy detail areas in thermal imaging images in the prior art.
- the present invention provides an image processing method, the method comprising:
- the shallow feature map is sequentially passed through each attention residual layer for perception processing to obtain a deep feature map output by the last attention residual layer; based on the target residual addition processing layer of the hybrid attention super-resolution network model, the pixel values of the pixel points in the corresponding rows and columns of the shallow feature map and the deep feature map are added to obtain a processed target feature map; based on the upsampling layer in the hybrid attention super-resolution network model, the target feature map is input into the upsampling layer to obtain an output high-resolution image with enhanced details.
- the set number of series-connected attention residual layers in the hybrid attention super-resolution network model is used to sequentially pass the shallow feature map through each attention residual layer for perception processing, and the deep feature map output by the last attention residual layer is obtained, including:
- the attention residual layer For each attention residual layer in the hybrid attention super-resolution network model, if the attention residual layer is the first attention residual layer, the shallow feature map is input into the attention residual layer as the first input feature map; if the attention residual layer is not the first attention residual layer, the output feature map of the previous attention residual layer is input into the attention residual layer as the first input feature map.
- the first input feature map of the attention residual layer performs perception processing on the first input feature map to obtain a target attention feature map output by the attention layer after the perception processing; based on the residual addition processing layer of the attention residual layer, the pixel values of the target attention feature map and the pixel points of the corresponding rows and columns of the first input feature map are added to obtain an output feature map of the attention residual layer, until a deep feature map output by the last attention residual layer is obtained.
- the attention layer based on the attention residual layer performs perception processing on the first input feature map
- the target attention feature map output by the attention layer after the perception processing includes:
- the processing unit of the attention layer based on the attention residual layer according to the pixel value of each pixel point in the first input feature map and the local binary pattern LBP sampling function pre-stored in the attention layer, inputs the pixel value of each pixel point into the LBP sampling function to obtain the output LBP eigenvalue matrix and saves it in the attention layer;
- the first input feature map is input into the first perceptron unit of the attention layer for perceptual processing to obtain a brightness-based attention feature map
- the LBP eigenvalue matrix is perceptually processed through the second perceptron unit of the attention layer to obtain a gradient-based attention feature map
- the two attention feature maps are input into the fusion layer unit of the attention layer for fusion processing to obtain a fused attention feature map
- the fused attention feature map is dot-multiplied with the first input feature map to obtain an output target attention feature map.
- each attention residual layer further includes a first convolution layer, a first activation layer, a second convolution layer, and a second activation layer.
- the method further includes:
- the first input feature map is input into the first convolution layer of the attention residual layer for convolution processing, activated through the first activation layer, convolution through the second convolution layer, activated through the second activation layer to obtain the second input feature map, and the second input feature map is input into the attention layer of the attention residual layer for subsequent processing.
- the present invention provides an image processing device, the device comprising:
- An acquisition module used for acquiring a target image to be processed
- a processing module is used to perform convolution processing on the input target image based on the convolution layer of the pre-trained hybrid attention super-resolution network model to obtain a shallow feature map; based on a set number of series-connected attention residual layers in the hybrid attention super-resolution network model, the shallow feature map is sequentially passed through each attention residual layer for perception processing to obtain a deep feature map output by the last attention residual layer; based on the target residual addition processing layer of the hybrid attention super-resolution network model, the pixel values of the corresponding rows and columns of the shallow feature map and the deep feature map are added to obtain a processed target feature map; based on the upsampling layer in the hybrid attention super-resolution network model, the target feature map is input into the upsampling layer to obtain an output high-resolution image with enhanced details.
- the processing module is specifically used to, for each attention residual layer in the hybrid attention super-resolution network model, if the attention residual layer is the first attention residual layer, input the shallow feature map into the attention residual layer as the first input feature map; if the attention residual layer is not the first attention residual layer, then use the output feature map of the previous attention residual layer as the first input feature map of the attention residual layer; the attention layer based on the attention residual layer performs perception processing on the first input feature map to obtain the target feature map output by the attention layer after the perception processing.
- the target attention feature map is marked, and the residual addition processing layer based on the attention residual layer adds the pixel values of the pixel points in the corresponding rows and columns of the target attention feature map and the first input feature map to obtain the output feature map of the attention residual layer, until the deep feature map output by the last attention residual layer is obtained.
- the processing module is specifically used for the processing unit of the attention layer based on the attention residual layer, according to the pixel value of each pixel in the first input feature map and the local binary pattern LBP sampling function pre-saved in the attention layer, the pixel value of each pixel is input into the LBP sampling function to obtain the output LBP eigenvalue matrix and save it in the attention layer;
- the first input feature map is input into the first perceptron unit of the attention layer for perceptual processing to obtain a brightness-based attention feature map
- the LBP eigenvalue matrix is perceptually processed by the second perceptron unit of the attention layer to obtain a gradient-based attention feature map
- the two attention feature maps are input into the fusion layer unit of the attention layer for fusion processing to obtain a fused attention feature map
- the dot product processing unit based on the attention layer performs dot product processing on the fused attention feature map and the first input feature map to obtain the output target attention feature map.
- each attention residual layer also includes a first convolution layer, a first activation layer, a second convolution layer and a second activation layer.
- the processing module is specifically used to input the first input feature map into the first convolution layer of the attention residual layer for convolution processing, perform activation processing through the first activation layer, perform convolution processing through the second convolution layer, perform activation processing through the second activation layer to obtain a second input feature map, and input the second input feature map into the attention layer of the attention residual layer for subsequent processing.
- the present invention provides an electronic device, comprising: a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory communicate with each other via the communication bus;
- a computer program is stored in the memory, and when the program is executed by the processor, the processor implements the steps of any one of the above-mentioned image processing methods.
- the present invention provides a computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the steps of any one of the above-mentioned image processing methods.
- the present invention provides an image processing method, device, equipment and medium.
- a target image to be processed is obtained, and based on the convolution layer of a pre-trained hybrid attention super-resolution network model, a convolution process is performed on the input target image to obtain a shallow feature map.
- the shallow feature map is sequentially passed through each attention residual layer for perception processing. Since a set number of series-connected attention residual layers construct a deeper network, the model pays more attention to the detail area in the image. The deeper network can accurately extract the deep feature map of the detail feature.
- the pixel values of the pixel points of the corresponding rows and columns of the deep feature map and the shallow feature map are added and input into the upsampling layer to obtain an output high-resolution image after detail enhancement, thereby solving the problem of inaccurate feature extraction of fuzzy detail areas in thermal imaging images.
- FIG1 is a schematic diagram of a process of an image processing method provided by an embodiment of the present invention.
- FIG2 is a schematic diagram of an image processing process provided by an embodiment of the present invention.
- FIG3 is a schematic diagram of a process of perceiving and processing a first input feature map by an attention layer of a hybrid attention neural network model provided by an embodiment of the present invention
- FIG4 is a schematic diagram of a process of extracting an output feature map of an attention residual layer provided by an embodiment of the present invention.
- FIG5 is a schematic diagram of the structure of an image processing device provided by an embodiment of the present invention.
- FIG. 6 is a schematic diagram of the structure of an electronic device provided by an embodiment of the present invention.
- embodiments of the present invention provide an image processing method, device, equipment and medium.
- Embodiment 1 is a diagrammatic representation of Embodiment 1:
- FIG1 is a schematic diagram of a process of an image processing method provided by an embodiment of the present invention, the process comprising the following steps:
- S101 Obtain a target image to be processed, and based on the convolution layer of a pre-trained hybrid attention super-resolution network model, perform convolution processing on the input target image to obtain a shallow feature map.
- an image processing method provided in an embodiment of the present invention is applied to an electronic device, which can be a host, a tablet computer, a smart terminal device such as a smart phone, or a server, wherein the server can be a local server or a cloud server, and the embodiment of the present invention does not limit this.
- the electronic device obtains a target image to be processed, where the target image refers to an image to be processed.
- the target image may be a thermal image, or a low-resolution image with blurred detail areas, such as an infrared image, a visible light image, etc.
- the electronic device may obtain the target image to be processed in a variety of ways. For example, the electronic device may specifically receive a target image sent by an electronic device (such as a thermal imager) connected to the electronic device, or may obtain a target image stored by the electronic device itself.
- the hybrid attention super-resolution network model is used to reconstruct the image for super-resolution based on the attention mechanism.
- the attention mechanism is a technology that imitates cognitive attention in artificial neural networks.
- the attention mechanism can enhance the weights of certain parts of the neural network input data while weakening the weights of other parts, so as to focus the network's attention on the most important small part of the data.
- the attention mechanism can be implemented by adding an attention function to the model structure or introducing other structures that implement the attention mechanism.
- the input of the hybrid attention super-resolution network model can include the target image
- the output of the hybrid attention super-resolution network model can include a high-resolution image with enhanced details.
- S102 Based on a set number of series-connected attention residual layers in the hybrid attention super-resolution network model, The shallow feature map is sequentially passed through each attention residual layer for perception processing to obtain a deep feature map output by the last attention residual layer; based on the target residual addition processing layer of the hybrid attention super-resolution network model, the pixel values of the pixel points in the corresponding rows and columns of the shallow feature map and the deep feature map are added to obtain a processed target feature map; based on the upsampling layer in the hybrid attention super-resolution network model, the target feature map is input into the upsampling layer to obtain an output high-resolution image with enhanced details.
- the structure of the hybrid attention super-resolution network model is as follows: the hybrid attention super-resolution network model includes a convolutional layer, an attention residual layer, a target residual addition processing layer and an upsampling layer, the output of the convolutional layer is used as the input of the attention residual layer, the output of the convolutional layer and the output of the attention residual layer are used as the input of the target residual addition processing layer, the output of the target residual addition processing layer is used as the input of the upsampling layer, and the output of the upsampling layer is used as the final output of the hybrid attention super-resolution network model.
- the convolution layer is used to extract shallow features from the target image and obtain a shallow feature map.
- the input of the convolution layer includes the target image, and the output of the convolution layer includes the shallow feature map.
- the model type of the convolution layer can be a convolutional neural network (CNN), etc.
- the attention residual layer is used to extract deep features of the target image.
- the input of the attention residual layer includes the first input feature map, which can be a shallow feature map output by the convolution layer or an output feature map output by the previous attention residual layer.
- the output of the attention residual layer includes an output feature map, and the output feature map output by the last attention residual layer can be called a deep feature map.
- the model type of the attention residual layer can be a residual attention network (RAN), etc.
- the shallow feature map is sequentially passed through each attention residual layer for perception processing to obtain a deep feature map output by the last attention residual layer, including: for each attention residual layer in the hybrid attention super-resolution network model, if the attention residual layer is the first attention residual layer, the shallow feature map output by the convolution layer is used as the first input feature map of the attention residual layer, and is input into the attention residual layer; if the attention residual layer is not the first attention residual layer, the output feature map output by the previous attention residual layer is used as the first input feature map of the attention residual layer, and is input into the attention residual layer.
- the target residual addition processing layer is used to process the pixel values of the pixel points of the corresponding rows and columns of the at least two input images to obtain the target feature map.
- the target residual addition processing layer adds the pixel values of the pixel points of the corresponding rows and columns of the at least two input images.
- the input of the target residual addition processing layer includes a shallow feature map and a deep feature map
- the output of the target residual addition processing layer includes a target feature map
- the model type of the target residual addition processing layer can be a residual neural network Residual Neural Network (ResNet) and the like.
- ResNet residual neural network Residual Neural Network
- the upsampling layer is used to increase the resolution of the image.
- the input of the upsampling layer includes the target feature map, and the output of the upsampling layer includes a high-resolution image with enhanced details.
- the model type of the upsampling layer can be Fully Convolutional Networks (FCN), Convolutional Networks for Biomedical Image Segmentation (U-Net), etc.
- the hybrid attention super-resolution network model can be trained based on a large number of training samples with labels. Specifically, the training samples with labels are input into the hybrid attention super-resolution network model, and the parameters of the hybrid attention super-resolution network model are updated through training.
- the training sample may be a sample target image.
- the identifier may be a high-resolution image corresponding to the sample target image with actual details enhanced.
- the identifier may be obtained by The method may be obtained manually through interpolation algorithms, image reconstruction and other image super-resolution technologies; the interpolation algorithm may be neighbor interpolation, bilinear interpolation and bicubic interpolation, etc.; the image reconstruction may be wavelet transform, etc.
- the convolution layer, the attention residual layer, the target residual addition processing layer and the upsampling layer may be jointly trained and obtained.
- the initial convolution layer, initial attention residual layer, initial target residual addition processing layer and initial upsampling layer can be trained based on a large number of training samples with training labels; specifically, the sample target image is input into the initial convolution layer to obtain a sample shallow feature map; the sample shallow feature map is input into the initial attention residual layer to obtain a sample deep feature map; the sample shallow feature map and the sample deep feature map are input into the initial target residual addition processing layer to obtain a sample target feature map; the sample target feature map is input into the initial upsampling layer to obtain a high-resolution image after detail enhancement corresponding to the sample target image, a loss function is constructed based on the sample target image and its corresponding high-resolution image after detail enhancement, and the parameters of the initial convolution layer, the initial attention residual layer, the initial target residual addition processing layer and the initial upsampling layer are simultaneously updated based on the loss function until the training meets the preset conditions, and the trained convolution layer, the attention residual layer, the target residual addition processing layer and the upsampling layer are obtained.
- the preset conditions can be that the loss function is less than a threshold, converges, or the training cycle reaches a threshold.
- training can be performed by various methods based on training samples. For example, training can be performed based on a gradient descent method.
- the hybrid attention super-resolution network model includes a set number of series-connected attention residual layers after the convolutional layer, where the set number can be 60, 64, 62, 65, etc. Preferably, the set number is 64.
- the electronic device inputs the shallow feature map output by the convolution layer into the first attention residual layer, performs perception processing through each attention residual layer in sequence, uses the output feature map of the previous attention residual layer as the input feature map of the next attention residual layer, and obtains the deep feature map output by the last attention residual layer.
- FIG. 2 is a schematic diagram of an image processing process provided by an embodiment of the present invention.
- the target image a low-resolution image
- the convolution layer of the hybrid attention neural network model the shallow feature map output by the convolution layer is input into the first attention residual layer.
- n perceptual processing by n (n can be 60, 64, 62, 65, etc., preferably, n can be 64) attention residual layers
- the deep feature map output by the last attention residual layer is obtained
- the deep feature map and the shallow feature map are input into the target residual addition processing layer to obtain the target feature map output by the target residual addition processing layer and input into the upsampling layer to obtain a high-resolution image output by the upsampling layer.
- the shallow feature map and the deep feature map are input into the target residual addition processing layer, and the pixel values of the pixels in each corresponding row and column in the shallow feature map and the deep feature map are added to obtain the target feature map after residual addition processing.
- the target feature map is input into the upsampling layer, and the target image is upsampled to obtain a high-resolution image with enhanced details output by the upsampling layer.
- a target image to be processed is obtained, and based on the convolution layer of the pre-trained hybrid attention super-resolution network model, a convolution process is performed on the input target image to obtain a shallow feature map.
- the shallow feature map is sequentially passed through each attention residual layer for perception processing. Since a set number of series-connected attention residual layers construct a deeper network, the model pays more attention to the detail area in the image, and the deeper network can accurately extract the deep feature map of the detail feature.
- the deep feature map and The pixel values of the corresponding rows and columns of the shallow feature map are added and input into the upsampling layer to obtain a high-resolution image with enhanced details, thereby solving the problem of inaccurate feature extraction in the blurred detail area of the thermal imaging image.
- Embodiment 2 is a diagrammatic representation of Embodiment 1:
- the hybrid attention super-resolution network model may include a set number of serially connected attention residual layers, where the serial connection means that the output of the previous attention residual layer serves as the input of the next attention residual layer.
- the set number (i.e., the aforementioned n value) can be 64
- the hybrid attention super-resolution network model includes 64 series-connected attention residual layers, recorded as attention residual layer 1, attention residual layer 2, ..., attention residual layer 64, wherein the shallow feature map output by the convolution layer is input to the attention residual layer 1 (i.e., the first attention residual layer), the output of the attention residual layer 1 is used as the input of the attention residual layer 2, ..., the input of the attention residual layer 63 is used as the output of the attention residual layer 64, and the output of the attention residual layer 64 is a deep feature map.
- the set number of series-connected attention residual layers in the hybrid attention super-resolution network model is used to sequentially pass the shallow feature map through each attention residual layer for perception processing, and the deep feature map output by the last attention residual layer is obtained, which includes:
- the attention residual layer For each attention residual layer in the hybrid attention super-resolution network model, if the attention residual layer is the first attention residual layer, the shallow feature map is input into the attention residual layer as the first input feature map; if the attention residual layer is not the first attention residual layer, the output feature map of the previous attention residual layer is used as the first input feature map of the attention residual layer; the attention layer based on the attention residual layer performs perception processing on the first input feature map to obtain the target attention feature map output by the attention layer after the perception processing; based on the residual addition processing layer of the attention residual layer, the pixel values of the pixel points of the corresponding rows and columns of the target attention feature map and the first input feature map are added to obtain the output feature map of the attention residual layer, until the deep feature map output by the last attention residual layer is obtained.
- the attention residual layer For each attention residual layer of the hybrid attention super-resolution network model, if the attention residual layer is the first attention residual layer, the shallow feature map output by the convolution layer is input into the attention residual layer as the first input feature map; if the attention residual layer is not the first attention residual layer, the output feature map of the previous attention residual layer of the attention residual layer is obtained, and the output feature map is used as the first input feature map of the attention residual layer.
- the attention residual layer includes an attention layer and a residual addition processing layer.
- the first input feature map is input into the attention layer of the attention residual layer.
- the first input feature map is sensed based on the attention layer to obtain a target attention feature map output after the attention layer performs sense processing.
- the target attention feature map and the first input feature map are input into the residual addition processing layer of the attention residual layer.
- the pixel values of the pixels at each position in the target attention feature map and the first input feature map are determined.
- the pixel values of the pixels in the corresponding rows and columns of the target attention feature map and the first input feature map are added to obtain an output feature map of the attention residual output. If the attention residual layer is not the last attention residual layer, the output feature map is used as the input feature map of the next attention residual layer. If the attention residual layer is the last attention residual layer, the output feature map is used as the deep feature map output by the last attention residual layer.
- the attention residual layer includes an attention layer and a residual addition processing layer.
- the attention layer is used to process the first input feature map to obtain a target attention feature map.
- the residual addition processing layer is used to add the pixel values of the corresponding rows and columns of the input first input feature map and the target attention feature map to obtain an output feature map.
- the model type of the attention layer can be a Transformer model, etc.
- the model type of the residual addition processing layer can be a residual neural network ResNet, etc.
- the attention residual layer can be divided into a first attention residual layer and a non-first attention residual layer.
- the first attention residual layer refers to the first attention residual layer that processes the shallow feature map
- the non-first attention residual layer refers to the remaining attention residual layers except the first attention residual layer.
- the input of the attention layer of the first attention residual layer includes the shallow feature map (also referred to as the first input feature map of the current attention residual layer), and the output of the attention layer of the first attention residual layer includes the target attention feature map of the first attention residual layer.
- the input of the residual addition processing layer of the first attention residual layer includes the shallow feature map and the target attention feature map of the first attention residual layer, and the output of the residual addition processing layer of the first attention residual layer includes the output feature map of the first attention residual layer.
- the input of the attention layer of the non-first attention residual layer includes the output feature map of the previous attention residual layer (also called the first input feature map of the current attention residual layer), and the output of the attention layer of the non-first attention residual layer includes the target attention feature map of the current attention residual layer.
- the input of the residual addition processing layer of the non-first attention residual layer includes the target attention feature map of the current attention residual layer and the output feature map of the previous attention residual layer (also called the first input feature map of the current attention residual layer), and the output of the residual addition processing layer of the non-first attention residual layer includes the output feature map of the current attention residual layer.
- the output feature map output by the last attention residual layer can be input into the target residual addition processing layer as a deep feature map, wherein the last attention residual layer refers to the attention residual layer whose output feature map is to be further processed by the target residual addition processing layer.
- the model by setting multiple series-connected attention residual layers in a hybrid attention super-resolution network model, it is possible to further extract deep features of the target image based on the shallow feature map extracted by the convolution layer and the results obtained by processing each attention residual layer, so that the model can pay more attention to the details in the target image, thereby accurately extracting detailed features, constructing a deeper network through the residual results, effectively extracting deep features, and ultimately improving the model's enhancement effect on the details of the target image.
- Embodiment 3 is a diagrammatic representation of Embodiment 3
- the attention layer based on the attention residual layer performs perceptual processing on the first input feature map
- the target attention feature map output by the attention layer after the perceptual processing is obtained includes:
- the processing unit of the attention layer based on the attention residual layer according to the pixel value of each pixel point in the first input feature map and the local binary pattern (LBP) sampling function pre-stored in the attention layer, inputs the pixel value of each pixel point into the LBP sampling function to obtain the output LBP feature value matrix and stores it in the attention layer;
- LBP local binary pattern
- the first input feature map is input into the first perceptron unit of the attention layer for perceptual processing to obtain a brightness-based attention feature map
- the LBP eigenvalue matrix is perceptually processed through the second perceptron unit of the attention layer to obtain a gradient-based attention feature map
- the two attention feature maps are input into the fusion layer unit of the attention layer for fusion processing to obtain a fused attention feature map
- the fused attention feature map is dot-multiplied with the first input feature map to obtain an output target attention feature map.
- the first input feature map is input into the attention residual layer.
- the first input feature map is first input into the processing unit of the attention layer, and the pixel value of each pixel in the first input feature map and the pre-saved LBP sampling function are input into the LBP sampling function to obtain the LBP eigenvalue corresponding to each pixel, and according to the LBP eigenvalue corresponding to each pixel and the row and column where each pixel is located in the first input feature map, the LBP eigenvalue corresponding to each pixel is used as the element value of the element point of the corresponding row and column in the LBP eigenvalue matrix, so as to obtain the LBP eigenvalue matrix and save it in the attention layer.
- the first input feature map is input into the first perceptron unit of the attention layer, and the first input feature map is perceptually processed to obtain a brightness-based attention feature map;
- the LBP eigenvalue matrix is input into the second perceptron unit of the attention layer, and the LBP eigenvalue matrix is perceptually processed to obtain a gradient-based attention feature map, wherein the first perceptron unit and the second perceptron unit are multi-layered and independent of each other.
- the brightness-based attention feature map and the gradient-based attention feature map are input into the fusion layer unit of the attention layer, and the brightness-based attention feature map and the gradient-based attention feature map are fused to obtain the fused attention feature map.
- the fused attention feature map and the first input feature map are input into the point multiplication processing unit of the attention layer, and the fused attention feature map and the first input feature map are point multiplied with each other, that is, according to the pixel values of the pixels at each position in the fused attention feature map and the first input feature map, the pixel values of the pixels in each row and column of the fused attention feature map are multiplied with the pixel values of the pixels in the corresponding rows and columns of the first input feature map to obtain the pixel values of the pixels in each row and column of the target attention feature map.
- the attention layer includes a processing unit, a first perceptron unit, a second perceptron unit, a fusion layer unit, and a dot product processing unit.
- the processing unit is used to extract feature values from the first input feature map to obtain an LBP eigenvalue matrix.
- the input of the processing unit includes the first input feature map, and the output of the processing unit includes the LBP eigenvalue matrix.
- the processing unit may include a pre-saved LBP sampling function, the LBP sampling function receives the pixel value of each pixel point in the first input feature map as input, and outputs the LBP eigenvalue corresponding to each pixel point in the first input feature map.
- the processing unit uses the LBP eigenvalue corresponding to each pixel point in the first input feature map as the element value of the element point of the corresponding row and column in the LBP eigenvalue matrix according to the LBP eigenvalue corresponding to each pixel point in the first input feature map and the row and column where each pixel point in the first input feature map is located in the first input feature map, thereby obtaining the LBP eigenvalue matrix.
- the LBP sampling function can use a variety of modes, such as the original LBP feature, the circular LBP feature, the equivalent mode, etc.
- the first perceptron unit is used to extract a brightness-based attention feature map from the first input feature map.
- the input of the first perceptron unit includes the first input feature map, and the output of the first perceptron unit includes the brightness-based attention feature map.
- the model type of the first perceptron unit can be a self-attention network Non-local Networks (NLNet), etc.
- the second perceptron unit is used to extract a gradient-based attention feature map from the first input feature map.
- the input of the second perceptron unit includes an LBP eigenvalue matrix, and the output of the second perceptron unit includes a gradient-based attention feature map.
- the model type of the second perceptron unit can be Gradient-weighted Class Activation Mapping (Grad-CAM) and the like.
- the first perceptron unit and the second perceptron unit are multi-layered and independent of each other.
- the fusion layer unit is used to fuse the brightness-based attention feature map and the gradient-based attention feature map.
- the input of the fusion layer unit includes the brightness-based attention feature map and the gradient-based attention feature map.
- the output of the layer unit includes the fused attention feature map.
- the model type of the fusion layer unit can be a Transformer model, etc.
- the dot multiplication processing unit is used to perform dot multiplication processing on the fused attention feature map and the first input feature map, that is, according to the pixel values of the pixels at each position in the fused attention feature map and the first input feature map, the pixel values of the pixels in each row and column of the fused attention feature map are multiplied with the pixel values of the pixels in the corresponding rows and columns of the first input feature map to obtain the pixel values of the pixels in each row and column of the target attention feature map.
- the input of the dot multiplication processing unit includes the first input feature map and the fused attention feature map, and the output of the dot multiplication processing unit includes the target attention feature map.
- the pixel value of each pixel point in the first input feature map is input into the LBP sampling function to obtain an output LBP eigenvalue matrix and save it in the attention layer; the first input feature map is then input into the first perceptron unit of the attention layer for perceptual processing to obtain a brightness-based attention feature map, the LBP eigenvalue matrix is perceptually processed by the second perceptron unit of the attention layer to obtain a gradient-based attention feature map, and then the brightness-based attention feature map and the gradient-based attention feature map are fused, and the brightness and gradient are considered simultaneously when extracting the target attention feature map, so that the feature information extracted from the target attention feature map is more accurate and comprehensive.
- Figure 3 is a schematic diagram of the process of perceiving and processing the first input feature map by the attention layer of a hybrid attention neural network model provided by an embodiment of the present invention.
- the first input feature map is input into the fusion layer after passing through the first multi-layer perceptron, the first input feature map is passed through the determination unit to obtain the LBP eigenvalue matrix, the LBP eigenvalue matrix is passed through the second multi-layer perceptron and input into the fusion layer, and the fusion feature map output by the fusion layer is point-multiplied with the first input feature map to obtain the output target attention feature map.
- the electronic device may obtain the output LBP eigenvalue matrix according to the target image and a pre-saved LBP sampling function and save it in the attention layer of each attention residual layer.
- Embodiment 4 is a diagrammatic representation of Embodiment 4:
- each attention residual layer further includes a first convolution layer, a first activation layer, a second convolution layer and a second activation layer.
- the method further includes:
- the first input feature map is input into the first convolution layer of the attention residual layer for convolution processing, activated through the first activation layer, convolution through the second convolution layer, activated through the second activation layer to obtain the second input feature map, and the second input feature map is input into the attention layer of the attention residual layer for subsequent processing.
- each attention residual layer can also include a first convolution layer, a first activation layer, a second convolution layer and a second activation layer.
- the first input feature map is convolved through the first convolution layer, the first input feature map after the convolution is input into the first activation layer for activation, the first input feature map after the activation is input into the second convolution layer for convolution, and the first input feature map after the convolution is input into the second activation layer for activation to obtain the second input feature map output by the second activation layer.
- the second input feature map is input into the attention layer of the attention residual layer for perception processing to obtain the target attention feature map output by the attention layer after the perception processing; based on the residual addition processing layer of the attention residual layer, the pixel values of the pixel points of the corresponding rows and columns of the target attention feature map and the first input feature map are added to obtain the output feature map of the attention residual layer.
- the first convolution layer is used to perform convolution processing on the first input feature map.
- the input of the first convolution layer includes the first input feature map, and the output of the first convolution layer includes the first convolution feature map.
- the model type of the first convolution layer can be a convolutional neural network CNN, etc.
- the first activation layer is used to activate the first convolutional feature map.
- the input of the first activation layer includes the first convolutional feature map, and the output of the first activation layer includes the first activation feature map.
- the first activation layer can be implemented by various activation functions, such as Sigmoid activation function, hyperbolic tangent activation function, ReLU activation function, etc.
- the second convolution layer is used to perform convolution processing on the first activation feature map.
- the input of the second convolution layer includes the first activation feature map, and the output of the second convolution layer includes the second convolution feature map.
- the model type of the second convolution layer can be a convolutional neural network CNN, etc.
- the second activation layer is used to activate the second convolutional feature map.
- the input of the second activation layer includes the second convolutional feature map, and the output of the second activation layer includes the second input feature map.
- the second activation layer can be implemented by various activation functions, such as Sigmoid activation function, hyperbolic tangent activation function, ReLU activation function, etc.
- the target attention feature map and the output feature map can pay more attention to the details in the image, thereby improving the ability to enhance the details.
- Figure 4 is a schematic diagram of a process of extracting an output feature map by an attention residual layer provided by an embodiment of the present invention.
- the first input feature map is passed through a convolution layer, an activation layer, a convolution layer, and an activation layer to obtain a second input feature map.
- the second input feature map is input into the attention layer to obtain a target attention feature map.
- the target attention feature map and the first input feature map are subjected to residual addition processing to obtain an output feature map.
- Embodiment 5 is a diagrammatic representation of Embodiment 5:
- FIG5 is a schematic diagram of the structure of an image processing device provided by an embodiment of the present invention. As shown in FIG5 , the device includes:
- An acquisition module 501 is used to acquire a target image to be processed
- Processing module 502 is used to perform convolution processing on the input target image based on the convolution layer of the pre-trained hybrid attention super-resolution network model to obtain a shallow feature map; based on a set number of series-connected attention residual layers in the hybrid attention super-resolution network model, the shallow feature map is sequentially passed through each attention residual layer for perception processing to obtain a deep feature map output by the last attention residual layer; based on the target residual addition processing layer of the hybrid attention super-resolution network model, the pixel values of the corresponding rows and columns of the shallow feature map and the deep feature map are added to obtain a processed target feature map; based on the upsampling layer in the hybrid attention super-resolution network model, the target feature map is input into the upsampling layer to obtain an output high-resolution image with enhanced details.
- the processing module 502 is specifically used to, for each attention residual layer in the hybrid attention super-resolution network model, if the attention residual layer is the first attention residual layer, input the shallow feature map into the attention residual layer as the first input feature map; if the attention residual layer is not the first attention residual layer, use the output feature map of the previous attention residual layer as the first input feature map of the attention residual layer; based on the attention layer of the attention residual layer, perform perception processing on the first input feature map to obtain a target attention feature map output by the attention layer after the perception processing; based on the residual addition processing layer of the attention residual layer, add the pixel values of the pixel points of the corresponding rows and columns of the target attention feature map and the first input feature map to obtain the output feature map of the attention residual layer, until the deep feature map output by the last attention residual layer is obtained.
- the processing module 502 is specifically used for the processing unit of the attention layer based on the attention residual layer, according to the pixel value of each pixel in the first input feature map and the local binary pattern LBP sampling function pre-saved in the attention layer, the pixel value of each pixel is input into the LBP sampling function to obtain the output LBP eigenvalue matrix and save it in the attention layer;
- the first input feature map is input into the first perceptron unit of the attention layer for perceptual processing to obtain a brightness-based attention feature map
- the LBP eigenvalue matrix is perceptually processed by the second perceptron unit of the attention layer to obtain a gradient-based attention feature map
- the two attention feature maps are input into the fusion layer unit of the attention layer for fusion processing to obtain a fused attention feature map, and based on the dot product processing unit of the attention layer, the fused attention feature map is dot-multiplied with the first input feature map to obtain the output target attention feature map.
- each attention residual layer also includes a first convolution layer, a first activation layer, a second convolution layer and a second activation layer.
- the processing module 502 is specifically used to input the first input feature map into the first convolution layer of the attention residual layer for convolution processing, perform activation processing through the first activation layer, perform convolution processing through the second convolution layer, perform activation processing through the second activation layer to obtain a second input feature map, and input the second input feature map into the attention layer of the attention residual layer for subsequent processing.
- Embodiment 6 is a diagrammatic representation of Embodiment 6
- Figure 6 is a schematic diagram of the structure of an electronic device provided by an embodiment of the present invention. Based on the above embodiments, the present application also provides an electronic device, as shown in Figure 6, including: a processor 601, a communication interface 602, a memory 603 and a communication bus 604, wherein the processor 601, the communication interface 602, and the memory 603 communicate with each other through the communication bus 604.
- the memory 603 stores a computer program. When the program is executed by the processor 601, the processor 601 performs the following steps:
- a convolution layer is used to perform convolution processing on the input target image to obtain a shallow feature map
- the shallow feature map is sequentially passed through each attention residual layer for perception processing to obtain a deep feature map output by the last attention residual layer; based on the target residual addition processing layer of the hybrid attention super-resolution network model, the pixel values of the pixel points in the corresponding rows and columns of the shallow feature map and the deep feature map are added to obtain a processed target feature map; based on the upsampling layer in the hybrid attention super-resolution network model, the target feature map is input into the upsampling layer to obtain an output high-resolution image with enhanced details.
- the set number of series-connected attention residual layers in the hybrid attention super-resolution network model is used to sequentially pass the shallow feature map through each attention residual layer for perception processing, and the deep feature map output by the last attention residual layer is obtained, including:
- the attention residual layer For each attention residual layer in the hybrid attention super-resolution network model, if the attention residual layer is the first attention residual layer, the shallow feature map is input into the attention residual layer as the first input feature map; if the attention residual layer is not the first attention residual layer, the output feature map of the previous attention residual layer is used as the first input feature map of the attention residual layer; the attention layer based on the attention residual layer performs perception processing on the first input feature map to obtain the target attention feature map output by the attention layer after the perception processing; based on the residual addition processing layer of the attention residual layer, the pixel values of the pixel points of the corresponding rows and columns of the target attention feature map and the first input feature map are added to obtain the output feature map of the attention residual layer, until the deep feature map output by the last attention residual layer is obtained.
- the attention layer based on the attention residual layer performs perception processing on the first input feature map
- the target attention feature map output by the attention layer after the perception processing includes:
- the processing unit of the attention layer based on the attention residual layer according to the pixel value of each pixel point in the first input feature map and the local binary pattern LBP sampling function pre-stored in the attention layer, inputs the pixel value of each pixel point into the LBP sampling function to obtain the output LBP eigenvalue matrix and saves it in the attention layer;
- the first input feature map is input into the first perceptron unit of the attention layer for perceptual processing to obtain a brightness-based attention feature map
- the LBP eigenvalue matrix is perceptually processed through the second perceptron unit of the attention layer to obtain a gradient-based attention feature map
- the two attention feature maps are input into the fusion layer unit of the attention layer for fusion processing to obtain a fused attention feature map
- the fused attention feature map is dot-multiplied with the first input feature map to obtain an output target attention feature map.
- each attention residual layer further includes a first convolution layer, a first activation layer, a second convolution layer, and a second activation layer.
- the method further includes:
- the first input feature map is input into the first convolution layer of the attention residual layer for convolution processing, activated through the first activation layer, convolution through the second convolution layer, activated through the second activation layer to obtain the second input feature map, and the second input feature map is input into the attention layer of the attention residual layer for subsequent processing.
- the communication bus mentioned in the above electronic device can be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus. EISA) bus, etc.
- PCI Peripheral Component Interconnect
- EISA Extended Industry Standard Architecture
- the communication bus can be divided into address bus, data bus, control bus, etc. For ease of representation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.
- the communication interface 602 is used for communication between the electronic device and other devices.
- the memory may include a random access memory (RAM) or a non-volatile memory (NVM), such as at least one disk storage.
- RAM random access memory
- NVM non-volatile memory
- the memory may also be at least one storage device located away from the aforementioned processor.
- processors can be general-purpose processors, including central processing units, network processors (Network Processor, NP), etc.; they can also be digital signal processing processors (Digital Signal Processing, DSP), application-specific integrated circuits, field programmable gate arrays or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
- DSP Digital Signal Processing
- Embodiment 7 is a diagrammatic representation of Embodiment 7:
- the present application further provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program executable by a processor, and when the program runs on the processor, the processor implements the following steps when executing:
- the shallow feature map is sequentially passed through each attention residual layer for perception processing to obtain a deep feature map output by the last attention residual layer; based on the target residual addition processing layer of the hybrid attention super-resolution network model, the pixel values of the pixel points in the corresponding rows and columns of the shallow feature map and the deep feature map are added to obtain a processed target feature map; based on the upsampling layer in the hybrid attention super-resolution network model, the target feature map is input into the upsampling layer to obtain an output high-resolution image with enhanced details.
- the set number of series-connected attention residual layers in the hybrid attention super-resolution network model is used to sequentially pass the shallow feature map through each attention residual layer for perception processing, and the deep feature map output by the last attention residual layer is obtained, including:
- the attention residual layer For each attention residual layer in the hybrid attention super-resolution network model, if the attention residual layer is the first attention residual layer, the shallow feature map is input into the attention residual layer as the first input feature map; if the attention residual layer is not the first attention residual layer, the output feature map of the previous attention residual layer is used as the first input feature map of the attention residual layer; the attention layer based on the attention residual layer performs perception processing on the first input feature map to obtain the target attention feature map output by the attention layer after the perception processing; based on the residual addition processing layer of the attention residual layer, the pixel values of the pixel points of the corresponding rows and columns of the target attention feature map and the first input feature map are added to obtain the output feature map of the attention residual layer, until the deep feature map output by the last attention residual layer is obtained.
- the attention layer based on the attention residual layer performs perception processing on the first input feature map
- the target attention feature map output by the attention layer after the perception processing includes:
- the processing unit of the attention layer based on the attention residual layer according to the pixel value of each pixel point in the first input feature map and the local binary pattern LBP sampling function pre-stored in the attention layer, inputs the pixel value of each pixel point into the LBP sampling function to obtain the output LBP eigenvalue matrix and saves it in the attention layer;
- the first input feature map is input into the first perceptron unit of the attention layer for perceptual processing to obtain a brightness-based attention feature map
- the LBP eigenvalue matrix is perceptually processed through the second perceptron unit of the attention layer to obtain a gradient-based attention feature map
- the two attention feature maps are input into the fusion layer unit of the attention layer for fusion processing to obtain a fused attention feature map
- the fused attention feature map is dot-multiplied with the first input feature map to obtain an output target attention feature map.
- each attention residual layer further includes a first convolution layer, a first activation layer, a second convolution layer, and a second activation layer.
- the method further includes:
- the first input feature map is input into the first convolution layer of the attention residual layer for convolution processing, activated through the first activation layer, convolution through the second convolution layer, activated through the second activation layer to obtain the second input feature map, and the second input feature map is input into the attention layer of the attention residual layer for subsequent processing.
- the embodiments of the present application may be provided as methods, systems, or computer program products. Therefore, the present application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment in combination with software and hardware. Moreover, the present application may adopt the form of a computer program product implemented in one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) that include computer-usable program code.
- a computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
- These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer-readable memory produce a manufactured product including an instruction device that implements the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.
- These computer program instructions may also be loaded onto a computer or other programmable data processing device so that a series of operational steps are executed on the computer or other programmable device to produce a computer-implemented process, whereby the instructions executed on the computer or other programmable device provide steps for implementing the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
本发明公开了一种图像处理方法、装置、设备和介质,由于本发明中获取待处理的目标图像,基于预先训练完成的混合注意力超分辨网络模型的卷积层,对输入的目标图像进行卷积处理得到浅层特征图,通过模型中的设定数量的串联的注意力残差层,将浅层特征图顺序经过每个注意力残差层进行感知处理,由于设定数量的串联的注意力残差层构造出更深的网络,使得模型更加关注图像中的细节区域,更深的网络能准确地提取细节特征的深层特征图,深层特征图与浅层特征图的对应行列的像素点的像素值相加后输入到上采样层中,得到输出的细节增强后的高分辨率图像,从而解决了热成像图像中模糊细节区域的特征提取不准确的问题。
Description
交叉引用
本申请要求对2023年1月17日提交的申请号为202310090705.6的中国申请的优先权,全部内容通过引用的方式并入本申请。
本发明涉及图像处理技术领域,尤其涉及一种图像处理方法、装置、设备和介质。
热成像图像由于受硬件以及成本限制因素,热成像图像通常分辨率较低,细节不够突出,通过超分辨率重建可以提升图像的分辨率和质量,可以缓解分辨率低和细节不突出的问题。
但是由于热成像图像相比可见光图像的细节较模糊,难以区分细节区域和平缓区域,并且当目标与周围环境的温度差别较大时,细节区域的特征提取会更加不准确,热成像图像在经过超分辨率网络模型进行超分辨率重建后,细节区域会出现黑白边现象。
因此,如何解决热成像图像中模糊细节区域的特征提取不准确导致的超分辨率重建后的黑白边现象就成为亟待解决的技术问题。
发明内容
本发明提供了一种图像处理方法、装置、设备和介质,用以解决现有技术中热成像图像中模糊细节区域的特征提取不准确导致的超分辨率重建后的黑白边现象的问题。
本发明提供了一种图像处理方法,所述方法包括:
获取待处理的目标图像,基于预先训练完成的混合注意力超分辨网络模型的卷积层,对输入的所述目标图像进行卷积处理得到浅层特征图;
基于所述混合注意力超分辨网络模型中设定数量的串联的注意力残差层,将所述浅层特征图顺序经过每个注意力残差层进行感知处理,得到最后一个注意力残差层输出的深层特征图,基于所述混合注意力超分辨网络模型的目标残差相加处理层,将所述浅层特征图与所述深层特征图的对应行列的像素点的像素值相加,得到处理后的目标特征图;基于所述混合注意力超分辨网络模型中的上采样层,将所述目标特征图输入到所述上采样层,得到输出的细节增强后的高分辨率图像。
进一步地,所述基于所述混合注意力超分辨网络模型中设定数量的串联的注意力残差层,将所述浅层特征图顺序经过每个注意力残差层进行感知处理,得到最后一个注意力残差层输出的深层特征图包括:
针对所述混合注意力超分辨网络模型中的每个注意力残差层,若该注意力残差层为首个注意力残差层,则将所述浅层特征图作为第一输入特征图输入该注意力残差层,若该注意力残差层非为首个注意力残差层,则将上一注意力残差层的输出特征图作为该
注意力残差层的第一输入特征图;基于该注意力残差层的注意力层对第一输入特征图进行感知处理,得到感知处理后注意力层输出的目标注意力特征图,基于该注意力残差层的残差相加处理层,将所述目标注意力特征图以及所述第一输入特征图的对应行列的像素点的像素值相加,得到该注意力残差层的输出特征图,直到得到最后一个注意力残差层输出的深层特征图。
进一步地,所述基于该注意力残差层的注意力层对第一输入特征图进行感知处理,得到感知处理后注意力层输出的目标注意力特征图包括:
基于该注意力残差层的注意力层的处理单元,根据所述第一输入特征图中每个像素点的像素值、以及所述注意力层中预先保存的局部二值模式LBP采样函数,将所述每个像素点的像素值输入所述LBP采样函数得到输出的LBP特征值矩阵并保存在所述注意力层中;
将所述第一输入特征图输入所述注意力层的第一感知机单元进行感知处理得到基于亮度的注意力特征图,将所述LBP特征值矩阵经过所述注意力层的第二感知机单元进行感知处理得到基于梯度的注意力特征图,将两个注意力特征图输入所述注意力层的融合层单元进行融合处理得到融合后的注意力特征图,基于所述注意力层的点乘处理单元,将所述融合后的注意力特征图与所述第一输入特征图进行点乘处理得到输出的目标注意力特征图。
进一步地,每个注意力残差层还包括第一卷积层、第一激活层、第二卷积层和第二激活层,所述基于该注意力残差层的注意力层对第一输入特征图进行感知处理,得到感知处理后注意力层输出的目标注意力特征图之前,所述方法还包括:
将所述第一输入特征图输入该注意力残差层的第一卷积层进行卷积处理、经过第一激活层进行激活处理、经过第二卷积层进行卷积处理,经过第二激活层进行激活处理得到第二输入特征图,并将所述第二输入特征图输入该注意力残差层的注意力层进行后续处理。
相应地,本发明提供了一种图像处理装置,所述装置包括:
获取模块,用于获取待处理的目标图像;
处理模块,用于基于预先训练完成的混合注意力超分辨网络模型的卷积层,对输入的所述目标图像进行卷积处理得到浅层特征图;基于所述混合注意力超分辨网络模型中设定数量的串联的注意力残差层,将所述浅层特征图顺序经过每个注意力残差层进行感知处理,得到最后一个注意力残差层输出的深层特征图,基于所述混合注意力超分辨网络模型的目标残差相加处理层,将所述浅层特征图与所述深层特征图的对应行列的像素点的像素值相加,得到处理后的目标特征图;基于所述混合注意力超分辨网络模型中的上采样层,将所述目标特征图输入到所述上采样层,得到输出的细节增强后的高分辨率图像。
进一步地,所述处理模块,具体用于针对所述混合注意力超分辨网络模型中的每个注意力残差层,若该注意力残差层为首个注意力残差层,则将所述浅层特征图作为第一输入特征图输入该注意力残差层,若该注意力残差层非为首个注意力残差层,则将上一注意力残差层的输出特征图作为该注意力残差层的第一输入特征图;基于该注意力残差层的注意力层对第一输入特征图进行感知处理,得到感知处理后注意力层输出的目
标注意力特征图,基于该注意力残差层的残差相加处理层,将所述目标注意力特征图以及所述第一输入特征图的对应行列的像素点的像素值相加,得到该注意力残差层的输出特征图,直到得到最后一个注意力残差层输出的深层特征图。
进一步地,所述处理模块,具体用于基于该注意力残差层的注意力层的处理单元,根据所述第一输入特征图中每个像素点的像素值、以及所述注意力层中预先保存的局部二值模式LBP采样函数,将所述每个像素点的像素值输入所述LBP采样函数得到输出的LBP特征值矩阵并保存在所述注意力层中;将所述第一输入特征图输入所述注意力层的第一感知机单元进行感知处理得到基于亮度的注意力特征图,将所述LBP特征值矩阵经过所述注意力层的第二感知机单元进行感知处理得到基于梯度的注意力特征图,将两个注意力特征图输入所述注意力层的融合层单元进行融合处理得到融合后的注意力特征图,基于所述注意力层的点乘处理单元,将所述融合后的注意力特征图与所述第一输入特征图进行点乘处理得到输出的目标注意力特征图。
进一步地,每个注意力残差层还包括第一卷积层、第一激活层、第二卷积层和第二激活层,所述处理模块,具体用于将所述第一输入特征图输入该注意力残差层的第一卷积层进行卷积处理、经过第一激活层进行激活处理、经过第二卷积层进行卷积处理,经过第二激活层进行激活处理得到第二输入特征图,并将所述第二输入特征图输入该注意力残差层的注意力层进行后续处理。
相应地,本发明提供了一种电子设备,包括:处理器、通信接口、存储器和通信总线,其中,处理器,通信接口,存储器通过通信总线完成相互间的通信;
所述存储器中存储有计算机程序,当所述程序被所述处理器执行时,使得所述处理器执行时实现上述图像处理方法中任一所述方法的步骤。
相应地,本发明提供了一种计算机可读存储介质,其存储有计算机程序,所述计算机程序被处理器执行时实现上述图像处理方法中任一所述方法的步骤。
本发明提供了一种图像处理方法、装置、设备和介质,由于本发明中获取待处理的目标图像,基于预先训练完成的混合注意力超分辨网络模型的卷积层,对输入的目标图像进行卷积处理得到浅层特征图,通过模型中的设定数量的串联的注意力残差层,将浅层特征图顺序经过每个注意力残差层进行感知处理,由于设定数量的串联的注意力残差层构造出更深的网络,使得模型更加关注图像中的细节区域,更深的网络能准确地提取细节特征的深层特征图,深层特征图与浅层特征图的对应行列的像素点的像素值相加后输入到上采样层中,得到输出的细节增强后的高分辨率图像,从而解决了热成像图像中模糊细节区域的特征提取不准确的问题。
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简要介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域的普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本发明实施例提供的一种图像处理方法的过程示意图;
图2为本发明实施例提供的一种图像处理的过程示意图;
图3为本发明实施例提供的一种混合注意力神经网络模型的注意力层对第一输入特征图进行感知处理的过程示意图;
图4为本发明实施例提供的一种注意力残差层提取输出特征图的过程示意图;
图5为本发明实施例提供的一种图像处理装置的结构示意图;
图6为本发明实施例提供的一种电子设备结构示意图。
为了使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明作进一步地详细描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本发明保护的范围。
为了解决热成像图像中模糊细节区域的特征提取不准确导致的超分辨率重建后的黑白边现象的问题,本发明实施例提供了一种图像处理方法、装置、设备和介质。
实施例1:
图1为本发明实施例提供的一种图像处理方法的过程示意图,该过程包括以下步骤:
S101:获取待处理的目标图像,基于预先训练完成的混合注意力超分辨网络模型的卷积层,对输入的所述目标图像进行卷积处理得到浅层特征图。
为了解决热成像图像中模糊细节区域的特征提取不准确导致的超分辨率重建后的黑白边现象的问题,本发明实施例提供的一种图像处理方法应用于电子设备,该电子设备可以是主机、平板电脑、智能手机等智能终端设备,也可以是服务器,其中该服务器可以是本地服务器,也可以是云端服务器,本发明实施例对此不做限制。
该电子设备获取待处理的目标图像,目标图像是指需要处理的图像。其中该目标图像可以是热成像图像,也可以是存在模糊细节区域的低分辨图像,例如红外图像、可见光图像等。电子设备可以通过多种方式获取待处理的目标图像,例如,该电子设备具体可以是接收与该电子设备连接的电子设备(如,热成像仪)发送的目标图像,也可以是获取该电子设备自身保存的目标图像。
为了提取到目标图像的浅层特征图,该电子设备保存有预先训练完成的混合注意力超分辨网络模型,该混合注意力超分辨率网络模型是用于实现低分辨率图像的超分辨率重建,其中模型的损失函数为L=MSE(lr,hr),L表示损失函数值,lr表示低分辨率图像,hr表示高分辨率图像,MSE为均方误差;基于该混合注意力超分辨率网络模型的卷积层,对目标图像进行卷积处理得到目标图像的浅层特征图。在一些实施例中,混合注意力超分辨网络模型用于基于注意力机制对图像进行超分辨率重建。注意力机制是人工神经网络中一种模仿认知注意力的技术。注意力机制可以增强神经网络输入数据中某些部分的权重,同时减弱其他部分的权重,以此将网络的关注点聚焦于数据中最重要的一小部分。注意力机制可以通过在模型结构中加入attention函数或引入其他实现注意力机制的结构等方式实现。在一些实施例中,混合注意力超分辨网络模型的输入可以包括目标图像,混合注意力超分辨网络模型的输出可以包括细节增强后的高分辨率图像。
S102:基于所述混合注意力超分辨网络模型中设定数量的串联的注意力残差层,
将所述浅层特征图顺序经过每个注意力残差层进行感知处理,得到最后一个注意力残差层输出的深层特征图,基于所述混合注意力超分辨网络模型的目标残差相加处理层,将所述浅层特征图与所述深层特征图的对应行列的像素点的像素值相加,得到处理后的目标特征图;基于所述混合注意力超分辨网络模型中的上采样层,将所述目标特征图输入到所述上采样层,得到输出的细节增强后的高分辨率图像。
在一些实施例中,混合注意力超分辨网络模型的结构如下:混合注意力超分辨网络模型包括卷积层、注意力残差层、目标残差相加处理层以及上采样层,卷积层的输出作为注意力残差层的输入,卷积层的输出以及注意力残差层的输出作为目标残差相加处理层的输入,目标残差相加处理层的输出作为上采样层的输入,上采样层的输出作为混合注意力超分辨网络模型最终的输出。
卷积层用于对目标图像提取浅层特征,得到浅层特征图。卷积层的输入包括目标图像,卷积层的输出包括浅层特征图,卷积层的模型类型可以是卷积神经网络Convolutional Neural Network(CNN)等。
注意力残差层用于对目标图像提取深层特征。注意力残差层的输入包括第一输入特征图,第一输入特征图可以是卷积层输出的浅层特征图或前一个注意力残差层输出的输出特征图,注意力残差层的输出包括输出特征图,最后一个注意力残差层输出的输出特征图可以叫做深层特征图,注意力残差层的模型类型可以是残差注意力网络Residual Attention Network(RAN)等。
在一些实施例中,基于所述混合注意力超分辨网络模型中设定数量的串联的注意力残差层,将所述浅层特征图顺序经过每个注意力残差层进行感知处理,得到最后一个注意力残差层输出的深层特征图包括:针对混合注意力超分辨网络模型中的每个注意力残差层,若该注意力残差层为首个注意力残差层,则将卷积层输出的浅层特征图作为该注意力残差层的第一输入特征图,输入该注意力残差层;若该注意力残差层为非首个注意力残差层,则将前一个注意力残差层输出的输出特征图作为该注意力残差层的第一输入特征图,输入该注意力残差层。
目标残差相加处理层用于将输入的至少两个图像的对应行列的像素点的像素值进行处理得到目标特征图。在一些实施例中,目标残差相加处理层将输入的至少两个图像的对应行列的像素点的像素值进行相加。目标残差相加处理层的输入包括浅层特征图和深层特征图,目标残差相加处理层的输出包括目标特征图,目标残差相加处理层的模型类型可以是残差神经网络Residual Neural Network(ResNet)等。
上采样层用于提升图像的分辨率。上采样层的输入包括目标特征图,上采样层的输出包括细节增强后的高分辨率图像,上采样层的模型类型可以是全卷积网络Fully Convolutional Networks(FCN)、图像分割卷积网络U-Net:Convolutional Networks for Biomedical Image Segmentation(U-Net)等。
在一些实施例中,可以基于大量带有标识的训练样本训练混合注意力超分辨网络模型。具体的,将带有标识的训练样本输入混合注意力超分辨网络模型,通过训练更新混合注意力超分辨网络模型的参数。
在一些实施例中,训练样本可以是样本目标图像。在一些实施例中,标识可以是样本目标图像对应的实际细节增强后的高分辨率图像。在一些实施例中,标识的获取
方式可以是人工通过插值算法、图像重建等图像超分辨率技术得到的;插值算法可以是邻近插值、双线性插值和双立方插值等;图像重建可以是小波变换等。在一些实施例中,卷积层、注意力残差层、目标残差相加处理层以及上采样层可以联合训练获取。可以基于大量带有训练标签的训练样本训练初始卷积层、初始注意力残差层、初始目标残差相加处理层以及初始上采样层;具体的,将样本目标图像输入初始卷积层得到样本浅层特征图;将样本浅层特征图输入初始注意力残差层得到样本深层特征图;将样本浅层特征图和样本深层特征图输入初始目标残差相加处理层得到样本目标特征图;将样本目标特征图输入初始上采样层得到样本目标图像对应的细节增强后的高分辨率图像,基于样本目标图像及其对应的细节增强后的高分辨率图像构建损失函数,基于损失函数同时更新初始卷积层、初始注意力残差层、初始目标残差相加处理层以及初始上采样层的参数,直到训练满足预设条件,获取训练好的卷积层、注意力残差层、目标残差相加处理层以及上采样层。其中,预设条件可以是损失函数小于阈值、收敛,或训练周期达到阈值。在一些实施例中,可以基于训练样本,通过各种方法进行训练。例如,可以基于梯度下降法进行训练。
在一些实施例中,训练过程中,模型的损失函数为L=MSE(lr,hr),其中,L表示损失函数值,lr表示样本目标图像(为低分辨率图像),hr表示样本目标图像对应的细节增强后的高分辨率图像,MSE为均方误差。
为了提取到目标图像的深层特征图,该混合注意力超分辨率网络模型中在卷积层之后包括设定数量的串联的注意力残差层,其中设定数量可以是60、64、62、65等数值,较佳的,设定数量为64。
该电子设备将卷积层输出的浅层特征图输入首个注意力残差层,顺序经过每个注意力残差层进行感知处理,将上一注意力残差层的输出特征图作为下一注意力残差层的输入特征图,得到最后一个注意力残差层输出的深层特征图。
图2为本发明实施例提供的一种图像处理的过程示意图,如图2所示,将目标图像(为低分辨率图像)输入到混合注意力神经网络模型的卷积层,并将卷积层输出的浅层特征图输入到首个注意力残差层,经过n个(n可以取60、64、62、65等数值,较佳的,n可以取64)注意力残差层的感知处理,得到最后一个注意力残差层输出的深层特征图,将深层特征图和浅层特征图输入到目标残差相加处理层,得到目标残差相加处理层输出的目标特征图并输入到上采样层,得到上采样层输出的高分辨率图像。
基于混合注意力超分辨率网络模型的目标残差相加处理层,将浅层特征图与深层特征图输入到目标残差相加处理层,根据浅层特征图与深层特征图中每个对应行列的像素点的像素值,将对应行列的像素点的像素值相加,得到残差相加处理后的目标特征图,基于混合注意力超分辨率网络模型的上采样层,将目标特征图输入到上采样层中,对目标图像进行上采样处理,得到上采样层输出的细节增强后的高分辨率图像。
由于在本发明实施例中,获取待处理的目标图像,基于预先训练完成的混合注意力超分辨网络模型的卷积层,对输入的目标图像进行卷积处理得到浅层特征图,通过模型中的设定数量的串联的注意力残差层,将浅层特征图顺序经过每个注意力残差层进行感知处理,由于设定数量的串联的注意力残差层构造出更深的网络,使得模型更加关注图像中的细节区域,更深的网络能准确地提取细节特征的深层特征图,深层特征图与
浅层特征图的对应行列的像素点的像素值相加后输入到上采样层中,得到输出的细节增强后的高分辨率图像,从而解决了热成像图像中模糊细节区域的特征提取不准确的问题。
实施例2:
在一些实施例中,混合注意力超分辨网络模型可以包括设定数量的串联的注意力残差层,串连是指前一个注意力残差层的输出作为后一个注意力残差层的输入。
示例性的,设定数量(即前述的n值)可以为64,则混合注意力超分辨网络模型包括64个串联的注意力残差层,记为注意力残差层1、注意力残差层2、…、注意力残差层64,其中,卷积层输出的浅层特征图输入到注意力残差层1(即首个注意力残差层),注意力残差层1的输出作为注意力残差层2的输入,…,注意力残差层63的输入作为注意力残差层64的输出,注意力残差层64的输出为深层特征图。
为了得到深层特征图,在上述实施例的基础上,在本发明实施例中,所述基于所述混合注意力超分辨网络模型中设定数量的串联的注意力残差层,将所述浅层特征图顺序经过每个注意力残差层进行感知处理,得到最后一个注意力残差层输出的深层特征图包括:
针对所述混合注意力超分辨网络模型中的每个注意力残差层,若该注意力残差层为首个注意力残差层,则将所述浅层特征图作为第一输入特征图输入该注意力残差层,若该注意力残差层非为首个注意力残差层,则将上一注意力残差层的输出特征图作为该注意力残差层的第一输入特征图;基于该注意力残差层的注意力层对第一输入特征图进行感知处理,得到感知处理后注意力层输出的目标注意力特征图,基于该注意力残差层的残差相加处理层,将所述目标注意力特征图以及所述第一输入特征图的对应行列的像素点的像素值相加,得到该注意力残差层的输出特征图,直到得到最后一个注意力残差层输出的深层特征图。
针对混合注意力超分辨网络模型的每个注意力残差层,若该注意力残差层为首个注意力残差层,则将卷积层输出的浅层特征图作为第一输入特征图输入该注意力残差层;若该注意力残差层非为首个注意力残差层,则获取该注意力残差层的上一注意力残差层输出的输出特征图,将输出特征图作为该注意力残差层的第一输入特征图。
该注意力残差层包括注意力层和残差相加处理层,将第一输入特征图输入到该注意力残差层的注意力层,基于注意力层对第一输入特征图进行感知处理,得到注意力层进行感知处理后输出的目标注意力特征图,将目标注意力特征图和第一输入特征图输入到该注意力残差层的残差相加处理层,根据输入该注意力残差层的残差相加处理层中的目标注意力特征图和第一输入特征图,确定目标注意力特征图和第一输入特征图中每个位置的像素点的像素值,将目标注意力特征图和第一输入特征图中对应行列的像素点的像素值相加,得到该注意力残差输出的输出特征图,若该注意力残差层非为最后一个注意力残差层,则将输出特征图作为下一注意力残差层的输入特征图,若该注意力残差层为最后一个注意力残差层,则将输出特征图作为最后一个注意力残差层输出的深层特征图。
在一些实施例中,注意力残差层包括注意力层和残差相加处理层。注意力层用于对第一输入特征图进行处理得到目标注意力特征图。残差相加处理层用于对输入的第一输入特征图和目标注意力特征图的对应行列的像素点的像素值进行相加得到输出特
征图。注意力层的模型类型可以是Transformer模型等,残差相加处理层的模型类型可以是残差神经网络ResNet等。
在一些实施例中,注意力残差层可以分为首个注意力残差层和非首个注意力残差层。首个注意力残差层是指第一个对浅层特征图进行处理的注意力残差层,非首个注意力残差层是指除了首个注意力残差层外的其余注意力残差层。
首个注意力残差层的注意力层的输入包括浅层特征图(也称作当前注意力残差层的第一输入特征图),首个注意力残差层的注意力层的输出包括首个注意力残差层的目标注意力特征图。首个注意力残差层的残差相加处理层的输入包括浅层特征图和首个注意力残差层的目标注意力特征图,首个注意力残差层的残差相加处理层的输出包括首个注意力残差层的输出特征图。
非首个注意力残差层的注意力层的输入包括上一注意力残差层输出的输出特征图(也称作当前注意力残差层的第一输入特征图),非首个注意力残差层的注意力层的输出包括当前注意力残差层的目标注意力特征图。非首个注意力残差层的残差相加处理层的输入包括当前注意力残差层的目标注意力特征图和上一注意力残差层输出的输出特征图(也称作当前注意力残差层的第一输入特征图),非首个注意力残差层的残差相加处理层的输出包括当前注意力残差层的输出特征图。
在一些实施例中,最后一个注意力残差层输出的输出特征图可以作为深层特征图输入目标残差相加处理层中,其中,最后一个注意力残差层是指其得到的输出特征图要进入目标残差相加处理层进一步处理的注意力残差层。
本说明书一些实施例中,通过在混合注意力超分辨网络模型中设置多个串联的注意力残差层,实现了在卷积层提取的浅层特征图的基础上结合每个注意力残差层处理得到的结果对目标图像进一步提取深层特征,使得模型可以更加关注目标图像中的细节部分,从而能够准确地提取细节特征,通过残差结果构造出更深的网络,有效提取深层特征,最终提高模型对目标图像细节的增强效果。
实施例3:
为了得到目标注意力特征图,在上述各实施例的基础上,在本发明实施例中,所述基于该注意力残差层的注意力层对第一输入特征图进行感知处理,得到感知处理后注意力层输出的目标注意力特征图包括:
基于该注意力残差层的注意力层的处理单元,根据所述第一输入特征图中每个像素点的像素值、以及所述注意力层中预先保存的局部二值模式(Local Binary Pattern,LBP)采样函数,将所述每个像素点的像素值输入所述LBP采样函数得到输出的LBP特征值矩阵并保存在所述注意力层中;
将所述第一输入特征图输入所述注意力层的第一感知机单元进行感知处理得到基于亮度的注意力特征图,将所述LBP特征值矩阵经过所述注意力层的第二感知机单元进行感知处理得到基于梯度的注意力特征图,将两个注意力特征图输入所述注意力层的融合层单元进行融合处理得到融合后的注意力特征图,基于所述注意力层的点乘处理单元,将所述融合后的注意力特征图与所述第一输入特征图进行点乘处理得到输出的目标注意力特征图。
为了得到目标注意力特征图,在将第一输入特征图输入到该注意力残差层的注
意力层后,首先将第一输入特征图输入到注意力层的处理单元,根据第一输入特征图中每个像素点的像素值、以及预先保存的LBP采样函数,将每个像素点的像素值输入到LBP采样函数中,得到每个像素点对应的LBP特征值,根据每个像素点对应的LBP特征值、以及每个像素点在第一输入特征图中所在行列,将每个像素点对应的LBP特征值作为LBP特征值矩阵中对应行列的元素点的元素值,从而得到LBP特征值矩阵并保存在注意力层中。
将第一输入特征图输入到注意力层的第一感知机单元,对第一输入特征图进行感知处理得到基于亮度的注意力特征图;将LBP特征值矩阵输入到注意力层的第二感知机单元,对LBP特征值矩阵进行感知处理得到基于梯度的注意力特征图,其中第一感知机单元和第二感知机单元是多层的且相互独立的。
将基于亮度的注意力特征图和基于梯度的注意力特征图输入到注意力层的融合层单元,对基于亮度的注意力特征图和基于梯度的注意力特征图进行融合处理,得到融合后的注意力特征图,将融合后的注意力特征图与第一输入特征图输入到注意力层的点乘处理单元,对融合后的注意力特征图与第一输入特征图进行点乘处理,即根据融合后的注意力特征图和第一输入特征图中每个位置的像素点的像素值,将融合后的注意力特征图中每一行列的像素点的像素值与第一输入特征图中对应行列的像素点的像素值相乘,得到目标注意力特征图中每一行列的像素点的像素值。
在一些实施例中,注意力层包括处理单元、第一感知机单元、第二感知机单元、融合层单元以及点乘处理单元。
处理单元用于对第一输入特征图提取特征值得到LBP特征值矩阵。处理单元的输入包括第一输入特征图,处理单元的输出包括LBP特征值矩阵。在一些实施例中,处理单元可以包括预先保存的LBP采样函数构成,LBP采样函数接收第一输入特征图中每个像素点的像素值作为输入,并输出第一输入特征图中每个像素点对应的LBP特征值,处理单元根据第一输入特征图中每个像素点对应的LBP特征值、以及第一输入特征图中每个像素点在第一输入特征图中所在行列,将第一输入特征图中每个像素点对应的LBP特征值作为LBP特征值矩阵中对应行列的元素点的元素值,从而得到LBP特征值矩阵。在一些实施例中,LBP采样函数可以选用多种模式,比如原始LBP特征、圆形LBP特征、等价模式等。
第一感知机单元用于对第一输入特征图提取基于亮度的注意力特征图。第一感知机单元的输入包括第一输入特征图,第一感知机单元的输出包括基于亮度的注意力特征图。第一感知机单元的模型类型可以是自注意力网络Non-local Networks(NLNet)等。
第二感知机单元用于对第一输入特征图提取基于梯度的注意力特征图。第二感知机单元的输入包括LBP特征值矩阵,第二感知机单元的输出包括基于梯度的注意力特征图。第二感知机单元的模型类型可以是梯度加权类激活映射Gradient-weighted Class Activation Mapping(Grad-CAM)等。
在一些实施例中,第一感知机单元和第二感知机单元是多层的且相互独立的。
融合层单元用于对基于亮度的注意力特征图和基于梯度的注意力特征图进行融合。融合层单元的输入包括基于亮度的注意力特征图和基于梯度的注意力特征图,融合
层单元的输出包括融合后的注意力特征图。融合层单元的模型类型可以是Transformer模型等。
点乘处理单元用于对融合后的注意力特征图与第一输入特征图进行点乘处理,即根据融合后的注意力特征图和第一输入特征图中每个位置的像素点的像素值,将融合后的注意力特征图中每一行列的像素点的像素值与第一输入特征图中对应行列的像素点的像素值相乘,得到目标注意力特征图中每一行列的像素点的像素值。点乘处理单元的输入包括第一输入特征图和融合后的注意力特征图,点乘处理单元的输出包括目标注意力特征图。
本说明书一些实施例中,通过将第一输入特征图中每个像素点的像素值输入LBP采样函数得到输出的LBP特征值矩阵并保存在注意力层中;然后将第一输入特征图输入注意力层的第一感知机单元进行感知处理得到基于亮度的注意力特征图,将LBP特征值矩阵经过注意力层的第二感知机单元进行感知处理得到基于梯度的注意力特征图,然后将基于亮度的注意力特征图和基于梯度的注意力特征图进行融合处理,提取目标注意力特征图时同时考虑了亮度和梯度,使得目标注意力特征图提取的特征信息更加准确全面。
图3为本发明实施例提供的一种混合注意力神经网络模型的注意力层对第一输入特征图进行感知处理的过程示意图,如图3所示,将第一输入特征图经过第一多层感知机后输入到融合层,将第一输入特征图经过确定单元后得到LBP特征值矩阵,将LBP特征值矩阵经过第二多层感知机后输入到融合层,将融合层输出的融合特征图与第一输入特征图进行点乘处理得到输出的目标注意力特征图。
作为一种可能的实施方式,在本发明实施例中,还可以是该电子设备在获取到目标图像后,根据目标图像以及预先保存的LBP采样函数得到输出的LBP特征值矩阵并保存在每个注意力残差层的注意力层中。
下面通过一个具体的实施例对本发明的注意力层对第一输入特征图进行感知处理的过程进行说明,根据目标图像I以及LBP采样函数Fs,将目标图像的每个像素点的像素值输入到处理单元中的LBP采样函数,整合后得到LBP特征值矩阵FeaLbp,其中FeaLbp=Fs(I);将输入的第一输入特征图和LBP特征值矩阵分别输入两个独立的多层感知机(即第一感知机单元以及第二感知机单元),经过多层感知机处理后输出基于亮度的注意力特征图AttLuma和基于梯度的注意力特征图AttLbp,其中AttLuma=MLPLuma(FeaLuma),AttLbp=MLPLbpa(FeaLbp),MLPLuma和MLPLbpa分别表示两个独立的感知机,FeaLuma表示第一输入特征图。
将两个注意力特征图在融合层进行融合,得到融合后的注意力特征图AttFinal,其中AttFinal=FFus(AttLbp+AttLuma),其中FFus表示融合层;将融合后的注意力特征图与第一输入特征图进行点乘处理,得到添加注意力特征图的目标注意力特征图
实施例4:
为了提高特征提取的准确度,在上述各实施例的基础上,在本发明实施例中,每个注意力残差层还包括第一卷积层、第一激活层、第二卷积层和第二激活层,所述基
于该注意力残差层的注意力层对第一输入特征图进行感知处理,得到感知处理后注意力层输出的目标注意力特征图之前,所述方法还包括:
将所述第一输入特征图输入该注意力残差层的第一卷积层进行卷积处理、经过第一激活层进行激活处理、经过第二卷积层进行卷积处理,经过第二激活层进行激活处理得到第二输入特征图,并将所述第二输入特征图输入该注意力残差层的注意力层进行后续处理。
为了提高特征提取的准确度,每个注意力残差层还可以包括第一卷积层、第一激活层、第二卷积层和第二激活层,在将第一输入特征图输入到该注意力残差层之后,将第一输入特征图经过第一卷积层进行卷积处理,将卷积处理后的第一输入特征图输入到第一激活层进行激活处理,将激活处理后的第一输入特征图输入到第二卷积层进行卷积处理,再将经过第二卷积层卷积处理后的第一输入特征图输入到第二激活层进行激活处理,得到第二激活层输出的第二输入特征图。
将第二输入特征图输入到该注意力残差层的注意力层进行感知处理,得到感知处理后注意力层输出的目标注意力特征图,基于该注意力残差层的残差相加处理层,将目标注意力特征图以及第一输入特征图的对应行列的像素点的像素值相加,得到该注意力残差层的输出特征图。
第一卷积层用于对第一输入特征图进行卷积处理。第一卷积层的输入包括第一输入特征图,第一卷积层的输出包括第一卷积特征图。第一卷积层的模型类型可以是卷积神经网络CNN等。
第一激活层用于对第一卷积特征图进行激活处理。第一激活层的输入包括第一卷积特征图,第一激活层的输出包括第一激活特征图。第一激活层可以通过各种激活函数实现,比如Sigmoid激活函数、双曲正切激活函数、ReLU激活函数等。
第二卷积层用于对第一激活特征图进行卷积处理。第二卷积层的输入包括第一激活特征图,第二卷积层的输出包括第二卷积特征图。第二卷积层的模型类型可以是卷积神经网络CNN等。
第二激活层用于对第二卷积特征图进行激活处理。第二激活层的输入包括第二卷积特征图,第二激活层的输出包括第二输入特征图。第二激活层可以通过各种激活函数实现,比如Sigmoid激活函数、双曲正切激活函数、ReLU激活函数等。
本说明书一些实施例中,通过将第一输入特征图经过两次卷积层和激活层的组合后提取的第二输入特征图输送到注意力层计算并添加注意力,通过添加注意力,使得目标注意力特征图和输出特征图可以更加关注图像中的细节从而提升对细节增强的能力。
图4为本发明实施例提供的一种注意力残差层提取输出特征图的过程示意图,如图4所示,将第一输入特征图经过卷积层、激活层、卷积层、激活层后得到第二输入特征图,将第二输入特征图输入到注意力层后得到目标注意力特征图,将目标注意力特征图和第一输入特征图进行残差相加处理,得到输出特征图。
实施例5:
图5为本发明实施例提供的一种图像处理装置的结构示意图,如图5所示,该装置包括:
获取模块501,用于获取待处理的目标图像;
处理模块502,用于基于预先训练完成的混合注意力超分辨网络模型的卷积层,对输入的所述目标图像进行卷积处理得到浅层特征图;基于所述混合注意力超分辨网络模型中设定数量的串联的注意力残差层,将所述浅层特征图顺序经过每个注意力残差层进行感知处理,得到最后一个注意力残差层输出的深层特征图,基于所述混合注意力超分辨网络模型的目标残差相加处理层,将所述浅层特征图与所述深层特征图的对应行列的像素点的像素值相加,得到处理后的目标特征图;基于所述混合注意力超分辨网络模型中的上采样层,将所述目标特征图输入到所述上采样层,得到输出的细节增强后的高分辨率图像。
进一步地,所述处理模块502,具体用于针对所述混合注意力超分辨网络模型中的每个注意力残差层,若该注意力残差层为首个注意力残差层,则将所述浅层特征图作为第一输入特征图输入该注意力残差层,若该注意力残差层非为首个注意力残差层,则将上一注意力残差层的输出特征图作为该注意力残差层的第一输入特征图;基于该注意力残差层的注意力层对第一输入特征图进行感知处理,得到感知处理后注意力层输出的目标注意力特征图,基于该注意力残差层的残差相加处理层,将所述目标注意力特征图以及所述第一输入特征图的对应行列的像素点的像素值相加,得到该注意力残差层的输出特征图,直到得到最后一个注意力残差层输出的深层特征图。
进一步地,所述处理模块502,具体用于基于该注意力残差层的注意力层的处理单元,根据所述第一输入特征图中每个像素点的像素值、以及所述注意力层中预先保存的局部二值模式LBP采样函数,将所述每个像素点的像素值输入所述LBP采样函数得到输出的LBP特征值矩阵并保存在所述注意力层中;将所述第一输入特征图输入所述注意力层的第一感知机单元进行感知处理得到基于亮度的注意力特征图,将所述LBP特征值矩阵经过所述注意力层的第二感知机单元进行感知处理得到基于梯度的注意力特征图,将两个注意力特征图输入所述注意力层的融合层单元进行融合处理得到融合后的注意力特征图,基于所述注意力层的点乘处理单元,将所述融合后的注意力特征图与所述第一输入特征图进行点乘处理得到输出的目标注意力特征图。
进一步地,每个注意力残差层还包括第一卷积层、第一激活层、第二卷积层和第二激活层,所述处理模块502,具体用于将所述第一输入特征图输入该注意力残差层的第一卷积层进行卷积处理、经过第一激活层进行激活处理、经过第二卷积层进行卷积处理,经过第二激活层进行激活处理得到第二输入特征图,并将所述第二输入特征图输入该注意力残差层的注意力层进行后续处理。
实施例6:
图6为本发明实施例提供的一种电子设备结构示意图,在上述各实施例的基础上,本申请还提供了一种电子设备,如图6所示,包括:处理器601、通信接口602、存储器603和通信总线604,其中,处理器601,通信接口602,存储器603通过通信总线604完成相互间的通信。
所述存储器603中存储有计算机程序,当所述程序被所述处理器601执行时,使得所述处理器601执行如下步骤:
获取待处理的目标图像,基于预先训练完成的混合注意力超分辨网络模型的卷
积层,对输入的所述目标图像进行卷积处理得到浅层特征图;
基于所述混合注意力超分辨网络模型中设定数量的串联的注意力残差层,将所述浅层特征图顺序经过每个注意力残差层进行感知处理,得到最后一个注意力残差层输出的深层特征图,基于所述混合注意力超分辨网络模型的目标残差相加处理层,将所述浅层特征图与所述深层特征图的对应行列的像素点的像素值相加,得到处理后的目标特征图;基于所述混合注意力超分辨网络模型中的上采样层,将所述目标特征图输入到所述上采样层,得到输出的细节增强后的高分辨率图像。
进一步地,所述基于所述混合注意力超分辨网络模型中设定数量的串联的注意力残差层,将所述浅层特征图顺序经过每个注意力残差层进行感知处理,得到最后一个注意力残差层输出的深层特征图包括:
针对所述混合注意力超分辨网络模型中的每个注意力残差层,若该注意力残差层为首个注意力残差层,则将所述浅层特征图作为第一输入特征图输入该注意力残差层,若该注意力残差层非为首个注意力残差层,则将上一注意力残差层的输出特征图作为该注意力残差层的第一输入特征图;基于该注意力残差层的注意力层对第一输入特征图进行感知处理,得到感知处理后注意力层输出的目标注意力特征图,基于该注意力残差层的残差相加处理层,将所述目标注意力特征图以及所述第一输入特征图的对应行列的像素点的像素值相加,得到该注意力残差层的输出特征图,直到得到最后一个注意力残差层输出的深层特征图。
进一步地,所述基于该注意力残差层的注意力层对第一输入特征图进行感知处理,得到感知处理后注意力层输出的目标注意力特征图包括:
基于该注意力残差层的注意力层的处理单元,根据所述第一输入特征图中每个像素点的像素值、以及所述注意力层中预先保存的局部二值模式LBP采样函数,将所述每个像素点的像素值输入所述LBP采样函数得到输出的LBP特征值矩阵并保存在所述注意力层中;
将所述第一输入特征图输入所述注意力层的第一感知机单元进行感知处理得到基于亮度的注意力特征图,将所述LBP特征值矩阵经过所述注意力层的第二感知机单元进行感知处理得到基于梯度的注意力特征图,将两个注意力特征图输入所述注意力层的融合层单元进行融合处理得到融合后的注意力特征图,基于所述注意力层的点乘处理单元,将所述融合后的注意力特征图与所述第一输入特征图进行点乘处理得到输出的目标注意力特征图。
进一步地,每个注意力残差层还包括第一卷积层、第一激活层、第二卷积层和第二激活层,所述基于该注意力残差层的注意力层对第一输入特征图进行感知处理,得到感知处理后注意力层输出的目标注意力特征图之前,所述方法还包括:
将所述第一输入特征图输入该注意力残差层的第一卷积层进行卷积处理、经过第一激活层进行激活处理、经过第二卷积层进行卷积处理,经过第二激活层进行激活处理得到第二输入特征图,并将所述第二输入特征图输入该注意力残差层的注意力层进行后续处理。
上述电子设备提到的通信总线可以是外设部件互连标准(Peripheral Component Interconnect,PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture,
EISA)总线等。该通信总线可以分为地址总线、数据总线、控制总线等。为便于表示,图中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
通信接口602用于上述电子设备与其他设备之间的通信。
存储器可以包括随机存取存储器(Random Access Memory,RAM),也可以包括非易失性存储器(Non-Volatile Memory,NVM),例如至少一个磁盘存储器。可选地,存储器还可以是至少一个位于远离前述处理器的存储装置。
上述处理器可以是通用处理器,包括中央处理器、网络处理器(Network Processor,NP)等;还可以是数字指令处理器(Digital Signal Processing,DSP)、专用集成电路、现场可编程门陈列或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。
实施例7:
在上述各实施例的基础上,本申请还提供了一种计算机可读存储介质,所述计算机可读存储介质内存储有可由处理器执行的计算机程序,当所述程序在所述处理器上运行时,使得所述处理器执行时实现如下步骤:
获取待处理的目标图像,基于预先训练完成的混合注意力超分辨网络模型的卷积层,对输入的所述目标图像进行卷积处理得到浅层特征图;
基于所述混合注意力超分辨网络模型中设定数量的串联的注意力残差层,将所述浅层特征图顺序经过每个注意力残差层进行感知处理,得到最后一个注意力残差层输出的深层特征图,基于所述混合注意力超分辨网络模型的目标残差相加处理层,将所述浅层特征图与所述深层特征图的对应行列的像素点的像素值相加,得到处理后的目标特征图;基于所述混合注意力超分辨网络模型中的上采样层,将所述目标特征图输入到所述上采样层,得到输出的细节增强后的高分辨率图像。
进一步地,所述基于所述混合注意力超分辨网络模型中设定数量的串联的注意力残差层,将所述浅层特征图顺序经过每个注意力残差层进行感知处理,得到最后一个注意力残差层输出的深层特征图包括:
针对所述混合注意力超分辨网络模型中的每个注意力残差层,若该注意力残差层为首个注意力残差层,则将所述浅层特征图作为第一输入特征图输入该注意力残差层,若该注意力残差层非为首个注意力残差层,则将上一注意力残差层的输出特征图作为该注意力残差层的第一输入特征图;基于该注意力残差层的注意力层对第一输入特征图进行感知处理,得到感知处理后注意力层输出的目标注意力特征图,基于该注意力残差层的残差相加处理层,将所述目标注意力特征图以及所述第一输入特征图的对应行列的像素点的像素值相加,得到该注意力残差层的输出特征图,直到得到最后一个注意力残差层输出的深层特征图。
进一步地,所述基于该注意力残差层的注意力层对第一输入特征图进行感知处理,得到感知处理后注意力层输出的目标注意力特征图包括:
基于该注意力残差层的注意力层的处理单元,根据所述第一输入特征图中每个像素点的像素值、以及所述注意力层中预先保存的局部二值模式LBP采样函数,将所述每个像素点的像素值输入所述LBP采样函数得到输出的LBP特征值矩阵并保存在所述注意力层中;
将所述第一输入特征图输入所述注意力层的第一感知机单元进行感知处理得到基于亮度的注意力特征图,将所述LBP特征值矩阵经过所述注意力层的第二感知机单元进行感知处理得到基于梯度的注意力特征图,将两个注意力特征图输入所述注意力层的融合层单元进行融合处理得到融合后的注意力特征图,基于所述注意力层的点乘处理单元,将所述融合后的注意力特征图与所述第一输入特征图进行点乘处理得到输出的目标注意力特征图。
进一步地,每个注意力残差层还包括第一卷积层、第一激活层、第二卷积层和第二激活层,所述基于该注意力残差层的注意力层对第一输入特征图进行感知处理,得到感知处理后注意力层输出的目标注意力特征图之前,所述方法还包括:
将所述第一输入特征图输入该注意力残差层的第一卷积层进行卷积处理、经过第一激活层进行激活处理、经过第二卷积层进行卷积处理,经过第二激活层进行激活处理得到第二输入特征图,并将所述第二输入特征图输入该注意力残差层的注意力层进行后续处理。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的精神和范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。
Claims (10)
- 一种图像处理方法,其特征在于,所述方法包括:获取待处理的目标图像,基于预先训练完成的混合注意力超分辨网络模型的卷积层,对输入的所述目标图像进行卷积处理得到浅层特征图;基于所述混合注意力超分辨网络模型中设定数量的串联的注意力残差层,将所述浅层特征图顺序经过每个注意力残差层进行感知处理,得到最后一个注意力残差层输出的深层特征图,基于所述混合注意力超分辨网络模型的目标残差相加处理层,将所述浅层特征图与所述深层特征图的对应行列的像素点的像素值相加,得到处理后的目标特征图;基于所述混合注意力超分辨网络模型中的上采样层,将所述目标特征图输入到所述上采样层,得到输出的细节增强后的高分辨率图像。
- 根据权利要求1所述的方法,其特征在于,所述基于所述混合注意力超分辨网络模型中设定数量的串联的注意力残差层,将所述浅层特征图顺序经过每个注意力残差层进行感知处理,得到最后一个注意力残差层输出的深层特征图包括:针对所述混合注意力超分辨网络模型中的每个注意力残差层,若该注意力残差层为首个注意力残差层,则将所述浅层特征图作为第一输入特征图输入该注意力残差层,若该注意力残差层非为首个注意力残差层,则将上一注意力残差层的输出特征图作为该注意力残差层的第一输入特征图;基于该注意力残差层的注意力层对第一输入特征图进行感知处理,得到感知处理后注意力层输出的目标注意力特征图,基于该注意力残差层的残差相加处理层,将所述目标注意力特征图以及所述第一输入特征图的对应行列的像素点的像素值相加,得到该注意力残差层的输出特征图,直到得到最后一个注意力残差层输出的深层特征图。
- 根据权利要求2所述的方法,其特征在于,所述基于该注意力残差层的注意力层对第一输入特征图进行感知处理,得到感知处理后注意力层输出的目标注意力特征图包括:基于该注意力残差层的注意力层的处理单元,根据所述第一输入特征图中每个像素点的像素值、以及所述注意力层中预先保存的局部二值模式LBP采样函数,将所述每个像素点的像素值输入所述LBP采样函数得到输出的LBP特征值矩阵并保存在所述注意力层中;将所述第一输入特征图输入所述注意力层的第一感知机单元进行感知处理得到基于亮度的注意力特征图,将所述LBP特征值矩阵经过所述注意力层的第二感知机单元进行感知处理得到基于梯度的注意力特征图,将两个注意力特征图输入所述注意力层的融合层单元进行融合处理得到融合后的注意力特征图,基于所述注意力层的点乘处理单元,将所述融合后的注意力特征图与所述第一输入特征图进行点乘处理得到输出的目标 注意力特征图。
- 根据权利要求2所述的方法,其特征在于,每个注意力残差层还包括第一卷积层、第一激活层、第二卷积层和第二激活层,所述基于该注意力残差层的注意力层对第一输入特征图进行感知处理,得到感知处理后注意力层输出的目标注意力特征图之前,所述方法还包括:将所述第一输入特征图输入该注意力残差层的第一卷积层进行卷积处理、经过第一激活层进行激活处理、经过第二卷积层进行卷积处理,经过第二激活层进行激活处理得到第二输入特征图,并将所述第二输入特征图输入该注意力残差层的注意力层进行后续处理。
- 一种图像处理装置,其特征在于,所述装置包括:获取模块,用于获取待处理的目标图像;处理模块,用于基于预先训练完成的混合注意力超分辨网络模型的卷积层,对输入的所述目标图像进行卷积处理得到浅层特征图;基于所述混合注意力超分辨网络模型中设定数量的串联的注意力残差层,将所述浅层特征图顺序经过每个注意力残差层进行感知处理,得到最后一个注意力残差层输出的深层特征图,基于所述混合注意力超分辨网络模型的目标残差相加处理层,将所述浅层特征图与所述深层特征图的对应行列的像素点的像素值相加,得到处理后的目标特征图;基于所述混合注意力超分辨网络模型中的上采样层,将所述目标特征图输入到所述上采样层,得到输出的细节增强后的高分辨率图像。
- 根据权利要求5所述的装置,其特征在于,所述处理模块,具体用于针对所述混合注意力超分辨网络模型中的每个注意力残差层,若该注意力残差层为首个注意力残差层,则将所述浅层特征图作为第一输入特征图输入该注意力残差层,若该注意力残差层非为首个注意力残差层,则将上一注意力残差层的输出特征图作为该注意力残差层的第一输入特征图;基于该注意力残差层的注意力层对第一输入特征图进行感知处理,得到感知处理后注意力层输出的目标注意力特征图,基于该注意力残差层的残差相加处理层,将所述目标注意力特征图以及所述第一输入特征图的对应行列的像素点的像素值相加,得到该注意力残差层的输出特征图,直到得到最后一个注意力残差层输出的深层特征图。
- 根据权利要求6所述的装置,其特征在于,所述处理模块,具体用于基于该注意力残差层的注意力层的处理单元,根据所述第一输入特征图中每个像素点的像素值、 以及所述注意力层中预先保存的局部二值模式LBP采样函数,将所述每个像素点的像素值输入所述LBP采样函数得到输出的LBP特征值矩阵并保存在所述注意力层中;将所述第一输入特征图输入所述注意力层的第一感知机单元进行感知处理得到基于亮度的注意力特征图,将所述LBP特征值矩阵经过所述注意力层的第二感知机单元进行感知处理得到基于梯度的注意力特征图,将两个注意力特征图输入所述注意力层的融合层单元进行融合处理得到融合后的注意力特征图,基于所述注意力层的点乘处理单元,将所述融合后的注意力特征图与所述第一输入特征图进行点乘处理得到输出的目标注意力特征图。
- 根据权利要求6所述的装置,其特征在于,每个注意力残差层还包括第一卷积层、第一激活层、第二卷积层和第二激活层,所述处理模块,具体用于将所述第一输入特征图输入该注意力残差层的第一卷积层进行卷积处理、经过第一激活层进行激活处理、经过第二卷积层进行卷积处理,经过第二激活层进行激活处理得到第二输入特征图,并将所述第二输入特征图输入该注意力残差层的注意力层进行后续处理。
- 一种电子设备,其特征在于,包括:处理器、通信接口、存储器和通信总线,其中,处理器,通信接口,存储器通过通信总线完成相互间的通信;所述存储器中存储有计算机程序,当所述程序被所述处理器执行时,使得所述处理器执行权利要求1-4任一项所述图像处理方法的步骤。
- 一种计算机可读存储介质,其特征在于,其存储有可由处理器执行的计算机程序,当所述程序在所述处理器上运行时,使得所述处理器执行权利要求1-4任一项所述图像处理方法的步骤。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310090705.6A CN116051378A (zh) | 2023-01-17 | 2023-01-17 | 一种图像处理方法、装置、设备和介质 |
CN202310090705.6 | 2023-01-17 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024153156A1 true WO2024153156A1 (zh) | 2024-07-25 |
Family
ID=86129443
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2024/072880 WO2024153156A1 (zh) | 2023-01-17 | 2024-01-17 | 一种图像处理方法、装置、设备和介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN116051378A (zh) |
WO (1) | WO2024153156A1 (zh) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116051378A (zh) * | 2023-01-17 | 2023-05-02 | 浙江华感科技有限公司 | 一种图像处理方法、装置、设备和介质 |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111429355A (zh) * | 2020-03-30 | 2020-07-17 | 新疆大学 | 一种基于生成对抗网络的图像超分辨率重建方法 |
CN113592718A (zh) * | 2021-08-12 | 2021-11-02 | 中国矿业大学 | 基于多尺度残差网络的矿井图像超分辨率重建方法及系统 |
CN114140353A (zh) * | 2021-11-25 | 2022-03-04 | 苏州大学 | 一种基于通道注意力的Swin-Transformer图像去噪方法及系统 |
US20220201242A1 (en) * | 2020-05-29 | 2022-06-23 | Boe Technology Group Co., Ltd. | Method, device and computer readable storage medium for video frame interpolation |
CN115393186A (zh) * | 2022-07-22 | 2022-11-25 | 武汉工程大学 | 一种人脸图像超分辨率重建方法、系统、设备及介质 |
CN115496651A (zh) * | 2021-06-02 | 2022-12-20 | 武汉Tcl集团工业研究院有限公司 | 特征处理方法、装置、计算机可读存储介质及电子设备 |
CN116051378A (zh) * | 2023-01-17 | 2023-05-02 | 浙江华感科技有限公司 | 一种图像处理方法、装置、设备和介质 |
-
2023
- 2023-01-17 CN CN202310090705.6A patent/CN116051378A/zh active Pending
-
2024
- 2024-01-17 WO PCT/CN2024/072880 patent/WO2024153156A1/zh unknown
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111429355A (zh) * | 2020-03-30 | 2020-07-17 | 新疆大学 | 一种基于生成对抗网络的图像超分辨率重建方法 |
US20220201242A1 (en) * | 2020-05-29 | 2022-06-23 | Boe Technology Group Co., Ltd. | Method, device and computer readable storage medium for video frame interpolation |
CN115496651A (zh) * | 2021-06-02 | 2022-12-20 | 武汉Tcl集团工业研究院有限公司 | 特征处理方法、装置、计算机可读存储介质及电子设备 |
CN113592718A (zh) * | 2021-08-12 | 2021-11-02 | 中国矿业大学 | 基于多尺度残差网络的矿井图像超分辨率重建方法及系统 |
CN114140353A (zh) * | 2021-11-25 | 2022-03-04 | 苏州大学 | 一种基于通道注意力的Swin-Transformer图像去噪方法及系统 |
CN115393186A (zh) * | 2022-07-22 | 2022-11-25 | 武汉工程大学 | 一种人脸图像超分辨率重建方法、系统、设备及介质 |
CN116051378A (zh) * | 2023-01-17 | 2023-05-02 | 浙江华感科技有限公司 | 一种图像处理方法、装置、设备和介质 |
Also Published As
Publication number | Publication date |
---|---|
CN116051378A (zh) | 2023-05-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20210183022A1 (en) | Image inpainting method and apparatus, computer device, and storage medium | |
CN112446383B (zh) | 车牌识别方法及装置、存储介质、终端 | |
EP3923233A1 (en) | Image denoising method and apparatus | |
WO2024153156A1 (zh) | 一种图像处理方法、装置、设备和介质 | |
CN110598714A (zh) | 一种软骨图像分割方法、装置、可读存储介质及终端设备 | |
CN111476719A (zh) | 图像处理方法、装置、计算机设备及存储介质 | |
CN110781980B (zh) | 目标检测模型的训练方法、目标检测方法及装置 | |
US11244426B2 (en) | Method for image super resolution imitating optical zoom implemented on a resource-constrained mobile device, and a mobile device implementing the same | |
CN111986202B (zh) | 青光眼辅助诊断装置、方法及存储介质 | |
CN111091604B (zh) | 快速成像模型的训练方法、装置及服务器 | |
CN115761258A (zh) | 一种基于多尺度融合与注意力机制的图像方向预测方法 | |
CN112700460A (zh) | 图像分割方法及系统 | |
CN112418243A (zh) | 特征提取方法、装置及电子设备 | |
CN110717864B (zh) | 一种图像增强方法、装置、终端设备及计算机可读介质 | |
CN113744280B (zh) | 图像处理方法、装置、设备及介质 | |
Li et al. | Underwater Imaging Formation Model‐Embedded Multiscale Deep Neural Network for Underwater Image Enhancement | |
CN112132753B (zh) | 多尺度结构引导图像的红外图像超分辨率方法及系统 | |
CN116704206B (zh) | 图像处理方法、装置、计算机设备和存储介质 | |
CN111899263B (zh) | 图像分割方法、装置、计算机设备及存储介质 | |
CN115937358A (zh) | 图像处理方法及其装置、电子设备和存储介质 | |
CN111524072B (zh) | 超分辨重构网络训练方法和装置、电子设备及存储介质 | |
CN112365515A (zh) | 一种基于密集感知网络的边缘检测方法、装置及设备 | |
US10832076B2 (en) | Method and image processing entity for applying a convolutional neural network to an image | |
CN111126568A (zh) | 图像处理方法及装置、电子设备及计算机可读存储介质 | |
CN114648468B (zh) | 图像处理方法、装置、终端设备及计算机可读存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24744330 Country of ref document: EP Kind code of ref document: A1 |