WO2021155675A1 - 图像处理方法、装置、计算机可读存储介质和计算机设备 - Google Patents

图像处理方法、装置、计算机可读存储介质和计算机设备 Download PDF

Info

Publication number
WO2021155675A1
WO2021155675A1 PCT/CN2020/122160 CN2020122160W WO2021155675A1 WO 2021155675 A1 WO2021155675 A1 WO 2021155675A1 CN 2020122160 W CN2020122160 W CN 2020122160W WO 2021155675 A1 WO2021155675 A1 WO 2021155675A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature map
image
codec
processing unit
size
Prior art date
Application number
PCT/CN2020/122160
Other languages
English (en)
French (fr)
Inventor
高宏运
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2021155675A1 publication Critical patent/WO2021155675A1/zh
Priority to US17/727,042 priority Critical patent/US20220253974A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4046Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks

Definitions

  • This application relates to the field of computer technology, in particular to an image processing method, device, computer-readable storage medium, and computer equipment.
  • Image restoration is a common problem in daily life, and its goal is to restore images that are irreversibly and complexly damaged during the imaging process.
  • the picture When the user is in a dark or dynamic scene, the picture usually has varying degrees of noise or blur.
  • the image restoration algorithm can be used to reconstruct the detailed information lost due to blur or noise.
  • traditional image restoration methods can only address a certain type of specific problem. For example, for image deblurring, traditional image restoration methods can only remove the problem of image blur caused by a certain movement of translation or rotation. Image denoising also has the same problem.
  • Traditional image restoration methods are aimed at a certain type of specific noise, such as removing Gaussian noise, Poisson noise, etc. to achieve image feature reconstruction.
  • the real destruction image imaging scene is very complicated, and it also includes problems such as unclear images caused by the movement of the camera, the movement of objects in the scene, and different degrees of noise.
  • traditional image restoration methods have low noise and blur points. The processing effect of sharp images is poor.
  • An image processing method executed by a computer device, including:
  • the target codec Perform feature reconstruction on the spliced feature maps through the target codec to obtain a target image, the definition of the target image is higher than the definition of the image to be processed; the target codec contains the first preset number The first codec is at least one pair of the first preset number of pairs of codecs.
  • An image processing device comprising:
  • An acquiring module configured to acquire an image to be processed, and scale the image to be processed from a first size to a second size
  • the first reconstruction module is configured to perform feature reconstruction on the to-be-processed image of the second size through the first codec to obtain the first feature map;
  • a splicing module configured to enlarge the first feature map to a first size, and perform splicing processing on the image to be processed of the first size and the first feature map of the first size;
  • the target reconstruction module is used to perform feature reconstruction on the spliced feature maps through a target codec to obtain a target image, the definition of the target image is higher than the definition of the image to be processed; in the target codec It includes a first preset number of paired codecs, and the first codec is at least one pair of the first preset number of paired codecs.
  • a computer-readable storage medium that stores a computer program, and when the computer program is executed by a processor, the processor executes the following steps:
  • the target codec Perform feature reconstruction on the spliced feature maps through the target codec to obtain a target image, the definition of the target image is higher than the definition of the image to be processed; the target codec contains the first preset number The first codec is at least one pair of the first preset number of pairs of codecs.
  • a computer device includes a memory and a processor, the memory stores a computer program, and when the computer program is executed by the processor, the processor executes the following steps:
  • the target codec Perform feature reconstruction on the spliced feature maps through the target codec to obtain a target image, the definition of the target image is higher than the definition of the image to be processed; the target codec contains the first preset number The first codec is at least one pair of the first preset number of pairs of codecs.
  • Fig. 1 is an application environment diagram of an image processing method in an embodiment
  • FIG. 2 is a schematic flowchart of an image processing method in an embodiment
  • Fig. 3 is a schematic flow chart of the steps of reconstructing the mid-scale features of the image to be processed in an embodiment
  • Figure 4 is a diagram of the internal structure of an encoder and a decoder in an embodiment
  • Figure 5 is a diagram of the internal structure of an encoder and a decoder in another embodiment
  • Fig. 6 is a schematic flow chart of the steps in which the processing unit of the first codec outputs a characteristic map in an embodiment
  • Figure 7 is an internal structure diagram of a processing unit according to an embodiment
  • FIG. 8 is a schematic flowchart of the steps of outputting a characteristic map by a processing unit of a second codec in an embodiment
  • Figure 9(a) is a network structure diagram of an image processing method in an embodiment
  • Figure 9(b) is a network structure diagram of an image processing method in another embodiment
  • Figure 10 (a) is an embodiment of the image processing method of this solution and the processing results of the traditional multiple image processing methods in terms of deblurring;
  • Figure 10(b) shows the processing results of the image processing method of this solution and the traditional multiple image processing methods in terms of noise removal in an embodiment
  • FIG. 11 is a schematic flowchart of an image processing method applied to video processing in an embodiment
  • Figure 12 is a structural block diagram of an image processing device in another embodiment
  • Fig. 13 is a structural block diagram of a computer device in an embodiment.
  • Fig. 1 is an application environment diagram of an image processing method in an embodiment. 1, the image processing method is applied to an image processing system.
  • the image processing system includes a terminal 110 and a server 120.
  • the terminal 110 and the server 120 are connected through a network.
  • the terminal 110 may specifically be a desktop terminal or a mobile terminal, and the mobile terminal may specifically be at least one of a mobile phone, a tablet computer, and a notebook computer.
  • the server 120 may be implemented as an independent server or a server cluster composed of multiple servers.
  • the terminal 110 can independently implement the image processing method.
  • the terminal 110 can also implement the image processing method through interaction with the server 120.
  • the terminal 110 may obtain the image to be processed, and send the image to be processed to the server 120.
  • the server 120 receives the image to be processed, and scales the image to be processed from the first size to the second size.
  • the server 120 performs feature reconstruction on the to-be-processed image of the second size through the first codec to obtain the first feature map.
  • the server 120 enlarges the first feature map to a first size, and performs stitching processing on the image to be processed of the first size and the first feature map of the first size.
  • the server 120 performs feature reconstruction on the spliced feature maps through the target codec to obtain a target image.
  • the definition of the target image is higher than the definition of the image to be processed; the target codec contains a first preset number of A pair of codecs, and the first codec is at least one pair of the first preset number of pairs of codecs.
  • the server 120 returns the target image to the terminal 110.
  • the terminal obtains a face image with a size of 300*400, and the face area in the face image is blurred, and the terminal performs feature extraction and feature reconstruction on the face image through the processing method. In order to obtain a clear face image of the face area.
  • the terminal scales the 300*400 face image by 1/4 size to obtain a 75*100 face image.
  • the 75*100 face image shows the coarse-scale features of the face area.
  • the terminal reconstructs the 75*100 face image features through a pair of codecs to obtain a 75*100 feature map, thereby reconstructing the coarse-scale features of the face region in the face image.
  • the terminal enlarges the 75*100 feature map twice in size to obtain a 150*200 feature map.
  • the 300*400 face image is scaled to 1/2 size to obtain a 150*200 face image.
  • the 150*200 face image and the 150*200 feature map show the mid-scale features of the face area.
  • the terminal stitches the 150*200 feature map and the 150*200 face image.
  • the stitched feature maps are reconstructed to obtain a reconstructed feature map of 150*200, thereby reconstructing the mid-scale features of the face region in the face image.
  • One of the two pairs of encoders is the same as the codec used in the coarse-scale feature reconstruction.
  • a face image with a size of 300*400 displays the fine-scale features of the face area.
  • the terminal enlarges the 150*200 feature map twice in size to obtain a 300*400 feature map.
  • the feature map with a size of 300*400 also shows the fine-scale features of the face area.
  • the terminal stitches the 300*400 feature map and the 300*400 face image.
  • Three pairs of codecs are used to perform feature reconstruction on the stitched feature maps to obtain a 300*400 target image, thereby reconstructing the fine-scale features of the face region in the face image.
  • Two of the three pairs of encoders are the same as the two pairs of codecs used in the mid-scale feature reconstruction.
  • the blurred face region can be gradually cleared. And by increasing the number of codecs to process increasingly refined feature scales, the difficulty of restoring the refined feature scales is reduced, and the accuracy of feature reconstruction is ensured, thereby obtaining a clear face image.
  • an image processing method is provided.
  • the method is mainly applied to the terminal 110 (or server 120) in FIG. 1 as an example.
  • the image processing method specifically includes the following steps:
  • Step 202 Obtain an image to be processed, and scale the image to be processed from a first size to a second size.
  • the image to be processed refers to an image with low definition, such as a blurred image or an image with noise.
  • the first size is the original size of the acquired image to be processed
  • the second size is the size of the image to be processed after scaling
  • the second size can be 1/4, 1/5, 1/6 of the first size of the image to be processed Wait.
  • the image shows less detailed features, and more features are displayed at a coarse scale, and the blur degree of the image will be significantly reduced.
  • the difficulty of reconstructing the features of the blurred image of small size will be lower than that of the original size.
  • the terminal can acquire the image to be processed, and scale the image to be processed to the second size, so that the image shows coarse-scale features, so as to first reconstruct the fuzzy regions existing in the coarse-scale features.
  • the terminal obtains the image to be processed, determines the original size of the obtained image to be processed, and uses the original size as the first size. For example, if the size of the image to be processed obtained by the terminal is 330*400, then 330*400 is used as the first size of the image to be processed. The size of the image to be processed obtained by the terminal is 450*600, and 450*600 is used as the first size of the image to be processed. Then, the terminal scales the to-be-processed image of the first size to the second size to obtain the to-be-processed image of the second size.
  • Step 204 Perform feature reconstruction on the to-be-processed image of the second size through the first codec to obtain a first feature map.
  • the first codec refers to a pair of encoder and decoder. That is, the first encoder includes an encoder and a decoder corresponding to the encoder.
  • the first codec includes a pair of codecs, and the number of the pair of codecs can be set according to requirements, for example, 1 pair, 2 pairs, etc.
  • the terminal may input the to-be-processed image of the second size into a first network
  • the first network is a network that reconstructs the coarse-scale features of the image.
  • the first network includes a first codec, and the number of the first codec can be set according to requirements.
  • the encoder in the first codec in the first network performs feature extraction on the image to be processed of the second size, and the encoder inputs the extracted feature map into the decoder corresponding to the encoder for decoding to obtain the decoder The output feature map.
  • the feature map output by the decoder is the first feature map.
  • the second-size image to be processed is encoded by the first encoder, and the previous one is encoded by the next encoder.
  • the feature map output by the encoder is encoded until the feature map output by the last encoder in the first codec is obtained.
  • the feature map output by the last encoder is input to the corresponding decoder, and the feature map is decoded by the decoder to obtain the feature map output by the last decoder.
  • the feature map output by the last decoder is the first feature map.
  • the second size may be 1/4 of the first size
  • the first codec obtains an intermediate feature map after performing feature reconstruction on the 1/4-size to-be-processed image.
  • the size of the intermediate feature map is also 1/4 of the first size.
  • the terminal enlarges the 1/4-size intermediate feature map twice to obtain a first feature map, the size of the first feature map is 1/2 of the first size.
  • Step 206 Enlarge the first feature map to a first size, and perform stitching processing on the image to be processed in the first size and the first feature map of the first size.
  • the splicing processing refers to the matrix splicing of images or the parallel connection of feature maps in the channel dimension.
  • the terminal may enlarge the first feature map to the same size as the acquired image to be processed, that is, the first size, so as to first reconstruct the fuzzy region existing in the fine-scale feature. Then, the terminal performs stitching processing on the to-be-processed image of the first size and the first feature map of the first size.
  • the terminal may determine the matrix corresponding to the image to be processed of the first size, and the matrix corresponding to the first feature map of the first size, and stitch the two matrices.
  • the terminal may determine the channel dimension of the image to be processed of the first size and the channel dimension of the first feature map of the first size, and connect the image to be processed of the first size and the first feature map of the first size in parallel according to the channel dimensions.
  • the terminal will use the R, G, B of the image to be processed of the first size
  • the channels and the R, G, and B channels of the first feature map of the first size are connected in parallel to obtain a feature map with 6 channels.
  • Step 208 Perform feature reconstruction on the spliced feature maps through the target codec to obtain a target image.
  • the definition of the target image is higher than that of the image to be processed; the target codec contains a first preset number of A pair of codecs, and the first codec is at least one pair of the first preset number of pairs of codecs.
  • the target codec refers to a pair of encoders and decoders, and the first codec is a paired codec in the target codec and is a component of the target codec.
  • the target codec includes a first preset number of paired codecs, and when the first codec includes a pair of codecs, the first codec serves as the first preset number of paired codecs A pair in the device. When the first codec includes two pairs of codecs, the first codec serves as two pairs of the first preset number of pairs of codecs, and so on.
  • the terminal may input the spliced feature map into a target network, and the target network is a network of fine-scale features of the reconstructed image.
  • the target network includes a target codec, the target codec includes a pair of codecs, and the number of paired codecs in the target codec is the first preset number. And the first codec is part of the target codec.
  • the first encoder in the target codec in the target network performs feature extraction on the spliced feature map, and uses the feature map output by the previous encoder as the input of the next encoder until the target codec is obtained The feature map output by the last encoder in.
  • the feature map output by the last encoder is input to the corresponding decoder, and the feature map is decoded by the decoder.
  • the feature map output by the previous decoder is used as the input of the next decoder, and the target image output by the last decoder in the target codec is obtained.
  • the image to be processed is acquired, the image to be processed is scaled from the first size to the second size, and the image to be processed of the second size is reconstructed by the first codec to obtain the first feature map, which can Complete the reconstruction of coarse-scale features of the image to be processed.
  • the first feature map is enlarged to the same first size as the image to be processed, and the image to be processed of the first size and the first feature map of the first size are spliced to reconstruct the fine-scale features of the image to be processed. Perform feature reconstruction on the spliced feature map through the target codec to obtain the target image.
  • the target codec contains a first preset number of paired codecs, and the first codec is the first preset At least one of the number of pairs of codecs, so as to reconstruct the low-definition to-be-processed image into a high-definition target image.
  • Step 302 Scale the to-be-processed image of the first size to the same size as the first feature map, and perform stitching processing on the first feature map and the to-be-processed image of the same size to obtain a second feature map.
  • the first feature map is a feature map that reconstructs the coarse-scale features of the image to be processed, and the terminal may further reconstruct the intermediate-scale features of the image to be processed.
  • the mid-scale feature is the feature between the coarse-scale feature and the fine-scale feature.
  • the terminal may determine the size of the first feature map, and scale the image to be processed of the first size to the same size as the first feature map.
  • the scale of the first feature map is 1/2 of the first size of the image to be processed, and the terminal scales the image to be processed of the first size to 1/2 of the first size to obtain the same size as the first feature map.
  • the image to be processed of the same size is a feature map that reconstructs the coarse-scale features of the image to be processed, and the terminal may further reconstruct the intermediate-scale features of the image to be processed.
  • the mid-scale feature is the feature between the coarse-scale feature and the fine-scale feature.
  • the terminal may determine the size of the first feature map, and scale the image to be processed of the first size to the same size
  • the terminal may perform matrix stitching between the first feature map and the scaled image to be processed of the same size to obtain a second feature map.
  • the terminal performs parallel processing on the channel dimension of the first feature map and the scaled image to be processed of the same size to obtain the second feature map.
  • Step 304 Perform feature reconstruction on the second feature map by the second codec to obtain the reconstructed first feature map;
  • the second codec includes a second preset number of paired codecs, and the first The codec is at least one pair of the second preset number of paired codecs;
  • the first preset number of paired codecs includes the second preset number of paired codecs.
  • the second codec refers to a pair of encoder and decoder.
  • the first codec as a paired codec in the second codec, is a component of the second codec.
  • the second codec includes a second preset number of pairs of codecs.
  • the first codec serves as a second preset number of pairs of codecs.
  • a pair of decoders When the first codec includes two pairs of codecs, the first codec serves as two of the second preset number of pairs of codecs, and so on.
  • the target codec includes a first preset number of paired codecs, and the first preset number of paired codecs includes a second preset number of paired codecs.
  • the terminal may input the second feature map into a second network, which is a network of intermediate-scale features of the reconstructed image.
  • the second network includes a second codec, and the second codec includes a second preset number of paired codecs, and the second preset number can be set according to requirements.
  • the first codec is used as a part of the second codec.
  • the first encoder in the second codec performs feature extraction on the second feature map, and uses the feature map output by the previous encoder as the input of the next encoder until the second feature map is obtained.
  • the feature map output by the last encoder is a network of intermediate-scale features of the reconstructed image.
  • the feature map output by the last encoder is input into the corresponding decoder, and the number of encoders and decoders in the second codec is the same.
  • the feature map output by the previous decoder is used as the input of the next decoder, and the feature map is decoded by the decoder to obtain the feature map output by the last decoder in the second codec.
  • the feature map output by the last decoder in the second codec is the reconstructed first feature map.
  • Enlarging the first feature map to a first size, and performing stitching processing on the image to be processed of the first size and the first feature map of the first size includes:
  • Step 306 Enlarge the reconstructed first feature map to a first size, and perform stitching processing on the image to be processed of the first size and the reconstructed first feature map of the first size.
  • the reconstructed first feature map is a feature map that reconstructs the mid-scale features of the image to be processed, and the terminal may further reconstruct the fine-scale features of the image to be processed.
  • the fine-scale features are more detailed and specific features than the intermediate-scale features.
  • the terminal may determine the size of the acquired image to be processed, that is, the first size, and enlarge the reconstructed first feature map to the same first size as the image to be processed. Further, if the size of the reconstructed first feature map is 1/2 of the first size, the terminal enlarges the reconstructed first feature map twice to obtain the reconstructed first feature map of the first size.
  • the terminal may perform matrix stitching on the image to be processed of the first size and the reconstructed first feature map of the first size to obtain the stitched feature map.
  • the terminal performs parallel processing on the channel dimension of the to-be-processed image of the first size and the reconstructed first feature map of the first size to obtain a spliced feature map.
  • the terminal can input the spliced feature map into the target network, and perform feature reconstruction on the spliced feature map through the target encoder in the target network to reconstruct the fine-scale features of the image to be processed, thereby obtaining a high-definition Target image.
  • the size of the target image is the first size.
  • the image to be processed in the first size is scaled to the same size as the first feature map, and the first feature map and the image to be processed in the same size are stitched together to obtain the second feature map.
  • the decoder performs feature reconstruction on the second feature map to reconstruct the mid-scale features of the image to be processed to obtain the reconstructed first feature map;
  • the second codec includes a second preset number of paired codecs ,
  • the first codec is at least one pair of the second preset number of paired codecs; the first preset number of paired codecs includes the second preset number of paired codecs decoder.
  • the reconstructed first feature map is enlarged to a first size, and the image to be processed of the first size and the reconstructed first feature map of the first size are stitched together to reconstruct the fine-scale features of the image to be processed, thereby Reconstruct low-definition images into high-definition images.
  • the splicing process of the first feature map and the image to be processed of the same size to obtain the second feature map includes: paralleling the first feature map and the image to be processed of the same size in the channel dimension Process to obtain the second feature map.
  • the channel dimension refers to the number of channels of the image.
  • the number of channels of an RGB image is 3, and the number of channels of a black-and-white image is 1.
  • the terminal determines the channel dimension of the first feature map, and determines the same channel dimension in the image to be processed with the same size as the first feature map. Then, the terminal may stitch the first feature map and the image to be processed of the same size in the same channel dimension to obtain the second feature map.
  • the first feature map and the image to be processed of the same size are both RGB images.
  • the number of channels of the RGB color image is 3.
  • the terminal connects the R, G, and B channels of the first feature map and the R, G, and B channels of the image to be processed of the same size in parallel to obtain a second feature map with a channel number of 6.
  • the first feature map and the image to be processed of the same size are processed in parallel in the channel dimension, and the features of the first feature map can be associated with the features of the image to be processed, so that the obtained second feature map has With more feature information, the reconstruction of the image features to be processed is more accurate.
  • performing feature reconstruction on the image to be processed of the second size through the first codec to obtain the first feature map includes:
  • one encoder includes at least two processing units, and one decoder also includes at least two processing units.
  • the internal structure of each processing unit is the same.
  • the terminal inputs the to-be-processed image of the second size into the first codec, and the first processing unit in the first codec performs feature extraction on the to-be-processed image of the second size to obtain the output feature output by the first processing unit picture.
  • the first processing unit is the first processing unit in the first codec.
  • the terminal performs splicing processing on the output feature map of the first processing unit and the input feature map of the first processing unit, and inputs the spliced feature map into the second processing unit to obtain the output feature map of the second processing unit.
  • the terminal stitches the output feature map of the previous processing unit in the first codec and the input feature map of the first unit (that is, the second-size image to be processed), and uses the stitched feature map as the next The input of a processing unit until the output feature map of the last processing unit of the first codec is obtained.
  • the output feature map of the last processing unit of the first codec is the first feature map output by the first codec.
  • FIG. 4 shows the internal structure of the encoder and decoder in an embodiment.
  • Each encoder and decoder contains multiple processing units, such as M1, M2, M3 and M4.
  • M1 is the first processing unit of the encoder in the first codec
  • the terminal inputs the to-be-processed image of the second size into the first processing unit to obtain an output feature map.
  • M2 the output feature map of M1 and the input feature of M1 are spliced, and the spliced feature map is used as the input of M2.
  • the output feature map of M2 and the input feature of M1 are spliced, and the spliced feature map is used as the input of M3.
  • the output feature map of M3 and the input feature of M1 are spliced, and the spliced feature map is used as the input of M4 to obtain the output feature map of M4.
  • the second-size to-be-processed image is input to the processing unit of the first codec, and the output feature map of the previous processing unit in the first codec and the second-size to-be-processed image are processed
  • the spliced feature map is used as the input of the next processing unit, so that part of the features restored by the previous processing unit can be associated with the information of the original image, so that the next processing unit can perform feature reconstruction more quickly.
  • it while increasing the effective feature information of each processing unit, it reduces the amount of calculation in the feature reconstruction process and avoids the problem of high integration difficulty.
  • performing feature reconstruction on the image to be processed of the second size through the first codec to obtain the first feature map includes:
  • the fusion processing refers to adding two feature maps, for example, adding weight matrices corresponding to the two feature maps.
  • the terminal inputs the to-be-processed image of the second size into the first codec, and the first processing unit in the first codec performs feature extraction on the to-be-processed image of the second size to obtain the output of the first processing unit The output feature map.
  • the terminal performs fusion processing on the output feature map of the first processing unit and the input feature map of the first processing unit, that is, the output feature map of the first processing unit and the second-size image to be processed are fused, The fused feature map is input to the second processing unit, and the output feature map of the second processing unit is obtained.
  • the terminal performs fusion processing on the output feature map of the second processing unit and the input feature map of the second processing unit, and uses the fused feature map as the input of the next processing unit of the second processing unit.
  • the terminal merges the output feature map of the previous processing unit and the input feature map of the previous processing unit, and uses the fused feature map as the next The input of the processing unit until the output feature map of the last processing unit of the first codec is obtained.
  • the output feature map of the last processing unit of the first codec is the first feature map output by the first codec.
  • Each encoder and decoder contains multiple processing units, such as M1, M2, M3 and M4. Starting from the second processing unit, the output feature map of the previous processing unit and the input feature of the previous unit are spliced, and the spliced feature map is used as the input of the next processing unit.
  • M1 is the first processing unit of the encoder in the first codec, and the terminal inputs the to-be-processed image of the second size into the first processing unit to obtain an output feature map.
  • the output feature map of M1 and the input feature of M1 are spliced, and the spliced feature map is used as the input of M2.
  • the output feature map of M2 and the input feature of M2 are spliced, and the spliced feature map is used as the input of M3.
  • the output feature map of M3 and the input feature of M3 are spliced, and the spliced feature map is used as the input of M4 to obtain the output feature map of M4.
  • the feature map output by M4 and the input feature of M4 are spliced together.
  • the to-be-processed image of the second size is input to the processing unit of the first codec, and the output feature map of the previous processing unit in the first codec and the input of the previous processing unit
  • the feature map performs fusion processing, which can fuse part of the features restored by the previous processing unit with the feature information that has not been restored by the processing unit.
  • the fused feature map is used as the input of the next processing unit, so that the next processing unit can perform feature reconstruction based on more feature information until the last processing unit of the first codec outputs the first feature map. While increasing the effective feature information of each processing unit, the amount of calculation in the feature reconstruction process is reduced, and the problem of high integration difficulty is avoided.
  • the processing units of the codec adopt a guided connection mode, and the guided connection provides two skip connection modes, as shown in FIG. 4 and FIG. 5.
  • the first is to connect the input image to be processed of the second size to each intermediate processing unit as a part of the input.
  • the second type connects the input and output of each processing unit.
  • each processing unit can maintain basically the same input characteristics without increasing the difficulty of feature extraction for subsequent processing units. And keep the characteristics of each processing unit gradually optimized without causing too much deviation between the characteristics of the processing units.
  • the formula for guiding the connection can be expressed as
  • Li represents the i-th processing unit; and And mi and are the input and output of the i-th processing unit, respectively.
  • enter In the first method it refers to a feature map obtained by paralleling the output of the previous processing unit and the feature of the image to be processed of the second size.
  • enter In the second method it refers to the feature map obtained by connecting the output of the previous processing unit and the input of the previous processing unit in parallel.
  • the input of the second-size to-be-processed image into the processing unit of the first codec to obtain the output characteristic map includes:
  • Step 602 Input the to-be-processed image of the second size into the convolutional layer of the processing unit in the first codec.
  • one processing unit includes at least two convolutional layers.
  • the size of each convolutional layer may be the same or different.
  • the terminal inputs the to-be-processed image of the second size into the first processing unit in the first codec, and the first convolutional layer in the first processing unit in the first codec performs the second-size image to be processed Perform feature extraction to obtain the output feature map output by the first convolutional layer.
  • the first convolutional layer is the first convolutional layer of the first processing unit in the first codec.
  • the input feature map of the first processing unit is the to-be-processed image of the second size.
  • the terminal uses the output feature map of the first convolutional layer of the first processing unit and the input feature map of the processing unit where the first convolutional layer is located as the input of the second convolutional layer; that is, in the first processing unit
  • the output feature map of the first convolutional layer and the input feature map of the first processing unit are input to the second convolutional layer to obtain the output feature map of the second convolutional layer.
  • the terminal combines the output feature map of the previous convolutional layer in the first unit and the input feature map of the first unit (that is, the second-size image to be processed). ) Is used as the input of the next convolutional layer until the output feature map of the last convolutional layer of the first unit is obtained.
  • the terminal After the terminal obtains the output feature maps of each convolutional layer in the first processing unit in the first codec, it performs splicing processing on the output feature maps of each convolutional layer in the first processing unit to obtain the first processing The output feature map of the unit.
  • Step 604 Use the output feature map of the previous convolutional layer in the processing unit of the first codec and the input feature map of the processing unit as the input of the next convolutional layer until the last one in the processing unit is obtained.
  • the output feature map of the convolutional layer is obtained.
  • the output feature map of the previous processing unit and the input feature map of the first processing unit are spliced together to serve as the input feature of the next processing unit.
  • the output feature map of the previous processing unit and the input feature map of the previous processing unit are merged as the input feature of the next processing unit.
  • the output feature map of the previous convolutional layer and the input feature map of the second processing unit are used as the input of the next convolutional layer.
  • the output feature map of the previous convolutional layer in the processing unit and the input feature map of the processing unit As the input of the next convolutional layer in the processing unit, until the output feature map of the last convolutional layer in the processing unit is obtained.
  • Step 606 Perform splicing processing on the input feature map of the processing unit and the output feature map of each convolutional layer in the processing unit to obtain an output feature map of the processing unit.
  • the terminal After obtaining the output feature map of each convolutional layer in the processing unit in the first codec, the terminal performs splicing processing on the output feature map of each convolutional layer in the processing unit to obtain the output of the processing unit Feature map.
  • the second-size image to be processed is input to the convolutional layer of the processing unit in the first codec, and the output feature map of the previous convolutional layer in the processing unit of the first codec And the input feature map of the processing unit as the input of the next convolutional layer, until the output feature map of the last convolution layer in the processing unit is obtained, the input feature map of the processing unit and each convolution in the processing unit
  • the output feature map of the layer is spliced, and the output feature map of the processing unit is obtained, so that the detailed content of the image can be better distinguished from the destructive features in the image through multi-level feature fusion, and then the coarse-scale image of the image can be further reconstructed Real structural information and textures.
  • Multi-level feature fusion can make the neural network better learn to distinguish the image content and the damaged features of the image, and then better use it to reconstruct the real structure information and texture of the image.
  • the formula for multi-level feature fusion is as follows:
  • f n-1 and f n are the output of the previous unit and the output of the current unit, respectively, and ⁇ refers to the nonlinear activation function; Refers to the offset term of the first convolutional layer; Refers to the weight item of the first convolutional layer.
  • the middle convolutional layer learns the feature residuals.
  • the first layer of convolution processes the output of the previous unit to get That is, f n-1 is convolved through the first layer of 3 ⁇ 3 convolution to obtain the output feature map, and the output feature map is corrected through the activation layer to obtain
  • the second layer of convolution takes the output feature map of the first layer Parallel with the output characteristic map f n-1 of the previous unit as the input and then get
  • the third layer of convolution takes the output feature map of the second layer Parallel with the output characteristic map f n-1 of the previous unit as the input and then get
  • the first three layers undergo a 3 ⁇ 3 convolution process, and after each convolution layer, the output feature map is corrected by the activation layer to obtain the output feature map of each layer.
  • the fourth layer of convolution performs feature fusion on the output feature maps of the first three layers to obtain will Perform fusion processing with the output feature map f n-1 of the previous unit to obtain the output feature map f n of the current processing unit.
  • performing feature reconstruction on the second feature map through the second codec to obtain the reconstructed first feature map includes:
  • one encoder in the second codec includes at least two processing units, and one decoder also includes at least two processing units.
  • the internal structure of each processing unit is the same.
  • the terminal inputs the second feature map to the second codec, and the first processing unit in the second codec performs feature extraction on the second feature map to obtain the output feature map output by the first processing unit.
  • the first processing unit is the first processing unit in the second codec.
  • the terminal performs splicing processing on the output feature map of the first processing unit and the input feature map of the first processing unit, and inputs the spliced feature map into the second processing unit to obtain the output feature map of the second processing unit.
  • the terminal will splice the output feature map of the previous processing unit in the second codec and the input feature map of the first unit (ie, the second feature map), and use the spliced feature map as the next processing unit Until the output feature map of the last processing unit of the second codec is obtained.
  • the output feature map of the last processing unit of the second codec is the first feature map output by the second codec.
  • the second feature map is input to the processing unit of the second codec to obtain an output feature map, and the output feature map of the previous processing unit in the second codec and the second feature map
  • the splicing process can associate part of the features restored by the previous processing unit with the feature map reconstructed by the previous processing unit, so that the next processing unit can perform feature reconstruction more quickly.
  • the spliced feature map is used as the input of the next processing unit until the last processing unit of the second codec outputs the reconstructed first feature map, thereby reconstructing the feature information of the intermediate scale of the image to be processed.
  • performing feature reconstruction on the second feature map through the second codec to obtain the reconstructed first feature map includes:
  • the terminal inputs the second feature map to the second codec, and the first processing unit in the second codec performs feature extraction on the second feature map to obtain the output feature map output by the first processing unit.
  • the terminal performs fusion processing on the output feature map of the first processing unit and the input feature map of the first processing unit, that is, the output feature map of the first processing unit and the second feature map are fused, and the fused
  • the feature map is input to the second processing unit, and the output feature map of the second processing unit is obtained.
  • the terminal performs fusion processing on the output feature map of the second processing unit and the input feature map of the second processing unit, and uses the fused feature map as the input of the next processing unit of the second processing unit.
  • the terminal merges the output feature map of the previous processing unit and the input feature map of the previous processing unit, and uses the fused feature map as the next The input of the processing unit until the output feature map of the last processing unit of the second codec is obtained.
  • the output feature map of the last processing unit of the second codec is the first feature map output by the second codec.
  • the second feature map is input to the processing unit of the second codec, and the output features and input features of the same processing unit are merged, which can reduce the amount of calculation and ensure the correlation between features , which makes the difference between the features in the feature map more obvious, so as to better reconstruct the features of the intermediate scale.
  • inputting the second feature map to the processing unit of the second codec to obtain an output feature map includes:
  • Step 802 Input the second feature map to the convolutional layer of the processing unit in the second codec to obtain an output feature map.
  • the terminal inputs the second feature map to the first processing unit in the second codec, and the first convolutional layer in the first processing unit in the second codec applies the second-size image to be processed Perform feature extraction to obtain the output feature map output by the first convolutional layer.
  • the first convolutional layer is the first convolutional layer of the first processing unit in the second codec.
  • the input feature map of the first processing unit is the to-be-processed image of the second size.
  • the terminal uses the output feature map of the first convolutional layer of the first processing unit and the input feature map of the processing unit where the first convolutional layer is located as the input of the second convolutional layer; that is, in the first processing unit
  • the output feature map of the first convolutional layer and the input feature map of the first processing unit are input to the second convolutional layer to obtain the first processing unit in the second codec
  • the output feature map of the second convolutional layer uses the output feature map of the first convolutional layer of the first processing unit and the input feature map of the processing unit where the first convolutional layer is located as the input of the second convolutional layer; that is, in the first processing unit
  • the output feature map of the first convolutional layer and the input feature map of the first processing unit that is, the second-size image to be processed
  • the terminal uses the output feature map of the previous convolutional layer in the first unit and the input feature map of the first unit (ie, the second feature map) as the next The input of a convolutional layer until the output feature map of the last convolutional layer of the first unit is obtained.
  • the terminal After the terminal obtains the output feature maps of each convolutional layer in the first processing unit in the second codec, it performs stitching processing on the output feature maps of each convolutional layer in the first processing unit to obtain the first processing The output feature map of the unit.
  • Step 804 Use the output feature map of the previous convolutional layer in the processing unit of the second codec and the input feature map of the processing unit as the input of the next convolutional layer until the second codec's output feature map is obtained.
  • the output feature map of the previous processing unit and the input feature map of the first processing unit are spliced together as the input feature of the next processing unit.
  • the output feature map of the previous processing unit and the input feature map of the previous processing unit are merged as the input feature of the next processing unit.
  • the output feature map of the previous convolutional layer and the input feature map of the second processing unit are used as the input of the next convolutional layer.
  • the output feature map of the previous convolutional layer in the processing unit and the input feature map of the processing unit As the input of the next convolutional layer in the processing unit, until the output feature map of the last convolutional layer in the processing unit is obtained.
  • Step 806 Perform splicing processing on the input feature map of the processing unit and the output feature map of each convolutional layer in the processing unit of the second codec to obtain the output feature map of the processing unit in the second codec .
  • the terminal After obtaining the output feature map of each convolutional layer in the processing unit in the second codec, the terminal performs splicing processing on the output feature map of each convolutional layer in the processing unit to obtain the output of the processing unit Feature map.
  • the second feature map is input to the convolutional layer of the processing unit in the second codec, and the output feature map of the previous convolutional layer in the processing unit of the second codec and the input of the processing unit
  • the feature map is used as the input of the next convolutional layer until the output feature map of the last convolutional layer in the processing unit of the second codec is obtained, and the input feature map of the processing unit and the convolutional layer in the processing unit
  • the output feature map is spliced to obtain the output feature map of the processing unit in the second codec, so that the detailed content of the image can be better distinguished from the destructive features in the image through multi-level feature fusion, and then the image can be further reconstructed. Real structural information and textures at intermediate scales.
  • the input feature map of the processing unit and the output feature map of each convolution layer in the processing unit are spliced to obtain the output feature map of the processing unit in the second codec, including :
  • the output feature map of each convolutional layer in the processing unit is spliced; the output feature map of each convolutional layer is spliced and the input feature map of the processing unit is merged to obtain the second codec The output feature map of the processing unit in the processor.
  • the terminal may perform splicing processing on the output feature maps of each convolutional layer. Further, the terminal may determine the weight matrix corresponding to the output feature map of each convolutional layer, and add the weight matrix corresponding to the output feature map of each convolutional layer to obtain the spliced feature map. Then, the terminal performs fusion processing on the spliced feature map and the input feature map of the processing unit. The fusion processing may be to connect the spliced feature map and the output feature map of the processing unit in parallel in the channel dimension, so as to obtain the output feature map of the processing unit.
  • the terminal may perform splicing processing on the output feature maps of each convolutional layer. Then, the terminal performs fusion processing on the spliced feature map and the input feature map of the first processing unit.
  • the input of the first processing unit in the second codec is the second feature map, that is, the spliced feature map and the input feature map.
  • the second feature map is fused to obtain the output feature map of the first processing unit.
  • the output feature maps of the convolutional layers in the second processing unit are spliced.
  • the spliced feature map and the input feature map of the second processing unit are fused to obtain the output feature map of the second processing unit.
  • the output feature map of each convolutional layer in the processing unit is spliced, and the output feature map of each convolutional layer is spliced and the input feature map of the processing unit is merged.
  • the second preset number of pairs of codecs include a first codec and a third codec, and the third codec is in the encoder and decoder of the first codec.
  • the first preset number of pairs of codecs include a second codec and a fourth codec, the fourth codec is located in the second codec encoder and decoder between.
  • the first network includes a first codec
  • the second network includes a second codec.
  • the second codec includes a second preset number of paired codecs, and the remaining paired codecs of the second preset number of paired codecs except the first codec are called the first codec. Three codecs.
  • the third codec is located between the encoder and the decoder in the first codec.
  • the last encoder output feature map of the first encoder is used as the input of the first encoder in the third codec, and the output of the last encoder in the third codec is obtained.
  • the output of the last encoder in the third codec is used as the input of the corresponding decoder in the third codec to obtain the output of the last decoder in the third codec.
  • the output of the last decoder in the third codec is used as the input of the corresponding decoder in the first codec, so that the output characteristic map of the last decoder in the first codec is obtained, that is, the second The first feature map after network reconstruction.
  • the third network includes a target codec, and the target codec includes a first preset number of paired codecs. Among the first preset number of paired codecs, the remaining paired codecs except the second codec are called fourth codecs.
  • the fourth codec is between the encoder and the decoder of the second codec.
  • FIG. 9(a) it is a network structure diagram of the image processing method in an embodiment.
  • the image processing method in Figure 9(a) has three networks.
  • the first network is a coarse-scale feature reconstruction network
  • the second network is an intermediate-scale feature reconstruction network
  • the third network is a fine-scale feature reconstruction network.
  • the network of each scale is a codec network, but each network is composed of a different number of pairs of encoders/decoders.
  • the coarse scale the original size of the image to be processed is reduced to 1/4 size for processing, and the coarse scale network is composed of a pair of encoder E 1 /decoder D 1 .
  • x 1 and y 1 are the input image of the first network (that is, the 1/4-size image to be processed) and the output feature map of the first network, respectively.
  • the output feature map y 1 of the first network is enlarged twice to obtain a feature map of 1/2 size of the original size of the image to be processed (ie the first feature map), and then the image to be processed is reduced to 1 of the original size.
  • /2 connect the 1/2 size x 2 and 1/2 size feature map in parallel in the channel dimension as the input of the second network (That is, the second feature map).
  • the second network designed two pairs of encoder/decoder to perform intermediate-scale recovery.
  • the encoder E 2 /decoder D 2 is placed in the middle of the encoder E 1 /decoder D 1 , and the purpose is to process more complex feature extraction and reconstruction at the intermediate scale. After the intermediate scale processing, the result y 2 (that is, the reconstructed first feature map) is obtained.
  • the encoder E 3 /decoder D 3 is placed in the middle of the encoder E 2 /decoder D 2 to obtain a fine-scale network structure.
  • Fine-scale input It is obtained after the intermediate scale result y 2 is enlarged twice and then paralleled with the original size to be processed image x 3 in the channel dimension. After fine-scale processing, the target image y 3 is obtained .
  • the whole structure of this image processing consists of three networks, which deal with coarse, medium and fine scale images respectively.
  • the entire network is trained through N paired sets of broken/clear images.
  • the following objective functions can be used for training:
  • T i is the number of pixels of the image at scale i, which is used for normalization here.
  • L is the optimized objective function; N is the number of image pairs; S is the number of scales of the image; k is the number of the image.
  • FIG. 9(b) it is a network structure diagram of the image processing method in an embodiment.
  • the image processing method in Figure 9(b) has three networks.
  • the first network is a coarse-scale feature reconstruction network, which includes a pair of encoder E1 and decoder D1.
  • the second network is an intermediate-scale feature reconstruction network, which includes a pair of encoder E1 and decoder D1, and a pair of encoder E2 and decoder D2.
  • the third network is a fine-scale feature reconstruction network, including a pair of encoder E1 and decoder D1, a pair of encoder E2 and decoder D2, and a pair of encoder E3 and decoder D3.
  • the internal structures of the encoder E1, the encoder E2, and the encoder E3 are the same, and all include processing units M1, M2, M3, and M4.
  • the internal structure of the decoder D1, the decoder D2, and the decoder D3 are the same, and they all include processing units M1, M2, M3, and M4.
  • the image x 1 to be processed is input to the processing unit M1 in the encoder E1, and the output feature map of M1 is obtained. Then, starting from M2, the output feature map of M1 and the input feature of M1 are spliced, and the spliced feature map is used as the input of M2.
  • the output feature map of M2 and the input feature of M2 are spliced, and the spliced feature map is used as the input of M3.
  • the output feature map of M3 and the input feature of M3 are spliced, and the spliced feature map is used as the input of M4 to obtain the output feature map of M4.
  • the feature map output by M4 and the input feature of M4 are spliced to obtain the feature map output by the encoder E1.
  • the feature map output by the encoder E1 is input to the processing unit M1 in the decoder D1 to obtain the output feature map of M1.
  • the output feature map of M1 and the input feature of M1 are spliced, and the spliced feature map is used as the input of M2.
  • the output feature map of M2 and the input feature of M2 are spliced, and the spliced feature map is used as the input of M3.
  • the output feature map of M3 and the input feature of M3 are spliced, and the spliced feature map is used as the input of M4 to obtain the output feature map of M4.
  • the feature map output by M4 and the input feature of M4 are spliced to obtain the feature map y 1 output by the decoder D1.
  • each processing unit in the encoder and each processing unit in the decoder is the same, and both have the structure of the processing unit M1 as shown in FIG. 9(b).
  • f n-1 is the output feature map of the previous unit.
  • f n-1 is the image to be processed x 1 .
  • the image x 1 to be processed is input to the first processing unit M1 of the encoder E1.
  • the processing unit M1 performs convolution processing on f n-1 through the first layer of 3 ⁇ 3 convolution to obtain the output feature map, and activate it The layer corrects the output feature map to obtain
  • the processing procedure is the same as the processing procedure in FIG. 4 described above, and will not be repeated here.
  • three networks are used to reconstruct the coarse, medium, and fine-scale features of the image to be processed, so as to obtain a clear target image, such as the target image y 3 as shown in FIG. 9(b).
  • this is the processing result of the image processing method in this embodiment and the conventional multiple image processing methods in terms of deblurring.
  • the first column in the figure is the image to be processed, and the second, third, and fourth columns are the target images obtained after feature reconstruction of the image to be processed by traditional image processing methods.
  • the last column is the target image obtained after feature reconstruction of the image to be processed by the image processing method in this embodiment. It can be seen from Figure 10(a) that the target image in the last column is clearer than the target image in the second, third, and fourth columns. In other words, the image processing method in this embodiment has a better effect on image deblurring than the traditional processing method.
  • this is the processing result of the image processing method in this embodiment and the conventional multiple image processing methods in terms of noise removal.
  • the first column in the figure is the image to be processed, and the second, third, and fourth columns are the target images obtained after feature reconstruction of the image to be processed by traditional image processing methods.
  • the last column is the target image obtained after feature reconstruction of the image to be processed by the image processing method in this embodiment. It can be seen from Figure 10(b) that the target image in the last column is clearer than the target image in the second, third, and fourth columns. That is to say, the image processing method in this embodiment has a better effect on image denoising than the traditional processing method.
  • the image processing method is applied to video processing;
  • the to-be-processed image is each frame of the to-be-processed image in the to-be-processed video;
  • the acquiring the image to be processed and scaling the image to be processed from a first size to a second size includes:
  • Step 1102 Obtain each frame of the to-be-processed image in the to-be-processed video, and scale the frame of the to-be-processed image from a first size to a second size.
  • the video to be processed is a low-definition video, such as a blurry video or a video with noise.
  • the terminal may obtain each frame of the to-be-processed image in the to-be-processed video, and use the obtained size of each frame of the to-be-processed image as the first size. Then, the terminal scales each frame of the image to be processed from the first size to the second size, so that the image shows the coarse-scale feature, so as to reconstruct the fuzzy region existing in the coarse-scale feature in the video to be processed first.
  • the feature reconstruction of the image to be processed of the second size by the first codec to obtain the first feature map includes:
  • Step 1104 Perform feature reconstruction on the to-be-processed image of each frame of the second size through the first codec, to obtain a first feature map corresponding to each frame of the to-be-processed image.
  • the terminal may input the to-be-processed image of the second size of each frame into the first network
  • the first network is a network that reconstructs the coarse-scale features of the image.
  • the first network includes a first codec, and the number of the first codec can be set according to requirements.
  • the encoder in the first codec in the first network extracts features of the images to be processed in the second size of each frame, and the encoder inputs the extracted feature maps of each frame to the decoder corresponding to the encoder for decoding, and obtains The feature map of each frame output by the decoder.
  • the feature map of each frame output by the decoder is the first feature map corresponding to the image to be processed in each frame.
  • the first encoder when there are at least two pairs of codecs in the first codec, the first encoder is used to encode the image to be processed in the second size of each frame, and the next encoder is used to The feature maps of each frame output by one encoder are encoded until the feature maps of each frame output by the last encoder in the first codec are obtained. Then, each frame feature map output by the last encoder is input to a decoder, and each frame feature map is decoded by the decoder to obtain each frame feature map output by the last decoder. The feature map of each frame output by the last decoder is the first feature map corresponding to the image to be processed in each frame.
  • Enlarging the first feature map to a first size, and performing stitching processing on the image to be processed of the first size and the first feature map of the first size includes:
  • Step 1106 Enlarge each first feature map to a first size, and perform splicing processing on the image to be processed of the first size of each frame and the corresponding first feature map of the first size to obtain the corresponding splicing of the image to be processed in each frame Feature map.
  • the terminal can enlarge the first feature map of each frame to the same first size as the corresponding image to be processed, so as to first reconstruct the fuzzy area existing in the fine-scale feature . Then, the terminal performs stitching processing on the to-be-processed image of each frame of the first size and the corresponding first feature map of each frame of the first size.
  • the terminal may determine the matrix corresponding to the image to be processed in the first size of each frame, and the matrix corresponding to the first feature map of the first size in each frame, and combine the matrix of the image to be processed in the first size with The corresponding matrices corresponding to the first feature map of the first size are spliced to obtain the spliced feature map corresponding to each frame of the image to be processed.
  • the terminal can determine the channel dimension of the image to be processed in the first size of each frame and the channel dimension of the corresponding first feature map of the first size, and divide the image to be processed in the first size of each frame according to the channel dimension. Parallel connection with the respective first feature maps of the first size to obtain the stitching feature map corresponding to each frame of the image to be processed.
  • the feature reconstruction of the spliced feature map through the target codec to obtain the target image the definition of the target image is higher than the definition of the image to be processed, including:
  • Step 1108 Perform feature reconstruction on the stitched feature map corresponding to each frame of the image to be processed by the target codec to obtain the target image corresponding to each frame of the image to be processed.
  • the terminal may input the stitched feature map corresponding to each frame of the image to be processed into a target network
  • the target network is a network of fine-scale features of the reconstructed image.
  • the target network includes a target codec
  • the target codec includes a first codec
  • the number of the target codec can be set according to requirements.
  • the first encoder in the target codec in the target network performs feature extraction on the spliced feature map corresponding to each frame of the image to be processed, and uses the frame feature map output by the previous encoder as the input of the next encoder , Until the feature map of each frame output by the last encoder in the target codec is obtained.
  • each frame feature map output by the last encoder is input to a corresponding decoder, and each frame feature map is decoded by the decoder.
  • the feature map of each frame output by the previous decoder is used as the input of the next decoder, and the last decoder in the target codec is obtained to output each frame of target image corresponding to each frame of the image to be processed.
  • the size of the target image of each frame is the first size.
  • Step 1110 Generate a target video according to each frame of the target image, the definition of the target video is higher than the definition of the to-be-processed video.
  • the terminal may generate a target video for each frame of target image according to the time point of each frame of the to-be-processed image in the to-be-processed video, and the resolution of the obtained target video is higher than that of the to-be-processed video.
  • the image processing method is applied to video processing, and the to-be-processed image is each frame of the to-be-processed image in the to-be-processed video.
  • the to-be-processed image is each frame of the to-be-processed image in the to-be-processed video.
  • Applying this image processing method to video processing can realize feature reconstruction of multiple blurred or noisy images at the same time, and can improve the efficiency of image feature reconstruction.
  • the target video is generated according to each frame of target image, so that a low-definition video can be reconstructed into a high-definition video.
  • an image processing method including:
  • the terminal obtains the image to be processed, and scales the image to be processed from the first size to the second size.
  • the terminal inputs the to-be-processed image of the second size into the convolutional layer of the processing unit in the first codec.
  • the terminal uses the output feature map of the previous convolutional layer in the processing unit of the first codec and the input feature map of the processing unit as the input of the next convolutional layer until the last volume in the processing unit is obtained.
  • the output feature map of the multilayer is obtained.
  • the terminal performs splicing processing on the input feature map of the processing unit and the output feature map of each convolutional layer in the processing unit to obtain the output feature map of the processing unit.
  • the terminal splices the output feature map of the previous processing unit in the first codec and the to-be-processed image of the second size, and uses the spliced feature map as the input of the next processing unit until the first codec
  • the last processing unit of the processor outputs the first feature map.
  • the terminal scales the to-be-processed image of the first size to the same size as the first feature map, and performs parallel processing on the channel dimension of the first feature map and the to-be-processed image of the same size to obtain a second feature map.
  • the terminal inputs the second feature map to the convolutional layer of the processing unit in the second codec to obtain the output feature map;
  • the second codec includes a second preset number of paired codecs, the The first codec is at least one of the second preset number of pairs of codecs.
  • the terminal uses the output feature map of the previous convolutional layer in the processing unit of the second codec and the input feature map of the processing unit as the input of the next convolutional layer until the processing unit of the second codec is obtained.
  • the output feature map of the last convolutional layer is obtained.
  • the terminal performs splicing processing on the output feature maps of each convolutional layer in the processing unit; performing fusion processing on the output feature maps of each convolutional layer after the splicing process and the input feature maps of the processing unit to obtain the second The output feature map of the processing unit in the codec.
  • the output feature map of the previous processing unit in the second codec and the second feature map are spliced to obtain the input of the next processing unit until the last processing unit of the second codec outputs the reconstructed first Feature map.
  • the terminal enlarges the reconstructed first feature map to a first size, and performs stitching processing on the image to be processed of the first size and the reconstructed first feature map of the first size.
  • the terminal performs feature reconstruction on the spliced feature maps through the target codec to obtain the target image.
  • the definition of the target image is higher than the definition of the image to be processed, and the target codec contains the first preset number of Paired codecs, the first preset number of pairs of codecs include the second preset number of pairs of codecs.
  • FIGS. 2 to 11 are schematic flowcharts of an image processing method in an embodiment. It should be understood that although the various steps in the flowcharts of FIGS. 2 to 11 are displayed in sequence as indicated by the arrows, these steps are not necessarily executed in sequence in the order indicated by the arrows. Unless specifically stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least part of the steps in Figure 2-11 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times. These sub-steps or The execution order of the stages is not necessarily carried out sequentially, but may be executed alternately or alternately with other steps or at least a part of other steps or sub-steps or stages.
  • an image processing apparatus which includes: an acquisition module 1202, a first reconstruction module 1204, a splicing module 1206, and a target reconstruction module 1208. in,
  • the acquiring module 1202 is configured to acquire an image to be processed, and scale the image to be processed from a first size to a second size.
  • the first reconstruction module 1204 is configured to perform feature reconstruction on the to-be-processed image of the second size through the first codec to obtain a first feature map.
  • the stitching module 1206 is configured to enlarge the first feature map to a first size, and perform stitching processing on the image to be processed of the first size and the first feature map of the first size.
  • the target reconstruction module 1208 is used to perform feature reconstruction on the spliced feature maps through the target codec to obtain a target image, the definition of the target image is higher than the definition of the image to be processed; the target codec contains the first A preset number of paired codecs, and the first codec is at least one pair of the first preset number of paired codecs.
  • the above-mentioned image processing device acquires the image to be processed, scales the image to be processed from the first size to the second size, and performs feature reconstruction on the image to be processed of the second size through the first codec to obtain the first feature map, thereby Complete the reconstruction of coarse-scale features of the image to be processed.
  • the first feature map is enlarged to the same first size as the image to be processed, and the image to be processed of the first size and the first feature map of the first size are spliced to reconstruct the fine-scale features of the image to be processed. Perform feature reconstruction on the spliced feature map through the target codec to obtain the target image.
  • the target codec contains a first preset number of paired codecs, and the first codec is the first preset At least one of the number of pairs of codecs, so as to reconstruct the low-definition to-be-processed image into a high-definition target image.
  • the device further includes: a second reconstruction module.
  • the second reconstruction module includes: scaling the to-be-processed image of the first size to the same size as the first feature map, and performing stitching processing on the first feature map and the to-be-processed image of the same size to obtain a second feature map; Perform feature reconstruction on the second feature map through the second codec to obtain the reconstructed first feature map; the second codec includes a second preset number of paired codecs, and the first codec
  • the encoder is at least one pair of the second preset number of paired codecs; the first preset number of paired codecs includes the second preset number of paired codecs.
  • the splicing module is also used to enlarge the reconstructed first feature map to a first size, and perform splicing processing on the image to be processed of the first size and the reconstructed first feature map of the first size.
  • the image to be processed in the first size is scaled to the same size as the first feature map, and the first feature map and the image to be processed in the same size are stitched together to obtain the second feature map.
  • the decoder performs feature reconstruction on the second feature map to reconstruct the mid-scale features of the image to be processed to obtain the reconstructed first feature map.
  • the reconstructed first feature map is enlarged to a first size, and the image to be processed of the first size and the reconstructed first feature map of the first size are stitched together to reconstruct the fine-scale features of the image to be processed, thereby Reconstruct low-definition images into high-definition images.
  • the second reconstruction module is further configured to: perform parallel processing on the channel dimension of the first feature map and the image to be processed of the same size to obtain a second feature map.
  • the first feature map and the image to be processed of the same size are processed in parallel in the channel dimension, and the features of the first feature map can be associated with the features of the image to be processed, so that the obtained second feature map has With more feature information, the reconstruction of the image features to be processed is more accurate.
  • the first reconstruction module 1204 is further configured to: input the to-be-processed image of the second size into the processing unit of the first codec to obtain an output feature map;
  • the output feature map of a processing unit and the second-size image to be processed are spliced, and the stitched feature map is used as the input of the next processing unit until the last processing unit of the first codec outputs the first feature picture.
  • the second-size to-be-processed image is input to the processing unit of the first codec, and the output feature map of the previous processing unit in the first codec and the second-size to-be-processed image are processed
  • the spliced feature map is used as the input of the next processing unit, so that part of the features restored by the previous processing unit can be associated with the information of the original image, so that the next processing unit can perform feature reconstruction more quickly.
  • it while increasing the effective feature information of each processing unit, it reduces the amount of calculation in the feature reconstruction process and avoids the problem of high integration difficulty.
  • the first reconstruction module 1204 is further configured to: input the to-be-processed image of the second size into the processing unit of the first codec to obtain an output feature map;
  • the output feature map of the previous processing unit and the input feature map of the previous processing unit are fused, and the fused feature map is used as the input of the next processing unit until the last processing unit of the first codec outputs the first A feature map.
  • the to-be-processed image of the second size is input to the processing unit of the first codec, and the output feature map of the previous processing unit in the first codec and the input of the previous processing unit
  • the feature map performs fusion processing, which can fuse part of the features restored by the previous processing unit with the feature information that has not been restored by the processing unit.
  • the fused feature map is used as the input of the next processing unit, so that the next processing unit can perform feature reconstruction based on more feature information until the last processing unit of the first codec outputs the first feature map. While increasing the effective feature information of each processing unit, the amount of calculation in the feature reconstruction process is reduced, and the problem of high integration difficulty is avoided.
  • the first reconstruction module 1204 is further configured to: input the to-be-processed image of the second size into the convolutional layer of the processing unit in the first codec; The output feature map of the previous convolution layer in the unit and the input feature map of the processing unit are used as the input of the next convolution layer until the output feature map of the last convolution layer in the processing unit is obtained; the processing unit The input feature map of the processing unit and the output feature map of each convolutional layer in the processing unit are spliced to obtain the output feature map of the processing unit.
  • the second-size image to be processed is input to the convolutional layer of the processing unit in the first codec, and the output feature map of the previous convolutional layer in the processing unit of the first codec And the input feature map of the processing unit as the input of the next convolutional layer, until the output feature map of the last convolution layer in the processing unit is obtained, the input feature map of the processing unit and each convolution in the processing unit
  • the output feature map of the layer is spliced, and the output feature map of the processing unit is obtained, so that the detailed content of the image can be better distinguished from the destructive features in the image through multi-level feature fusion, and then the coarse-scale image of the image can be further reconstructed Real structural information and textures.
  • the second reconstruction module is further configured to: input the second feature map to the processing unit of the second codec to obtain an output feature map; and the previous processing unit in the second codec The output feature map of and the second feature map are spliced together to obtain the input of the next processing unit until the last processing unit of the second codec outputs the reconstructed first feature map.
  • the second feature map is input to the processing unit of the second codec to obtain an output feature map, and the output feature map of the previous processing unit in the second codec and the second feature map
  • the splicing process can associate part of the features restored by the previous processing unit with the feature map reconstructed by the previous processing unit, so that the next processing unit can perform feature reconstruction more quickly.
  • the spliced feature map is used as the input of the next processing unit until the last processing unit of the second codec outputs the reconstructed first feature map, thereby reconstructing the feature information of the intermediate scale of the image to be processed.
  • the second reconstruction module is further configured to: input the second feature map to the processing unit of the second codec to obtain an output feature map; and the previous processing unit in the second codec The output feature map of and the input feature map of the previous processing unit are fused to obtain the input of the next processing unit until the last processing unit of the second codec outputs the reconstructed first feature map.
  • the second feature map is input to the processing unit of the second codec, and the output features and input features of the same processing unit are merged, which can reduce the amount of calculation and ensure the correlation between features , which makes the difference between the features in the feature map more obvious, so as to better reconstruct the features of the intermediate scale.
  • the second reconstruction module is further configured to: input the second feature map into the convolutional layer of the processing unit in the second codec to obtain an output feature map;
  • the output feature map of the previous convolutional layer in the processing unit and the input feature map of the processing unit are used as the input of the next convolutional layer until the last convolutional layer in the processing unit of the second codec is obtained.
  • Output feature map; the input feature map of the processing unit and the output feature map of each convolutional layer in the processing unit are spliced to obtain the output feature map of the processing unit in the second codec.
  • the second feature map is input to the convolutional layer of the processing unit in the second codec, and the output feature map of the previous convolutional layer in the processing unit of the second codec and the input of the processing unit
  • the feature map is used as the input of the next convolutional layer until the output feature map of the last convolutional layer in the processing unit of the second codec is obtained, and the input feature map of the processing unit and the convolutional layer in the processing unit
  • the output feature map is spliced to obtain the output feature map of the processing unit in the second codec, so that the detailed content of the image can be better distinguished from the destructive features in the image through multi-level feature fusion, and then the image can be further reconstructed. Real structural information and textures at intermediate scales.
  • the second reconstruction module is further used to: perform splicing processing on the output feature maps of each convolutional layer in the processing unit of the second codec; and splice the output feature maps of each convolutional layer The processed feature map and the second feature map are fused to obtain the output feature map of the processing unit in the second codec.
  • the output feature map of each convolutional layer in the processing unit is spliced, and the output feature map of each convolutional layer is spliced and the input feature map of the processing unit is merged.
  • the second preset number of paired codecs in the device include the first codec and the third codec, and the third codec is in the first codec.
  • the first preset number of pairs of codecs include a second codec and a fourth codec, and the fourth codec is in the encoder of the second codec And the decoder.
  • the acquiring module 1202 is further configured to acquire each frame of the to-be-processed image in the to-be-processed video, and scale the frame of the to-be-processed image from the first size to the second size.
  • the first reconstruction module 1204 is further configured to: perform feature reconstruction on the to-be-processed image of each frame of the second size through the first codec to obtain the first feature map corresponding to each frame of the to-be-processed image;
  • the splicing module 1206 is also used to enlarge each first feature map to a first size, and perform splicing processing on the image to be processed in the first size of each frame and the corresponding first feature map of the first size to obtain the to-be-processed image of each frame. Process the mosaic feature map corresponding to the image;
  • the target reconstruction module 1208 is also used to: perform feature reconstruction on the spliced feature map corresponding to each frame of the image to be processed by the target codec to obtain the target image corresponding to each frame of the image to be processed; generate a target video according to each frame of the target image , The definition of the target video is higher than the definition of the to-be-processed video.
  • the image processing method is applied to video processing, and the to-be-processed image is each frame of the to-be-processed image in the to-be-processed video.
  • each frame of the to-be-processed image is scaled from the first size to the second size; through the first codec, each frame of the second-size to-be-processed image is reconstructed to obtain The first feature map corresponding to each frame of the image to be processed; each first feature map is enlarged to the same first size as the corresponding image to be processed, and the image to be processed in the first size of each frame and the corresponding first size of the image
  • a feature map is spliced to obtain the spliced feature map corresponding to each frame of the image to be processed; the target codec is used to perform feature reconstruction on the spliced feature map corresponding to each frame of the image to be processed to obtain the target image corresponding to each frame of the image to
  • Applying this image processing method to video processing can realize feature reconstruction of multiple blurred or noisy images at the same time, and can improve the efficiency of image feature reconstruction.
  • the target video is generated according to each frame of target image, so that a low-definition video can be reconstructed into a high-definition video.
  • Fig. 13 shows an internal structure diagram of a computer device in an embodiment.
  • the computer device may specifically be the terminal 110 in FIG. 1.
  • the computer equipment includes the computer equipment including a processor, a memory, a network interface, an input device, and a display screen connected through a system bus.
  • the memory includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium of the computer device stores an operating system and may also store a computer program.
  • the processor can realize the image processing method.
  • a computer program may also be stored in the internal memory, and when the computer program is executed by the processor, the processor can execute the image processing method.
  • the display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen. It can be an external keyboard, touchpad, or mouse.
  • FIG. 13 is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • the specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
  • the image processing apparatus provided in this application may be implemented in the form of a computer program, and the computer program may run on the computer device as shown in FIG. 13.
  • the memory of the computer device can store various program modules that make up the image processing apparatus, for example, the acquisition module 1202, the first reconstruction module 1204, the splicing module 1206, and the target reconstruction module 1208 shown in FIG. 12.
  • the computer program composed of each program module causes the processor to execute the steps in the image processing method of each embodiment of the present application described in this specification.
  • the computer device shown in FIG. 13 may execute the steps of acquiring the image to be processed through the acquiring module 1202 in the image processing apparatus shown in FIG. 12, and scaling the image to be processed from a first size to a second size.
  • the computer device may perform the feature reconstruction of the second-size image to be processed through the first codec through the first reconstruction module 1204 to obtain the first feature map.
  • the computer device may perform the steps of enlarging the first feature map to a first size through the stitching module 1206, and performing stitching processing on the image to be processed of the first size and the first feature map of the first size.
  • the computer device can perform feature reconstruction of the stitched feature map through the target codec through the target reconstruction module 1208 to obtain the target image, the definition of the target image is higher than the definition of the image to be processed; the target codec contains A first preset number of paired codecs, and the first codec is a step of at least one pair of the first preset number of paired codecs.
  • a computer device including a memory and a processor, the memory stores a computer program, and when the computer program is executed by the processor, the processor executes the steps of the above-mentioned image processing method.
  • the steps of the image processing method may be the steps in the image processing method of each of the foregoing embodiments.
  • a computer-readable storage medium which stores a computer program, and when the computer program is executed by a processor, the processor causes the processor to execute the steps of the above-mentioned image processing method.
  • the steps of the image processing method may be the steps in the image processing method of each of the foregoing embodiments.
  • a computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the computer device executes the steps in the foregoing method embodiments.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain Channel
  • memory bus Radbus direct RAM
  • RDRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

一种图像处理方法,包括:获取待处理图像,将所述待处理图像从第一尺寸缩放到第二尺寸;通过第一编解码器对第二尺寸的待处理图像进行特征重建,得到第一特征图;将所述第一特征图放大到第一尺寸,将第一尺寸的待处理图像和第一尺寸的第一特征图进行拼接处理;及通过目标编解码器对拼接后的特征图进行特征重建,得到目标图像,所述目标图像的清晰度高于所述待处理图像的清晰度;所述目标编解码器中包含第一预设数量的成对编解码器,所述第一编解码器为所述第一预设数量的成对编解码器中的至少一对。

Description

图像处理方法、装置、计算机可读存储介质和计算机设备
本申请要求于2020年02月07日提交中国专利局,申请号为2020100824740、发明名称为“图像处理方法、装置、计算机可读存储介质和计算机设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,特别是涉及一种图像处理方法、装置、计算机可读存储介质和计算机设备。
背景技术
图像恢复是一种在日常生活中很常见的问题,其目标是恢复在成像过程中遇到不可逆并且复杂破坏的图片。当用户处于光线较暗或者动态的场景中,图片通常会出现不同程度的噪声或者模糊。图片恢复算法便可以用来重建由于模糊或者噪声而损失的细节信息。
然而,传统的图像恢复方式都只能针对某一类特定问题。例如,对于图像去模糊来说,传统的图像恢复方式只能去除平移或旋转中的某一种运动过程中产生的图像模糊的问题。图像去噪也存在同样的问题,传统的图像恢复方式都是针对某类特定噪声,例如通过去除高斯噪声,泊松噪声等来实现图像的特征重建。但真实的破坏图像成像场景十分复杂,还包括相机的运动、场景中物体的运动和不同程度的噪声所导致的图像不清晰等问题,而传统的图像恢复方式对这些存在噪声和模糊点的低清晰度图像的处理效果较差。
发明内容
基于此,有必要针对图像不清晰的技术问题,提供一种图像处理方法、 装置、计算机可读存储介质和计算机设备。
一种图像处理方法,由计算机设备执行,包括:
获取待处理图像,将所述待处理图像从第一尺寸缩放到第二尺寸;
通过第一编解码器对第二尺寸的待处理图像进行特征重建,得到第一特征图;
将所述第一特征图放大到第一尺寸,将第一尺寸的待处理图像和第一尺寸的第一特征图进行拼接处理;
通过目标编解码器对拼接后的特征图进行特征重建,得到目标图像,所述目标图像的清晰度高于所述待处理图像的清晰度;所述目标编解码器中包含第一预设数量的成对编解码器,所述第一编解码器为所述第一预设数量的成对编解码器中的至少一对。
一种图像处理装置,所述装置包括:
获取模块,用于获取待处理图像,将所述待处理图像从第一尺寸缩放到第二尺寸;
第一重建模块,用于通过第一编解码器对第二尺寸的待处理图像进行特征重建,得到第一特征图;
拼接模块,用于将所述第一特征图放大到第一尺寸,将第一尺寸的待处理图像和第一尺寸的第一特征图进行拼接处理;
目标重建模块,用于通过目标编解码器对拼接后的特征图进行特征重建,得到目标图像,所述目标图像的清晰度高于所述待处理图像的清晰度;所述目标编解码器中包含第一预设数量的成对编解码器,所述第一编解码器为所述第一预设数量的成对编解码器中的至少一对。
一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行如下步骤:
获取待处理图像,将所述待处理图像从第一尺寸缩放到第二尺寸;
通过第一编解码器对第二尺寸的待处理图像进行特征重建,得到第一特征图;
将所述第一特征图放大到第一尺寸,将第一尺寸的待处理图像和第一尺寸的第一特征图进行拼接处理;
通过目标编解码器对拼接后的特征图进行特征重建,得到目标图像,所述目标图像的清晰度高于所述待处理图像的清晰度;所述目标编解码器中包含第一预设数量的成对编解码器,所述第一编解码器为所述第一预设数量的成对编解码器中的至少一对。
一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行如下步骤:
获取待处理图像,将所述待处理图像从第一尺寸缩放到第二尺寸;
通过第一编解码器对第二尺寸的待处理图像进行特征重建,得到第一特征图;
将所述第一特征图放大到第一尺寸,将第一尺寸的待处理图像和第一尺寸的第一特征图进行拼接处理;
通过目标编解码器对拼接后的特征图进行特征重建,得到目标图像,所述目标图像的清晰度高于所述待处理图像的清晰度;所述目标编解码器中包含第一预设数量的成对编解码器,所述第一编解码器为所述第一预设数量的成对编解码器中的至少一对。
附图说明
为了更清楚地说明本申请实施例或相关技术中的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他实施例的附图。
图1为一个实施例中图像处理方法的应用环境图;
图2为一个实施例中图像处理方法的流程示意图;
图3为一个实施例中重建待处理图像的中间尺度特征的步骤的流程示意 图;
图4为一个实施例中编码器和解码器的内部结构图;
图5为另一个实施例中编码器和解码器的内部结构图;
图6为一个实施例中第一编解码器的处理单元输出特征图的步骤的流程示意图;
图7为一个实施例处理单元的内部结构图;
图8为一个实施例中第二编解码器的处理单元输出特征图的步骤的流程示意图;
图9(a)为一个实施例中图像处理方法的网络结构图;
图9(b)为另一个实施例中图像处理方法的网络结构图;
图10(a)为一个实施例中本方案的图像处理方法和传统的多种图像处理方法在去模糊方面的处理结果;
图10(b)为一个实施例中本方案的图像处理方法和传统的多种图像处理方法在去噪声方面的处理结果;
图11为一个实施例中图像处理方法应用于视频处理的流程示意图;
图12为另一个实施例中图像处理装置的结构框图;
图13为一个实施例中计算机设备的结构框图。
具体实施方式
为了便于理解本申请,下面将参照相关附图对本申请进行更全面的描述。附图中给出了本申请的较佳实施例。但是,本申请可以以许多不同的形式来实现,并不限于本文所描述的实施例。相反地,提供这些实施例的目的是使对本申请的公开内容的理解更加透彻全面。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
图1为一个实施例中图像处理方法的应用环境图。参照图1,该图像处理方法应用于图像处理系统。该图像处理系统包括终端110和服务器120。终端110和服务器120通过网络连接。终端110具体可以是台式终端或移动终端,移 动终端具体可以手机、平板电脑、笔记本电脑等中的至少一种。服务器120可以用独立的服务器或者是多个服务器组成的服务器集群来实现。本实施例中,终端110可独立实现该图像处理方法。该终端110也可通过与服务器120的交互配合实现该图像处理方法。
终端110可获取待处理图像,将该待处理图像发送给服务器120。服务器120接收该待处理图像,将该待处理图像从第一尺寸缩放到第二尺寸。服务器120通过第一编解码器对第二尺寸的待处理图像进行特征重建,得到第一特征图。服务器120将该第一特征图放大到第一尺寸,将该第一尺寸的待处理图像和第一尺寸的第一特征图进行拼接处理。服务器120通过目标编解码器对拼接后的特征图进行特征重建,得到目标图像,该目标图像的清晰度高于该待处理图像的清晰度;该目标编解码器中包含第一预设数量的成对编解码器,该第一编解码器为该第一预设数量的成对编解码器中的至少一对。服务器120将该目标图像返回给终端110。通过本实施里中的种图像处理方法,能够将模糊的、存在噪点的待处理图像重建为清晰图像。
在一个实施例中,终端获取到一张尺寸为300*400的人脸图像,该人脸图像中的人脸区域模糊,则终端通过该处理方法对该人脸图像进行特征提取和特征重建,以得到人脸区域清晰的人脸图像。
具体地,终端将该300*400的人脸图像缩放1/4尺寸,得到75*100的人脸图像。75*100的人脸图像中显示人脸区域的粗尺度特征。终端通过一对编解码器对该75*100的人脸图像特征重建,得到75*100的特征图,从而重建了人脸图像中人脸区域的粗尺度特征。
接着,终端将75*100的特征图放大两倍尺寸,得到150*200的特征图。并将300*400的人脸图像缩放1/2尺寸,得到150*200的人脸图像。150*200的人脸图像和150*200的特征图中显示人脸区域的中间尺度特征。接着,终端将150*200的特征图和150*200的人脸图像进行拼接。并通过两对编解码器对拼接后的特征图进行特征重建,得到重建后的150*200的特征图,从而重建了人脸图像中人脸区域的中间尺度特征。该两对编码器中有一对编码器与粗尺度 特征重建中所使用的编解码器相同。
接着,尺寸为300*400的人脸图像中显示人脸区域的细尺度特征。终端将150*200的特征图放大两倍尺寸,得到300*400的特征图。该尺寸为300*400的特征图中同样显示人脸区域的细尺度特征。终端将300*400的特征图和300*400的人脸图像进行拼接。通过三对编解码器对拼接后的特征图进行特征重建,得到300*400的目标图像,从而重建了人脸图像中人脸区域的细尺度特征。该三对编码器中有两对编码器与中间尺度特征重建中所使用的两对编解码器相同。
通过依次重建人脸图像中的人脸区域的粗尺度特征、中间尺度特征和细尺度特征,能够将模糊的人脸区域逐步变清晰。并且通过在增加编解码器的数量处理越来越细化的特征尺度,以降低细化的特征尺度的恢复难度,保证特征重建的准确性,从而得到清晰的人脸图像。
如图2所示,在一个实施例中,提供了一种图像处理方法。本实施例主要以该方法应用于上述图1中的终端110(或服务器120)来举例说明。参照图2,该图像处理方法具体包括如下步骤:
步骤202,获取待处理图像,将待处理图像从第一尺寸缩放到第二尺寸。
其中,待处理图像是指清晰度低的图像,例如模糊图像或存在噪点的图像。第一尺寸为获取的待处理图像的原始尺寸,第二尺寸为缩放后的待处理图像的尺寸,第二尺寸可以是待处理图像的第一尺寸的1/4、1/5、1/6等。
具体地,图像越小,图像的粗尺度特征越明显;图像越大,图像的细节特征越明显,即细尺度特征越明显。把模糊图像缩放到较小尺寸时,图像显示的细节特征较少,更多的是显示粗尺度的特征,则图像的模糊程度会明显减小。在这种情况下,对小尺寸的模糊图像的特征重建的难度会比原始尺寸的难度低。则终端可获取待处理图像,将该待处理图像缩放到第二尺寸,使得图像显示出粗尺度特征,以先对粗尺度特征中存在的模糊区域进行重建。
在本实施例中,终端获取待处理图像,并确定所获取的待处理图像的原始尺寸,将该原始尺寸作为第一尺寸。例如,终端所获取的待处理图像的尺 寸为330*400,则将330*400作为待处理图像的第一尺寸。终端所获取的待处理图像的尺寸为450*600,则将450*600作为该待处理图像的第一尺寸。接着,终端将第一尺寸的待处理图像缩放到第二尺寸,得到第二尺寸的待处理图像。
步骤204,通过第一编解码器对第二尺寸的待处理图像进行特征重建,得到第一特征图。
其中,第一编解码器是指成对的编码器和解码器。即第一编码器中包括编码器和该编码器对应的解码器。第一编解码器中包含成对编解码器,该成对编解码器的数量可根据需求设置,例如1对、2对等。
具体地,终端可将该第二尺寸的待处理图像输入第一网络,该第一网络为重建图像的粗尺度特征的网络。该第一网络中包含第一编解码器,第一编解码器的数量可根据需求设置。第一网络中的第一编解码器中的编码器对该第二尺寸的待处理图像进行特征提取,编码器将所提取的特征图输入该编码器对应的解码器进行解码,得到该解码器输出的特征图。该解码器输出的特征图即为第一特征图。
在本实施例中,当该第一编解码器中存在至少两对编解码器时,通过第一个编码器对第二尺寸的待处理图像进行编码,并通过下一个编码器对上一个编码器输出的特征图进行编码,直到得到第一编解码器中的最后一个编码器输出的特征图。接着,最后一个编码器输出的特征图输入对应的解码器中,通过解码器对特征图进行解码,得到最后一个解码器输出的特征图。该最后一个解码器输出的特征图即为第一特征图。
在本实施例中,第二尺寸可为第一尺寸的1/4,则第一编解码器对该1/4尺寸的待处理图像进行特征重建后,得到中间特征图。该中间特征图的尺寸同样为第一尺寸的1/4。接着,终端将该1/4尺寸的中间特征图放大两倍,得到第一特征图,该第一特征图的尺寸为第一尺寸的1/2。
步骤206,将第一特征图放大到第一尺寸,将第一尺寸的待处理图像和第一尺寸的第一特征图进行拼接处理。
其中,拼接处理是指图像的矩阵拼接或者特征图在通道维度上的并联。
具体地,得到第一特征图之后,终端可将该第一特征图放大到与所获取的待处理图像相同的尺寸,即第一尺寸,以先对细尺度特征中存在的模糊区域进行重建。接着,终端将该第一尺寸的待处理图像和该第一尺寸的第一特征图进行拼接处理。
在本实施例中,终端可确定第一尺寸的待处理图像对应的矩阵,以及第一尺寸的第一特征图对应的矩阵,将该两个矩阵进行拼接。终端可确定第一尺寸的待处理图像的通道维度和第一尺寸的第一特征图的通道维度,按照通道维度将第一尺寸的待处理图像和第一尺寸的第一特征图进行并联。
例如,第一尺寸的待处理图像和第一尺寸的第一特征图均为RGB彩色图像,RGB彩色图像的通道数为3,则终端将该第一尺寸的待处理图像的R、G、B通道和该第一尺寸的第一特征图的R、G、B通道进行并联,得到通道数为6的特征图。
步骤208,通过目标编解码器对拼接后的特征图进行特征重建,得到目标图像,该目标图像的清晰度高于待处理图像的清晰度;该目标编解码器中包含第一预设数量的成对编解码器,该第一编解码器为该第一预设数量的成对编解码器中的至少一对。
其中,目标编解码器是指成对的编码器和解码器,该第一编解码器作为目标编解码器中的成对编解码器,是目标编解码器中的组成部分。目标编解码器中包含第一预设数量的成对编解码器,当第一编解码器中包含一对编解码器时,该第一编解码器作为第一预设数量的成对编解码器中的一对。当第一编解码器中包含两对编解码器时,该第一编解码器作为第一预设数量的成对编解码器中的两对,以此类推。
具体地,终端可将该拼接后的特征图输入目标网络,该目标网络为重建图像的细尺度特征的网络。该目标网络中包含目标编解码器,该目标编解码器中包含成对编解码器,该目标编解码器中的成对编解码器的数量为第一预设数量。并且第一编解码器作为目标编解码器中的一部分。目标网络中的目标编解码器中的第一个编码器对该拼接后的特征图进行特征提取,并将上一 编码器输出的特征图作为下一编码器的输入,直到得到目标编解码器中的最后一个编码器输出的特征图。接着,最后一个编码器输出的特征图输入对应的解码器中,通过解码器对特征图进行解码。将上一解码器输出的特征图作为下一解码器的输入,得到目标编解码器中的最后一个解码器输出的目标图像。
上述图像处理方法,获取待处理图像,将待处理图像从第一尺寸缩放到第二尺寸,通过第一编解码器对第二尺寸的待处理图像进行特征重建,得到第一特征图,从而可以完成待处理图像的粗尺度特征的重建。将第一特征图放大到与待处理图像相同的第一尺寸,将第一尺寸的待处理图像和第一尺寸的第一特征图进行拼接处理,以对待处理图像的细尺度特征进行重建。通过目标编解码器对拼接后的特征图进行特征重建,得到目标图像,该目标编解码器中包含第一预设数量的成对编解码器,该第一编解码器为该第一预设数量的成对编解码器中的至少一对,从而将低清晰度的待处理图像重建为高清晰度的目标图像。通过对待处理图像的粗尺度特征和细尺度特征进行重建,能够将模糊的、存在噪点的待处理图像重建为清晰图像。
在一个实施例中,如图3所示,在该将该第一特征图放大到第一尺寸,将该第一尺寸的待处理图像和第一尺寸的第一特征图进行拼接处理之前,还包括:
步骤302,将第一尺寸的待处理图像缩放到与该第一特征图相同的尺寸,将第一特征图和相同尺寸的待处理图像进行拼接处理,得到第二特征图。
具体地,第一特征图为重建了待处理图像的粗尺度特征的特征图,则终端可进一步重建该待处理图像的中间尺度的特征。该中间尺度的特征为粗尺度特征和细尺度特征之间的特征。终端可确定该第一特征图的尺寸,并将该第一尺寸的待处理图像缩放到与该第一特征图相同的尺寸。进一步地,该第一特征图的尺度为该待处理图像的第一尺寸的1/2,则终端将第一尺寸的待处理图像缩放为第一尺寸的1/2,得到与第一特征图相同尺寸的待处理图像。接着,终端可将第一特征图和缩放后的相同尺寸的待处理图像进行矩阵拼接, 得到第二特征图。或者,终端将第一特征图和缩放后的相同尺寸的待处理图像在通道维度上进行并联处理,得到第二特征图。
步骤304,通过第二编解码器对第二特征图进行特征重建,得到重建后的第一特征图;该第二编解码器中包含第二预设数量的成对编解码器,该第一编解码器为该第二预设数量的成对编解码器中的至少一对;该第一预设数量的成对编解码器中包含该第二预设数量的成对编解码器。
其中,第二编解码器是指成对的编码器和解码器。第一编解码器作为第二编解码中的成对编解码器,是第二编解码器的组成部分。第二编解码器中包含第二预设数量的成对编解码器,当第一编解码器中包含一对编解码器时,该第一编解码器作为第二预设数量的成对编解码器中的一对。当第一编解码器中包含两对编解码器时,该第一编解码器作为第二预设数量的成对编解码器中的两对,以此类推。该目标编解码器中包含第一预设数量的成对编解码器,该第一预设数量的成对编解码器中包含了第二预设数量的成对编解码器。
具体地,终端可将该第二特征图输入第二网络,该第二网络为重建图像的中间尺度特征的网络。该第二网络中包含第二编解码器,该第二编解码器中包含第二预设数量的成对编解码器,该第二预设数量可根据需求设置。并且,在第二网络中,第一编解码器作为第二编解码器的一部分。该第二编解码器中的第一个编码器对该第二特征图进行特征提取,并将上一编码器输出的特征图作为下一编码器的输入,直到得到第二编解码器中的最后一个编码器输出的特征图。
接着,最后一个编码器输出的特征图输入对应的解码器中,第二编解码器中编码器和解码器的数量相同。并将上一解码器输出的特征图作为下一解码器的输入,通过解码器对特征图进行解码,得到第二编解码器中的最后一个解码器输出的特征图。该第二编解码器中的最后一个解码器输出的特征图即为重建后的第一特征图。
该将第一特征图放大到第一尺寸,将第一尺寸的待处理图像和第一尺寸的第一特征图进行拼接处理,包括:
步骤306,将重建后的第一特征图放大到第一尺寸,将第一尺寸的待处理图像和第一尺寸的重建后的第一特征图进行拼接处理。
具体地,重建后的第一特征图为重建了待处理图像的中间尺度特征的特征图,则终端可进一步重建该待处理图像的细尺度的特征。该细尺度的特征为比中间尺度特征更细节更具体的特征。终端可确定所获取的待处理图像的尺寸,即第一尺寸,并将重建后的第一特征图放大到与待处理图像相同的第一尺寸。进一步地,该重建后的第一特征图的尺寸为第一尺寸的1/2,则终端将重建后的第一特征图放大两倍,得到第一尺寸的重建后的第一特征图。接着,终端可将第一尺寸的待处理图像和第一尺寸的重建后的第一特征图进行矩阵拼接,得到拼接后的特征图。或者,终端将第一尺寸的待处理图像和第一尺寸的重建后的第一特征图在通道维度上进行并联处理,得到拼接后的特征图。
接着,终端可将该拼接后的特征图输入目标网络,通过目标网络中的目标编码器对该拼接后的特征图进行特征重建,以重建待处理图像的细尺度特征,从而得到高清晰度的目标图像。该目标图像的尺寸为第一尺寸。
本实施例中,将第一尺寸的待处理图像缩放到与第一特征图相同的尺寸,将第一特征图和相同尺寸的待处理图像进行拼接处理,得到第二特征图,通过第二编解码器对第二特征图进行特征重建,以对待处理图像的中间尺度特征进行重建,得到重建后的第一特征图;该第二编解码器中包含第二预设数量的成对编解码器,该第一编解码器为该第二预设数量的成对编解码器中的至少一对;该第一预设数量的成对编解码器中包含该第二预设数量的成对编解码器。将重建后的第一特征图放大到第一尺寸,将第一尺寸的待处理图像和第一尺寸的重建后的第一特征图进行拼接处理,以对待处理图像的细尺度特征进行重建,从而将低清晰度图像重建为高清晰度图像。
在一个实施例中,该将该第一特征图和相同尺寸的待处理图像进行拼接处理,得到第二特征图,包括:将第一特征图和相同尺寸的待处理图像在通道维度上进行并联处理,得到第二特征图。
其中,通道维度是指图像的通道数。例如RGB图像的通道数为3,黑白图像的通道数为1。
具体地,终端确定第一特征图的通道维度,并确定与第一特征图相同尺寸的待处理图像中的相同通道维度。接着,终端可将第一特征图和相同尺寸的待处理图像在相同的通道维度上进行拼接,得到第二特征图。
例如,第一特征图和相同尺寸的待处理图像均为RGB图像。RGB彩色图像的通道数为3,终端将第一特征图的R、G、B通道和相同尺寸的待处理图像的R、G、B通道进行并联,得到通道数为6的第二特征图。
本实施例中,将第一特征图和相同尺寸的待处理图像在通道维度上进行并联处理,能够将第一特征图的特征和待处理图像的特征关联起来,使得得到的第二特征图具备更多的特征信息,从而对待处理图像特征的重建更准确。
在一个实施例中,该通过第一编解码器对第二尺寸的待处理图像进行特征重建,得到第一特征图,包括:
将第二尺寸的待处理图像输入第一编解码器的处理单元,得到输出特征图;将第一编解码器中的上一处理单元的输出特征图和该第二尺寸的待处理图像进行拼接处理,将拼接后的特征图作为下一处理单元的输入,直到该第一编解码器的最后一处理单元输出第一特征图。
具体地,一个编码器中包含至少两个处理单元,一个解码器中也包含至少两个处理单元。每个处理单元的内部结构相同。终端将第二尺寸的待处理图像输入第一编解码器,由第一编解码器中的第一处理单元对该第二尺寸的待处理图像进行特征提取,得到第一处理单元输出的输出特征图。该第一处理单元为第一编解码器中的第一个处理单元。接着,终端将该第一处理单元的输出特征图和该第一处理单元的输入特征图进行拼接处理,将拼接后的特征图输入第二处理单元,得到第二处理单元的输出特征图。
类似地,终端将第一编解码器中的上一处理单元的输出特征图和第一单元的输入特征图(即第二尺寸的待处理图像)进行拼接处理,将拼接后的特征图作为下一处理单元的输入,直到得到第一编解码器的最后一处理单元的 输出特征图。该第一编解码器的最后一处理单元的输出特征图即为第一编解码器输出的第一特征图。
如图4所示,为一个实施例中编码器和解码器的内部结构。每个编码器和解码器中均包含多个处理单元,如M1,M2,M3和M4。从第二处理单元开始,将上一处理单元的输出特征图和第一单元的输入特征进行拼接处理,将拼接后的特征图作为下一处理单元的输入。例如,M1为第一编解码器中的编码器的第一处理单元,终端将第二尺寸的待处理图像输入第一处理单元,得到输出特征图。接着,从M2开始,将M1的输出特征图和M1的输入特征进行拼接处理,将拼接后的特征图作为M2的输入。将M2的输出特征图和M1的输入特征进行拼接处理,将拼接后的特征图作为M3的输入。将M3的输出特征图和M1的输入特征进行拼接处理,将拼接后的特征图作为M4的输入,得到M4输出的特征图。
本实施例中,将第二尺寸的待处理图像输入第一编解码器的处理单元,并将第一编解码器中的上一处理单元的输出特征图和该第二尺寸的待处理图像进行拼接处理,将拼接后的特征图作为下一处理单元的输入,使得能够将上一处理单元恢复的部分特征和原始图像的信息进行关联,使得下一处理单元更迅速地进行特征重建。并且在增加每个处理单元的有效特征信息的同时,减少了特征重建过程中计算量,避免了融合难度高的问题。
在一个实施例中,该通过第一编解码器对第二尺寸的待处理图像进行特征重建,得到第一特征图,包括:
将该第二尺寸的待处理图像输入该第一编解码器的处理单元,得到输出特征图;将该第一编解码器中的上一处理单元的输出特征图和该上一处理单元的输入特征图进行融合处理,将融合后的特征图作为下一处理单元的输入,直到该第一编解码器的最后一处理单元输出第一特征图。
其中,融合处理是指将两个特征图相加,例如,将两个特征图对应的权重矩阵相加。
具体地,终端将第二尺寸的待处理图像输入第一编解码器,由第一编解 码器中的第一处理单元对该第二尺寸的待处理图像进行特征提取,得到第一处理单元输出的输出特征图。接着,终端将该第一处理单元的输出特征图和该第一处理单元的输入特征图进行融合处理,即将该第一处理单元的输出特征图和该第二尺寸的待处理图像进行融合处理,融合后的特征图输入第二处理单元,得到第二处理单元的输出特征图。
接着,终端将该第二处理单元的输出特征图和该第二处理单元的输入特征图进行融合处理,将融合处理后的特征图作为第二处理单元的下一处理单元的输入。
类似地,从该第一编解码器中的第二处理单元开始,终端将上一处理单元的输出特征图和上一处理单元的输入特征图进行融合处理,将融合后的特征图作为下一处理单元的输入,直到得到第一编解码器的最后一处理单元的输出特征图。该第一编解码器的最后一处理单元的输出特征图即为第一编解码器输出的第一特征图。
如图5所示,为一个实施例中编码器和解码器的内部连接结构。每个编码器和解码器中均包含多个处理单元,如M1,M2,M3和M4。从第二处理单元开始,将上一处理单元的输出特征图和该上一单元的输入特征进行拼接处理,将拼接后的特征图作为下一处理单元的输入。例如,M1为第一编解码器中的编码器的第一处理单元,终端将第二尺寸的待处理图像输入第一处理单元,得到输出特征图。接着,从M2开始,将M1的输出特征图和M1的输入特征进行拼接处理,将拼接后的特征图作为M2的输入。将M2的输出特征图和M2的输入特征进行拼接处理,将拼接后的特征图作为M3的输入。将M3的输出特征图和M3的输入特征进行拼接处理,将拼接后的特征图作为M4的输入,得到M4输出的特征图。将M4输出的特征图和M4的输入特征进行拼接处理。
本实施例中,将该第二尺寸的待处理图像输入该第一编解码器的处理单元,将该第一编解码器中的上一处理单元的输出特征图和该上一处理单元的输入特征图进行融合处理,能够将上一处理单元恢复的部分特征和该处理单元未恢复的特征信息进行融合。将融合后的特征图作为下一处理单元的输入, 使得下一处理单元能够根据更多的特征信息进行特征重建,直到该第一编解码器的最后一处理单元输出第一特征图。在增加每个处理单元的有效特征信息的同时,减少了特征重建过程中计算量,避免了融合难度高的问题。
本实施例中编解码器的各个处理单元之间采用引导连接的方式,该引导连接提供了两种跳跃连接方式,如图4和图5所示。第一种是将输入的第二尺寸的待处理图像连接到每一个中间处理单元中作为输入的一部分。第二种将每个处理单元的输入和输出连接起来。按照这两种处理方式,每个处理单元都能够保持基本相同的输入特征,并不会增加后面处理单元的特征提取难度。并且保持每个处理单元是逐步优化特征,而不会导致处理单元之间的特征偏离太多。引导连接的公式可以表示为
Figure PCTCN2020122160-appb-000001
Figure PCTCN2020122160-appb-000002
其中,L i代表第i个处理单元;而
Figure PCTCN2020122160-appb-000003
和m i和分别是第i个处理单元的输入和输出。其中,输入
Figure PCTCN2020122160-appb-000004
在第一种方式中是指并联了前一个处理单元的输出和第二尺寸的待处理图像特征而得到的特征图。输入
Figure PCTCN2020122160-appb-000005
在第二种方式中是指并联了前一个处理单元的输出和前一个处理单元的输入而得到的特征图。
在一个实施例中,如图6所示,该将第二尺寸的待处理图像输入该第一编解码器的处理单元,得到输出特征图,包括:
步骤602,将第二尺寸的待处理图像输入该第一编解码器中的处理单元的卷积层。
具体地,一个处理单元中包含至少两个卷积层。每个卷积层的尺寸可相同也可不相同。终端将第二尺寸的待处理图像输入第一编解码器中的第一处理单元,由第一编解码器中的第一处理单元中的第一卷积层对该第二尺寸的待处理图像进行特征提取,得到第一卷积层输出的输出特征图。该第一卷积层为第一编解码器中的第一处理单元的第一个卷积层。该第一处理单元的输入特征图即为第二尺寸的待处理图像。
接着,终端将该第一处理单元的第一卷积层的输出特征图和该第一卷积层所在的处理单元的输入特征图作为第二卷积层的输入;即将该第一处理单元中的第一卷积层的输出特征图和该第一处理单元的输入特征图(即第二尺寸的待处理图像)输入第二卷积层,得到第二卷积层的输出特征图。
类似地,从第一处理单元中的第二卷积层开始,终端将第一单元中的上一卷积层的输出特征图和第一单元的输入特征图(即第二尺寸的待处理图像)作为下一卷积层的输入,直到得到第一单元的最后一卷积层的输出特征图。
接着,终端得到第一编解码器中第一处理单元中的各个卷积层的输出特征图后,将第一处理单元中的各个卷积层的输出特征图进行拼接处理,得到该第一处理单元的输出特征图。
步骤604,将该第一编解码器的处理单元中的上一卷积层的输出特征图和该处理单元的输入特征图作为下一卷积层的输入,直到得到该处理单元中的最后一卷积层的输出特征图。
具体地,从第二处理单元开始,将上一处理单元的输出特征图和该第一处理单元的输入特征图进行拼接,作为下一处理单元的输入特征。或者,将上一处理单元的输出特征图和该上一处理单元的输入特征图进行融合,作为下一处理单元的输入特征。
接着,对于第一编解码器中的第二处理单元中的各个卷积层,将上一卷积层的输出特征图和第二处理单元的输入特征图作为下一卷积层的输入。
类似地,从第一编解码器中的第二处理单元开始,对于各个处理单元中的各个卷积层,将处理单元中的上一卷积层的输出特征图和该处理单元的输入特征图作为该处理单元中的下一卷积层的输入,直到得到该处理单元中的最后一卷积层的输出特征图。
步骤606,将该处理单元的输入特征图和该处理单元中各卷积层的输出特征图进行拼接处理,得到该处理单元的输出特征图。
具体地,终端得到第一编解码器中的处理单元中的各个卷积层的输出特征图后,将该处理单元中的各个卷积层的输出特征图进行拼接处理,得到该 处理单元的输出特征图。
本实施例中,将第二尺寸的待处理图像输入该第一编解码器中的处理单元的卷积层,将该第一编解码器的处理单元中的上一卷积层的输出特征图和该处理单元的输入特征图作为下一卷积层的输入,直到得到该处理单元中的最后一卷积层的输出特征图,将该处理单元的输入特征图和该处理单元中各卷积层的输出特征图进行拼接处理,得到该处理单元的输出特征图,使得能够通过多级的特征融合更好地区分图像细节内容和图像中的破坏特征,然后更进一步地重建图像的粗尺度的真实结构信息和纹理。
一个处理单元的内部结构如图7所示,多级的特征融合可以使神经网络更好地学习区分图像内容和图像被破坏的特征,然后更好地用来重建图像的真实结构信息和纹理。多级特征融合的公式如下:
Figure PCTCN2020122160-appb-000006
其中f n-1和f n分别是上一个单元的输出和当前单元的输出,σ是指非线性激活函数;
Figure PCTCN2020122160-appb-000007
是指第一个卷积层的偏移项;
Figure PCTCN2020122160-appb-000008
是指第一个卷积层的权重项。
通过一个跳跃连接使得中间的卷积层来学习特征残差。第一层卷积处理上一个单元的输出得到
Figure PCTCN2020122160-appb-000009
即通过3×3的第一层卷积对f n-1进行卷积处理,得到输出特征图,并通过激活层对输出特征图进行修正,得到
Figure PCTCN2020122160-appb-000010
第二层卷积将第一层的的输出特征图
Figure PCTCN2020122160-appb-000011
和上一个单元的输出特征图f n-1并联作为输入然后得到
Figure PCTCN2020122160-appb-000012
第三层卷积将第二层的的输出特征图
Figure PCTCN2020122160-appb-000013
和上一个单元的输出特征图f n-1并联作为输入然后得到
Figure PCTCN2020122160-appb-000014
前三层经过3×3的卷积处理,并在每个卷积层之后通过激活层对输出特征图进行修正,得到各层的输出特征图。第四层卷积对前三层的输出特征图进行了特征融合,得到
Figure PCTCN2020122160-appb-000015
Figure PCTCN2020122160-appb-000016
和上一个单元的 输出特征图f n-1进行融合处理,得到当前处理单元的输出特征图f n
在一个实施例中,该通过第二编解码器对该第二特征图进行特征重建,得到重建后的第一特征图,包括:
将该第二特征图输入该第二编解码器的处理单元,得到输出特征图;将该第二编解码器中的上一处理单元的输出特征图和该第二特征图进行拼接处理,得到下一处理单元的输入,直到该第二编解码器的最后一处理单元输出重建后的第一特征图。
具体地,第二编解码器中一个编码器中包含至少两个处理单元,一个解码器中也包含至少两个处理单元。每个处理单元的内部结构相同。终端将第二特征图输入第二编解码器,由第二编解码器中的第一处理单元对该第二特征图进行特征提取,得到第一处理单元输出的输出特征图。该第一处理单元为第二编解码器中的第一个处理单元。接着,终端将该第一处理单元的输出特征图和该第一处理单元的输入特征图进行拼接处理,将拼接后的特征图输入第二处理单元,得到第二处理单元的输出特征图。
类似地,终端将第二编解码器中的上一处理单元的输出特征图和第一单元的输入特征图(即第二特征图)进行拼接处理,将拼接后的特征图作为下一处理单元的输入,直到得到第二编解码器的最后一处理单元的输出特征图。该第二编解码器的最后一处理单元的输出特征图即为第二编解码器输出的第一特征图。
本实施例中,将该第二特征图输入该第二编解码器的处理单元,得到输出特征图,将该第二编解码器中的上一处理单元的输出特征图和该第二特征图进行拼接处理,能够将上一处理单元恢复的部分特征和上一处理单元重建的特征图进行关联,使得下一处理单元更迅速地进行特征重建。将拼接后的特征图作为下一处理单元的输入,直到该第二编解码器的最后一处理单元输出重建后的第一特征图,从而重建了待处理图像的中间尺度的特征信息。
在一个实施例中,该通过第二编解码器对该第二特征图进行特征重建,得到重建后的第一特征图,包括:
将该第二特征图输入该第二编解码器的处理单元,得到输出特征图;将该第二编解码器中的上一处理单元的输出特征图和该上一处理单元的输入特征图进行融合处理,得到下一处理单元的输入,直到该第二编解码器的最后一处理单元输出重建后的第一特征图。
具体地,终端将第二特征图输入第二编解码器,由第二编解码器中的第一处理单元对该第二特征图进行特征提取,得到第一处理单元输出的输出特征图。接着,终端将该第一处理单元的输出特征图和该第一处理单元的输入特征图进行融合处理,即将该第一处理单元的输出特征图和该第二特征图进行融合处理,融合后的特征图输入第二处理单元,得到第二处理单元的输出特征图。
接着,终端将该第二处理单元的输出特征图和该第二处理单元的输入特征图进行融合处理,将融合处理后的特征图作为第二处理单元的下一处理单元的输入。
类似地,从该第二编解码器中的第二处理单元开始,终端将上一处理单元的输出特征图和上一处理单元的输入特征图进行融合处理,将融合后的特征图作为下一处理单元的输入,直到得到第二编解码器的最后一处理单元的输出特征图。该第二编解码器的最后一处理单元的输出特征图即为第二编解码器输出的第一特征图。
本实施例中,将该第二特征图输入该第二编解码器的处理单元,并将同一个处理单元的输出特征和输入特征进行融合,能够减少计算量并且保证了特征之间的相关性,使得特征图中的各特征之间的差异更明显,以更好地对中间尺度的特征进行重建。
在一个实施例中,如图8所示,该将该第二特征图输入该第二编解码器的处理单元,得到输出特征图,包括:
步骤802,将该第二特征图输入该第二编解码器中的处理单元的卷积层,得到输出特征图。
具体地,终端将第二特征图输入第二编解码器中的第一处理单元,由第二编解码器中的第一处理单元中的第一卷积层对该第二尺寸的待处理图像进行特征提取,得到第一卷积层输出的输出特征图。该第一卷积层为第二编解码器中的第一处理单元的第一个卷积层。该第一处理单元的输入特征图即为第二尺寸的待处理图像。
接着,终端将该第一处理单元的第一卷积层的输出特征图和该第一卷积层所在的处理单元的输入特征图作为第二卷积层的输入;即将该第一处理单元中的第一卷积层的输出特征图和该第一处理单元的输入特征图(即第二尺寸的待处理图像)输入第二卷积层,得到第二编解码器中的第一处理单元的第二卷积层的输出特征图。
类似地,从第一处理单元中的第二卷积层开始,终端将第一单元中的上一卷积层的输出特征图和第一单元的输入特征图(即第二特征图)作为下一卷积层的输入,直到得到第一单元的最后一卷积层的输出特征图。
接着,终端得到第二编解码器中第一处理单元中的各个卷积层的输出特征图后,将第一处理单元中的各个卷积层的输出特征图进行拼接处理,得到该第一处理单元的输出特征图。
步骤804,将该第二编解码器的处理单元中的上一卷积层的输出特征图和该处理单元的输入特征图作为下一卷积层的输入,直到得到该第二编解码器的该处理单元中的最后一卷积层的输出特征图。
具体地,从第二编解码器中的第二处理单元开始,将上一处理单元的输出特征图和该第一处理单元的输入特征图进行拼接,作为下一处理单元的输入特征。或者,将上一处理单元的输出特征图和该上一处理单元的输入特征图进行融合,作为下一处理单元的输入特征。
接着,对于第二编解码器中的第二处理单元中的各个卷积层,将上一卷积层的输出特征图和第二处理单元的输入特征图作为下一卷积层的输入。
类似地,从第二编解码器中的第二处理单元开始,对于各个处理单元中的各个卷积层,将处理单元中的上一卷积层的输出特征图和该处理单元的输 入特征图作为该处理单元中的下一卷积层的输入,直到得到该处理单元中的最后一卷积层的输出特征图。
步骤806,将该处理单元的输入特征图和第二编解码器的该处理单元中的各卷积层的输出特征图进行拼接处理,得到第二编解码器中的该处理单元的输出特征图。
具体地,终端得到第二编解码器中的处理单元中的各个卷积层的输出特征图后,将该处理单元中的各个卷积层的输出特征图进行拼接处理,得到该处理单元的输出特征图。
本实施例中,将第二特征图输入第二编解码器中的处理单元的卷积层,将第二编解码器的处理单元中的上一卷积层的输出特征图和处理单元的输入特征图作为下一卷积层的输入,直到得到第二编解码器的处理单元中的最后一卷积层的输出特征图,将处理单元的输入特征图和处理单元中的各卷积层的输出特征图进行拼接处理,得到第二编解码器中的处理单元的输出特征图,使得能够通过多级特征融合更好地区分图像细节内容和图像中的破坏特征,然后更进一步地重建图像的中间尺度的真实结构信息和纹理。
在一个实施例中,该将该处理单元的输入特征图和该处理单元中的各卷积层的输出特征图进行拼接处理,得到第二编解码器中的该处理单元的输出特征图,包括:
将该处理单元中的各卷积层的输出特征图进行拼接处理;将各卷积层的输出特征图拼接处理后的特征图和该处理单元的输入特征图进行融合处理,得到第二编解码器中的该处理单元的输出特征图。
具体地,终端得到第二编解码器中的处理单元中的各卷积层的输出特征图之后,可将各卷积层的输出特征图进行拼接处理。进一步地,终端可确定各卷积层的输出特征图对应的权重矩阵,将各卷积层的输出特征图对应的权重矩阵相加,得到拼接后的特征图。接着,终端将拼接后的特征图和该处理单元的输入特征图进行融合处理。该融合处理可以是将拼接后的特征图和该 处理单元的输出特征图在通道维度上进行并联,从而得到该处理单元的输出特征图。
例如,终端得到第二编解码器中第一处理单元中的各卷积层的输出特征图之后,可将各卷积层的输出特征图进行拼接处理。接着,终端将拼接后的特征图和该第一处理单元的输入特征图进行融合处理,该第二编解码器中第一处理单元的输入为第二特征图,即将拼接后的特征图和该第二特征图进行融合处理,得到该第一处理单元的输出特征图。
对于第二编解码器中的第二处理单元,将第二处理单元中的各卷积层的输出特征图进行拼接处理。将拼接后的特征图和该第二处理单元的输入特征图进行融合处理,得到该第二处理单元的输出特征图。
本实施例中,将该处理单元中的各卷积层的输出特征图进行拼接处理,将各卷积层的输出特征图拼接处理后的特征图和该处理单元的输入特征图进行融合处理,得到第二编解码器中的该处理单元的输出特征图,从而能够对多级特征进行融合,使得能够根据更多的特征信息重建中间尺度的特征,也能够对细尺度特征的重建提供更多参考信息,减少细尺度特征重建的难度。
在一个实施例中,该第二预设数量的成对编解码器中包含第一编解码器和第三编解码器,该第三编解码器处于第一编解码器中的编码器和解码器之间;该第一预设数量的成对编解码器中包含第二编解码器和第四编解码器,该第四编解码器处于第二编解码器中的编码器和解码器之间。
具体地,第一网络中包含第一编解码器,第二网络中包含第二编解码器。第二编解码器中包含第二预设数量的成对编解码器,该第二预设数量的成对编解码器中除该第一编解码器以外的其余成对编解码器称为第三编解码器。第三编解码器处于该第一编解码器中的编码器和解码器之间,当第一网络输出的特征图和待处理图像拼接输入第二网络后,第二编码器中所包含的第一编码器的编码器对拼接图进行处理,然后将该编码器的输出作为下一个编码器的输出,直到第一编码器的最后一个编码器输出特征图。接着,第一编码器的最后一个编码器输出特征图作为第三编解码器中的第一个编码器的输 入,得到第三编解码器中的最后一个编码器的输出。该第三编解码器中的最后一个编码器的输出作为第三编解码器中的对应的解码器的输入,得到第三编解码器中的最后一个解码器的输出。第三编解码器中的最后一个解码器的输出作为第一编解码器中的对应的解码器的输入,从而得到第一编解码器中的最后一个解码器的输出特征图,即得到第二网络重建后的第一特征图。
接着,第三网络中包含目标编解码器,该目标编解码器中包含第一预设数量的成对编解码器。该第一预设数量的成对编解码器中除该第二编解码器以外的其余成对编解码器称为第四编解码器。第四编解码器处于该第二编解码器的编码器和解码器之间。将第二网络输出的特征图放大到第一尺寸,并和第一尺寸的待处理图像拼接输入第三网络后,第三网络中的处理方式与第二网络中的处理方式原理相同,从而得到第三网络输出的目标图像。
本实施例中,通过三个网络中依次增加编解码器的数量以分别对待处理图像的粗尺度特征、中间尺度特征和细尺度特征进行重建,能够从明显特征的重建一步步深入到细节特征的重建,使得重建的特征更准确,从而得到清晰的图像,实现图像的去模糊去噪处理。
如图9(a)所示,为一个实施例中图像处理方法的网络结构图。图9(a)中的图像处理方法存在三个网络,第一网络为粗尺度特征重建网络,第二网络为中间尺度特征重建网络,第三网络为细尺度特征重建网咯。每个尺度的网络都是一个编解码网络,但是每个网络中由不同数量的成对编码器/解码器组成。对于粗尺度来说,待处理图像的原始尺寸被缩小到1/4尺寸来处理,而粗尺度的网络由一对编码器E 1/解码器D 1构成。
Figure PCTCN2020122160-appb-000017
其中x 1和y 1分别是第一网络的输入图像(即1/4尺寸待处理图像)和第一网络的输出特征图。接下来第一网络的输出特征图y 1被放大两倍,得到待处理图像的原始尺寸的1/2尺寸的特征图(即第一特征图),再将待处理图像缩小到原始尺寸的1/2,将1/2尺寸的x 2和1/2尺寸的特征图在通道维度上并联之后作为第二网络的输入
Figure PCTCN2020122160-appb-000018
(即第二特征图)。
对于第二网络,因为恢复难度的增加,第二网络设计了两对编码器/解码器来进行中间尺度的恢复。其中,编码器E 2/解码器D 2被放入编码器E 1/解码器D 1中间,目的是处理中间尺度更复杂的特征提取和重建。中间尺度处理之后得到结果y 2(即重建后的第一特征图)。
Figure PCTCN2020122160-appb-000019
同理,在第三网络中,编码器E 3/解码器D 3被放入编码器E 2/解码器D 2中间,从而得到细尺度的网络结构。细尺度的输入
Figure PCTCN2020122160-appb-000020
是由中间尺度的结果y 2被放大两倍再与原始尺寸的待处理图像x 3在通道维度上并联之后得到。细尺度处理之后得到目标图像y 3
Figure PCTCN2020122160-appb-000021
该图像处理的整个结构由三个网络组成,分别处理粗、中、细尺度的图像。通过N个配对的破坏/清晰图像组来训练该整个网络。可通过以下目标函数进行训练:
Figure PCTCN2020122160-appb-000022
其中
Figure PCTCN2020122160-appb-000023
Figure PCTCN2020122160-appb-000024
分别是在尺度i下破坏图像和清晰图像;θ i是尺度i的子网络参数。T i是尺度i下图像的像素个数,这里是用来归一化。L是优化的目标函数;N是图像配对的数量;S是图像的尺度数量;k是指图像的编号。
如图9(b)所示,为一个实施例中图像处理方法的网络结构图。图9(b)中的图像处理方法存在三个网络,第一网络为粗尺度特征重建网络,包含成对的编码器E1和解码器D1。第二网络为中间尺度特征重建网络,包含成对的编码器E1和解码器D1,成对的编码器E2和解码器D2。第三网络为细尺度特征重建网络,包含成对的编码器E1和解码器D1、成对的编码器E2和解码器D2,以及成对的编码器E3和解码器D3。
如图9(b)所示,编码器E1、编码器E2和编码器E3的内部结构相同,均包含处理单元M1、M2、M3和M4。解码器D1、解码器D2和解码器D3的内部 结构相同,均包含处理单元M1、M2、M3和M4。
第一网络中,待处理图像x 1输入编码器E1中的处理单元M1,得到M1的输出特征图。接着,从M2开始,将M1的输出特征图和M1的输入特征进行拼接处理,将拼接后的特征图作为M2的输入。将M2的输出特征图和M2的输入特征进行拼接处理,将拼接后的特征图作为M3的输入。将M3的输出特征图和M3的输入特征进行拼接处理,将拼接后的特征图作为M4的输入,得到M4输出的特征图。将M4输出的特征图和M4的输入特征进行拼接处理,得到编码器E1输出的特征图。
接着,将编码器E1输出的特征图输入解码器D1中的处理单元M1,得到M1的输出特征图。接着,从解码器D1中的处理单元M2开始,将M1的输出特征图和M1的输入特征进行拼接处理,将拼接后的特征图作为M2的输入。将M2的输出特征图和M2的输入特征进行拼接处理,将拼接后的特征图作为M3的输入。将M3的输出特征图和M3的输入特征进行拼接处理,将拼接后的特征图作为M4的输入,得到M4输出的特征图。将M4输出的特征图和M4的输入特征进行拼接处理,得到解码器D1输出的特征图y 1
编码器中的各处理单元和解码器中的各处理单元的内部结构相同,均如图9(b)中所示的处理单元M1的结构。
f n-1是上一个单元的输出特征图。对于第一网络中的编码器E1的第一个处理单元M1而言,f n-1为即为待处理图像x 1。待处理图像x 1输入编码器E1的第一个处理单元M1中,处理单元M1中通过3×3的第一层卷积对f n-1进行卷积处理,得到输出特征图,并通过激活层对输出特征图进行修正,得到
Figure PCTCN2020122160-appb-000025
该处理过程与上述图4中的处理过程相同,在此不再赘述。
本实施例中通过三个网络,分别重建待处理图像的粗、中、细尺度特征,从而得到清晰的目标图像,如图9(b)中所示的目标图像y 3
如图10(a)所示,为本实施例中的图像处理方法和传统的多种图像处理方法在去模糊方面的处理结果。图中的第一列为待处理图像,第二列、第三列和第四列为传统的图像处理方法对待处理图像进行特征重建后得到的目标 图像。最后一列为本实施例中的图像处理方法对待处理图像进行特征重建后得到的目标图像。从图10(a)中可看出,最后一列的目标图像相对于第二列、第三列和第四列的目标图像更为清晰。也就是说,本实施例中的图像处理方法在图像去模糊方面的效果比传统的处理方式的效果更好。
如图10(b)所示,为本实施例中的图像处理方法和传统的多种图像处理方法在去噪声方面的处理结果。图中的第一列为待处理图像,第二列、第三列和第四列为传统的图像处理方法对待处理图像进行特征重建后得到的目标图像。最后一列为本实施例中的图像处理方法对待处理图像进行特征重建后得到的目标图像。从图10(b)中可看出,最后一列的目标图像相对于第二列、第三列和第四列的目标图像更为清晰。也就是说,本实施例中的图像处理方法在图像去噪声方面的效果比传统的处理方式的效果更好。
在一个实施例中,如图11所示,该图像处理方法应用于视频处理;该待处理图像为待处理视频中的每帧待处理图像;
该获取待处理图像,将该待处理图像从第一尺寸缩放到第二尺寸,包括:
步骤1102,获取待处理视频中的每帧待处理图像,将该每帧待处理图像从第一尺寸缩放到第二尺寸。
其中,待处理视频为清晰度低的视频,例如存在模糊的视频或存在噪点的视频。
具体地,图像越小,图像的粗尺度特征越明显;图像越大,图像的细节特征越明显,即细尺度特征越明显。把待处理视频中的每帧模糊图像缩放到较小尺寸时,图像显示的细节特征较少,更多的是显示粗尺度的特征,则图像的模糊程度会明显减小。在这种情况下,对小尺寸的模糊图像的特征重建的难度会比原始尺寸的难度低。则终端可获取待处理视频中的各帧待处理图像,将所获取的各帧待处理图像的尺寸作为第一尺寸。接着,终端将该各帧待处理图像从第一尺寸缩放到第二尺寸,使得图像显示出粗尺度特征,以先对待处理视频中粗尺度特征中存在的模糊区域进行重建。
该通过第一编解码器对第二尺寸的待处理图像进行特征重建,得到第一 特征图,包括:
步骤1104,通过第一编解码器对每帧第二尺寸的待处理图像进行特征重建,得到该每帧待处理图像对应的第一特征图。
具体地,终端可将各帧第二尺寸的待处理图像输入第一网络,该第一网络为重建图像的粗尺度特征的网络。该第一网络中包含第一编解码器,第一编解码器的数量可根据需求设置。第一网络中的第一编解码器中的编码器对各帧第二尺寸的待处理图像进行特征提取,编码器将所提取的各帧特征图输入该编码器对应的解码器进行解码,得到该解码器输出的各帧特征图。该解码器输出的各帧特征图即为各帧待处理图像对应的第一特征图。
在本实施例中,当该第一编解码器中存在至少两对编解码器时,通过第一个编码器对各帧第二尺寸的待处理图像进行编码,并通过下一个编码器对上一个编码器输出的各帧特征图进行编码,直到得到第一编解码器中的最后一个编码器输出的各帧特征图。接着,最后一个编码器输出的各帧特征图输入解码器中,通过解码器对各帧特征图进行解码,得到最后一个解码器输出的各帧特征图。该最后一个解码器输出的各帧特征图即为各帧待处理图像对应的第一特征图。
该将该第一特征图放大到第一尺寸,将第一尺寸的待处理图像和第一尺寸的第一特征图进行拼接处理,包括:
步骤1106,将各第一特征图放大第一尺寸,将该每帧第一尺寸的待处理图像和对应的第一尺寸的第一特征图进行拼接处理,得到该每帧待处理图像对应的拼接特征图。
具体地,得到各帧第一特征图之后,终端可将该各帧第一特征图放大到与各自对应的待处理图像相同的第一尺寸,以先对细尺度特征中存在的模糊区域进行重建。接着,终端将各帧第一尺寸的待处理图像和对应的第一尺寸的各帧第一特征图进行拼接处理。
在本实施例中,终端可确定各帧第一尺寸的待处理图像分别对应的矩阵,以及各帧第一尺寸的第一特征图分别对应的矩阵,将第一尺寸的待处理图像 的矩阵和对应的第一尺寸的第一特征图对应的矩阵进行拼接,得到该每帧待处理图像对应的拼接特征图。
在本实施例中,终端可确定各帧第一尺寸的待处理图像的通道维度和各自对应的第一尺寸的第一特征图的通道维度,按照通道维度将各帧第一尺寸的待处理图像和各自对应的第一尺寸的第一特征图进行并联,得到该每帧待处理图像对应的拼接特征图。
该通过目标编解码器对拼接后的特征图进行特征重建,得到目标图像,该目标图像的清晰度高于该待处理图像的清晰度,包括:
步骤1108,通过目标编解码器对该每帧待处理图像对应的拼接特征图进行特征重建,得到该每帧待处理图像对应的目标图像。
具体地,终端可将该每帧待处理图像对应的拼接特征图输入目标网络,该目标网络为重建图像的细尺度特征的网络。该目标网络中包含目标编解码器,该目标编解码器中包含了第一编解码器,该目标编解码器的数量可根据需求设置。目标网络中的目标编解码器中的第一个编码器对该每帧待处理图像对应的拼接特征图进行特征提取,并将上一编码器输出的各帧特征图作为下一编码器的输入,直到得到目标编解码器中的最后一个编码器输出的各帧特征图。接着,最后一个编码器输出的各帧特征图输入对应的解码器中,通过解码器对各帧特征图进行解码。将上一解码器输出的各帧特征图作为下一解码器的输入,得到目标编解码器中的最后一个解码器输出每帧待处理图像对应的各帧目标图像。该各帧目标图像的尺寸为第一尺寸。
步骤1110,根据每帧目标图像生成目标视频,该目标视频的清晰度高于该待处理视频的清晰度。
具体地,终端可按照各帧待处理图像在该待处理视频中的时间点将每帧目标图像生成目标视频,得到的目标视频的清晰度高于该待处理视频的清晰度。
本实施例中,将图像处理方法应用于视频处理,待处理图像为待处理视频中的每帧待处理图像。通过获取待处理视频中的每帧待处理图像,将每帧 待处理图像从第一尺寸缩放到第二尺寸;通过第一编解码器对每帧第二尺寸的待处理图像进行特征重建,得到每帧待处理图像对应的第一特征图;将各第一特征图放大到与对应的待处理图像相同的第一尺寸,将每帧第一尺寸的待处理图像和对应的第一尺寸的第一特征图进行拼接处理,得到每帧待处理图像对应的拼接特征图;通过目标编解码器对每帧待处理图像对应的拼接特征图进行特征重建,得到每帧待处理图像对应的目标图像,将该图像处理方法应用于视频处理,可实现同时对多张模糊或存在噪点的图像进行特征重建,能够提高图像特征重建的效率。根据每帧目标图像生成目标视频,从而能够将低清晰度的视频重建为高清晰度的视频。
在一个实施例中,提供了一种图像处理方法,包括:
终端获取待处理图像,将待处理图像从第一尺寸缩放到第二尺寸。
接着,终端将第二尺寸的待处理图像输入第一编解码器中的处理单元的卷积层。
接着,终端将第一编解码器的处理单元中的上一卷积层的输出特征图和该处理单元的输入特征图作为下一卷积层的输入,直到得到该处理单元中的最后一卷积层的输出特征图。
进一步地,终端将该处理单元的输入特征图和该处理单元中各卷积层的输出特征图进行拼接处理,得到该处理单元的输出特征图。
接着,终端将第一编解码器中的上一处理单元的输出特征图和第二尺寸的待处理图像进行拼接处理,将拼接后的特征图作为下一处理单元的输入,直到第一编解码器的最后一处理单元输出第一特征图。
接着,终端将第一尺寸的待处理图像缩放到与第一特征图相同的尺寸,将第一特征图和相同尺寸的待处理图像在通道维度上进行并联处理,得到第二特征图。
进一步地,终端将第二特征图输入第二编解码器中的处理单元的卷积层,得到输出特征图;该第二编解码器中包含第二预设数量的成对编解码器,该第一编解码器为该第二预设数量的成对编解码器中的至少一对。
接着,终端将第二编解码器的处理单元中的上一卷积层的输出特征图和处理单元的输入特征图作为下一卷积层的输入,直到得到第二编解码器的处理单元中的最后一卷积层的输出特征图。
进一步地,终端将处理单元中的各卷积层的输出特征图进行拼接处理;将各卷积层的输出特征图拼接处理后的特征图和处理单元的输入特征图进行融合处理,得到第二编解码器中的处理单元的输出特征图。将第二编解码器中的上一处理单元的输出特征图和第二特征图进行拼接处理,得到下一处理单元的输入,直到第二编解码器的最后一处理单元输出重建后的第一特征图。
接着,终端将重建后的第一特征图放大到第一尺寸,将第一尺寸的待处理图像和第一尺寸的重建后的第一特征图进行拼接处理。
进一步地,终端通过目标编解码器对拼接后的特征图进行特征重建,得到目标图像,目标图像的清晰度高于待处理图像的清晰度,该目标编解码器中包含第一预设数量的成对编解码器,该第一预设数量的成对编解码器中包含该第二预设数量的成对编解码器。
本实施中,通过对待处理图像的粗尺度特征的重建、中间尺度特征的重建以及细尺度特征的重建,并结合多级的特征融合更好地区分图像细节内容和图像中的破坏特征,然后更进一步地重建图像的真实结构信息和纹理,从而能够将低清晰度的待处理图像重建为高清晰度的目标图像。
图2-图11为一个实施例中图像处理方法的流程示意图。应该理解的是,虽然图2-图11的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图2-图11中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
在一个实施例中,如图12所示,提供了一种图像处理装置,该装置包括: 获取模块1202、第一重建模块1204、拼接模块1206和目标重建模块1208。其中,
获取模块1202,用于获取待处理图像,将该待处理图像从第一尺寸缩放到第二尺寸。
第一重建模块1204,用于通过第一编解码器对第二尺寸的待处理图像进行特征重建,得到第一特征图。
拼接模块1206,用于将该第一特征图放大到第一尺寸,将第一尺寸的待处理图像和第一尺寸的第一特征图进行拼接处理。
目标重建模块1208,用于通过目标编解码器对拼接后的特征图进行特征重建,得到目标图像,该目标图像的清晰度高于该待处理图像的清晰度;该目标编解码器中包含第一预设数量的成对编解码器,该第一编解码器为该第一预设数量的成对编解码器中的至少一对。
上述图像处理装置,获取待处理图像,将待处理图像从第一尺寸缩放到第二尺寸,通过第一编解码器对第二尺寸的待处理图像进行特征重建,得到第一特征图,从而可以完成待处理图像的粗尺度特征的重建。将第一特征图放大到与待处理图像相同的第一尺寸,将第一尺寸的待处理图像和第一尺寸的第一特征图进行拼接处理,以对待处理图像的细尺度特征进行重建。通过目标编解码器对拼接后的特征图进行特征重建,得到目标图像,该目标编解码器中包含第一预设数量的成对编解码器,该第一编解码器为该第一预设数量的成对编解码器中的至少一对,从而将低清晰度的待处理图像重建为高清晰度的目标图像。通过对待处理图像的粗尺度特征和细尺度特征进行重建,能够将模糊的、存在噪点的待处理图像重建为清晰图像。
在一个实施例中,该装置还包括:第二重建模块。该第二重建模块包括:将第一尺寸的待处理图像缩放到与该第一特征图相同的尺寸,将该第一特征图和相同尺寸的待处理图像进行拼接处理,得到第二特征图;通过第二编解码器对该第二特征图进行特征重建,得到重建后的第一特征图;该第二编解码器中包含第二预设数量的成对编解码器,该第一编解码器为该第二预设数 量的成对编解码器中的至少一对;该第一预设数量的成对编解码器中包含该第二预设数量的成对编解码器。
该拼接模块还用于将该重建后的第一特征图放大到第一尺寸,将第一尺寸的待处理图像和第一尺寸的重建后的第一特征图进行拼接处理。
本实施例中,将第一尺寸的待处理图像缩放到与第一特征图相同的尺寸,将第一特征图和相同尺寸的待处理图像进行拼接处理,得到第二特征图,通过第二编解码器对第二特征图进行特征重建,以对待处理图像的中间尺度特征进行重建,得到重建后的第一特征图。将重建后的第一特征图放大到第一尺寸,将第一尺寸的待处理图像和第一尺寸的重建后的第一特征图进行拼接处理,以对待处理图像的细尺度特征进行重建,从而将低清晰度图像重建为高清晰度图像。
在一个实施例中,该第二重建模块还用于:将该第一特征图和相同尺寸的待处理图像在通道维度上进行并联处理,得到第二特征图。
本实施例中,将第一特征图和相同尺寸的待处理图像在通道维度上进行并联处理,能够将第一特征图的特征和待处理图像的特征关联起来,使得得到的第二特征图具备更多的特征信息,从而对待处理图像特征的重建更准确。
在一个实施例中,该第一重建模块1204还用于:将第二尺寸的待处理图像输入该第一编解码器的处理单元,得到输出特征图;将该第一编解码器中的上一处理单元的输出特征图和该第二尺寸的待处理图像进行拼接处理,将拼接后的特征图作为下一处理单元的输入,直到该第一编解码器的最后一处理单元输出第一特征图。
本实施例中,将第二尺寸的待处理图像输入第一编解码器的处理单元,并将第一编解码器中的上一处理单元的输出特征图和该第二尺寸的待处理图像进行拼接处理,将拼接后的特征图作为下一处理单元的输入,使得能够将上一处理单元恢复的部分特征和原始图像的信息进行关联,使得下一处理单元更迅速地进行特征重建。并且在增加每个处理单元的有效特征信息的同时,减少了特征重建过程中计算量,避免了融合难度高的问题。
在一个实施例中,该第一重建模块1204还用于:将该第二尺寸的待处理图像输入该第一编解码器的处理单元,得到输出特征图;将该第一编解码器中的上一处理单元的输出特征图和该上一处理单元的输入特征图进行融合处理,将融合后的特征图作为下一处理单元的输入,直到该第一编解码器的最后一处理单元输出第一特征图。
本实施例中,将该第二尺寸的待处理图像输入该第一编解码器的处理单元,将该第一编解码器中的上一处理单元的输出特征图和该上一处理单元的输入特征图进行融合处理,能够将上一处理单元恢复的部分特征和该处理单元未恢复的特征信息进行融合。将融合后的特征图作为下一处理单元的输入,使得下一处理单元能够根据更多的特征信息进行特征重建,直到该第一编解码器的最后一处理单元输出第一特征图。在增加每个处理单元的有效特征信息的同时,减少了特征重建过程中计算量,避免了融合难度高的问题。
在一个实施例中,该第一重建模块1204还用于:将该第二尺寸的待处理图像输入该第一编解码器中的处理单元的卷积层;将该第一编解码器的处理单元中的上一卷积层的输出特征图和该处理单元的输入特征图作为下一卷积层的输入,直到得到该处理单元中的最后一卷积层的输出特征图;将该处理单元的输入特征图和该处理单元中各卷积层的输出特征图进行拼接处理,得到该处理单元的输出特征图。
本实施例中,将第二尺寸的待处理图像输入该第一编解码器中的处理单元的卷积层,将该第一编解码器的处理单元中的上一卷积层的输出特征图和该处理单元的输入特征图作为下一卷积层的输入,直到得到该处理单元中的最后一卷积层的输出特征图,将该处理单元的输入特征图和该处理单元中各卷积层的输出特征图进行拼接处理,得到该处理单元的输出特征图,使得能够通过多级的特征融合更好地区分图像细节内容和图像中的破坏特征,然后更进一步地重建图像的粗尺度的真实结构信息和纹理。
在一个实施例中,该第二重建模块还用于:将该第二特征图输入该第二编解码器的处理单元,得到输出特征图;将该第二编解码器中的上一处理单 元的输出特征图和该第二特征图进行拼接处理,得到下一处理单元的输入,直到该第二编解码器的最后一处理单元输出重建后的第一特征图。
本实施例中,将该第二特征图输入该第二编解码器的处理单元,得到输出特征图,将该第二编解码器中的上一处理单元的输出特征图和该第二特征图进行拼接处理,能够将上一处理单元恢复的部分特征和上一处理单元重建的特征图进行关联,使得下一处理单元更迅速地进行特征重建。将拼接后的特征图作为下一处理单元的输入,直到该第二编解码器的最后一处理单元输出重建后的第一特征图,从而重建了待处理图像的中间尺度的特征信息。
在一个实施例中,该第二重建模块还用于:将该第二特征图输入该第二编解码器的处理单元,得到输出特征图;将该第二编解码器中的上一处理单元的输出特征图和该上一处理单元的输入特征图进行融合处理,得到下一处理单元的输入,直到该第二编解码器的最后一处理单元输出重建后的第一特征图。
本实施例中,将该第二特征图输入该第二编解码器的处理单元,并将同一个处理单元的输出特征和输入特征进行融合,能够减少计算量并且保证了特征之间的相关性,使得特征图中的各特征之间的差异更明显,以更好地对中间尺度的特征进行重建。
在一个实施例中,该第二重建模块还用于:将该第二特征图输入该第二编解码器中的处理单元的卷积层,得到输出特征图;将该第二编解码器的处理单元中的上一卷积层的输出特征图和该处理单元的输入特征图作为下一卷积层的输入,直到得到该第二编解码器的该处理单元中的最后一卷积层的输出特征图;将该处理单元的输入特征图和该处理单元中的各卷积层的输出特征图进行拼接处理,得到该第二编解码器中的该处理单元的输出特征图。
本实施例中,将第二特征图输入第二编解码器中的处理单元的卷积层,将第二编解码器的处理单元中的上一卷积层的输出特征图和处理单元的输入特征图作为下一卷积层的输入,直到得到第二编解码器的处理单元中的最后一卷积层的输出特征图,将处理单元的输入特征图和处理单元中的各卷积层 的输出特征图进行拼接处理,得到第二编解码器中的处理单元的输出特征图,使得能够通过多级特征融合更好地区分图像细节内容和图像中的破坏特征,然后更进一步地重建图像的中间尺度的真实结构信息和纹理。
在一个实施例中,该第二重建模块还用于:将该第二编解码器的处理单元中的各卷积层的输出特征图进行拼接处理;将该各卷积层的输出特征图拼接处理后的特征图和该第二特征图进行融合处理,得到该第二编解码器中的处理单元的输出特征图。
本实施例中,将该处理单元中的各卷积层的输出特征图进行拼接处理,将各卷积层的输出特征图拼接处理后的特征图和该处理单元的输入特征图进行融合处理,得到第二编解码器中的该处理单元的输出特征图,从而能够对多级特征进行融合,使得能够根据更多的特征信息重建中间尺度的特征,也能够对细尺度特征的重建提供更多参考信息,减少细尺度特征重建的难度。
在一个实施例中,该该装置中的第二预设数量的成对编解码器中包含该第一编解码器和第三编解码器,第三编解码器处于第一编解码器中的编码器和解码器之间;该第一预设数量的成对编解码器中包含第二编解码器和第四编解码器,该第四编解码器处于第二编解码器中的编码器和解码器之间。
本实施例中,通过依次增加编解码器的数量以分别对待处理图像的粗尺度特征、中间尺度特征和细尺度特征进行重建,能够从明显特征的重建一步步深入到细节特征的重建,使得重建的特征更准确,从而得到清晰的图像,实现图像的去模糊去噪处理。
在一个实施例中,该获取模块1202还用于:获取待处理视频中的每帧待处理图像,将该每帧待处理图像从第一尺寸缩放到第二尺寸。
第一重建模块1204还用于:通过第一编解码器对每帧第二尺寸的待处理图像进行特征重建,得到该每帧待处理图像对应的第一特征图;
该拼接模块1206还用于:将各第一特征图放大到第一尺寸,将每帧第一尺寸的待处理图像和对应的第一尺寸的第一特征图进行拼接处理,得到该每帧待处理图像对应的拼接特征图;
该目标重建模块1208还用于:通过目标编解码器对该每帧待处理图像对应的拼接特征图进行特征重建,得到该每帧待处理图像对应的目标图像;根据每帧目标图像生成目标视频,该目标视频的清晰度高于该待处理视频的清晰度。
本实施例中,将图像处理方法应用于视频处理,待处理图像为待处理视频中的每帧待处理图像。通过获取待处理视频中的每帧待处理图像,将每帧待处理图像从第一尺寸缩放到第二尺寸;通过第一编解码器对每帧第二尺寸的待处理图像进行特征重建,得到每帧待处理图像对应的第一特征图;将各第一特征图放大到与对应的待处理图像相同的第一尺寸,将每帧第一尺寸的待处理图像和对应的第一尺寸的第一特征图进行拼接处理,得到每帧待处理图像对应的拼接特征图;通过目标编解码器对每帧待处理图像对应的拼接特征图进行特征重建,得到每帧待处理图像对应的目标图像,将该图像处理方法应用于视频处理,可实现同时对多张模糊或存在噪点的图像进行特征重建,能够提高图像特征重建的效率。根据每帧目标图像生成目标视频,从而能够将低清晰度的视频重建为高清晰度的视频。
图13示出了一个实施例中计算机设备的内部结构图。该计算机设备具体可以是图1中的终端110。如图13所示,该计算机设备包括该计算机设备包括通过系统总线连接的处理器、存储器、网络接口、输入装置和显示屏。其中,存储器包括非易失性存储介质和内存储器。该计算机设备的非易失性存储介质存储有操作系统,还可存储有计算机程序,该计算机程序被处理器执行时,可使得处理器实现图像处理方法。该内存储器中也可储存有计算机程序,该计算机程序被处理器执行时,可使得处理器执行图像处理方法。计算机设备的显示屏可以是液晶显示屏或者电子墨水显示屏,计算机设备的输入装置可以是显示屏上覆盖的触摸层,也可以是计算机设备外壳上设置的按键、轨迹球或触控板,还可以是外接的键盘、触控板或鼠标等。
本领域技术人员可以理解,图13中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的 限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一个实施例中,本申请提供的图像处理装置可以实现为一种计算机程序的形式,计算机程序可在如图13所示的计算机设备上运行。计算机设备的存储器中可存储组成该图像处理装置的各个程序模块,比如,图12所示的获取模块1202、第一重建模块1204、拼接模块1206和目标重建模块1208。各个程序模块构成的计算机程序使得处理器执行本说明书中描述的本申请各个实施例的图像处理方法中的步骤。
例如,图13所示的计算机设备可以通过如图12所示的图像处理装置中的获取模块1202执行获取待处理图像,将该待处理图像从第一尺寸缩放到第二尺寸的步骤。计算机设备可通过第一重建模块1204执行通过第一编解码器对第二尺寸的待处理图像进行特征重建,得到第一特征图的步骤。计算机设备可通过拼接模块1206执行将该第一特征图放大到第一尺寸,将该第一尺寸的待处理图像和第一尺寸的第一特征图进行拼接处理的步骤。计算机设备可通过目标重建模块1208执行通过目标编解码器对拼接后的特征图进行特征重建,得到目标图像,该目标图像的清晰度高于该待处理图像的清晰度;目标编解码器中包含第一预设数量的成对编解码器,第一编解码器为该第一预设数量的成对编解码器中的至少一对的步骤。
在一个实施例中,提供了一种计算机设备,包括存储器和处理器,存储器存储有计算机程序,计算机程序被处理器执行时,使得处理器执行上述图像处理方法的步骤。此处图像处理方法的步骤可以是上述各个实施例的图像处理方法中的步骤。
在一个实施例中,提供了一种计算机可读存储介质,存储有计算机程序,计算机程序被处理器执行时,使得处理器执行上述图像处理方法的步骤。此处图像处理方法的步骤可以是上述各个实施例的图像处理方法中的步骤。
在一个实施例中,提供了一种计算机程序产品或计算机程序,该计算机 程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述各方法实施例中的步骤。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一非易失性计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。
以上所述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (20)

  1. 一种图像处理方法,由计算机设备执行,包括:
    获取待处理图像,将所述待处理图像从第一尺寸缩放到第二尺寸;
    通过第一编解码器对第二尺寸的待处理图像进行特征重建,得到第一特征图;
    将所述第一特征图放大到第一尺寸,将第一尺寸的待处理图像和第一尺寸的第一特征图进行拼接处理;及
    通过目标编解码器对拼接后的特征图进行特征重建,得到目标图像,所述目标图像的清晰度高于所述待处理图像的清晰度;所述目标编解码器中包含第一预设数量的成对编解码器,所述第一编解码器为所述第一预设数量的成对编解码器中的至少一对。
  2. 根据权利要求1所述的方法,其特征在于,在所述将所述第一特征图放大到第一尺寸,将第一尺寸的待处理图像和第一尺寸的第一特征图进行拼接处理之前,还包括:
    将第一尺寸的待处理图像缩放到与所述第一特征图相同的尺寸,将所述第一特征图和相同尺寸的待处理图像进行拼接处理,得到第二特征图;
    通过第二编解码器对所述第二特征图进行特征重建,得到重建后的第一特征图;所述第二编解码器中包含第二预设数量的成对编解码器,所述第一编解码器为所述第二预设数量的成对编解码器中的至少一对;所述第一预设数量的成对编解码器中包含所述第二预设数量的成对编解码器;及
    所述将所述第一特征图放大到第一尺寸,将第一尺寸的待处理图像和第一尺寸的第一特征图进行拼接处理,包括:
    将所述重建后的第一特征图放大到第一尺寸,将第一尺寸的待处理图像和第一尺寸的重建后的第一特征图进行拼接处理。
  3. 根据权利要求2所述的方法,其特征在于,所述将所述第一特征图和相同尺寸的待处理图像进行拼接处理,得到第二特征图,包括:
    将所述第一特征图和相同尺寸的待处理图像在通道维度上进行并联处 理,得到第二特征图。
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,所述通过第一编解码器对第二尺寸的待处理图像进行特征重建,得到第一特征图,包括:
    将第二尺寸的待处理图像输入所述第一编解码器的处理单元,得到输出特征图;及
    将所述第一编解码器中的上一处理单元的输出特征图和所述第二尺寸的待处理图像进行拼接处理,将拼接后的特征图作为下一处理单元的输入,直到所述第一编解码器的最后一处理单元输出第一特征图。
  5. 根据权利要求1至3中任一项所述的方法,其特征在于,所述通过第一编解码器对第二尺寸的待处理图像进行特征重建,得到第一特征图,包括:
    将第二尺寸的待处理图像输入所述第一编解码器的处理单元,得到输出特征图;及
    将所述第一编解码器中的上一处理单元的输出特征图和所述上一处理单元的输入特征图进行融合处理,将融合后的特征图作为下一处理单元的输入,直到所述第一编解码器的最后一处理单元输出第一特征图。
  6. 根据权利要求5所述的方法,其特征在于,所述将第二尺寸的待处理图像输入所述第一编解码器的处理单元,得到输出特征图,包括:
    将第二尺寸的待处理图像输入所述第一编解码器中的处理单元的卷积层;
    将所述第一编解码器的处理单元中的上一卷积层的输出特征图和所述处理单元的输入特征图作为下一卷积层的输入,直到得到所述处理单元中的最后一卷积层的输出特征图;及
    将所述处理单元的输入特征图和所述处理单元中各卷积层的输出特征图进行拼接处理,得到所述处理单元的输出特征图。
  7. 根据权利要求2所述的方法,其特征在于,所述通过第二编解码器对所述第二特征图进行特征重建,得到重建后的第一特征图,包括:
    将所述第二特征图输入所述第二编解码器的处理单元,得到输出特征图; 及
    将所述第二编解码器中的上一处理单元的输出特征图和所述第二特征图进行拼接处理,得到下一处理单元的输入,直到所述第二编解码器的最后一处理单元输出重建后的第一特征图。
  8. 根据权利要求2所述的方法,其特征在于,所述通过第二编解码器对所述第二特征图进行特征重建,得到重建后的第一特征图,包括:
    将所述第二特征图输入所述第二编解码器的处理单元,得到输出特征图;及
    将所述第二编解码器中的上一处理单元的输出特征图和所述上一处理单元的输入特征图进行融合处理,得到下一处理单元的输入,直到所述第二编解码器的最后一处理单元输出重建后的第一特征图。
  9. 根据权利要求7或8所述的方法,其特征在于,所述将所述第二特征图输入所述第二编解码器的处理单元,得到输出特征图,包括:
    将所述第二特征图输入所述第二编解码器中的处理单元的卷积层,得到输出特征图;
    将所述第二编解码器的处理单元中的上一卷积层的输出特征图和所述处理单元的输入特征图作为下一卷积层的输入,直到得到所述第二编解码器的所述处理单元中的最后一卷积层的输出特征图;及
    将所述处理单元的输入特征图和所述处理单元中的各卷积层的输出特征图进行拼接处理,得到所述第二编解码器中的所述处理单元的输出特征图。
  10. 根据权利要求9所述的方法,其特征在于,所述将所述处理单元的输入特征图和所述处理单元中的各卷积层的输出特征图进行拼接处理,得到所述第二编解码器中的所述处理单元的输出特征图,包括:
    将所述处理单元中的各卷积层的输出特征图进行拼接处理;及
    将所述各卷积层的输出特征图拼接处理后的特征图和所述处理单元的输入特征图进行融合处理,得到所述第二编解码器中的所述处理单元的输出特征图。
  11. 根据权利要求2所述的方法,其特征在于,所述第二预设数量的成对编解码器中包含所述第一编解码器和第三编解码器,所述第三编解码器处于所述第一编解码器中的编码器和解码器之间;所述第一预设数量的成对编解码器中包含所述第二编解码器和第四编解码器,所述第四编解码器处于所述第二编解码器中的编码器和解码器之间。
  12. 根据权利要求1所述的方法,其特征在于,所述图像处理方法应用于视频处理;所述待处理图像为待处理视频中的每帧待处理图像;
    所述获取待处理图像,将所述待处理图像从第一尺寸缩放到第二尺寸,包括:
    获取待处理视频中的每帧待处理图像,将所述每帧待处理图像从第一尺寸缩放到第二尺寸;
    所述通过第一编解码器对第二尺寸的待处理图像进行特征重建,得到第一特征图,包括:
    通过第一编解码器对每帧第二尺寸的待处理图像进行特征重建,得到所述每帧待处理图像对应的第一特征图;
    所述将所述第一特征图放大到第一尺寸,将第一尺寸的待处理图像和第一尺寸的第一特征图进行拼接处理,包括:
    将各第一特征图放大到第一尺寸,将每帧第一尺寸的待处理图像和对应的第一尺寸的第一特征图进行拼接处理,得到所述每帧待处理图像对应的拼接特征图;
    所述通过目标编解码器对拼接后的特征图进行特征重建,得到目标图像,所述目标图像的清晰度高于所述待处理图像的清晰度,包括:
    通过目标编解码器对所述每帧待处理图像对应的拼接特征图进行特征重建,得到所述每帧待处理图像对应的目标图像;及
    根据每帧目标图像生成目标视频,所述目标视频的清晰度高于所述待处理视频的清晰度。
  13. 一种图像处理装置,其特征在于,所述装置包括:
    获取模块,用于获取待处理图像,将所述待处理图像从第一尺寸缩放到第二尺寸;
    第一重建模块,用于通过第一编解码器对第二尺寸的待处理图像进行特征重建,得到第一特征图;
    拼接模块,用于将所述第一特征图放大到第一尺寸,将第一尺寸的待处理图像和第一尺寸的第一特征图进行拼接处理;及
    目标重建模块,用于通过目标编解码器对拼接后的特征图进行特征重建,得到目标图像,所述目标图像的清晰度高于所述待处理图像的清晰度;所述目标编解码器中包含第一预设数量的成对编解码器,所述第一编解码器为所述第一预设数量的成对编解码器中的至少一对。
  14. 根据权利要求13所述的装置,其特征在于,所述装置还包括:
    第二重建模块,用于将第一尺寸的待处理图像缩放到与所述第一特征图相同的尺寸,将所述第一特征图和相同尺寸的待处理图像进行拼接处理,得到第二特征图;通过第二编解码器对所述第二特征图进行特征重建,得到重建后的第一特征图;所述第二编解码器中包含第二预设数量的成对编解码器,所述第一编解码器为所述第二预设数量的成对编解码器中的至少一对;所述第一预设数量的成对编解码器中包含所述第二预设数量的成对编解码器;及
    所述拼接模块,还用于将所述重建后的第一特征图放大到第一尺寸,将第一尺寸的待处理图像和第一尺寸的重建后的第一特征图进行拼接处理。
  15. 根据权利要求14所述的装置,其特征在于,所述第二重建模块还用于将所述第一特征图和相同尺寸的待处理图像在通道维度上进行并联处理,得到第二特征图。
  16. 根据权利要求13至15中任一项所述的装置,其特征在于,所述第一重建模块还用于将第二尺寸的待处理图像输入所述第一编解码器的处理单元,得到输出特征图;将所述第一编解码器中的上一处理单元的输出特征图和所述第二尺寸的待处理图像进行拼接处理,将拼接后的特征图作为下一处理单元的输入,直到所述第一编解码器的最后一处理单元输出第一特征图。
  17. 根据权利要求13至15中任一项所述的装置,其特征在于,所述第一重建模块还用于将第二尺寸的待处理图像输入所述第一编解码器的处理单元,得到输出特征图;将所述第一编解码器中的上一处理单元的输出特征图和所述上一处理单元的输入特征图进行融合处理,将融合后的特征图作为下一处理单元的输入,直到所述第一编解码器的最后一处理单元输出第一特征图。
  18. 根据权利要求17所述的装置,其特征在于,所述第一重建模块还用于将第二尺寸的待处理图像输入所述第一编解码器中的处理单元的卷积层;将所述第一编解码器的处理单元中的上一卷积层的输出特征图和所述处理单元的输入特征图作为下一卷积层的输入,直到得到所述处理单元中的最后一卷积层的输出特征图;将所述处理单元的输入特征图和所述处理单元中各卷积层的输出特征图进行拼接处理,得到所述处理单元的输出特征图。
  19. 一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行如权利要求1至12中任一项所述的方法的步骤。
  20. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行如权利要求1至12中任一项所述的方法的步骤。
PCT/CN2020/122160 2020-02-07 2020-10-20 图像处理方法、装置、计算机可读存储介质和计算机设备 WO2021155675A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/727,042 US20220253974A1 (en) 2020-02-07 2022-04-22 Image processing method and apparatus, computer readable storage medium, and computer device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010082474.0A CN111340694B (zh) 2020-02-07 2020-02-07 图像处理方法、装置、计算机可读存储介质和计算机设备
CN202010082474.0 2020-02-07

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/727,042 Continuation US20220253974A1 (en) 2020-02-07 2022-04-22 Image processing method and apparatus, computer readable storage medium, and computer device

Publications (1)

Publication Number Publication Date
WO2021155675A1 true WO2021155675A1 (zh) 2021-08-12

Family

ID=71183447

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/122160 WO2021155675A1 (zh) 2020-02-07 2020-10-20 图像处理方法、装置、计算机可读存储介质和计算机设备

Country Status (3)

Country Link
US (1) US20220253974A1 (zh)
CN (1) CN111340694B (zh)
WO (1) WO2021155675A1 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111340694B (zh) * 2020-02-07 2023-10-27 腾讯科技(深圳)有限公司 图像处理方法、装置、计算机可读存储介质和计算机设备
CN111294512A (zh) * 2020-02-10 2020-06-16 深圳市铂岩科技有限公司 图像处理方法、装置、存储介质及摄像装置
CN112286353A (zh) * 2020-10-28 2021-01-29 上海盈赞通信科技有限公司 一种vr眼镜的通用型图像处理方法及装置
CN112990171B (zh) * 2021-05-20 2021-08-06 腾讯科技(深圳)有限公司 图像处理方法、装置、计算机设备及存储介质
CN116468812A (zh) * 2023-05-16 2023-07-21 山东省计算中心(国家超级计算济南中心) 一种基于多分支和多尺度的图像压缩感知重构方法及系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110274370A1 (en) * 2010-05-10 2011-11-10 Yuhi Kondo Image processing apparatus, image processing method and image processing program
CN102915527A (zh) * 2012-10-15 2013-02-06 中山大学 基于形态学成分分析的人脸图像超分辨率重建方法
US20170365046A1 (en) * 2014-08-15 2017-12-21 Nikon Corporation Algorithm and device for image processing
CN108629743A (zh) * 2018-04-04 2018-10-09 腾讯科技(深圳)有限公司 图像的处理方法、装置、存储介质和电子装置
CN111340694A (zh) * 2020-02-07 2020-06-26 腾讯科技(深圳)有限公司 图像处理方法、装置、计算机可读存储介质和计算机设备
WO2020187042A1 (zh) * 2019-03-19 2020-09-24 京东方科技集团股份有限公司 图像处理方法、装置、设备以及计算机可读介质

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108876833A (zh) * 2018-03-29 2018-11-23 北京旷视科技有限公司 图像处理方法、图像处理装置和计算机可读存储介质
CN108510560B (zh) * 2018-04-11 2020-01-24 腾讯科技(深圳)有限公司 图像处理方法、装置、存储介质和计算机设备
CN110163048B (zh) * 2018-07-10 2023-06-02 腾讯科技(深圳)有限公司 手部关键点的识别模型训练方法、识别方法及设备
CN109345449B (zh) * 2018-07-17 2020-11-10 西安交通大学 一种基于融合网络的图像超分辨率及去非均匀模糊方法
CN109254766B (zh) * 2018-07-25 2020-02-07 中建八局第一建设有限公司 基于移动端的可视化编程平台及二维图纸三维可视化方法
CN110276267A (zh) * 2019-05-28 2019-09-24 江苏金海星导航科技有限公司 基于Spatial-LargeFOV深度学习网络的车道线检测方法
CN110580704A (zh) * 2019-07-24 2019-12-17 中国科学院计算技术研究所 基于卷积神经网络的et细胞图像自动分割方法及系统
CN110544213B (zh) * 2019-08-06 2023-06-13 天津大学 一种基于全局和局部特征融合的图像去雾方法
CN110738605B (zh) * 2019-08-30 2023-04-28 山东大学 基于迁移学习的图像去噪方法、系统、设备及介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110274370A1 (en) * 2010-05-10 2011-11-10 Yuhi Kondo Image processing apparatus, image processing method and image processing program
CN102915527A (zh) * 2012-10-15 2013-02-06 中山大学 基于形态学成分分析的人脸图像超分辨率重建方法
US20170365046A1 (en) * 2014-08-15 2017-12-21 Nikon Corporation Algorithm and device for image processing
CN108629743A (zh) * 2018-04-04 2018-10-09 腾讯科技(深圳)有限公司 图像的处理方法、装置、存储介质和电子装置
WO2020187042A1 (zh) * 2019-03-19 2020-09-24 京东方科技集团股份有限公司 图像处理方法、装置、设备以及计算机可读介质
CN111340694A (zh) * 2020-02-07 2020-06-26 腾讯科技(深圳)有限公司 图像处理方法、装置、计算机可读存储介质和计算机设备

Also Published As

Publication number Publication date
CN111340694B (zh) 2023-10-27
US20220253974A1 (en) 2022-08-11
CN111340694A (zh) 2020-06-26

Similar Documents

Publication Publication Date Title
WO2021155675A1 (zh) 图像处理方法、装置、计算机可读存储介质和计算机设备
CN108022212B (zh) 高分辨率图片生成方法、生成装置及存储介质
US11688070B2 (en) Video frame segmentation using reduced resolution neural network and masks from previous frames
WO2021233006A1 (zh) 图像处理模型的训练方法、图像处理方法、装置及设备
JP7112595B2 (ja) 画像処理方法及びその装置、コンピュータ機器並びにコンピュータプログラム
KR102086711B1 (ko) Simd형 초병렬 연산 처리 장치용 초해상 처리 방법, 장치, 프로그램 및 기억 매체
Shi et al. DDet: Dual-path dynamic enhancement network for real-world image super-resolution
WO2011092696A1 (en) Method and system for generating an output image of increased pixel resolution from an input image
US9697584B1 (en) Multi-stage image super-resolution with reference merging using personalized dictionaries
US11367163B2 (en) Enhanced image processing techniques for deep neural networks
WO2021164269A1 (zh) 基于注意力机制的视差图获取方法和装置
CN111402139A (zh) 图像处理方法、装置、电子设备和计算机可读存储介质
US10789769B2 (en) Systems and methods for image style transfer utilizing image mask pre-processing
Huang et al. Learning deformable and attentive network for image restoration
CN113891027B (zh) 视频插帧模型训练方法、装置、计算机设备和存储介质
Kumar et al. Low-light robust face super resolution via morphological transformation based locality-constrained representation
CN113793259A (zh) 图像变焦方法、计算机设备和存储介质
Zeng et al. Real-time video super resolution network using recurrent multi-branch dilated convolutions
Wen et al. Progressive representation recalibration for lightweight super-resolution
US20230098437A1 (en) Reference-Based Super-Resolution for Image and Video Enhancement
CN115797194A (zh) 图像降噪方法、装置、电子设备、存储介质和程序产品
CN115937121A (zh) 基于多维度特征融合的无参考图像质量评价方法及系统
WO2021057464A1 (zh) 视频处理方法和装置、存储介质和电子装置
CN111401477B (zh) 图像处理方法、装置、电子设备和计算机可读存储介质
Jia et al. Learning Rich Information for Quad Bayer Remosaicing and Denoising

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20917553

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20917553

Country of ref document: EP

Kind code of ref document: A1