WO2022057837A1 - Image processing method and apparatus, portrait super-resolution reconstruction method and apparatus, and portrait super-resolution reconstruction model training method and apparatus, electronic device, and storage medium - Google Patents

Image processing method and apparatus, portrait super-resolution reconstruction method and apparatus, and portrait super-resolution reconstruction model training method and apparatus, electronic device, and storage medium Download PDF

Info

Publication number
WO2022057837A1
WO2022057837A1 PCT/CN2021/118591 CN2021118591W WO2022057837A1 WO 2022057837 A1 WO2022057837 A1 WO 2022057837A1 CN 2021118591 W CN2021118591 W CN 2021118591W WO 2022057837 A1 WO2022057837 A1 WO 2022057837A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
training
resolution
super
processing
Prior art date
Application number
PCT/CN2021/118591
Other languages
French (fr)
Chinese (zh)
Inventor
侯剑堃
Original Assignee
广州虎牙科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202010977254.4A external-priority patent/CN114266697A/en
Priority claimed from CN202011000670.5A external-priority patent/CN114298901A/en
Application filed by 广州虎牙科技有限公司 filed Critical 广州虎牙科技有限公司
Publication of WO2022057837A1 publication Critical patent/WO2022057837A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof

Definitions

  • the present application relates to the technical field of computer vision, and in particular, to a method, apparatus, electronic device, and storage medium for image processing, super-resolution reconstruction of portraits, and model training.
  • Image super-resolution reconstruction or image super-resolution restoration refers to the process of restoring a given low-resolution image or image sequence into a corresponding high-resolution image through specific processing. It is widely used in various types of videos or images that need to be improved. Quality fields, such as video image processing, medical imaging, remote sensing imaging, video surveillance, etc.
  • image super-resolution reconstruction technology is also widely used in many fields such as face recognition, big data analysis, security, etc., which is of great help to achieve portrait restoration, portrait recognition, and matching.
  • image super-resolution reconstruction for example, in the process of super-resolution reconstruction of a human portrait, the method usually adopted is to reconstruct the entire image, because this method does not focus on the human eye. Perceive the more important information, which makes the reconstructed image difficult to meet the actual needs.
  • Embodiments of the present application provide an image processing and model training method, apparatus, electronic device, and storage medium, so as to improve the processing speed while ensuring the reconstruction effect.
  • Embodiments of the present application also provide a portrait super-resolution reconstruction method, a model training method, an apparatus, an electronic device, and a readable storage medium, which can improve the recognition of the obtained super-resolution image and meet user requirements.
  • Some embodiments of the present application provide an image processing method, the method may include:
  • the reconstructed feature map is enlarged by using the sub-pixel convolution layer of the image reconstruction model to obtain a reconstructed image.
  • the feature extraction network may include a convolutional layer, a plurality of concatenated blocks and a plurality of first convolutional layers, a plurality of the concatenated blocks and a plurality of the first convolutional layers Alternately set, the feature extraction network can adopt a global cascade structure;
  • the step of using the feature extraction network of the image reconstruction model to perform multi-scale feature extraction on the to-be-processed image to obtain a reconstructed feature map may include:
  • the output of the last first convolutional layer is used as the reconstructed feature map.
  • the number of the concatenated blocks may be 3 to 5, and the number of the first convolutional layers may be 3 to 5.
  • the concatenated block may include a plurality of residual blocks and a plurality of second convolution layers, and the plurality of the residual blocks and the plurality of the second convolution layers are alternately arranged, so
  • the cascading block can adopt a local cascading structure
  • the step of using the cascaded blocks to perform multi-scale feature extraction and outputting an intermediate feature map may include:
  • the residual block learns the residual features, and obtains the residual feature map
  • the input of the concatenated block and the output of each of the residual blocks before the Nth second convolutional layer are channel-stacked, and input to the Nth second convolutional layer for convolution after stacking accumulated processing;
  • the output of the last second convolutional layer is used as the intermediate feature map.
  • the number of the residual blocks may be 3 to 5
  • the number of the second convolutional layers may be 3 to 5.
  • the residual block may include a grouped convolutional layer, a third convolutional layer and a fourth convolutional layer, the grouped convolutional layer adopts a ReLu activation function, the grouped convolutional layer and The third convolutional layer is connected to form a residual path, and the residual block can adopt a local skip connection structure;
  • the step of using the residual block to learn residual features to obtain a residual feature map may include:
  • the input of the residual block is used as the input of the grouped convolution layer, and features are extracted through the residual path;
  • Feature fusion is performed between the input of the residual block and the output of the third convolution layer, and after fusion, the input is input to the fourth convolution layer for convolution processing, and the residual feature map is output.
  • the step of using the sub-pixel convolution layer of the image reconstruction model to amplify the reconstructed feature map to obtain a reconstructed image may include:
  • inventions of the present application also provide an image reconstruction model training method, the method may include:
  • training samples include low-resolution images and high-resolution images, and the low-resolution images are obtained by down-sampling the high-resolution images;
  • the image reconstruction model includes a feature extraction network and a sub-pixel convolution layer;
  • Back-propagation training is performed on the image reconstruction model based on the training reconstructed image, the high-resolution image and the preset objective function to obtain a trained image reconstruction model.
  • the objective function may be an L2 loss function
  • Back-propagation training is performed on the image reconstruction model based on the training reconstructed image, the high-resolution image and the L2 loss function to adjust the parameters of the image reconstruction model until the preset training is completed condition to obtain the image reconstruction model after training.
  • the image reconstruction model training method may further include:
  • the trained image reconstruction model is pruned to preserve long-line cascades and delete short-line cascades.
  • the method may further include:
  • a self-reducing average is performed on the low-resolution image to highlight texture details of the low-resolution image.
  • the method may further include:
  • the step of inputting the low-resolution image into a pre-built image reconstruction model may include:
  • the step of using the feature extraction network to perform multi-scale feature extraction on the low-resolution image to obtain a training feature map may include:
  • Still other embodiments of the present application also provide an image processing apparatus, and the apparatus may include:
  • an image acquisition module which can be configured to acquire an image to be processed
  • the first execution module can be configured to input the image to be processed into an image reconstruction model, and use the feature extraction network of the image reconstruction model to perform multi-scale feature extraction on the to-be-processed image and expand image channels to obtain a reconstruction feature map;
  • the second execution module may be configured to use the sub-pixel convolution layer of the image reconstruction model to amplify the reconstructed feature map to obtain a reconstructed image.
  • Still other embodiments of the present application also provide an image reconstruction model training apparatus, the apparatus may include:
  • a sample acquisition module which can be configured to acquire training samples, where the training samples include low-resolution images and high-resolution images, and the low-resolution images are obtained by down-sampling the high-resolution images;
  • a first processing module which can be configured to input the low-resolution image into a pre-built image reconstruction model, the image reconstruction model including a feature extraction network and a sub-pixel convolution layer;
  • the second processing module can be configured to use the feature extraction network to perform multi-scale feature extraction on the low-resolution image and expand image channels to obtain a training feature map;
  • a third processing module may be configured to use the sub-pixel convolutional layer to amplify the training feature map to obtain a training reconstructed image
  • the fourth processing module may be configured to perform back-propagation training on the image reconstruction model based on the training reconstructed image, the high-resolution image and a preset objective function to obtain a trained image reconstruction model.
  • an image processing and model training method, device, electronic device, and storage medium provided by the embodiments of the present application are obtained by acquiring an image to be processed and inputting an image reconstruction model.
  • the image reconstruction model includes a feature extraction network and sub-pixel convolution.
  • the feature extraction network is used to extract the multi-scale feature of the image to be processed and expand the image channel to obtain the reconstructed feature map, and then use the sub-pixel convolution layer to amplify the reconstructed feature map to obtain the reconstructed image. Since the feature extraction network can extract multi-scale features and expand image channels, it is possible to obtain a better reconstruction effect without increasing the depth of the network.
  • Image processing greatly reduces the amount of calculation and parameters; thus improving the processing speed while ensuring the reconstruction effect.
  • Some embodiments of the present application provide a method for super-resolution reconstruction of a portrait, the method may include:
  • the super-resolution reconstruction process is performed using the image processing method described above.
  • the key point detection, super-resolution reconstruction processing and restoration processing may include multiple rounds of iterative processing, and the to-be-processed image is an unprocessed to-be-processed image, or an image that has been processed in a previous round of iterations.
  • the super-resolution image obtained after the key point detection, super-resolution reconstruction processing and restoration processing.
  • the face key points may include multiple, and the image to be processed is restored by using the high-frequency information of the image to obtain a super-resolution image corresponding to the to-be-processed image steps that can include:
  • restoration processing is performed on the to-be-processed image to obtain a super-resolution image corresponding to the to-be-processed image.
  • performing restoration processing on the to-be-processed image based on the position information of each of the face key points and the high-frequency information of the image to obtain a super-resolution corresponding to the to-be-processed image Image steps can include:
  • restoration processing is performed on the corresponding face key points in the to-be-processed image.
  • the reconstructed model may include a discriminator and a generation network, and the generation network is obtained after training with training samples under the supervision of the trained discriminator.
  • the face key points may include the contours of the left eye, the right eye, the nose, the mouth and the chin.
  • inventions of the present application provide a method for training a portrait super-resolution reconstruction model, the method may include:
  • the training continues until a reconstructed model is obtained when a first preset condition is satisfied.
  • the output image and the target sample are compared, and the generation network is adjusted based on the comparison result after network parameters are adjusted, and the training is continued until the reconstruction is obtained when a first preset condition is satisfied.
  • the steps of the model can include:
  • the reconstruction model further includes a discriminator, and the discriminator is used to supervise the training of the generation network, and the method may further include:
  • the parameters of the discriminator are adjusted until the trained discriminator is obtained when the second preset condition is satisfied.
  • the output image and the target sample are compared, and the generation network is adjusted based on the comparison result after network parameters are adjusted, and the training is continued until the reconstruction is obtained when a first preset condition is satisfied.
  • the steps of the model can include:
  • the step of performing network parameter adjustment on the generation network according to the discrimination information and the comparison result and continuing to train until the reconstructed model is obtained when a first preset condition is satisfied may include: :
  • a third loss function is constructed based on the discriminant information of the output image by the discriminator, and a fourth loss function is constructed based on the image difference between the output image and the target sample obtained by the pre-built portrait cognitive model;
  • the reconstructed model is obtained when the function value satisfies the first preset condition.
  • the device may include:
  • the detection module can be configured to use the pre-built reconstruction model to perform key point detection on the image to be processed to obtain face key points;
  • a processing module which can be configured to perform super-resolution reconstruction processing according to the face key points and image features obtained based on the to-be-processed image to obtain high-frequency image information
  • the restoration module may be configured to perform restoration processing on the to-be-processed image by using the high-frequency information of the image to obtain a super-resolution image corresponding to the to-be-processed image.
  • the apparatus for super-resolution reconstruction of a human portrait may further include: according to the above-mentioned image processing apparatus, the image processing apparatus may be configured to perform super-resolution reconstruction processing.
  • Still other embodiments of the present application provide a human portrait super-resolution reconstruction model training device, and the human portrait super-resolution reconstruction device may include:
  • an acquisition module which can be configured to acquire training samples and target samples corresponding to the training samples
  • a key point obtaining module which can be configured to perform key point detection on the training sample by using the constructed generation network to obtain training key points
  • an output image obtaining module which can be configured to perform super-resolution reconstruction processing and restoration processing based on the training key points and the training samples to obtain an output image
  • a training module which can be configured to compare the output image and the target sample, and adjust the network parameters of the generation network based on the comparison result and continue training until a reconstructed model is obtained when the first preset condition is satisfied .
  • the electronic device may include: one or more processors; one or more storage media for storing one or more machine-executable instructions, when the One or more machine-executable instructions, when executed by the one or more processors, cause the one or more processors to implement the image processing method according to some embodiments, or the image processing method according to other embodiments
  • Still other embodiments of the present application provide a computer-readable storage medium storing machine-executable instructions that, when executed, implement image processing according to some embodiments method, or the image reconstruction model training method according to other embodiments, or the portrait super-resolution reconstruction method according to still other embodiments, or the portrait super-resolution reconstruction model training method according to still some embodiments .
  • the key points of the image to be processed are detected by using the pre-built reconstruction model to obtain the key points of the face, and then according to the face
  • the key points and the image features obtained based on the image to be processed are subjected to super-resolution reconstruction processing to obtain high-frequency information of the image.
  • the super-resolution reconstruction of the image is realized by combining the detection of the key points of the face and the restoration of the face, and the recognition of the obtained super-resolution image is improved, which meets the needs of users in practical applications.
  • FIG. 1 shows an application scenario diagram of the image processing method provided by the embodiment of the present application.
  • FIG. 2 shows a schematic flowchart of an image processing method provided by an embodiment of the present application.
  • FIG. 3 shows an example diagram of an image reconstruction model provided by an embodiment of the present application.
  • FIG. 4 shows an example diagram of a cascaded block provided by an embodiment of the present application.
  • FIG. 5 shows an example diagram of a residual block provided by an embodiment of the present application.
  • FIG. 6 shows another example diagram of an image reconstruction model provided by an embodiment of the present application.
  • FIG. 7 shows an image processing result presentation diagram provided by an embodiment of the present application.
  • FIG. 8 is a flowchart of a method for super-resolution reconstruction of a portrait provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a processing flow of a method for super-resolution reconstruction of a portrait provided by an embodiment of the present application.
  • FIG. 10 is another schematic diagram of a processing flow of the method for super-resolution reconstruction of a portrait provided by an embodiment of the present application.
  • FIG. 11 is a flowchart of a method for obtaining a super-resolution image in the method for super-resolution reconstruction of a portrait provided by an embodiment of the present application.
  • FIG. 12 is another schematic diagram of a processing flow of the method for super-resolution reconstruction of a portrait provided by an embodiment of the present application.
  • FIG. 13 shows a schematic flowchart of an image reconstruction model training method provided by an embodiment of the present application.
  • FIG. 14 is a flowchart of a method for training a portrait super-resolution reconstruction model provided by an embodiment of the present application.
  • FIG. 15 is one of the flowcharts of a method for obtaining a reconstructed model in the method for training a super-resolution reconstruction model of a portrait provided by an embodiment of the present application.
  • FIG. 16 is the second flowchart of a method for obtaining a reconstructed model in the method for training a super-resolution reconstruction model of a portrait provided by an embodiment of the present application.
  • 17(a) to 17(c) are schematic diagrams of output images obtained by the interpolation processing method, the method without adding the discriminator, and the method adding the discriminator, respectively.
  • FIG. 18 shows a schematic block diagram of an image processing apparatus provided by an embodiment of the present application.
  • FIG. 19 shows a schematic block diagram of an apparatus for training an image reconstruction model provided by an embodiment of the present application.
  • FIG. 20 shows a schematic block diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 21 is a structural block diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 22 is a block diagram of functional modules of an apparatus for super-resolution reconstruction of a portrait provided by an embodiment of the present application.
  • FIG. 23 is a block diagram of functional modules of an apparatus for training a super-resolution reconstruction model of a portrait provided by an embodiment of the present application.
  • Icons 10-electronic equipment; 11-processor; 12-memory; 13-bus; 20-first terminal; 30-second terminal; 40-network; 50-server; 100-image processing device; 110-image acquisition module; 120-first execution module; 130-second execution module; 200-model training device; 210-sample acquisition module; 220-first processing module; 230-second processing module; 240-third processing module; 250 - Fourth processing module.
  • 2110-storage medium 2120-processor; 2130-machine executable instructions; 131-portrait super-resolution reconstruction device; 1311-detection module; 1312-processing module; 1313-restoration module; 132-model training device; 1321-acquisition module; 1322-key point acquisition module; 1323-output image acquisition module; 1324-training module; 140-communication interface.
  • FIG. 1 shows an application scenario diagram of the image processing method provided by the embodiment of the present application, including a first terminal 20 , a second terminal 30 , a network 40 and a server 50 , the first terminal 20 and the second terminal 20 .
  • the terminals 30 are each connected to the server 50 through the network 40 .
  • the first terminal 20 and the second terminal 30 may be mobile terminals, and various application programs (Application, App) may be installed on the mobile terminals, for example, a video playing App, an instant messaging App, a video/image capturing App, and a shopping App. Wait.
  • the network 40 may be a wide area network or a local area network, or a combination of the two, using a wireless link for data transmission.
  • the first terminal 20 and the second terminal 30 may be any mobile terminals having a screen display function, for example, a smart phone, a notebook computer, a tablet computer, a desktop computer, a smart TV, and the like.
  • the first terminal 20 may upload the video file or picture to the server 50, and the server 50 may store the video file or picture after receiving the video file or picture uploaded by the first terminal 20.
  • the second terminal 30 can request the video file or picture from the server 50 , and the server 50 can return the video file or picture to the second terminal 30 .
  • the video file or picture will be compressed, so the resolution of the video file or picture is lower.
  • the second terminal 30 After receiving the video file or picture, the second terminal 30 can perform real-time processing on the video file or picture by using the image processing method provided in the embodiment of the present application to obtain a high-resolution video or picture, and display it on the second terminal 30 in the display interface to improve the user's picture quality experience.
  • the image processing method provided by the embodiment of the present application may be integrated into a video playback App or a gallery App of the second terminal 30 as a functional plug-in.
  • the first terminal 20 may be the mobile terminal of the host, and the second terminal 30 may be the mobile terminal of the viewer.
  • the first terminal 20 can upload the live video to the server 50, and the server 50 can store the live video.
  • the server 50 can return the live broadcast to the second terminal 30. video.
  • the second terminal 30 can process the live video in real time by using the image processing method provided in the embodiment of the present application to obtain a high-resolution live video and display it, so that the audience can watch the live video clearly. Live video.
  • the image processing method provided in this embodiment of the present application can be applied to a mobile terminal.
  • the above description is given by taking the application to the second terminal 30 as an example, it should be understood that the image processing method can also be applied to the first terminal. 20.
  • the specific value can be determined according to the actual application scenario, which is not limited here.
  • FIG. 2 shows a schematic flowchart of an image processing method provided by an embodiment of the present application.
  • the image processing method may include the following steps:
  • the image to be processed may be a picture displayed on the mobile terminal that needs to be reconstructed by super-resolution to improve the image quality or a video frame in a video stream, for example, it may be a low-resolution video obtained by the second terminal 30 from the server 50. video frame.
  • the mobile terminal can directly perform super-resolution reconstruction when receiving a low-resolution picture or a low-resolution video file; it can also display a low-resolution picture or a low-resolution video file first after receiving Display the interface, wait until the user performs the resolution switching operation, and then perform the super-resolution reconstruction. For example, when a low-resolution video is received, play it first, and when the user switches the resolution from "standard definition" to "ultra-definition", then Perform super-resolution reconstruction.
  • S102 input the image to be processed into an image reconstruction model, and use a feature extraction network of the image reconstruction model to perform multi-scale feature extraction on the image to be processed and expand image channels to obtain a reconstructed feature map.
  • the to-be-processed image is input into the image reconstruction model for super-resolution reconstruction.
  • the image reconstruction model includes a feature extraction network and a sub-pixel convolution layer.
  • the feature extraction network is used to extract multi-scale features of the image to be processed and expand the image channel, and the sub-pixel convolution layer is used to reconstruct the output of the feature extraction network.
  • the feature map is zoomed in.
  • Multi-scale feature extraction refers to extracting feature information at different levels by means of global cascade and local cascade.
  • feature extraction can be performed step by step from the bottom layer to the high layer, or the bottom layer information can be directly transferred to the high layer.
  • An image channel refers to one or more color channels after an image is divided according to color components.
  • an image can be divided into a single-channel image, a three-channel image and a four-channel image according to the image channel.
  • a single-channel image means that each pixel in the image is represented by only one value, such as a grayscale image;
  • a three-channel image means that each pixel in the image is represented by three values, such as an RGB color image; four-channel image The image is based on the three-channel image plus transparency, Alpha color space, etc.
  • Extending image channels means increasing the number of channels in the image without changing the size of the image.
  • the input is an image of H ⁇ W ⁇ C, where H ⁇ W is the size of the input image and C is the number of channels of the input image;
  • the output is an image of H ⁇ W ⁇ r 2 C, where H ⁇ W is the size of the output image size, r 2 C is the number of channels of the output image.
  • the sub-pixel convolution layer also known as PixelShuffle, is a convolutional layer that can be computed efficiently. Get high-resolution feature maps. Compared to artificial boosting filters such as bilinear or bicubic samplers, subpixel convolutional layers can be trained to learn more complex boosting operations, while the overall computation time is reduced.
  • the main function of the sub-pixel convolution layer is to combine the feature maps of r 2 channels into a new r ⁇ H, r ⁇ W upsampling result, namely (r ⁇ H) ⁇ (r ⁇ W) ⁇ C, the output image of rH ⁇ rW ⁇ C is obtained, and the r-fold enlargement of the input feature map to the output image is completed.
  • the working process of the sub-pixel convolutional layer can be as follows: firstly, the original low-resolution pixel is divided into r ⁇ r small grids; then according to certain rules, the values of the corresponding positions of the r ⁇ r input feature maps are used to fill these small grids ; The recombination process is completed by filling the small grids divided by each low-resolution pixel in the same way.
  • a sub-pixel convolutional layer may be used to adjust pixel positions in the reconstructed feature map to obtain a reconstructed image.
  • the reconstructed feature map output by the feature extraction network is H ⁇ W ⁇ r 2 C
  • the sub-pixel convolution layer is used to adjust the pixel position to obtain a reconstructed image of rH ⁇ rW ⁇ C, and then complete the r-fold magnification.
  • the sub-pixel convolutional layer can support multiple magnification sizes.
  • a 4-times magnification operation can be accomplished with a combination of 2-times sub-pixel convolutional layers, or a 2-times and 3-times sub-pixel volume
  • the layered combination completes a 6x magnification operation.
  • the existing super-resolution reconstruction algorithm is to first interpolate to high resolution and then make corrections, while the image reconstruction model in the embodiment of the present application is designed to enlarge the sub-pixel convolution layer at the end, which can ensure the characteristics of the front part of the model.
  • the extraction network processes small-sized images, which greatly reduces the amount of computation and parameters.
  • Step S102 will be described in detail below.
  • the feature extraction network includes convolutional layers, multiple concatenated blocks and multiple first convolutional layers, multiple concatenated blocks and multiple first convolutional layers are alternately arranged, and the feature extraction network adopts a global level link structure.
  • the global cascade structure refers to the left fast channel and the right fast channel in Figure 3.
  • the output of the cascaded block can be directly sent to each first convolutional layer after the cascaded block through the left fast channel.
  • the side fast channel can feed the output of the convolutional layers directly to each first convolutional layer.
  • the transport here refers to the superposition of channels, not the addition of data.
  • the feature extraction network of the image reconstruction model is used to perform multi-scale feature extraction on the image to be processed and expand the image channel to obtain the reconstructed feature map, which may include:
  • the multi-scale feature extraction is performed by using the concatenated block, and the output intermediate feature map;
  • the convolutional layer and the first convolutional layer can expand the image channel, and the convolutional layer, concatenated block and the first convolutional layer can extract features.
  • the channel stacking of the initial feature map and the intermediate feature map refers to combining the channels of the initial feature map and the channels of the intermediate feature map.
  • the initial feature map has 4 channels and the intermediate feature map has 8 channels.
  • the superimposed feature map has 12 channels; in other words, each pixel in the initial feature map is represented by 4 values, and each pixel in the intermediate feature map is represented by 8 values.
  • the feature map after channel stacking Each pixel is represented by 12 values.
  • the structure of the concatenated block is shown in FIG. 4 , the concatenated block includes multiple residual blocks and multiple second convolution layers, and multiple residual blocks and multiple second convolution layers are alternately arranged , the cascade block adopts a local cascade structure.
  • the local cascade structure refers to the left fast channel and the right fast channel in Figure 4.
  • the output of the residual block can be directly sent to each second convolution layer after the residual block through the left fast channel, and the The side fast channel can feed the input of the concatenated block directly to each second convolutional layer.
  • the transmission here refers to the superposition of channels, not the addition of data.
  • the multi-scale feature extraction is performed by using the cascaded block, and the way of outputting the intermediate feature map may include:
  • the input of the concatenated block and the output of each residual block before the Nth second convolutional layer are channel-superposed, and after the superposition is inputted into the Nth second convolutional layer for convolution processing;
  • the second convolutional layer can expand the image channel, and the residual block and the second convolutional layer can extract features.
  • the process of channel stacking the input of the cascaded block and the output of the residual block is similar to the above-mentioned process of channel stacking the initial feature map and the intermediate feature map, and will not be repeated here.
  • the structure of the residual block is shown in Figure 5.
  • the residual block may include a grouped convolutional layer, a third convolutional layer and a fourth convolutional layer.
  • the grouped convolutional layer adopts the ReLu activation function, and the grouped convolutional layer
  • the convolutional layer and the third convolutional layer are connected to form a residual path, and the residual block adopts a local skip connection structure.
  • the local skip connection structure refers to the fusion of the input of the residual block and the output of the residual path to learn residual features.
  • the residual feature is learned by using the residual block to obtain the residual feature map, which may include:
  • the input of the residual block and the output of the third convolution layer are feature fusion, and after fusion, the input is input to the fourth convolution layer for convolution processing, and the residual feature map is output.
  • the third convolutional layer and the fourth convolutional layer can expand the image channel, and the grouped convolutional layer can extract features.
  • the Group Convolution layer can group the input feature maps. Each group is then convolved separately. Compared with regular convolution, grouped convolution can reduce model parameters, thereby increasing the processing speed of the model.
  • the number of layers of a plurality of grouped convolutional layers and the number of groups of each grouped convolutional layer to the input feature map can be flexibly selected by the user according to actual needs.
  • the number of layers of the grouped convolutional layers is 2, the number of groups is 3 and so on.
  • the types of the convolutional layer, the first convolutional layer, the second convolutional layer, the third convolutional layer, and the fourth convolutional layer in this embodiment are not limited. ⁇ 1 point convolution, depth convolution, etc., can be flexibly adjusted according to actual needs.
  • the expressiveness of an image reconstruction model increases with the complexity of the global cascade or the local cascade, that is, the greater the number of cascaded blocks and the first convolutional layer in the feature extraction network, or the number of cascaded blocks in the The greater the number of residual blocks and second convolutional layers, the more expressive the image reconstruction model will be.
  • the more complex the network structure the slower the calculation speed. Therefore, in order to improve the processing speed while ensuring the reconstruction effect, the number of each module should not be too large.
  • the number of concatenated blocks and the first convolutional layer in the feature extraction network can both be 3 to 5, and the number of residual blocks and the second convolutional layer in the concatenated block can both be 3 to 5,
  • the number of grouped convolutional layers in the residual block can be 2 to 4.
  • the feature extraction network can be set to include 3 concatenated blocks and 3 first convolutional layers, the concatenated block includes 3 residual blocks and 3 second convolutional layers, and the residual block includes 2 Layer grouping convolutional layers.
  • the shared parameters of each module in the cascaded block can be set, that is, the shared parameters of multiple residual blocks and the shared parameters of multiple second convolution layers, so that the image reconstruction model can be further lightened and the processing speed can be improved.
  • the image reconstruction model can be further lightened and the processing speed can be improved.
  • the left picture and the middle picture are reconstructed images obtained by adopting the image processing method provided by the embodiment of the present application, the left picture does not share parameters, and the middle picture shares parameters; the right picture shows the use of bicubic interpolation (Bicubic ) algorithm to obtain the reconstructed image.
  • Bicubic bicubic interpolation
  • FIG. 8 shows a schematic flowchart of a method for super-resolution reconstruction of a portrait.
  • the detailed steps of the portrait super-resolution reconstruction method are introduced as follows.
  • Step S110 using a pre-built reconstruction model to perform key point detection on the image to be processed to obtain face key points.
  • Step S120 Perform super-resolution reconstruction processing according to the face key points and the image features obtained based on the to-be-processed image to obtain image high-frequency information.
  • Step S130 performing restoration processing on the to-be-processed image by using the high-frequency information of the image to obtain a super-resolution image corresponding to the to-be-processed image.
  • the target image to be processed may be an image with low definition, such images often have low definition of the face, and for example, image recognition, image matching, etc. problems that create obstacles.
  • the image to be processed may be a face image collected by a monitoring device, or a face image obtained by taking a screenshot of a web page, or a host's face image collected during a live broadcast, and so on.
  • the constructed reconstruction model may be used to first perform key point detection on the image to be processed to obtain the face key points.
  • the obtained face key points may include left eye, right eye, nose, mouth and chin contour. Based on these face key points, the outline of the face can be roughly outlined, and the most important part of the face for human eye cognition among the key points of the face is included.
  • the key points of the face are obtained by means of key point detection; Combining the face key points and the obtained image features for super-resolution reconstruction processing, the high-frequency information of the image is obtained.
  • the high-frequency information of the image mainly embodies the information at some edges and contours in the image, while the part of the slowly changing grayscale within the contour is the low-frequency information.
  • the high-frequency information of the image can reflect the information of the relative change area, so it is very important for the reconstruction of the image.
  • the image processing method according to the present application described in conjunction with FIG. 2 may be used to perform super-resolution reconstruction processing of the image, so as to obtain the high-frequency information of the image.
  • the image high-frequency information obtained in the exemplary embodiment of the present application is local information in the face image, and it is necessary to restore the image high-frequency information to the to-be-processed image, so as to perform restoration processing on the to-be-processed image, and obtain the super-high-frequency information corresponding to the to-be-processed image. resolution image.
  • the super-resolution reconstruction method of a portrait detects the key points of the face, obtains the high-frequency image information by using the key points of the face and the image features of the image, and then uses the high-frequency information of the image to restore the image to be processed,
  • the recognition of the obtained super-resolution image can be improved, and it meets the needs of users in practical applications.
  • the above-mentioned key point detection, super-resolution reconstruction processing and restoration processing may include multiple rounds of iterative processing.
  • the above image to be processed may be an unprocessed image to be processed, or a super-resolution image obtained after the key point detection, super-resolution reconstruction processing and restoration processing in the previous iteration.
  • the super-resolution after this round of iterative processing can be obtained after the above-mentioned key point detection, super-resolution reconstruction processing and restoration processing Image SRFace (Super Resolution Face). Then, on the basis of the obtained super-resolution image, the above-mentioned key point detection, super-resolution reconstruction processing and restoration processing are performed to obtain the super-resolution image after the second round of iteration. According to the processing logic, the final super-resolution image is obtained when certain requirements are met after multiple iterations.
  • the image high-frequency information can be obtained , and then obtain the first-round super-resolution image Face SR1 according to the high-frequency information of the image and Input.
  • key points are detected for Face SR1, and the corresponding face key points Face Points 1 are obtained.
  • the image high-frequency information is obtained, and then according to the image high-frequency information and Face SR1 Get the second-round super-resolution image Face SR2.
  • the final super-resolution image Face SR N can be obtained after N iterations (N is the preset number of iteration stops or the image obtained after N iterations meets the preset requirements).
  • the image obtained by the previous round of processing is used as the detection object to perform multiple loop processing in a recursive manner, which can continuously improve the quality of the obtained super-resolution image.
  • model parameters in multiple loops can be shared, thereby making the model more lightweight and providing support for applying the model to devices with weak processing capabilities, such as mobile terminals.
  • the network width can be preferentially increased within a certain range, that is, the number of feature extraction channels, instead of focusing on the network depth, that is, the number of network layers, combined with the use of recursive processing methods, the recognition accuracy of the model can be improved.
  • the detected face key points include a plurality of face key points, and the image high-frequency information is obtained based on the face key points and image features to restore the image to be processed.
  • the above-mentioned restoration processing can be implemented by the following steps:
  • step S131 the image to be processed is processed by using a pre-built portrait cognitive model, and the position information of each of the key points of the face is output.
  • Step S132 performing restoration processing on the corresponding face key points in the to-be-processed image according to each of the face key points and their corresponding position information and image high-frequency information to obtain the super-resolution corresponding to the to-be-processed image image.
  • a neural network model can be constructed, and the neural network model can be, for example, a convolutional neural network model (Convolutional Neural Networks, CNN) or the like.
  • CNN convolutional Neural Networks
  • Multiple training samples can be collected, wherein each training sample contains a face image, and the face key points in each face image carry position information, and the position information can be the position of each face key point in the face area.
  • the face area can also be mapped into the coordinate system, and the coordinate value of the key point of the face in the coordinate system is used as its position information.
  • the training samples to train the constructed neural network model to obtain a portrait cognitive model that meets the requirements can be identified and obtained by using the face recognition model.
  • the key points of the face in the LR Face of the image to be processed such as the left eye, right eye, nose, mouth, and chin contour, can be
  • the position information of the obtained face key points and the high-frequency information of the corresponding face key points contained in the high-frequency information of the image are restored and processed to obtain the final super-resolution image SR Face.
  • the method of obtaining the position information of each face key point by adopting the portrait cognitive model can accurately process the processing based on the position of each face key point during restoration.
  • the corresponding position in the image is restored to avoid the phenomenon of restoration and displacement of the corresponding key points of the face.
  • the specific restoration requirements of different face key points are often different during restoration. For example, for the eyes, it is hoped that the restored eyes can be brighter, while for the chin contour, it may be desirable to restore the processed eyes.
  • the chin contour is more defined.
  • the restoration attribute corresponding to each face key point may be obtained first, and the restoration attribute is the value of the restoration process described above. Different request information. Then, according to the position information of each face key point, the restoration attribute and the high frequency information of the image, the restoration processing is performed on the image to be processed, and the corresponding super-resolution image is obtained.
  • the face key points can be independently restored by distinguishing each face key point and based on its corresponding position information and restoration attributes, which can not only satisfy different needs
  • the specific requirements for the restoration of key points of the face, and the reconstruction model can also be processed synchronously based on the group convolution method, which can greatly reduce the processing time.
  • the super-resolution reconstruction process is implemented by using a reconstruction model constructed and trained in advance.
  • the model training method provided in the embodiments of the present application can be applied to any electronic device with an image processing function, for example, a server, a mobile terminal, a general-purpose computer, or a special-purpose computer.
  • FIG. 13 shows a schematic flowchart of an image reconstruction model training method provided by an embodiment of the present application.
  • the model training method may include the following steps:
  • training samples include low-resolution images and high-resolution images, and the low-resolution images are obtained by down-sampling the high-resolution images.
  • the training sample here is a dataset, which can obtain a large number of high-resolution images (for example, the resolution is higher than a certain preset value) as the original sample, and these high-resolution images can be various types of pictures or videos.
  • the video frame for example, may be a high-definition live video in a live video scene, and the like.
  • down-sampling is performed on the original samples, that is, down-sampling is performed on each high-resolution image according to the same method to obtain training samples.
  • the way of downsampling processing can be bicubic interpolation or the like.
  • S202 Input the low-resolution image into a pre-built image reconstruction model, where the image reconstruction model includes a feature extraction network and a sub-pixel convolution layer.
  • steps S203-S204 are similar to the processing procedures of steps S102-S103, and are not repeated here.
  • the objective function may be an L2 loss function, which is also called a mean square error (Mean Square Error, MSE) function, which is a type of regression loss function.
  • MSE mean square error
  • the curve of the L2 loss function is smooth, continuous, and derivable everywhere, which is convenient for using the gradient descent algorithm; and as the error decreases, the gradient is also decreasing, which is conducive to convergence, even if a fixed learning rate is used, it can be faster. converge to the minimum value.
  • back-propagation training can be performed on the image reconstruction model based on the training reconstructed image, the high-resolution image, and the L2 loss function, so as to adjust the parameters of the image reconstruction model until the preset training completion condition is reached, and the result is obtained
  • the trained image reconstructs the model.
  • the training completion condition can be that the number of iterations reaches a set value (for example, 2000 times), or the L2 loss function converges to a minimum value, etc., which is not limited here and can be set according to actual needs.
  • the trained image reconstruction model can be pruned according to the requirements and test results, and the long-line cascades are retained and deleted. Cascading short lines, thereby reducing excessive jumps in the middle, making the model more lightweight.
  • the low-resolution image can be pre-processed first, and then the image reconstruction model can be input after the pre-processing, and the pre-processing can be the self-subtraction of the image. Therefore, before step S202, the model training method may further include:
  • the low-resolution image is self-reduced to highlight the texture details of the low-resolution image.
  • the self-subtracting average processing can be performed without processing the foreground in the image, but subtracting the pixel average value of the background image from each pixel in the background, thereby enhancing the contrast between the background part and the foreground part and highlighting the texture details.
  • the preprocessing in order to extract more features from the feature extraction network, can also be performed on the image by flipping the symmetry operation and then inputting the model, and then performing the reverse flip symmetry on the output result of the model and calculating the average value, thereby Reduce the deviation of some feature layers or parameters caused by anisotropy. Therefore, before step S202, the model training method may further include:
  • At least one processed low-resolution image is input into the image reconstruction model, and the feature extraction network is used to perform multi-scale feature extraction on the at least one processed low-resolution image to obtain at least one auxiliary feature map; Perform reverse-flip symmetry processing, and average values after reverse-flip-symmetric processing to obtain training feature maps.
  • n ⁇ n flip it 3 times in the clockwise direction, 90° each time, so that 4 images of n ⁇ n can be obtained; then the 4 images of n ⁇ n are input into the image reconstruction model , the feature extraction network outputs 4 auxiliary feature maps; then flip the corresponding 3 auxiliary feature maps by 90°, 180° and 270° in the counterclockwise direction; and then perform pixel averaging on the processed 4 auxiliary feature maps to obtain The final training feature map.
  • the low-resolution image can be subjected to self-reducing average processing first, and then the low-resolution image can be flipped symmetrically; Self-decreasing average processing. It can be flexibly set according to actual needs, and is not limited here.
  • a new model can be trained on the basis of the completed model. For example, when training 3x and 4x magnification models, assuming that the 2x magnification model has been trained, the parameters of the 2x magnification model can be used as the initial parameters of the 3x and 4x magnification models. Here based on training.
  • FIG. 14 shows A schematic flowchart of a method for training a portrait super-resolution reconstruction model provided in an embodiment of the present application.
  • the method for training a portrait super-resolution reconstruction model includes:
  • Step S2100 acquiring training samples and target samples corresponding to the training samples
  • Step S2200 using the constructed generation network to perform key point detection on the training sample to obtain training key points;
  • Step S2300 performing super-resolution reconstruction processing and restoration processing based on the training key points and the training samples to obtain an output image
  • Step S2400 compare the output image with the target sample, and adjust the network parameters of the generating network based on the comparison result, and then continue training until a reconstructed model is obtained when a first preset condition is satisfied.
  • the reconstruction of the obtained reconstruction model can be improved. Accuracy.
  • a plurality of training samples are collected in advance, and each training sample may be a sample image including a face image with lower definition.
  • the target sample corresponding to the training sample is the one that meets the requirements, that is, the high-definition sample expected to be obtained after processing the training sample.
  • the pre-built generative network may be a recurrent recurrent network, and the process of using the generative network to perform key point detection, super-resolution reconstruction processing and restoration processing on training samples can be referred to the above description.
  • the generative network After processing, the generative network can output the output images corresponding to the training samples.
  • the target sample is used as a comparison standard for the processing quality of the generation network.
  • the generation network can be continuously trained according to the comparison result, so that the difference between the output image and the target sample is reduced to meet the requirements.
  • the reconstructed model is obtained.
  • the samples may be preprocessed, for example, by means of self-reduction, so as to bring out the details of the image texture, so as to improve the effect of subsequent processing and recognition.
  • the preprocessed samples can also be inverted symmetrically and then input to the generation network.
  • the output results of each network layer of the generation network can be reversed and symmetrically averaged. In this way, we can Reduce the deviation of some network layers or parameters caused by anisotropy.
  • the network can be pruned according to the requirements and the results of the test to retain several previous cycles that have a greater impact on the results.
  • the training to improve the reconstruction accuracy of the resulting generative network, and the peak signal-to-noise ratio and structural similarity of the subsequently processed images can also be greatly improved.
  • a loss function may be constructed to detect training of the generative network.
  • step S2400 of the method for training a portrait super-resolution reconstruction model of the present application it can be implemented in the following ways:
  • Step S2410 constructing a first loss function based on the difference between the pixel information of the output image and the pixel information of the target sample;
  • Step S2420 based on the difference between each face key point in the output image and the corresponding face key point in the target sample, construct a second loss function
  • Step S2430 compare the output image and the target sample, and adjust the network parameters of the generating network based on the comparison result and continue training until the weighted first loss function and the second loss function are obtained.
  • the reconstructed model is obtained when the function value satisfies the first preset condition.
  • the first loss function and the second loss function may be constructed to comprehensively evaluate the training of the generative network.
  • the first loss function is evaluated from the perspective of pixel differences between images.
  • a second loss function constructed with the difference information between face key points is added. .
  • the first loss function represents the overall pixel-level Euclidean distance between the output image of the generation network and the target sample (that is, the desired output effect)
  • the second loss function represents the face key point detection of the generation network. The Euclidean distance between the face key points and the corresponding face key points in the target sample (desired output effect).
  • the above-mentioned first loss function and second loss function are weighted and combined to jointly serve as the loss function of the generating function.
  • the function value of the comprehensive loss function including the first loss function and the second loss function is calculated by comparing the output image and the target sample.
  • the reconstructed model is obtained when the obtained function value satisfies the first preset condition.
  • the first preset condition may be that the value of the loss function no longer decreases to achieve convergence, or that the value of the loss function is lower than a preset value.
  • the training can be stopped to obtain a reconstructed model.
  • the first loss function constructed based on the difference of pixel information and the second loss function based on the difference between the key points of the face are used to perform the training supervision and judgment of the reconstructed model, which can improve the subsequent application of the reconstructed model to improve the performance of the reconstruction model.
  • the recognition of the obtained super-resolution image can be improved by applying the above-mentioned pre-built reconstruction model obtained by the generative network to the above-mentioned reconstruction of the to-be-processed image.
  • the reconstruction model in the method for training a portrait super-resolution reconstruction model includes a generation network, which is constructed for pre-training and can process low-resolution images to output corresponding images. model of super-resolution images.
  • the reconstruction model may further include a discriminator, and the discriminator may be used to supervise the training of the generation network. Therefore, in this embodiment, the generation network is a generation network obtained after training with training samples under the supervision of the trained discriminator.
  • the method for training a portrait super-resolution reconstruction model according to the present application further comprises the following steps:
  • the parameters of the discriminator are adjusted until the trained discriminator is obtained when the second preset condition is satisfied.
  • the main realization principle of the discriminator is to discriminate a real image (that is, a high-resolution image that meets the requirements) as real as possible (for example, output a discriminant result of 1), and generate a network
  • the output image of the generator can be judged as false as much as possible (for example, the output discrimination result is 0), so that the generator network can be supervised for continuous training, and finally the discriminator can judge the output image of the generator network as true. That is, the discriminator acts as the supervisor of the generative network to continuously optimize the training of the generative network.
  • a loss function of the discriminator may be pre-built, and the loss function may be composed of discriminant information of the output image of the generation network and discriminant information of the target sample by the discriminator.
  • the training process of the discriminator is the process of minimizing the above-mentioned loss function.
  • the value of the above-mentioned loss function no longer decreases to achieve convergence, it can be determined that the training of the discriminator satisfies the second preset condition, and the training can be obtained.
  • the discriminator can be fixed.
  • a discriminator is added to the reconstruction model to form an adversarial network including the discriminator and the generation network, which can further improve the reconstruction effect of the obtained reconstruction model.
  • the training and adjustment of the generation network may include the relevant discriminant of the discriminator.
  • step S2400 in the training method of the portrait super-resolution reconstruction model of the present application, the following sub-steps may be included:
  • Step S2410' inputting the output image to the trained discriminator to obtain discriminant information
  • Step S2420' compare the output image and the target sample to obtain a comparison result
  • Step S2430' after adjusting the network parameters of the generating network according to the discrimination information and the comparison result, continue training until a reconstructed model is obtained when the first preset condition is satisfied.
  • the difference between the output image and the target sample and the discriminator's discriminant information on the output image can be combined to adjust the training of the generation network.
  • the construction of the loss function may be performed in the following manner, and the reconstruction model training is performed by using the constructed loss function:
  • a third loss function is constructed based on the discriminant information of the output image by the discriminator, and a fourth loss function is constructed based on the image difference between the output image and the target sample obtained by the pre-built portrait cognitive model;
  • the reconstructed model is obtained when the function value satisfies the first preset condition.
  • the influence of the difference between the output image and the target sample on the adjustment of the generation network can be represented by the first loss function and the second loss function.
  • the influence of the discriminator's discriminant information on the output image on the training adjustment of the generation network can be represented by the third loss function.
  • a fourth loss function constructed from the image difference between the output image obtained by the portrait cognition model and the target sample can also be added.
  • the above-mentioned first loss function is constructed based on the difference between the pixel information of the output image and the pixel information of the corresponding target sample, and the second loss function is determined by the corresponding key points of each face in the output image and the target sample. It is constructed by the difference between the key points of the face. Since the purpose of constructing the discriminator to supervise the training of the generation network is to make the output image obtained by the generation network finally judged to be true by the discriminator, the third loss function is constructed by the discriminator's discriminative information on the output image. The fourth loss function is constructed by the difference of facial features between the output image obtained by the portrait cognitive model and the target sample.
  • the finally obtained loss function of the generating network can be obtained by weighted combination of the above-mentioned first loss function, second loss function, third loss function and fourth loss function.
  • the network parameters can be adjusted for the generation network according to the discrimination information of the discriminator and the comparison result between the output image and the target sample, and then the training can be continued.
  • the calculation process of the function value of the loss function after the above-mentioned combination is adjusted for training, until the function value weighted by the first loss function, the second loss function, the third loss function and the fourth loss function satisfies the first preset condition, the trained reconstruction model can be obtained.
  • FIGS. 17( a ) to 17 ( c ) wherein, FIG. 17 ( a ) is an image obtained after conventional interpolation processing, and FIG. 17 ( b ) is the embodiment of the application without adding a discriminator.
  • the obtained image, and FIG. 17( c ) is the image obtained under the implementation of adding a discriminator in this application.
  • the image obtained under the solution of the present application has significantly higher definition and better effect than the conventional interpolation processing method.
  • the image obtained by adding the discriminator is more clear in the human eye cognition than the image obtained without adding the discriminator.
  • FIG. 18 shows a schematic block diagram of an image processing apparatus 100 provided by an embodiment of the present application.
  • the image processing apparatus 100 is applied to a mobile terminal, and includes an image acquisition module 110 , a first execution module 120 and a second execution module 130 .
  • the image acquisition module 110 may be configured to acquire images to be processed.
  • the first execution module 120 may be configured to input the image to be processed into the image reconstruction model, and use the feature extraction network of the image reconstruction model to perform multi-scale feature extraction on the image to be processed and expand image channels to obtain a reconstructed feature map.
  • the feature extraction network includes a convolutional layer, a plurality of concatenated blocks and a plurality of first convolutional layers, and the plurality of concatenated blocks and the plurality of first convolutional layers are alternately arranged, and the feature extraction network adopts global cascade structure;
  • the first execution module 120 may be specifically configured to: input the image to be processed into the convolution layer for convolution processing to obtain an initial feature map; use the initial feature map as the input of the first concatenated block; The output of the first convolutional layer is used as the input of the Nth convolutional block, and the multi-scale feature extraction is performed by the concatenated block, and the intermediate feature map is output; The intermediate feature maps output by the concatenated blocks are channel-stacked, and after stacking, they are input to the Nth first convolutional layer for convolution processing; the output of the last first convolutional layer is used as the reconstructed feature map.
  • the concatenated block includes multiple residual blocks and multiple second convolutional layers, the multiple residual blocks and multiple second convolutional layers are alternately arranged, and the concatenated block adopts a local cascade structure ;
  • the first execution module 120 may perform multi-scale feature extraction using concatenated blocks, and output an intermediate feature map, including: taking the input of the concatenated block as the input of the first residual block, and using the N-1 th block as the input of the first residual block.
  • the output of the second convolutional layer is used as the input of the Nth residual block, and the residual feature is learned by using the residual block to obtain the residual feature map;
  • the output of the difference block is subjected to channel stacking, and after stacking, it is input to the Nth second convolutional layer for convolution processing; the output of the last second convolutional layer is used as the intermediate feature map.
  • the residual block includes a grouped convolutional layer, a third convolutional layer and a fourth convolutional layer, the grouped convolutional layer adopts a ReLu activation function, and the grouped convolutional layer and the third convolutional layer are connected to form Residual path, residual block adopts local skip connection structure;
  • the first execution module 120 may perform the method of learning residual features by using the residual block to obtain the residual feature map, including: using the input of the residual block as the input of the grouped convolution layer, and extracting features through the residual path;
  • the input of the block and the output of the third convolutional layer are feature fusion, and after fusion, they are input to the fourth convolutional layer for convolution processing, and the residual feature map is output.
  • the second execution module 130 may be configured to use the sub-pixel convolution layer of the image reconstruction model to amplify the reconstructed feature map to obtain a reconstructed image.
  • the second execution module 130 may be specifically configured to: use a sub-pixel convolution layer to adjust the pixel positions in the reconstructed feature map to obtain a reconstructed image.
  • FIG. 19 shows a schematic block diagram of an image reconstruction model training apparatus 200 provided by an embodiment of the present application.
  • the model training apparatus 200 is applied to any electronic device with image processing function, and may include: a sample acquisition module 210 , a first processing module 220 , a second processing module 230 , a third processing module 240 and a fourth processing module 250 .
  • the sample acquisition module 210 may be configured to acquire training samples, where the training samples include low-resolution images and high-resolution images, and the low-resolution images are obtained by down-sampling the high-resolution images.
  • the first processing module 220 may be configured to input the low-resolution image into a pre-built image reconstruction model, where the image reconstruction model includes a feature extraction network and a sub-pixel convolutional layer.
  • the second processing module 230 may be configured to use a feature extraction network to perform multi-scale feature extraction on the low-resolution image and expand image channels to obtain a training feature map.
  • the third processing module 240 may be configured to use a sub-pixel convolutional layer to amplify the training feature map to obtain a training reconstructed image.
  • the fourth processing module 250 may be configured to perform back-propagation training on the image reconstruction model based on the training reconstructed image, the high-resolution image and the preset objective function to obtain a trained image reconstruction model.
  • the objective function is an L2 loss function
  • the fourth processing module 250 may be specifically configured to: perform back-propagation training on the image reconstruction model based on the training reconstructed image, the high-resolution image and the L2 loss function, so as to adjust the parameters of the image reconstruction model until the preset value is reached.
  • the training completion condition is obtained, and the image reconstruction model after training is obtained.
  • the first processing module 220 may also be configured to: prune the trained image reconstruction model, so as to retain long-line cascades and delete short-line cascades.
  • the first processing module 220 may also be configured to: perform flip symmetry processing on the low-resolution image to obtain at least one processed low-resolution image.
  • the second processing module 230 may be specifically configured to: input at least one processed low-resolution image into the image reconstruction model.
  • the third processing module 240 may be specifically configured to: use a feature extraction network to perform multi-scale feature extraction on at least one processed low-resolution image to obtain at least one auxiliary feature map; perform reverse flip symmetry on the at least one auxiliary feature map processing, and averaged after anti-flip symmetry processing to obtain the trained feature map.
  • FIG. 20 shows a schematic block diagram of an electronic device 10 provided by an embodiment of the present application.
  • the electronic device 10 may be a mobile terminal that executes the above image processing method, or may be any electronic device having an image processing function that executes the above model training method.
  • the electronic device 10 includes a processor 11 , a memory 12 and a bus 13 , and the processor 11 is connected to the memory 12 through the bus 13 .
  • the memory 12 is used to store programs, such as the image processing apparatus 100 shown in FIG. 18 or the model training apparatus 200 shown in FIG. 19 .
  • the image processing apparatus 100 includes at least one software function module that can be stored in the memory 12 in the form of software or firmware.
  • the processor 11 executes the program to realize The image processing methods disclosed in the above embodiments.
  • the memory 12 may include a high-speed random access memory (Random Access Memory, RAM), and may also include a non-volatile memory (non-volatile memory, NVM).
  • RAM Random Access Memory
  • NVM non-volatile memory
  • the processor 11 may be an integrated circuit chip with signal processing capability. In the implementation process, each step of the above-mentioned method can be completed by an integrated logic circuit of hardware in the processor 11 or an instruction in the form of software.
  • the above-mentioned processor 11 can be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a microcontroller unit (Microcontroller Unit, MCU), a complex programmable logic device (Complex Programmable Logic Device, CPLD), field programmable Gate Array (Field Programmable Gate Array, FPGA), embedded ARM and other chips.
  • CPU Central Processing Unit
  • MCU microcontroller Unit
  • CPLD Complex Programmable Logic Device
  • FPGA Field Programmable Gate Array
  • embedded ARM embedded ARM
  • Embodiments of the present application further provide a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by the processor 11, implements the image processing method or the model training method disclosed in the foregoing embodiments.
  • an image processing and model training method, device, electronic device, and storage medium obtained by the embodiments of the present application obtain an image to be processed and input an image reconstruction model, and the image reconstruction model includes a feature extraction network and a sub-pixel convolution
  • the feature extraction network is used to extract the multi-scale feature of the image to be processed and expand the image channel to obtain the reconstructed feature map, and then use the sub-pixel convolution layer to amplify the reconstructed feature map to obtain the reconstructed image.
  • the processing speed can be improved while ensuring the reconstruction effect.
  • FIG. 21 is a schematic diagram of an exemplary component of an electronic device provided in an embodiment of the present application.
  • the electronic device may include a storage medium 2110, a processor 2120, machine-executable instructions 2130 (the machine-executable instructions 2130 may be the portrait super-resolution reconstruction apparatus 131 or the portrait super-resolution reconstruction model training apparatus 132 according to the present application) and Communication interface 140 .
  • the storage medium 2110 and the processor 2120 are both located in the electronic device and are provided separately.
  • the storage medium 2110 may also be independent of the electronic device, and may be accessed by the processor 2120 through a bus interface.
  • the storage medium 2110 may also be integrated into the processor 2120, for example, may be a cache and/or a general purpose register.
  • the machine-executable instructions 2130 can be understood as the electronic device described in FIG. 21 , or the processor 2120 of the electronic device, and can also be understood as being implemented under the control of the electronic device independently of the electronic device or the processor 2120 described in FIG. 21
  • the software function module of the above-mentioned portrait super-resolution reconstruction method or portrait super-resolution reconstruction model training method can be understood as the electronic device described in FIG. 21 , or the processor 2120 of the electronic device, and can also be understood as being implemented under the control of the electronic device independently of the electronic device or the processor 2120 described in FIG. 21
  • the software function module of the above-mentioned portrait super-resolution reconstruction method or portrait super-resolution reconstruction model training method can be understood as the electronic device described in FIG. 21 , or the processor 2120 of the electronic device, and can also be understood as being implemented under the control of the electronic device independently of the electronic device or the processor 2120 described in FIG. 21
  • the above-mentioned human portrait super-resolution reconstruction apparatus 131 may include a detection module 1311 , a processing module 1312 and a restoration module 1313 .
  • the functions of each functional module of the portrait super-resolution reconstruction apparatus 131 will be described in detail below.
  • the detection module 1311 can be configured to use a pre-built reconstruction model to perform key point detection on the image to be processed to obtain face key points;
  • the detection module 1311 may be configured to perform the above step S110, and for the detailed implementation of the detection module 1311, please refer to the above-mentioned content related to the step S110.
  • the processing module 1312 can be configured to perform super-resolution reconstruction processing according to the face key points and the image features obtained based on the to-be-processed image to obtain high-frequency image information;
  • processing module 1312 may be configured to execute the above-mentioned step S120, and for the detailed implementation of the processing module 1312, please refer to the above-mentioned content related to the step S120.
  • the restoration module 1313 may be configured to perform restoration processing on the to-be-processed image by using the high-frequency information of the image to obtain a super-resolution image corresponding to the to-be-processed image.
  • the restoration module 1313 may be configured to perform the above step S130, and for the detailed implementation of the restoration module 1313, please refer to the above-mentioned content related to the step S130.
  • the portrait super-resolution reconstruction apparatus may further include: the image processing apparatus according to FIG. 18 , the image processing apparatus being configured to perform super-resolution reconstruction processing.
  • the key point detection, super-resolution reconstruction processing and restoration processing include multiple rounds of iterative processing, and the to-be-processed image is an unprocessed to-be-processed image, or an image that has been processed in a previous round of iterations.
  • the super-resolution image obtained after the key point detection, super-resolution reconstruction processing and restoration processing.
  • the face key points include multiple, and the above-mentioned restoration module 1313 can be used to obtain a super-resolution image in the following manner:
  • restoration processing is performed on the to-be-processed image to obtain a super-resolution image corresponding to the to-be-processed image.
  • the restoration module 1313 may be configured to obtain a super-resolution image based on the position information of each face key point and the high-frequency information of the image in the following manner:
  • restoration processing is performed on the corresponding face key points in the to-be-processed image.
  • the reconstructed model includes a discriminator and a generation network, and the generation network is obtained after training with training samples under the supervision of the trained discriminator.
  • the face key points include left eye, right eye, nose, mouth and chin contours.
  • the above-mentioned portrait super-resolution reconstruction model training device 132 may include an acquisition module 1321 , a key point acquisition module 1322 , an output image acquisition module 1323 and a training module 1324 .
  • the functions of each functional module of the portrait super-resolution reconstruction model training device 132 will be described in detail below.
  • an acquisition module 1321 which can be configured to acquire training samples and target samples corresponding to the training samples
  • the acquisition module 1321 may be configured to perform the above step S2100, and for the detailed implementation of the acquisition module 1321, please refer to the above-mentioned content related to the step S2100.
  • the key point obtaining module 1322 can be configured to perform key point detection on the training sample by using the constructed generating network to obtain training key points;
  • the key point obtaining module 1322 may be configured to perform the above step S2200, and for the detailed implementation of the key point obtaining module 1322, reference may be made to the above-mentioned content related to step S2200.
  • the output image obtaining module 1323 can be configured to perform super-resolution reconstruction processing and restoration processing based on the training key points and the training samples to obtain an output image;
  • the output image obtaining module 1323 may be configured to perform the above-mentioned step S2300, and for the detailed implementation of the output image obtaining module 1323, reference may be made to the above-mentioned content related to the step S2300.
  • the training module 1324 can be configured to compare the output image and the target sample, and adjust the network parameters of the generation network based on the comparison result and continue training until the reconstruction is obtained when the first preset condition is met Model.
  • training module 1324 may be configured to perform the above-mentioned step S2400, and for the detailed implementation of the training module 1324, reference may be made to the above-mentioned content related to the step S2400.
  • the training module 1324 may be configured to obtain a reconstructed model based on the comparison result between the output image and the target sample in the following manner:
  • the reconstruction model further includes a discriminator, and the discriminator is used to supervise the training of the generation network
  • the portrait super-resolution reconstruction model training device 132 further includes a building module, and the building module is used for :
  • the parameters of the discriminator are adjusted until the trained discriminator is obtained when the second preset condition is satisfied.
  • the training module 1324 can obtain the reconstructed model in the following manner:
  • the training module 1324 may be configured to construct a reconstruction model based on the discriminant information and the alignment results in the following manner:
  • a third loss function is constructed based on the discriminant information of the output image by the discriminator, and a fourth loss function is constructed based on the image difference between the output image and the target sample obtained by the pre-built portrait cognitive model;
  • the reconstructed model is obtained when the function value satisfies the first preset condition.
  • embodiments of the present application further provide a computer-readable storage medium, where the computer-readable storage medium stores machine-executable instructions 130, and when the machine-executable instructions 130 are executed, realize the super-resolution reconstruction method for portraits provided by the above embodiments Or portrait super-resolution reconstruction model training method.
  • the computer-readable storage medium can be a general-purpose storage medium, such as a removable disk, a hard disk, etc., when the computer program on the computer-readable storage medium is run, the above-mentioned portrait super-resolution reconstruction method or portrait super-resolution method can be executed. Rate reconstruction model training method.
  • the processes involved when the computer-readable storage medium and its executable instructions are executed reference may be made to the relevant descriptions in the foregoing method embodiments, which will not be described in detail here.
  • the portrait super-resolution reconstruction method, the portrait super-resolution reconstruction model training method, the device, the electronic device, and the readable storage medium perform key points on the image to be processed by using the pre-built reconstruction model. Detection, get the key points of the face, and then perform super-resolution reconstruction processing according to the key points of the face and the image features obtained based on the image to be processed to obtain the high-frequency information of the image, and use the high-frequency information of the image to restore the image to be processed to obtain the to-be-processed image. Process the super-resolution image corresponding to the image.
  • the super-resolution reconstruction of the image is realized by combining the detection of face key points and the restoration of the face, and the recognition of the obtained super-resolution image is improved, which meets the needs of users in practical applications.
  • the present application provides an image processing method, a portrait super-resolution reconstruction method, an image reconstruction model training method, a portrait super-resolution reconstruction model training method, and related devices, electronic equipment and storage media.
  • Image reconstruction model includes a feature extraction network and a sub-pixel convolution layer.
  • the feature extraction network is used to extract multi-scale features of the image to be processed and expand the image channel to obtain the reconstructed feature map, and then use the sub-pixel convolution layer to reconstruct the feature map. Zoom in to get the reconstructed image. Since the feature extraction network can extract multi-scale features and expand image channels, it is possible to obtain a better reconstruction effect without increasing the depth of the network. Image processing greatly reduces the amount of calculation and parameters; thus improving the processing speed while ensuring the reconstruction effect.
  • the image reconstruction model training method, the portrait super-resolution reconstruction model training method, and the related devices, electronic equipment and storage media can be reproduced. and can be used in a variety of industrial applications.
  • the image processing method and portrait super-resolution reconstruction method of the present application, the image reconstruction model training method and the portrait super-resolution reconstruction model training method, and related apparatuses, electronic equipment and storage media can be used for low-resolution Any apparatus for image super-resolution reconstruction of an image or sequence of images.

Abstract

The embodiments of the present application relate to the technical field of computer vision, and provide an image processing and model training method and apparatus, an electronic device, and a storage medium. Said method comprises: acquiring an image to be processed and inputting same into an image reconstruction model, the image reconstruction model comprising a feature extraction network and a sub-pixel convolution layer, using the feature extraction network to perform multi-scale feature extraction and image channel extension on said image, so as to obtain a reconstructed feature map, and then using the sub-pixel convolution layer to enlarge the reconstructed feature map, so as to obtain a reconstructed image. As the feature extraction network can extract multi-scale features and extend an image channel, a good reconstruction effect can be obtained without increasing the network depth; moreover, the sub-pixel convolution layer is used at the end of the model to perform image enlargement, and the feature extraction network processes a small-sized image, greatly reducing the amount of calculation and parameters, thereby increasing the processing speed while ensuring the reconstruction effect. In addition, the embodiments of the present application further provide a portrait super-resolution reconstruction method and apparatus, a model training method and apparatus, an electronic device, and a readable storage medium.

Description

图像处理和人像超分辨率重建及模型训练方法、装置、电子设备及存储介质Image processing and portrait super-resolution reconstruction and model training method, device, electronic device and storage medium
相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS
本公开要求于2020年09月16日提交中国国家知识产权局的申请号为202010977254.4、名称为“图像处理和模型训练方法、装置、电子设备及存储介质”的中国专利申请以及于2020年09月22日提交中国国家知识产权局的申请号为202011000670.5、名称为“人像超分辨率重建方法、模型训练方法、装置、电子设备和可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。This disclosure requires a Chinese patent application with the application number 202010977254.4, entitled "Image Processing and Model Training Method, Apparatus, Electronic Device, and Storage Medium" filed with the State Intellectual Property Office of China on September 16, 2020, and filed on September 20, 2020 The priority of the Chinese patent application with the application number 202011000670.5 and the title of "Portrait Super-resolution Reconstruction Method, Model Training Method, Apparatus, Electronic Equipment and Readable Storage Medium" submitted to the State Intellectual Property Office of China on the 22nd, the entire content of which is approved by References are incorporated in this disclosure.
技术领域technical field
本申请涉及计算机视觉技术领域,具体而言,涉及一种图像处理和人像超分辨率重建及模型训练方法、装置、电子设备及存储介质。The present application relates to the technical field of computer vision, and in particular, to a method, apparatus, electronic device, and storage medium for image processing, super-resolution reconstruction of portraits, and model training.
背景技术Background technique
图像超分辨率重建或图像超分辨率复原是指将给定的低分辨率图像或图像序列通过特定的处理恢复成相应的高分辨率图像的过程,被广泛应用于各类需要提升视频或图像质量的领域,例如,视频图像处理、医学成像、遥感成像、视频监控等。Image super-resolution reconstruction or image super-resolution restoration refers to the process of restoring a given low-resolution image or image sequence into a corresponding high-resolution image through specific processing. It is widely used in various types of videos or images that need to be improved. Quality fields, such as video image processing, medical imaging, remote sensing imaging, video surveillance, etc.
目前通过深度学习算法进行超分辨率重建时,需要使用层数足够深的网络来得到较好的重建效果,因此网络结构通常很复杂,计算量大,影响处理速度。At present, when super-resolution reconstruction is performed by deep learning algorithms, a network with a sufficient depth of layers needs to be used to obtain a better reconstruction effect. Therefore, the network structure is usually complicated, and the amount of calculation is large, which affects the processing speed.
此外,图像超分辨率重建技术还广泛应用于如人脸识别、大数据分析、安防等诸多领域,对于实现人像还原、人像识别、匹配等具有重大帮助。但是,目前在进行图像超分辨率重建过程中,例如,对人像进行超分辨率重建的过程中,通常采用的方式是针对整张图像进行重建处理,这样的方式由于并未着重于对于人眼感知较为重要的信息,从而导致重建得到的图像难以满足实际的需求。In addition, image super-resolution reconstruction technology is also widely used in many fields such as face recognition, big data analysis, security, etc., which is of great help to achieve portrait restoration, portrait recognition, and matching. However, currently in the process of image super-resolution reconstruction, for example, in the process of super-resolution reconstruction of a human portrait, the method usually adopted is to reconstruct the entire image, because this method does not focus on the human eye. Perceive the more important information, which makes the reconstructed image difficult to meet the actual needs.
发明内容SUMMARY OF THE INVENTION
本申请的实施例提供了一种图像处理和模型训练方法、装置、电子设备及存储介质,用以在保证重建效果的同时提高处理速度。Embodiments of the present application provide an image processing and model training method, apparatus, electronic device, and storage medium, so as to improve the processing speed while ensuring the reconstruction effect.
本申请的实施例还提供了一种人像超分辨率重建方法、模型训练方法、装置、电子设备和可读存储介质,其能够提高得到的超分辨率图像的认知度、符合用户需求。Embodiments of the present application also provide a portrait super-resolution reconstruction method, a model training method, an apparatus, an electronic device, and a readable storage medium, which can improve the recognition of the obtained super-resolution image and meet user requirements.
为了实现上述目的,本申请实施例采用的技术方案如下:In order to achieve the above purpose, the technical solutions adopted in the embodiments of the present application are as follows:
本申请的一些实施例提供了一种图像处理方法,所述方法可以包括:Some embodiments of the present application provide an image processing method, the method may include:
获取待处理图像;Get the image to be processed;
将所述待处理图像输入图像重建模型,利用所述图像重建模型的特征提取网络对所述待处理图像进行多尺度特征提取及扩展图像通道,得到重建特征图;Inputting the to-be-processed image into an image reconstruction model, and using the feature extraction network of the image reconstruction model to perform multi-scale feature extraction on the to-be-processed image and expand image channels to obtain a reconstructed feature map;
利用所述图像重建模型的子像素卷积层对所述重建特征图进行放大,得到重建图像。The reconstructed feature map is enlarged by using the sub-pixel convolution layer of the image reconstruction model to obtain a reconstructed image.
在可选的实施方式中,所述特征提取网络可以包括卷积层、多个级联块和多个第一卷积层,多个所述级联块和多个所述第一卷积层交替设置,所述特征提取网络可以采用全局级联结构;In an optional embodiment, the feature extraction network may include a convolutional layer, a plurality of concatenated blocks and a plurality of first convolutional layers, a plurality of the concatenated blocks and a plurality of the first convolutional layers Alternately set, the feature extraction network can adopt a global cascade structure;
所述利用所述图像重建模型的特征提取网络对所述待处理图像进行多尺度特征提取,得到重建特征图的步骤,可以包括:The step of using the feature extraction network of the image reconstruction model to perform multi-scale feature extraction on the to-be-processed image to obtain a reconstructed feature map may include:
将所述待处理图像输入所述卷积层进行卷积处理,得到初始特征图;Inputting the image to be processed into the convolution layer for convolution processing to obtain an initial feature map;
将所述初始特征图作为第一个所述级联块的输入、以及将第N-1个所述第一卷积层的输出作为第N个所述级联块的输入,利用所述级联块进行多尺度特征提取,输出中间特征图;Using the initial feature map as the input of the first of the concatenated blocks and the output of the N-1th first convolutional layer as the input of the Nth of the concatenated blocks, using the stage The multi-scale feature extraction is performed in the joint block, and the intermediate feature map is output;
将所述初始特征图和第N个所述第一卷积层前每个所述级联块输出的所述中间特征图进行通道叠加,并在叠加后输入第N个所述第一卷积层进行卷积处理;Perform channel stacking on the initial feature map and the intermediate feature map output by each of the concatenated blocks before the Nth first convolution layer, and input the Nth first convolution layer after stacking layer for convolution processing;
将最后一个所述第一卷积层的输出作为所述重建特征图。The output of the last first convolutional layer is used as the reconstructed feature map.
在可选的实施方式中,所述级联块的数量可以为3至5,所述第一卷积层的数量可以为3至5。In an optional implementation manner, the number of the concatenated blocks may be 3 to 5, and the number of the first convolutional layers may be 3 to 5.
在可选的实施方式中,所述级联块可以包括多个残差块和多个第二卷积层,多个所述残差块和多个所述第二卷积层交替设置,所述级联块可以采用局部级联结构;In an optional embodiment, the concatenated block may include a plurality of residual blocks and a plurality of second convolution layers, and the plurality of the residual blocks and the plurality of the second convolution layers are alternately arranged, so The cascading block can adopt a local cascading structure;
所述利用所述级联块进行多尺度特征提取,输出中间特征图的步骤,可以包括:The step of using the cascaded blocks to perform multi-scale feature extraction and outputting an intermediate feature map may include:
将所述级联块的输入作为第一个所述残差块的输入、以及将第N-1个所述第二卷积层的输出作为第N个所述残差块的输入,利用所述残差块学习残差特征,得到残差特征图;Taking the input of the concatenated block as the input of the first residual block, and taking the output of the N-1th second convolutional layer as the input of the Nth residual block, using the The residual block learns the residual features, and obtains the residual feature map;
将所述级联块的输入和第N个所述第二卷积层前每个所述残差块的输出进行通道叠加,并在叠加后输入第N个所述第二卷积层进行卷积处理;The input of the concatenated block and the output of each of the residual blocks before the Nth second convolutional layer are channel-stacked, and input to the Nth second convolutional layer for convolution after stacking accumulated processing;
将最后一个所述第二卷积层的输出作为所述中间特征图。The output of the last second convolutional layer is used as the intermediate feature map.
在可选的实施方式中,所述残差块的数量可以为3至5,所述第二卷积层的数量可以为3至5。In an optional implementation manner, the number of the residual blocks may be 3 to 5, and the number of the second convolutional layers may be 3 to 5.
在可选的实施方式中,所述残差块可以包括分组卷积层、第三卷积层和第四卷积层,所述分组卷积层采用ReLu激活函数,所述分组卷积层和所述第三卷积层连接形成残差路径,所述残差块可以采用局部跳跃连接结构;In an optional embodiment, the residual block may include a grouped convolutional layer, a third convolutional layer and a fourth convolutional layer, the grouped convolutional layer adopts a ReLu activation function, the grouped convolutional layer and The third convolutional layer is connected to form a residual path, and the residual block can adopt a local skip connection structure;
所述利用所述残差块学习残差特征,得到残差特征图的步骤,可以包括:The step of using the residual block to learn residual features to obtain a residual feature map may include:
将所述残差块的输入作为所述分组卷积层的输入,通过所述残差路径提取特征;The input of the residual block is used as the input of the grouped convolution layer, and features are extracted through the residual path;
将所述残差块的输入和所述第三卷积层的输出进行特征融合,并在融合后输入所述第四卷积层进行卷积处理,输出所述残差特征图。Feature fusion is performed between the input of the residual block and the output of the third convolution layer, and after fusion, the input is input to the fourth convolution layer for convolution processing, and the residual feature map is output.
在可选的实施方式中,所述利用所述图像重建模型的子像素卷积层对所述重建特征图进行放大,得到重建图像的步骤,可以包括:In an optional embodiment, the step of using the sub-pixel convolution layer of the image reconstruction model to amplify the reconstructed feature map to obtain a reconstructed image may include:
利用所述子像素卷积层调整所述重建特征图中的像素位置,得到所述重建图像。Using the sub-pixel convolution layer to adjust the pixel positions in the reconstructed feature map to obtain the reconstructed image.
本申请的另一些实施例还提供了一种图像重建模型训练方法,所述方法可以包括:Other embodiments of the present application also provide an image reconstruction model training method, the method may include:
获取训练样本,所述训练样本包括低分辨率图像和高分辨率图像,所述低分辨率图像是对所述高分辨率图像进行下采样得到的;acquiring training samples, where the training samples include low-resolution images and high-resolution images, and the low-resolution images are obtained by down-sampling the high-resolution images;
将所述低分辨率图像输入预先构建的图像重建模型,所述图像重建模型包括特征提取网络和子像素卷积层;Inputting the low-resolution image into a pre-built image reconstruction model, the image reconstruction model includes a feature extraction network and a sub-pixel convolution layer;
利用所述特征提取网络对所述低分辨率图像进行多尺度特征提取及扩展图像通道,得到训练特征图;Use the feature extraction network to perform multi-scale feature extraction on the low-resolution image and expand the image channel to obtain a training feature map;
利用所述子像素卷积层对所述训练特征图进行放大,得到训练重建图像;Using the sub-pixel convolution layer to amplify the training feature map to obtain a training reconstructed image;
基于所述训练重建图像、所述高分辨率图像和预设的目标函数对所述图像重建模型进行反向传播训练,得到训练后的图像重建模型。Back-propagation training is performed on the image reconstruction model based on the training reconstructed image, the high-resolution image and the preset objective function to obtain a trained image reconstruction model.
在可选的实施方式中,所述目标函数可以为L2损失函数;In an optional embodiment, the objective function may be an L2 loss function;
所述基于所述训练重建图像、所述高分辨率图像和预设的目标函数对所述图像重建模型进行反向传播训练,得到训练后的图像重建模型的步骤:The step of performing back-propagation training on the image reconstruction model based on the training reconstruction image, the high-resolution image and the preset objective function to obtain the trained image reconstruction model:
基于所述训练重建图像、所述高分辨率图像和所述L2损失函数对所述图像重建模型进行反向传播训练,以对所述图像重建模型的参数进行调整,直至达到预设的训练完成条件,得到训练后的图像重建模型。Back-propagation training is performed on the image reconstruction model based on the training reconstructed image, the high-resolution image and the L2 loss function to adjust the parameters of the image reconstruction model until the preset training is completed condition to obtain the image reconstruction model after training.
在可选的实施方式中,所述图像重建模型训练方法还可以包括:In an optional embodiment, the image reconstruction model training method may further include:
对所述训练后的图像重建模型进行剪枝,以保留长线级联及删除短线级联。The trained image reconstruction model is pruned to preserve long-line cascades and delete short-line cascades.
在可选的实施方式中,所述将所述低分辨率图像输入预先构建的图像重建模型的步骤之前,所述方法还可以包括:In an optional embodiment, before the step of inputting the low-resolution image into a pre-built image reconstruction model, the method may further include:
对所述低分辨率图像进行自减平均值处理,以突出所述低分辨率图像的纹理细节。A self-reducing average is performed on the low-resolution image to highlight texture details of the low-resolution image.
在可选的实施方式中,所述将所述低分辨率图像输入预先构建的图像重建模型的步骤之前,所述方法还可以包括:In an optional embodiment, before the step of inputting the low-resolution image into a pre-built image reconstruction model, the method may further include:
对所述低分辨率图像进行翻转对称处理,得到至少一个处理后的低分辨率图像;Perform flip symmetry processing on the low-resolution image to obtain at least one processed low-resolution image;
所述将所述低分辨率图像输入预先构建的图像重建模型的步骤,可以包括:The step of inputting the low-resolution image into a pre-built image reconstruction model may include:
将所述至少一个处理后的低分辨率图像输入所述图像重建模型;inputting the at least one processed low-resolution image into the image reconstruction model;
所述利用所述特征提取网络对所述低分辨率图像进行多尺度特征提取,得到训练特征图的步骤,可以包括:The step of using the feature extraction network to perform multi-scale feature extraction on the low-resolution image to obtain a training feature map may include:
利用所述特征提取网络对所述至少一个处理后的低分辨率图像进行多尺度特征提取,得到至少一个辅助特征图;Use the feature extraction network to perform multi-scale feature extraction on the at least one processed low-resolution image to obtain at least one auxiliary feature map;
对至少一个辅助特征图进行反翻转对称处理,并在反翻转对称处理后求平均值,得到所述训练特征图。Perform anti-flip symmetry processing on at least one auxiliary feature map, and obtain the training feature map by averaging after the anti-flip symmetry processing.
本申请的又一些实施例还提供了一种图像处理装置,所述装置可以包括:Still other embodiments of the present application also provide an image processing apparatus, and the apparatus may include:
图像获取模块,可以被配置成用于获取待处理图像;an image acquisition module, which can be configured to acquire an image to be processed;
第一执行模块,可以被配置成用于将所述待处理图像输入图像重建模型,利用所述图像重建模型的特征提取网络对所述待处理图像进行多尺度特征提取及扩展图像通道,得到重建特征图;The first execution module can be configured to input the image to be processed into an image reconstruction model, and use the feature extraction network of the image reconstruction model to perform multi-scale feature extraction on the to-be-processed image and expand image channels to obtain a reconstruction feature map;
第二执行模块,可以被配置成用于利用所述图像重建模型的子像素卷积层对所述重建特征图进行放大,得到重建图像。The second execution module may be configured to use the sub-pixel convolution layer of the image reconstruction model to amplify the reconstructed feature map to obtain a reconstructed image.
本申请的又一些实施例还提供了一种图像重建模型训练装置,所述装置可以包括:Still other embodiments of the present application also provide an image reconstruction model training apparatus, the apparatus may include:
样本获取模块,可以被配置成用于获取训练样本,所述训练样本包括低分辨率图像和高分辨率图像,所述低分辨率图像是对所述高分辨率图像进行下采样得到的;a sample acquisition module, which can be configured to acquire training samples, where the training samples include low-resolution images and high-resolution images, and the low-resolution images are obtained by down-sampling the high-resolution images;
第一处理模块,可以被配置成用于将所述低分辨率图像输入预先构建的图像重建模型,所述图像重建模型包括特征提取网络和子像素卷积层;a first processing module, which can be configured to input the low-resolution image into a pre-built image reconstruction model, the image reconstruction model including a feature extraction network and a sub-pixel convolution layer;
第二处理模块,可以被配置成用于利用所述特征提取网络对所述低分辨率图像进行多尺度特征提取及扩展图像通道,得到训练特征图;The second processing module can be configured to use the feature extraction network to perform multi-scale feature extraction on the low-resolution image and expand image channels to obtain a training feature map;
第三处理模块,可以被配置成用于利用所述子像素卷积层对所述训练特征图进行放大,得到训练重建图像;A third processing module may be configured to use the sub-pixel convolutional layer to amplify the training feature map to obtain a training reconstructed image;
第四处理模块,可以被配置成用于基于所述训练重建图像、所述高分辨率图像和预设的目标函数对所述图像重建模型进行反向传播训练,得到训练后的图像重建模型。The fourth processing module may be configured to perform back-propagation training on the image reconstruction model based on the training reconstructed image, the high-resolution image and a preset objective function to obtain a trained image reconstruction model.
相对现有技术,本申请实施例提供的一种图像处理和模型训练方法、装置、电子设备及存储介质,通过获取待处理图像并输入图像重建模型,图像重建模型包括特征提取网络和子像素卷积层,先利用特征提取网络对待处理图像进行多尺度特征提取及扩展图像通道得到重建特征图,再利用子像素卷积层对重建特征图进行放大得到重建图像。由于特征提取网络能够提取多尺度特征和扩展图像通道,因此,不需要增加网络深度就能够得到较好的重建效果;同时,模型末端采用子像素卷积层做图像放大,特征提取网络以小尺寸图像做处理,大幅减少了计算量和参数量;从而在保证重建效果的同时提高了处理速度。Compared with the prior art, an image processing and model training method, device, electronic device, and storage medium provided by the embodiments of the present application are obtained by acquiring an image to be processed and inputting an image reconstruction model. The image reconstruction model includes a feature extraction network and sub-pixel convolution. First, the feature extraction network is used to extract the multi-scale feature of the image to be processed and expand the image channel to obtain the reconstructed feature map, and then use the sub-pixel convolution layer to amplify the reconstructed feature map to obtain the reconstructed image. Since the feature extraction network can extract multi-scale features and expand image channels, it is possible to obtain a better reconstruction effect without increasing the depth of the network. Image processing greatly reduces the amount of calculation and parameters; thus improving the processing speed while ensuring the reconstruction effect.
本申请的一些实施例提供了一种人像超分辨率重建方法,所述方法可以包括:Some embodiments of the present application provide a method for super-resolution reconstruction of a portrait, the method may include:
利用图像重建模型对待处理图像进行关键点检测,得到人脸关键点;Use the image reconstruction model to detect the key points of the image to be processed, and obtain the key points of the face;
根据所述人脸关键点和基于所述待处理图像得到的图像特征进行超分辨率重建处理,得到图像高频信息;Perform super-resolution reconstruction processing according to the face key points and the image features obtained based on the to-be-processed image to obtain image high-frequency information;
利用所述图像高频信息对所述待处理图像进行复原处理,得到所述待处理图像对应的超分辨率图像。Perform restoration processing on the to-be-processed image by using the high-frequency information of the image to obtain a super-resolution image corresponding to the to-be-processed image.
在可选的实施方式中,使用如上所述的图像处理方法来进行超分辨率重建处理。In an alternative embodiment, the super-resolution reconstruction process is performed using the image processing method described above.
在可选的实施方式中,所述关键点检测、超分辨率重建处理及复原处理可以包括多轮迭代处理,所述待处理图像为未经处理的待处理图像,或前一轮迭代中经过所述关键点检测、超分辨率重建处理以及复原处理后得到的超分辨率图像。In an optional implementation manner, the key point detection, super-resolution reconstruction processing and restoration processing may include multiple rounds of iterative processing, and the to-be-processed image is an unprocessed to-be-processed image, or an image that has been processed in a previous round of iterations. The super-resolution image obtained after the key point detection, super-resolution reconstruction processing and restoration processing.
在可选的实施方式中,所述人脸关键点可以包括多个,所述利用所述图像高频信息对所述待处理图像进行复原处理,得到所述待处理图像对应的超分辨率图像的步骤,可以包括:In an optional implementation manner, the face key points may include multiple, and the image to be processed is restored by using the high-frequency information of the image to obtain a super-resolution image corresponding to the to-be-processed image steps that can include:
利用预先构建的人像认知模型对所述待处理图像进行处理,输出各所述人脸关键点的位置信息;Process the to-be-processed image by using a pre-built portrait cognitive model, and output the position information of each of the key points of the face;
基于各所述人脸关键点的位置信息以及所述图像高频信息,对所述待处理图像进行复原处理,得到所述待处理图像对应的超分辨率图像。Based on the position information of each of the face key points and the high-frequency information of the image, restoration processing is performed on the to-be-processed image to obtain a super-resolution image corresponding to the to-be-processed image.
在可选的实施方式中,所述基于各所述人脸关键点的位置信息以及所述图像高频信息,对所述待处理图像进行复原处理,得到所述待处理图像对应的超分辨率图像的步骤,可以包括:In an optional implementation manner, performing restoration processing on the to-be-processed image based on the position information of each of the face key points and the high-frequency information of the image to obtain a super-resolution corresponding to the to-be-processed image Image steps can include:
获取各所述人脸关键点对应的复原属性;obtaining the restoration attributes corresponding to each of the face key points;
根据各所述人脸关键点以及其对应的位置信息、图像高频信息、复原属性,对所述待处理图像中对应人脸关键点进行复原处理。According to each of the face key points and their corresponding position information, image high-frequency information, and restoration attributes, restoration processing is performed on the corresponding face key points in the to-be-processed image.
在可选的实施方式中,所述重建模型可以包括判别器和生成网络,所述生成网络为在训练好的判别器的监督下,利用训练样本进行训练后获得。In an optional embodiment, the reconstructed model may include a discriminator and a generation network, and the generation network is obtained after training with training samples under the supervision of the trained discriminator.
在可选的实施方式中,所述人脸关键点可以包括左眼、右眼、鼻子、嘴巴及下巴轮廓。In an optional embodiment, the face key points may include the contours of the left eye, the right eye, the nose, the mouth and the chin.
本申请的另一些实施例提供一种人像超分辨率重建模型训练方法,所述方法可以包括:Other embodiments of the present application provide a method for training a portrait super-resolution reconstruction model, the method may include:
获取训练样本以及所述训练样本对应的目标样本;obtaining a training sample and a target sample corresponding to the training sample;
利用构建的生成网络对所述训练样本进行关键点检测,得到训练关键点;Use the constructed generating network to perform key point detection on the training sample to obtain training key points;
基于所述训练关键点和所述训练样本进行超分辨率重建处理和复原处理,得到输出图像;Perform super-resolution reconstruction processing and restoration processing based on the training key points and the training samples to obtain an output image;
比对所述输出图像和所述目标样本,并基于比对结果对所述生成网络进行网络参数调整后继续训练,直至满足第一预设条件时得到重建模型。Comparing the output image and the target sample, and adjusting the network parameters of the generating network based on the comparison result, the training continues until a reconstructed model is obtained when a first preset condition is satisfied.
在可选的实施方式中,所述比对所述输出图像和所述目标样本,并基于比对结果对所述生成网络进行网络参数调整后继续训练,直至满足第一预设条件时得到重建模型的步骤,可以包括:In an optional implementation manner, the output image and the target sample are compared, and the generation network is adjusted based on the comparison result after network parameters are adjusted, and the training is continued until the reconstruction is obtained when a first preset condition is satisfied. The steps of the model can include:
基于所述输出图像的像素信息和所述目标样本的像素信息之间的差异,构建第一损失函数;constructing a first loss function based on the difference between the pixel information of the output image and the pixel information of the target sample;
基于所述输出图像中各个人脸关键点和所述目标样本中对应人脸关键点之间的差异,构建第二损失函数;Construct a second loss function based on the difference between each face key point in the output image and the corresponding face key point in the target sample;
比对所述输出图像和所述目标样本,并基于比对结果对所述生成网络进行网络参数调整后继续训练,直至所述第一损失函数和所述第二损失函数加权后的函数值满足第一预设条件时得到重建模型。Comparing the output image and the target sample, and adjusting the network parameters of the generating network based on the comparison result, continue training until the weighted function values of the first loss function and the second loss function satisfy The reconstructed model is obtained at the first preset condition.
在可选的实施方式中,所述重建模型还包括判别器,所述判别器用于监督所述生成网络的训练,所述方法还可以包括:In an optional embodiment, the reconstruction model further includes a discriminator, and the discriminator is used to supervise the training of the generation network, and the method may further include:
构建判别器,利用所述判别器对所述输出图像以及所述输出图像对应的目标样本进行判别处理;constructing a discriminator, and using the discriminator to discriminate the output image and the target sample corresponding to the output image;
根据得到的判别结果对所述判别器进行参数调整,直至满足第二预设条件时得到训练好的判别器。According to the obtained discrimination result, the parameters of the discriminator are adjusted until the trained discriminator is obtained when the second preset condition is satisfied.
在可选的实施方式中,所述比对所述输出图像和所述目标样本,并基于比对结果对所述生成网络进行网络参数调整后继续训练,直至满足第一预设条件时得到重建模型的步骤,可以包括:In an optional implementation manner, the output image and the target sample are compared, and the generation network is adjusted based on the comparison result after network parameters are adjusted, and the training is continued until the reconstruction is obtained when a first preset condition is satisfied. The steps of the model can include:
将所述输出图像输入至训练好的所述判别器得到判别信息;Inputting the output image to the trained discriminator to obtain discriminant information;
比对所述输出图像和所述目标样本,得到比对结果;Comparing the output image and the target sample to obtain a comparison result;
根据所述判别信息和所述比对结果对所述生成网络进行网络参数调整后继续训练,直至满足第一预设条件时得到重建模型。After adjusting the network parameters of the generating network according to the discrimination information and the comparison result, continue training until a reconstructed model is obtained when the first preset condition is satisfied.
在可选的实施方式中,所述根据所述判别信息和所述比对结果对所述生成网络进行网络参数调整后继续训练,直至满足第一预设条件时得到重建模型的步骤,可以包括:In an optional implementation manner, the step of performing network parameter adjustment on the generation network according to the discrimination information and the comparison result and continuing to train until the reconstructed model is obtained when a first preset condition is satisfied may include: :
基于所述输出图像的像素信息和所述目标样本的像素信息之间的差异,构建第一损失函数;constructing a first loss function based on the difference between the pixel information of the output image and the pixel information of the target sample;
基于所述输出图像中各个人脸关键点和所述目标样本中对应人脸关键点之间的差异,构建第二损失函数;Construct a second loss function based on the difference between each face key point in the output image and the corresponding face key point in the target sample;
基于所述判别器对所述输出图像的判别信息构建第三损失函数,并基于预先构建的人像认知模型得到的所述输出图像和所述目标样本之间的图像差异构建第四损失函数;A third loss function is constructed based on the discriminant information of the output image by the discriminator, and a fourth loss function is constructed based on the image difference between the output image and the target sample obtained by the pre-built portrait cognitive model;
根据所述判别信息和所述比对结果对所述生成网络进行网络参数调整后继续训练,直至所述第一损失函数、第二损失函数、第三损失函数和第四损失函数加权后得到的函数值满足第一预设条件时得到所述重建模型。After adjusting the network parameters of the generating network according to the discriminant information and the comparison result, continue training until the first loss function, the second loss function, the third loss function and the fourth loss function are weighted. The reconstructed model is obtained when the function value satisfies the first preset condition.
本申请的又一些实施例提供了一种人像超分辨率重建装置,所述装置可以包括:Still other embodiments of the present application provide a human portrait super-resolution reconstruction device, the device may include:
检测模块,可以被配置成用于利用预先构建的重建模型对待处理图像进行关键点检测,得到人脸关键点;The detection module can be configured to use the pre-built reconstruction model to perform key point detection on the image to be processed to obtain face key points;
处理模块,可以被配置成用于根据所述人脸关键点和基于所述待处理图像得到的图像特征进行超分辨率重建处理,得到图像高频信息;a processing module, which can be configured to perform super-resolution reconstruction processing according to the face key points and image features obtained based on the to-be-processed image to obtain high-frequency image information;
复原模块,可以被配置成用于利用所述图像高频信息对所述待处理图像进行复原处理,得到所述待处理图像对应的超分辨率图像。The restoration module may be configured to perform restoration processing on the to-be-processed image by using the high-frequency information of the image to obtain a super-resolution image corresponding to the to-be-processed image.
在可选的实施方式中,所述人像超分辨率重建装置还可以包括:根据如前所述的图像处理装置,所述图像处理装置可以被配置成用于进行超分辨率重建处理。In an optional embodiment, the apparatus for super-resolution reconstruction of a human portrait may further include: according to the above-mentioned image processing apparatus, the image processing apparatus may be configured to perform super-resolution reconstruction processing.
本申请的又一些实施例提供了一种人像超分辨率重建模型训练装置,所述人像超分辨率重建装置可以包括:Still other embodiments of the present application provide a human portrait super-resolution reconstruction model training device, and the human portrait super-resolution reconstruction device may include:
获取模块,可以被配置成用于获取训练样本以及所述训练样本对应的目标样本;an acquisition module, which can be configured to acquire training samples and target samples corresponding to the training samples;
关键点获得模块,可以被配置成用于利用构建的生成网络对所述训练样本进行关键点检测,得到训练关键点;a key point obtaining module, which can be configured to perform key point detection on the training sample by using the constructed generation network to obtain training key points;
输出图像获得模块,可以被配置成用于基于所述训练关键点和所述训练样本进行超分辨率重建处理和复原处理,得到输出图像;an output image obtaining module, which can be configured to perform super-resolution reconstruction processing and restoration processing based on the training key points and the training samples to obtain an output image;
训练模块,可以被配置成用于比对所述输出图像和所述目标样本,并基于比对结果对所述生成网络进行网络参数调整后继续训练,直 至满足第一预设条件时得到重建模型。A training module, which can be configured to compare the output image and the target sample, and adjust the network parameters of the generation network based on the comparison result and continue training until a reconstructed model is obtained when the first preset condition is satisfied .
本申请的又一些实施例提供了一种电子设备,所述电子设备可以包括:一个或多个处理器;一个或多个存储介质,用于存储一个或多个机器可执行指令,当所述一个或多个机器可执行指令被所述一个或多个处理器执行时,使得所述一个或多个处理器实现根据一些实施方式所述的图像处理方法,或者根据另一些实施方式所述的图像重建模型训练方法,或者根据又一些实施方式所述的人像超分辨率重建方法,或者根据又一些实施方式所述的人像超分辨率重建模型训练方法。Still other embodiments of the present application provide an electronic device, the electronic device may include: one or more processors; one or more storage media for storing one or more machine-executable instructions, when the One or more machine-executable instructions, when executed by the one or more processors, cause the one or more processors to implement the image processing method according to some embodiments, or the image processing method according to other embodiments An image reconstruction model training method, or a portrait super-resolution reconstruction method according to further embodiments, or a portrait super-resolution reconstruction model training method according to further embodiments.
本申请的又一些实施例提供了一种计算机可读存储介质,所述计算机可读存储介质存储有机器可执行指令,所述机器可执行指令被执行时实现根据一些实施方式所述的图像处理方法,或者根据另一些实施方式所述的图像重建模型训练方法,或者根据又一些实施方式所述的人像超分辨率重建方法,或者根据又一些实施方式所述的人像超分辨率重建模型训练方法。Still other embodiments of the present application provide a computer-readable storage medium storing machine-executable instructions that, when executed, implement image processing according to some embodiments method, or the image reconstruction model training method according to other embodiments, or the portrait super-resolution reconstruction method according to still other embodiments, or the portrait super-resolution reconstruction model training method according to still some embodiments .
本申请实施例的有益效果包括,例如:The beneficial effects of the embodiments of the present application include, for example:
本申请提供的人像超分辨率重建方法、模型训练方法、装置、电子设备和可读存储介质,通过利用预先构建的重建模型对待处理图像进行关键点检测,得到人脸关键点,再根据人脸关键点和基于待处理图像得到的图像特征进行超分辨率重建处理,得到图像高频信息,利用图像高频信息对待处理图像进行复原处理,得到待处理图像对应的超分辨率图像。本申请中,结合人脸关键点检测以及人脸恢复,实现图像的超分辨率重建,提高得到的超分辨率图像的认知度,符合实际应用中用户的需求。In the portrait super-resolution reconstruction method, model training method, device, electronic device and readable storage medium provided by this application, the key points of the image to be processed are detected by using the pre-built reconstruction model to obtain the key points of the face, and then according to the face The key points and the image features obtained based on the image to be processed are subjected to super-resolution reconstruction processing to obtain high-frequency information of the image. In the present application, the super-resolution reconstruction of the image is realized by combining the detection of the key points of the face and the restoration of the face, and the recognition of the obtained super-resolution image is improved, which meets the needs of users in practical applications.
附图说明Description of drawings
为了更清楚地说明本申请实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,应当理解,以下附图仅示出了本申请的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。In order to illustrate the technical solutions of the embodiments of the present application more clearly, the following drawings will briefly introduce the drawings that need to be used in the embodiments. It should be understood that the following drawings only show some embodiments of the present application, and therefore do not It should be regarded as a limitation of the scope, and for those of ordinary skill in the art, other related drawings can also be obtained according to these drawings without any creative effort.
图1示出了本申请实施例提供的图像处理方法的一种应用场景图。FIG. 1 shows an application scenario diagram of the image processing method provided by the embodiment of the present application.
图2示出了本申请实施例提供的图像处理方法的一种流程示意图。FIG. 2 shows a schematic flowchart of an image processing method provided by an embodiment of the present application.
图3示出了本申请实施例提供的图像重建模型的一种示例图。FIG. 3 shows an example diagram of an image reconstruction model provided by an embodiment of the present application.
图4示出了本申请实施例提供的级联块的一种示例图。FIG. 4 shows an example diagram of a cascaded block provided by an embodiment of the present application.
图5示出了本申请实施例提供的残差块的一种示例图。FIG. 5 shows an example diagram of a residual block provided by an embodiment of the present application.
图6示出了本申请实施例提供的图像重建模型的另一种示例图。FIG. 6 shows another example diagram of an image reconstruction model provided by an embodiment of the present application.
图7示出了本申请实施例提供的一种图像处理结果展示图。FIG. 7 shows an image processing result presentation diagram provided by an embodiment of the present application.
图8为本申请实施例提供的人像超分辨率重建方法的流程图。FIG. 8 is a flowchart of a method for super-resolution reconstruction of a portrait provided by an embodiment of the present application.
图9为本申请实施例提供的人像超分辨率重建方法的处理流程的示意图。FIG. 9 is a schematic diagram of a processing flow of a method for super-resolution reconstruction of a portrait provided by an embodiment of the present application.
图10为本申请实施例提供的人像超分辨率重建方法的处理流程的另一示意图。FIG. 10 is another schematic diagram of a processing flow of the method for super-resolution reconstruction of a portrait provided by an embodiment of the present application.
图11为本申请实施例提供的人像超分辨率重建方法中,获得超分辨率图像的方法的流程图。FIG. 11 is a flowchart of a method for obtaining a super-resolution image in the method for super-resolution reconstruction of a portrait provided by an embodiment of the present application.
图12为本申请实施例提供的人像超分辨率重建方法的处理流程的又一示意图。FIG. 12 is another schematic diagram of a processing flow of the method for super-resolution reconstruction of a portrait provided by an embodiment of the present application.
图13示出了本申请实施例提供的图像重建模型训练方法的一种流程示意图。FIG. 13 shows a schematic flowchart of an image reconstruction model training method provided by an embodiment of the present application.
图14为本申请实施例提供的人像超分辨率重建模型训练方法的流程图。FIG. 14 is a flowchart of a method for training a portrait super-resolution reconstruction model provided by an embodiment of the present application.
图15为本申请实施例提供的人像超分辨率重建模型训练方法中,获得重建模型的方法的流程图之一。FIG. 15 is one of the flowcharts of a method for obtaining a reconstructed model in the method for training a super-resolution reconstruction model of a portrait provided by an embodiment of the present application.
图16为本申请实施例提供的人像超分辨率重建模型训练方法中,获得重建模型的方法的流程图之二。FIG. 16 is the second flowchart of a method for obtaining a reconstructed model in the method for training a super-resolution reconstruction model of a portrait provided by an embodiment of the present application.
图17(a)至图17(c)分别为插值处理方法、未加入判别器的方法以及加入判别器的方法得到的输出图像的示意图。17(a) to 17(c) are schematic diagrams of output images obtained by the interpolation processing method, the method without adding the discriminator, and the method adding the discriminator, respectively.
图18示出了本申请实施例提供的图像处理装置的一种方框示意图。FIG. 18 shows a schematic block diagram of an image processing apparatus provided by an embodiment of the present application.
图19示出了本申请实施例提供的图像重建模型训练装置的一种方框示意图。FIG. 19 shows a schematic block diagram of an apparatus for training an image reconstruction model provided by an embodiment of the present application.
图20示出了本申请实施例提供的电子设备的一种方框示意图。FIG. 20 shows a schematic block diagram of an electronic device provided by an embodiment of the present application.
图21为本申请实施例提供的电子设备的结构框图。FIG. 21 is a structural block diagram of an electronic device provided by an embodiment of the present application.
图22为本申请实施例提供的人像超分辨率重建装置的功能模块框图。FIG. 22 is a block diagram of functional modules of an apparatus for super-resolution reconstruction of a portrait provided by an embodiment of the present application.
图23为本申请实施例提供的人像超分辨率重建模型训练装置的功能模块框图。FIG. 23 is a block diagram of functional modules of an apparatus for training a super-resolution reconstruction model of a portrait provided by an embodiment of the present application.
图标:10-电子设备;11-处理器;12-存储器;13-总线;20-第一终端;30-第二终端;40-网络;50-服务器;100-图像处理装置;110-图像获取模块;120-第一执行模块;130-第二执行模块;200-模型训练装置;210-样本获取模块;220-第一处理模块;230-第二处理模块;240-第三处理模块;250-第四处理模块。Icons: 10-electronic equipment; 11-processor; 12-memory; 13-bus; 20-first terminal; 30-second terminal; 40-network; 50-server; 100-image processing device; 110-image acquisition module; 120-first execution module; 130-second execution module; 200-model training device; 210-sample acquisition module; 220-first processing module; 230-second processing module; 240-third processing module; 250 - Fourth processing module.
2110-存储介质;2120-处理器;2130-机器可执行指令;131-人像超分辨率重建装置;1311-检测模块;1312-处理模块;1313-复原模块;132-模型训练装置;1321-获取模块;1322-关键点获得模块;1323-输出图像获得模块;1324-训练模块;140-通信接口。2110-storage medium; 2120-processor; 2130-machine executable instructions; 131-portrait super-resolution reconstruction device; 1311-detection module; 1312-processing module; 1313-restoration module; 132-model training device; 1321-acquisition module; 1322-key point acquisition module; 1323-output image acquisition module; 1324-training module; 140-communication interface.
具体实施方式detailed description
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本申请实施例的组件 可以以各种不同的配置来布置和设计。In order to make the purposes, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments It is a part of the embodiments of the present application, but not all of the embodiments. The components of the embodiments of the present application generally described and illustrated in the drawings herein may be arranged and designed in a variety of different configurations.
因此,以下对在附图中提供的本申请的实施例的详细描述并非旨在限制要求保护的本申请的范围,而是仅仅表示本申请的选定实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。Thus, the following detailed description of the embodiments of the application provided in the accompanying drawings is not intended to limit the scope of the application as claimed, but is merely representative of selected embodiments of the application. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of the present application.
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。It should be noted that like numerals and letters refer to like items in the following figures, so once an item is defined in one figure, it does not require further definition and explanation in subsequent figures.
在本申请的描述中,需要说明的是,若出现术语“第一”、“第二”等仅用于区分描述,而不能理解为指示或暗示相对重要性。需要说明的是,在不冲突的情况下,本申请的实施例中的特征可以相互结合。In the description of the present application, it should be noted that if the terms "first", "second" etc. appear, they are only used to distinguish the description, and should not be construed as indicating or implying relative importance. It should be noted that the features in the embodiments of the present application may be combined with each other under the condition of no conflict.
下面将结合本申请实施例中附图,对本申请实施例中的技术方案进行清楚、完整地描述。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application.
请参照图1,图1示出了本申请实施例提供的图像处理方法的一种应用场景图,包括第一终端20、第二终端30、网络40及服务器50,第一终端20和第二终端30均通过网络40连接到服务器50。第一终端20和第二终端30可以是移动终端,在移动终端上可以安装有各种应用程序(Application,App),例如可以是视频播放App、即时通讯App、视频/图像采集App、购物App等。网络40可以是广域网或者局域网,又或者是二者的组合,使用无线链路实现数据传输。Please refer to FIG. 1 . FIG. 1 shows an application scenario diagram of the image processing method provided by the embodiment of the present application, including a first terminal 20 , a second terminal 30 , a network 40 and a server 50 , the first terminal 20 and the second terminal 20 . The terminals 30 are each connected to the server 50 through the network 40 . The first terminal 20 and the second terminal 30 may be mobile terminals, and various application programs (Application, App) may be installed on the mobile terminals, for example, a video playing App, an instant messaging App, a video/image capturing App, and a shopping App. Wait. The network 40 may be a wide area network or a local area network, or a combination of the two, using a wireless link for data transmission.
第一终端20和第二终端30可以是任何具有屏幕显示功能的移动终端,例如,智能手机、笔记本电脑、平板电脑、台式计算机、智能电视等。The first terminal 20 and the second terminal 30 may be any mobile terminals having a screen display function, for example, a smart phone, a notebook computer, a tablet computer, a desktop computer, a smart TV, and the like.
第一终端20可以将视频文件或者图片上传至服务器50,服务器50在接收到第一终端20上传的视频文件或者图片后,可以将该视频文件或者图片进行存储。当用户通过第二终端30观看视频或者查看图片时,第二终端30可以向服务器50请求该视频文件或者图片,服务器50可以向第二终端30返回该视频文件或者图片。通常,为了提高传输速度,会对该视频文件或图片进行压缩处理,故该视频文件或图片的分辨率较低。The first terminal 20 may upload the video file or picture to the server 50, and the server 50 may store the video file or picture after receiving the video file or picture uploaded by the first terminal 20. When the user watches a video or views a picture through the second terminal 30 , the second terminal 30 can request the video file or picture from the server 50 , and the server 50 can return the video file or picture to the second terminal 30 . Usually, in order to improve the transmission speed, the video file or picture will be compressed, so the resolution of the video file or picture is lower.
第二终端30在接收到该视频文件或者图片后,可以利用本申请实施例提供的图像处理方法对该视频文件或者图片进行实时处理,得到高分辨率的视频或图片,并显示在第二终端30的显示界面中,以提高用户的画质体验。本申请实施例提供的图像处理方法可以作为一个功能插件集成在第二终端30的视频播放App或者图库App中。After receiving the video file or picture, the second terminal 30 can perform real-time processing on the video file or picture by using the image processing method provided in the embodiment of the present application to obtain a high-resolution video or picture, and display it on the second terminal 30 in the display interface to improve the user's picture quality experience. The image processing method provided by the embodiment of the present application may be integrated into a video playback App or a gallery App of the second terminal 30 as a functional plug-in.
以视频直播场景为例,第一终端20可以是主播的移动终端,第二终端30可以是观众的移动终端。主播在直播时,第一终端20可以将直播视频上传至服务器50,服务器50可以对该直播视频进行存储,当观众通过第二终端30观看直播时,服务器50可以向第二终端30返回该直播视频。第二终端30在接收到该直播视频后,可以利用本申请实施例提供的图像处理方法对该直播视频进行实时处理,得到高分辨率的直播视频并进行显示,这样观众就能观看到清晰的直播视频。Taking a live video scene as an example, the first terminal 20 may be the mobile terminal of the host, and the second terminal 30 may be the mobile terminal of the viewer. When the host is broadcasting live, the first terminal 20 can upload the live video to the server 50, and the server 50 can store the live video. When the audience watches the live broadcast through the second terminal 30, the server 50 can return the live broadcast to the second terminal 30. video. After receiving the live video, the second terminal 30 can process the live video in real time by using the image processing method provided in the embodiment of the present application to obtain a high-resolution live video and display it, so that the audience can watch the live video clearly. Live video.
需要指出的是,本申请实施例提供的图像处理方法可以应用于移动终端,虽然上述是以应用于第二终端30为例进行说明,但是应当理解,该图像处理方法也可以应用于第一终端20,具体可以根据实际应用场景确定,在此不作限制。It should be pointed out that the image processing method provided in this embodiment of the present application can be applied to a mobile terminal. Although the above description is given by taking the application to the second terminal 30 as an example, it should be understood that the image processing method can also be applied to the first terminal. 20. The specific value can be determined according to the actual application scenario, which is not limited here.
下面对本申请实施例提供的图像处理方法进行详细介绍。The image processing methods provided by the embodiments of the present application are described in detail below.
在图1所示的应用场景示意图的基础上,请参照图2,图2示出了本申请实施例提供的图像处理方法的一种流程示意图,该图像处理方法可以包括以下步骤:On the basis of the schematic diagram of the application scenario shown in FIG. 1, please refer to FIG. 2, which shows a schematic flowchart of an image processing method provided by an embodiment of the present application. The image processing method may include the following steps:
S101,获取待处理图像。S101, acquiring an image to be processed.
待处理图像可以是在移动终端上显示的、需要进行超分辨率重建以提高图像质量的图片或者视频流中的视频帧,例如,可以是第二终端30从服务器50获取的低分辨率视频中的视频帧。The image to be processed may be a picture displayed on the mobile terminal that needs to be reconstructed by super-resolution to improve the image quality or a video frame in a video stream, for example, it may be a low-resolution video obtained by the second terminal 30 from the server 50. video frame.
在本实施例中,移动终端可以在接收到低分辨率图片或者低分辨率视频文件时,直接进行超分辨率重建;也可以在接收到低分辨率图片或者低分辨率视频文件后先显示到显示界面,等到用户进行分辨率切换操作时再进行超分辨率重建,例如,在接收到低分辨率视频时先进行播放,当用户将清晰度从“标清”切换为“超清”时,再进行超分辨率重建。In this embodiment, the mobile terminal can directly perform super-resolution reconstruction when receiving a low-resolution picture or a low-resolution video file; it can also display a low-resolution picture or a low-resolution video file first after receiving Display the interface, wait until the user performs the resolution switching operation, and then perform the super-resolution reconstruction. For example, when a low-resolution video is received, play it first, and when the user switches the resolution from "standard definition" to "ultra-definition", then Perform super-resolution reconstruction.
S102,将待处理图像输入图像重建模型,利用图像重建模型的特征提取网络对待处理图像进行多尺度特征提取及扩展图像通道,得到重建特征图。S102 , input the image to be processed into an image reconstruction model, and use a feature extraction network of the image reconstruction model to perform multi-scale feature extraction on the image to be processed and expand image channels to obtain a reconstructed feature map.
获取到待处理图像后,将待处理图像输入图像重建模型进行超分辨率重建。请参照图3,图像重建模型包括特征提取网络和子像素卷积层,特征提取网络用于提取待处理图像的多尺度特征并扩展图像通道,子像素卷积层用于对特征提取网络输出的重建特征图进行放大。After the to-be-processed image is acquired, the to-be-processed image is input into the image reconstruction model for super-resolution reconstruction. Please refer to FIG. 3, the image reconstruction model includes a feature extraction network and a sub-pixel convolution layer. The feature extraction network is used to extract multi-scale features of the image to be processed and expand the image channel, and the sub-pixel convolution layer is used to reconstruct the output of the feature extraction network. The feature map is zoomed in.
多尺度特征提取是指通过全局级联和局部级联的方式,提取不同层次的特征信息,例如,可以从底层到高层逐步进行特征提取,也可以将底层信息直接传递到高层。Multi-scale feature extraction refers to extracting feature information at different levels by means of global cascade and local cascade. For example, feature extraction can be performed step by step from the bottom layer to the high layer, or the bottom layer information can be directly transferred to the high layer.
图像通道是指将图像按照颜色成分划分后的一个或多个颜色通道,通常可以按照图像通道将图像分为单通道图像、三通道图像和四通道图像。单通道图像是指图像中的每个像素点只由一个数值表示,例如,灰度图;三通道图像是指图像中的每个像素点由三个数值表示,例如,RGB彩色图像;四通道图像是在三通道图像的基础上加上透明程度、Alpha色彩空间等。An image channel refers to one or more color channels after an image is divided according to color components. Usually, an image can be divided into a single-channel image, a three-channel image and a four-channel image according to the image channel. A single-channel image means that each pixel in the image is represented by only one value, such as a grayscale image; a three-channel image means that each pixel in the image is represented by three values, such as an RGB color image; four-channel image The image is based on the three-channel image plus transparency, Alpha color space, etc.
扩展图像通道是指不改变图像的大小且增加图像的通道数。例如,输入为H×W×C的图像,其中,H×W为输入图像的大小,C为输入图像的通道数;输出为H×W×r 2C的图像,H×W为输出图像的大小,r 2C为输出图像的通道数。 Extending image channels means increasing the number of channels in the image without changing the size of the image. For example, the input is an image of H×W×C, where H×W is the size of the input image and C is the number of channels of the input image; the output is an image of H×W×r 2 C, where H×W is the size of the output image size, r 2 C is the number of channels of the output image.
S103,利用图像重建模型的子像素卷积层对重建特征图进行放大,得到重建图像。S103, using the sub-pixel convolution layer of the image reconstruction model to amplify the reconstructed feature map to obtain a reconstructed image.
子像素卷积层(sub-pixel convolution layer)也称像素重组(PixelShuffle),是一种可以高效计算的卷积层,主要功能是将低分辨的特 征图,通过卷积和多通道间的重组得到高分辨率的特征图。相比于双线性或双三次采样器(bilinear or bicubic sampler)等人工提升滤波器,子像素卷积层能够通过训练学习更复杂的提升操作,同时计算的总体时间也被降低。The sub-pixel convolution layer, also known as PixelShuffle, is a convolutional layer that can be computed efficiently. Get high-resolution feature maps. Compared to artificial boosting filters such as bilinear or bicubic samplers, subpixel convolutional layers can be trained to learn more complex boosting operations, while the overall computation time is reduced.
例如,输入特征图为H×W×r 2C,子像素卷积层的主要功能就是将r 2个通道的特征图组合为新的r×H、r×W的上采样结果,即(r×H)×(r×W)×C,得到rH×rW×C的输出图像,完成输入特征图到输出图像的r倍放大。 For example, if the input feature map is H×W×r 2 C, the main function of the sub-pixel convolution layer is to combine the feature maps of r 2 channels into a new r×H, r×W upsampling result, namely (r ×H)×(r×W)×C, the output image of rH×rW×C is obtained, and the r-fold enlargement of the input feature map to the output image is completed.
子像素卷积层的工作过程可以是:首先将原来一个低分辨像素划分为r×r个小格子;然后按照一定的规则,利用r×r个输入特征图对应位置的值来填充这些小格子;按照同样的方法将每个低分辨像素划分出的小格子填满就完成了重组过程。The working process of the sub-pixel convolutional layer can be as follows: firstly, the original low-resolution pixel is divided into r×r small grids; then according to certain rules, the values of the corresponding positions of the r×r input feature maps are used to fill these small grids ; The recombination process is completed by filling the small grids divided by each low-resolution pixel in the same way.
在一个实施例中,可以利用子像素卷积层调整重建特征图中的像素位置,得到重建图像。In one embodiment, a sub-pixel convolutional layer may be used to adjust pixel positions in the reconstructed feature map to obtain a reconstructed image.
例如,特征提取网络输出的重建特征图为H×W×r 2C,利用子像素卷积层调整像素位置,得到rH×rW×C的重建图像,进而完成r倍放大。 For example, the reconstructed feature map output by the feature extraction network is H×W×r 2 C, and the sub-pixel convolution layer is used to adjust the pixel position to obtain a reconstructed image of rH×rW×C, and then complete the r-fold magnification.
在本实施例中,子像素卷积层可以支持多种放大尺寸,例如,可以用2倍的子像素卷积层组合完成4倍的放大操作,或者,用2倍和3倍的子像素卷积层组合完成6倍的放大操作。In this embodiment, the sub-pixel convolutional layer can support multiple magnification sizes. For example, a 4-times magnification operation can be accomplished with a combination of 2-times sub-pixel convolutional layers, or a 2-times and 3-times sub-pixel volume The layered combination completes a 6x magnification operation.
同时,现有的超分辨率重建算法是先插值到高分辨率再做修正,而本申请实施例中的图像重建模型将子像素卷积层设计在末端做放大,这样可以保证模型前段的特征提取网络以小尺寸图像做处理,大幅减少了计算量和参数量。At the same time, the existing super-resolution reconstruction algorithm is to first interpolate to high resolution and then make corrections, while the image reconstruction model in the embodiment of the present application is designed to enlarge the sub-pixel convolution layer at the end, which can ensure the characteristics of the front part of the model. The extraction network processes small-sized images, which greatly reduces the amount of computation and parameters.
下面对步骤S102进行详细介绍。Step S102 will be described in detail below.
请再次参照图3,特征提取网络包括卷积层、多个级联块和多个第一卷积层,多个级联块和多个第一卷积层交替设置,特征提取网络采用全局级联结构。全局级联结构是指图3中的左侧快速通道和右侧快速通道,通过左侧快速通道可以将级联块的输出直接输送给该级联块后的各个第一卷积层,通过右侧快速通道可以将卷积层的输出直接输送给各个第一卷积层。这里的输送是指通道的叠加,不是指数据相加。Please refer to Figure 3 again, the feature extraction network includes convolutional layers, multiple concatenated blocks and multiple first convolutional layers, multiple concatenated blocks and multiple first convolutional layers are alternately arranged, and the feature extraction network adopts a global level link structure. The global cascade structure refers to the left fast channel and the right fast channel in Figure 3. The output of the cascaded block can be directly sent to each first convolutional layer after the cascaded block through the left fast channel. The side fast channel can feed the output of the convolutional layers directly to each first convolutional layer. The transport here refers to the superposition of channels, not the addition of data.
在图3所示的特征提取网络的基础上,步骤S102中利用图像重建模型的特征提取网络对待处理图像进行多尺度特征提取及扩展图像通道,得到重建特征图的方式,可以包括:On the basis of the feature extraction network shown in FIG. 3 , in step S102, the feature extraction network of the image reconstruction model is used to perform multi-scale feature extraction on the image to be processed and expand the image channel to obtain the reconstructed feature map, which may include:
将待处理图像输入卷积层进行卷积处理,得到初始特征图;Input the image to be processed into the convolution layer for convolution processing to obtain the initial feature map;
将初始特征图作为第一个级联块的输入、以及将第N-1个第一卷积层的输出作为第N个级联块的输入,利用级联块进行多尺度特征提取,输出中间特征图;Taking the initial feature map as the input of the first concatenated block and the output of the N-1th first convolutional layer as the input of the Nth concatenated block, the multi-scale feature extraction is performed by using the concatenated block, and the output intermediate feature map;
将初始特征图和第N个第一卷积层前每个级联块输出的中间特征图进行通道叠加,并在叠加后输入第N个第一卷积层进行卷积处理;Perform channel stacking on the initial feature map and the intermediate feature map output by each concatenated block before the Nth first convolutional layer, and input the Nth first convolutional layer for convolution processing after stacking;
将最后一个第一卷积层的输出作为重建特征图。Take the output of the last first convolutional layer as the reconstructed feature map.
其中,卷积层和第一卷积层可以扩展图像通道,卷积层、级联块和第一卷积层可以提取特征。Among them, the convolutional layer and the first convolutional layer can expand the image channel, and the convolutional layer, concatenated block and the first convolutional layer can extract features.
将初始特征图和中间特征图进行通道叠加,是指合并初始特征图的通道和中间特征图的通道,例如,初始特征图有4个通道,中间特征图有8个通道,将二者进行通道叠加,则叠加后的特征图就有12个通道;换句话说,初始特征图中每个像素由4个数值表示,中间特征图中每个像素由8个数值表示,通道叠加后的特征图中每个像素由12个数值表示。The channel stacking of the initial feature map and the intermediate feature map refers to combining the channels of the initial feature map and the channels of the intermediate feature map. For example, the initial feature map has 4 channels and the intermediate feature map has 8 channels. Overlay, the superimposed feature map has 12 channels; in other words, each pixel in the initial feature map is represented by 4 values, and each pixel in the intermediate feature map is represented by 8 values. The feature map after channel stacking Each pixel is represented by 12 values.
在一个实施例中,级联块的结构如图4所示,级联块包括多个残差块和多个第二卷积层,多个残差块和多个第二卷积层交替设置,级联块采用局部级联结构。局部级联结构是指图4中的左侧快速通道和右侧快速通道,通过左侧快速通道可以将残差块的输出直接输送给该残差块后的各个第二卷积层,通过右侧快速通道可以将级联块的输入直接输送给各个第二卷积层。同上,这里的输送是指通道的叠加,不是指数据相加。In one embodiment, the structure of the concatenated block is shown in FIG. 4 , the concatenated block includes multiple residual blocks and multiple second convolution layers, and multiple residual blocks and multiple second convolution layers are alternately arranged , the cascade block adopts a local cascade structure. The local cascade structure refers to the left fast channel and the right fast channel in Figure 4. The output of the residual block can be directly sent to each second convolution layer after the residual block through the left fast channel, and the The side fast channel can feed the input of the concatenated block directly to each second convolutional layer. As above, the transmission here refers to the superposition of channels, not the addition of data.
在图4所示的级联块的基础上,利用级联块进行多尺度特征提取,输出中间特征图的方式,可以包括:On the basis of the cascaded block shown in Figure 4, the multi-scale feature extraction is performed by using the cascaded block, and the way of outputting the intermediate feature map may include:
将级联块的输入作为第一个残差块的输入、以及将第N-1个第二卷积层的输出作为第N个残差块的输入,利用残差块学习残差特征,得到残差特征图;Taking the input of the concatenated block as the input of the first residual block and the output of the N-1 second convolutional layer as the input of the Nth residual block, using the residual block to learn the residual features, we get Residual feature map;
将级联块的输入和第N个第二卷积层前每个残差块的输出进行通道叠加,并在叠加后输入第N个第二卷积层进行卷积处理;The input of the concatenated block and the output of each residual block before the Nth second convolutional layer are channel-superposed, and after the superposition is inputted into the Nth second convolutional layer for convolution processing;
将最后一个第二卷积层的输出作为中间特征图。Take the output of the last second convolutional layer as an intermediate feature map.
其中,第二卷积层可以扩展图像通道,残差块和第二卷积层可以提取特征。Among them, the second convolutional layer can expand the image channel, and the residual block and the second convolutional layer can extract features.
将级联块的输入与残差块的输出进行通道叠加的过程,与上述将初始特征图和中间特征图进行通道叠加的过程类似,在此不再赘述。The process of channel stacking the input of the cascaded block and the output of the residual block is similar to the above-mentioned process of channel stacking the initial feature map and the intermediate feature map, and will not be repeated here.
在一个实施例中,残差块的结构如图5所示,残差块可以包括分组卷积层、第三卷积层和第四卷积层,分组卷积层采用ReLu激活函数,分组卷积层和第三卷积层连接形成残差路径,残差块采用局部跳跃连接结构。局部跳跃连接结构是指,将残差块的输入与残差路径的输出进行融合以学习残差特征。In one embodiment, the structure of the residual block is shown in Figure 5. The residual block may include a grouped convolutional layer, a third convolutional layer and a fourth convolutional layer. The grouped convolutional layer adopts the ReLu activation function, and the grouped convolutional layer The convolutional layer and the third convolutional layer are connected to form a residual path, and the residual block adopts a local skip connection structure. The local skip connection structure refers to the fusion of the input of the residual block and the output of the residual path to learn residual features.
在图5所示的残差块的基础上,利用残差块学习残差特征,得到残差特征图的方式,可以包括:On the basis of the residual block shown in Figure 5, the residual feature is learned by using the residual block to obtain the residual feature map, which may include:
将残差块的输入作为分组卷积层的输入,通过残差路径提取特征;Use the input of the residual block as the input of the grouped convolutional layer, and extract features through the residual path;
将残差块的输入和第三卷积层的输出进行特征融合,并在融合后输入第四卷积层进行卷积处理,输出残差特征图。The input of the residual block and the output of the third convolution layer are feature fusion, and after fusion, the input is input to the fourth convolution layer for convolution processing, and the residual feature map is output.
其中,第三卷积层和第四卷积层可以扩展图像通道,分组卷积层可以提取特征。Among them, the third convolutional layer and the fourth convolutional layer can expand the image channel, and the grouped convolutional layer can extract features.
分组卷积层(Group Convolution layer)可以对输入特征图进行分组。然后每组分别进行卷积处理。与常规卷积相比,分组卷积能够减少模型参数,从而提高模型的处理速度。The Group Convolution layer can group the input feature maps. Each group is then convolved separately. Compared with regular convolution, grouped convolution can reduce model parameters, thereby increasing the processing speed of the model.
在本实施例中,多个分组卷积层的层数、以及每个分组卷积层对输入特征图的分组数,可以由用户根据实际需要灵活选择,例如,分组卷积层的层数为2,分组数为3等。In this embodiment, the number of layers of a plurality of grouped convolutional layers and the number of groups of each grouped convolutional layer to the input feature map can be flexibly selected by the user according to actual needs. For example, the number of layers of the grouped convolutional layers is 2, the number of groups is 3 and so on.
本实施例中的卷积层、第一卷积层、第二卷积层、第三卷积层和第四卷积层的类型并不限定,例如,可以是常规卷积(Conv)、1×1点卷积、深度卷积等,可以根据实际需求灵活调整。The types of the convolutional layer, the first convolutional layer, the second convolutional layer, the third convolutional layer, and the fourth convolutional layer in this embodiment are not limited. ×1 point convolution, depth convolution, etc., can be flexibly adjusted according to actual needs.
通常,图像重建模型的表现力会随着全局级联或者局部级联的复杂度增加,也就是说,特征提取网络中级联块和第一卷积层的数量越多、或者级联块中残差块和第二卷积层的数量越多,图像重建模型的表现力会越强。但是,网络结构越复杂,计算速度也会越慢。因此,为了在保证重建效果的同时提高处理速度,各模块的数量不宜过多。Generally, the expressiveness of an image reconstruction model increases with the complexity of the global cascade or the local cascade, that is, the greater the number of cascaded blocks and the first convolutional layer in the feature extraction network, or the number of cascaded blocks in the The greater the number of residual blocks and second convolutional layers, the more expressive the image reconstruction model will be. However, the more complex the network structure, the slower the calculation speed. Therefore, in order to improve the processing speed while ensuring the reconstruction effect, the number of each module should not be too large.
在一个实施方式中,特征提取网络中级联块和第一卷积层的数量均可以是3至5,级联块中残差块和第二卷积层的数量均可以是3至5,残差块中分组卷积层的数量可以是2至4。例如,请参照图6,可以设置特征提取网络包括3个级联块和3层第一卷积层,级联块包括3个残差块和3层第二卷积层,残差块包括2层分组卷积层。In one embodiment, the number of concatenated blocks and the first convolutional layer in the feature extraction network can both be 3 to 5, and the number of residual blocks and the second convolutional layer in the concatenated block can both be 3 to 5, The number of grouped convolutional layers in the residual block can be 2 to 4. For example, referring to Figure 6, the feature extraction network can be set to include 3 concatenated blocks and 3 first convolutional layers, the concatenated block includes 3 residual blocks and 3 second convolutional layers, and the residual block includes 2 Layer grouping convolutional layers.
另外,可以设置级联块中的各模块共享参数,即,多个残差块共享参数以及多个第二卷积层共享参数,从而使图像重建模型进一步轻量化,以提高处理速度,但是,共享参数后会有一定的效果损失。In addition, the shared parameters of each module in the cascaded block can be set, that is, the shared parameters of multiple residual blocks and the shared parameters of multiple second convolution layers, so that the image reconstruction model can be further lightened and the processing speed can be improved. However, There will be a certain loss of effect after sharing parameters.
例如,请参照图7,其中,左图和中图为采用本申请实施例提供的图像处理方法得到的重建图像,左图未共享参数,中图共享参数;右图为采用双三次插值(Bicubic)算法得到的重建图像。从图中可以看出,从图中可以看出,左图和中图明显比右图清晰。For example, please refer to FIG. 7 , in which the left picture and the middle picture are reconstructed images obtained by adopting the image processing method provided by the embodiment of the present application, the left picture does not share parameters, and the middle picture shares parameters; the right picture shows the use of bicubic interpolation (Bicubic ) algorithm to obtain the reconstructed image. As can be seen from the figure, it can be seen from the figure that the left and middle pictures are obviously clearer than the right picture.
根据本申请的示例性实施方式,图8示出了一种人像超分辨率重建方法的流程示意图。该人像超分辨率重建方法的详细步骤介绍如下。According to an exemplary embodiment of the present application, FIG. 8 shows a schematic flowchart of a method for super-resolution reconstruction of a portrait. The detailed steps of the portrait super-resolution reconstruction method are introduced as follows.
步骤S110,利用预先构建的重建模型对待处理图像进行关键点检测,得到人脸关键点。Step S110 , using a pre-built reconstruction model to perform key point detection on the image to be processed to obtain face key points.
步骤S120,根据所述人脸关键点和基于所述待处理图像得到的图像特征进行超分辨率重建处理,得到图像高频信息。Step S120: Perform super-resolution reconstruction processing according to the face key points and the image features obtained based on the to-be-processed image to obtain image high-frequency information.
步骤S130,利用所述图像高频信息对所述待处理图像进行复原处理,得到所述待处理图像对应的超分辨率图像。Step S130, performing restoration processing on the to-be-processed image by using the high-frequency information of the image to obtain a super-resolution image corresponding to the to-be-processed image.
应当理解,在本申请的其它实施例中,本实施例的人像超分辨率重建方法其中部分步骤的顺序可以根据实际需要相互交换,或者其中的部分步骤也可以省略或删除。It should be understood that, in other embodiments of the present application, the order of some steps in the method for super-resolution reconstruction of a portrait of this embodiment may be exchanged according to actual needs, or some steps may be omitted or deleted.
人脸图像的超分辨率处理,也就是对人脸图像的清晰度进行提升的处理。在根据本申请实施例的人像超分辨率重建方法中,针对的待处理图像可以是清晰度较低的图像,这类图像往往存在人脸清晰度较低,而对例如图像识别、图像匹配等造成障碍的问题。例如,待处理图像可以是监控设备所采集到的人脸图像、或者是网页截屏获得的人脸图像、又或者是直播过程中所采集的主播人脸图像等等。Super-resolution processing of face images, that is, processing to improve the clarity of face images. In the portrait super-resolution reconstruction method according to the embodiment of the present application, the target image to be processed may be an image with low definition, such images often have low definition of the face, and for example, image recognition, image matching, etc. problems that create obstacles. For example, the image to be processed may be a face image collected by a monitoring device, or a face image obtained by taking a screenshot of a web page, or a host's face image collected during a live broadcast, and so on.
人脸上其人脸关键点往往是对于人眼认知最重要的信息,若人脸关键点的清晰度得到有效的提升,则得到的人脸图像将更能满足人像重建的要求。因此,在根据本申请实施例的人像超分辨率重建方法中,可首先利用构建的重建模型对待处理图像进行关键点检测得到人脸关键点。The key points of the face on the face are often the most important information for human cognition. If the clarity of the key points of the face is effectively improved, the obtained face image will better meet the requirements of portrait reconstruction. Therefore, in the portrait super-resolution reconstruction method according to the embodiment of the present application, the constructed reconstruction model may be used to first perform key point detection on the image to be processed to obtain the face key points.
其中,得到的人脸关键点可包括左眼、右眼、鼻子、嘴巴以及下巴轮廓。基于这些人脸关键点可以大致勾勒出人脸轮廓,并且,包含人脸关键点中对于人眼认知最重要的眼睛部分。The obtained face key points may include left eye, right eye, nose, mouth and chin contour. Based on these face key points, the outline of the face can be roughly outlined, and the most important part of the face for human eye cognition among the key points of the face is included.
虽然人脸关键点对于人像超分辨率重建十分重要,但是,在重建过程中同样需要考虑人脸图像中的其他区域的重建处理,如此,可得到局部重点处理、全局也得到处理的符合要求的超分辨率图像。Although face key points are very important for portrait super-resolution reconstruction, the reconstruction processing of other areas in the face image also needs to be considered in the reconstruction process. In this way, local key points can be processed and the global can also be processed to meet the requirements. super-resolution images.
因此,在根据本申请实施例的人像超分辨率重建方法中,一方面通过关键点检测的方式得到人脸关键点,另一方面,可同时对待处理图像进行图像特征提取以得到图像特征。结合人脸关键点和得到的图像特征进行超分辨率重建处理,得到图像高频信息。Therefore, in the portrait super-resolution reconstruction method according to the embodiment of the present application, on the one hand, the key points of the face are obtained by means of key point detection; Combining the face key points and the obtained image features for super-resolution reconstruction processing, the high-frequency information of the image is obtained.
其中,图像高频信息主要体现图像中一些边缘、轮廓处的信息,而轮廓之内的灰度缓慢变化的部分则为低频信息。图像高频信息可体现出具有相对变化区域的信息,因此,对于图像的重建十分重要。Among them, the high-frequency information of the image mainly embodies the information at some edges and contours in the image, while the part of the slowly changing grayscale within the contour is the low-frequency information. The high-frequency information of the image can reflect the information of the relative change area, so it is very important for the reconstruction of the image.
在本申请的一些实施例中,可以使用结合图2所描述的根据本申请的图像处理方法来进行图像的超分辨率重建处理,以得到所述图像高频信息。In some embodiments of the present application, the image processing method according to the present application described in conjunction with FIG. 2 may be used to perform super-resolution reconstruction processing of the image, so as to obtain the high-frequency information of the image.
本申请的示例性实施例中得到的图像高频信息为人脸图像中的局部信息,需要将图像高频信息还原至待处理图像中,以对待处理图像进行复原处理,得到待处理图像对应的超分辨率图像。The image high-frequency information obtained in the exemplary embodiment of the present application is local information in the face image, and it is necessary to restore the image high-frequency information to the to-be-processed image, so as to perform restoration processing on the to-be-processed image, and obtain the super-high-frequency information corresponding to the to-be-processed image. resolution image.
本实施例所提供的人像超分辨率重建方法,通过人脸关键点检测,并利用人脸关键点和图像的图像特征得到图像高频信息,再利用图像高频信息对待处理图像进行复原处理,可提高得到的超分辨率图像的认知度,符合实际应用中用户的需求。The super-resolution reconstruction method of a portrait provided by this embodiment detects the key points of the face, obtains the high-frequency image information by using the key points of the face and the image features of the image, and then uses the high-frequency information of the image to restore the image to be processed, The recognition of the obtained super-resolution image can be improved, and it meets the needs of users in practical applications.
实际应用中,由于待处理图像的清晰度往往较低,因此,基于低清晰度的图像进行关键点检测得到的人脸关键点其效果并不理想,进而导致后续得到的超分辨率图像的效果不佳。因此,根据本申请的实施例中,上述的关键点检测、超分辨率重建处理及复原处理可包括多轮迭代处理。而上述的待处理图像可以是未经处理的待处理图像,或者是前一轮迭代中经过所述关键点检测、超分辨率重建处理以及复原处理后得到的超分辨率图像。In practical applications, since the definition of the image to be processed is often low, the key points of the face obtained by the key point detection based on the low-definition image are not satisfactory, which will lead to the effect of the subsequent super-resolution images. not good. Therefore, according to the embodiments of the present application, the above-mentioned key point detection, super-resolution reconstruction processing and restoration processing may include multiple rounds of iterative processing. The above image to be processed may be an unprocessed image to be processed, or a super-resolution image obtained after the key point detection, super-resolution reconstruction processing and restoration processing in the previous iteration.
具体地,参阅图9,对于未经处理的待处理图像LR Face(Low Resolution Face),通过上述的关键点检测、超分辨率重建处理以及复原处理后可得到该轮迭代处理后的超分辨率图像SRFace(Super Resolution Face)。然后,在得到的超分辨率图像的基础上,再进行上述的关键点检测、超分辨率重建处理以及复原处理,得到第二轮迭代后的超分辨率图像。按该处理逻辑,在进行多次迭代处理后,在满足一定要求时,得到最终的超分辨率图像。Specifically, referring to FIG. 9, for the unprocessed image LR Face (Low Resolution Face) to be processed, the super-resolution after this round of iterative processing can be obtained after the above-mentioned key point detection, super-resolution reconstruction processing and restoration processing Image SRFace (Super Resolution Face). Then, on the basis of the obtained super-resolution image, the above-mentioned key point detection, super-resolution reconstruction processing and restoration processing are performed to obtain the super-resolution image after the second round of iteration. According to the processing logic, the final super-resolution image is obtained when certain requirements are met after multiple iterations.
请结合参阅图10,针对输入的待处理图像Input,首先,可以对其进行关键点检测,得到对应的人脸关键点Face Points 0,基于Face Points  0并结合Input的图像特征得到图像高频信息,再根据图像高频信息和Input得到第一轮的超分辨率图像Face SR1。在此基础上,针对Face SR1进行关键点检测,得到对应的人脸关键点Face Points 1,基于Face Points 1并结合Face SR1的图像特征得到图像高频信息,再根据图像高频信息和Face SR1得到第二轮的超分辨率图像Face SR2。按此处理逻辑,在进行N次迭代处理之后(N为预设的迭代停止次数或N次迭代后得到的图像满足预设要求),可得到最终的超分辨率图像Face SR N。Please refer to Figure 10. For the input image Input to be processed, first, key point detection can be performed on it to obtain the corresponding face key point Face Points 0. Based on Face Points 0 and combined with the image features of Input, the image high-frequency information can be obtained , and then obtain the first-round super-resolution image Face SR1 according to the high-frequency information of the image and Input. On this basis, key points are detected for Face SR1, and the corresponding face key points Face Points 1 are obtained. Based on Face Points 1 and combined with the image features of Face SR1, the image high-frequency information is obtained, and then according to the image high-frequency information and Face SR1 Get the second-round super-resolution image Face SR2. According to this processing logic, the final super-resolution image Face SR N can be obtained after N iterations (N is the preset number of iteration stops or the image obtained after N iterations meets the preset requirements).
在根据本申请实施例的人像超分辨率重建方法中,采用循环递归的方式,将上一轮处理得到的图像作为检测对象以进行多次循环处理,可不断提高得到的超分辨率图像的质量。In the portrait super-resolution reconstruction method according to the embodiment of the present application, the image obtained by the previous round of processing is used as the detection object to perform multiple loop processing in a recursive manner, which can continuously improve the quality of the obtained super-resolution image. .
相应地,可以将多次循环中的模型参数进行共享,从而使模型更加轻量化,以为将模型应用于如移动端等处理能够较弱的设备提供支持。此外,在根据本申请实施例的人像超分辨率重建方法中,在处理资源有限的情况下,除了可以共享参数之外,还可在一定范围内优先增加网络宽度,即特征提取通道的数量,而不需要着重于网络深度,即网络层数上,结合利用循环递归处理的方式,以提高模型的识别准确性。Correspondingly, model parameters in multiple loops can be shared, thereby making the model more lightweight and providing support for applying the model to devices with weak processing capabilities, such as mobile terminals. In addition, in the portrait super-resolution reconstruction method according to the embodiment of the present application, in the case of limited processing resources, in addition to sharing parameters, the network width can be preferentially increased within a certain range, that is, the number of feature extraction channels, Instead of focusing on the network depth, that is, the number of network layers, combined with the use of recursive processing methods, the recognition accuracy of the model can be improved.
由上述可知,在根据本申请实施例的人像超分辨率重建方法中,检测的人脸关键点包括多个,而在基于人脸关键点和图像特征得到图像高频信息以对待处理图像进行复原时,需要能够将对应的人脸关键点对应至待处理图像中的准确地位置,以避免出现关键点偏移的现象。因此,请参阅图11,在根据本申请实施例的人像超分辨率重建方法中,上述进行复原处理时,可通过以下步骤实现:It can be seen from the above that in the portrait super-resolution reconstruction method according to the embodiment of the present application, the detected face key points include a plurality of face key points, and the image high-frequency information is obtained based on the face key points and image features to restore the image to be processed. When , it is necessary to be able to correspond the corresponding face key points to the exact position in the image to be processed, so as to avoid the phenomenon of key point shift. Therefore, referring to FIG. 11 , in the portrait super-resolution reconstruction method according to the embodiment of the present application, the above-mentioned restoration processing can be implemented by the following steps:
步骤S131,利用预先构建的人像认知模型对所述待处理图像进行处理,输出各所述人脸关键点的位置信息。In step S131, the image to be processed is processed by using a pre-built portrait cognitive model, and the position information of each of the key points of the face is output.
步骤S132,根据各所述人脸关键点以及其对应的位置信息、图像高频信息,对所述待处理图像中对应人脸关键点进行复原处理,得到所述待处理图像对应的超分辨率图像。Step S132, performing restoration processing on the corresponding face key points in the to-be-processed image according to each of the face key points and their corresponding position information and image high-frequency information to obtain the super-resolution corresponding to the to-be-processed image image.
在一些实施例中,可以构建神经网络模型,该神经网络模型可以是例如卷积神经网络模型(Convolutional Neural Networks,CNN)等。可采集多个训练样本,其中,各个训练样本中包含人脸图像,且各人脸图像中的人脸关键点携带有位置信息,该位置信息可以是各人脸关键点在人脸区域中的相对位置信息,也可以是将人脸区域映射至坐标系中,以人脸关键点在坐标系中的坐标值作为其位置信息。In some embodiments, a neural network model can be constructed, and the neural network model can be, for example, a convolutional neural network model (Convolutional Neural Networks, CNN) or the like. Multiple training samples can be collected, wherein each training sample contains a face image, and the face key points in each face image carry position information, and the position information can be the position of each face key point in the face area. For the relative position information, the face area can also be mapped into the coordinate system, and the coordinate value of the key point of the face in the coordinate system is used as its position information.
利用训练样本对构建的神经网络模型进行训练,以得到满足要求的人像认知模型。利用该人像认知模型可识别得到待处理图像中各个人脸关键点的位置信息。Use the training samples to train the constructed neural network model to obtain a portrait cognitive model that meets the requirements. The position information of each face key point in the image to be processed can be identified and obtained by using the face recognition model.
如此,在进行复原处理时,结合参阅图12,对于待处理图像LR Face中的人脸关键点,如左眼、右眼、鼻子、嘴巴、以及下巴轮廓,则可以基于根据人像认知模型所得到的人脸关键点的位置信息以及图像高频信息包含的对应人脸关键点的高频信息,对待处理图像中该人脸关键点进行复原处理,以得到最终的超分辨率图像SR Face。In this way, when performing the restoration process, referring to FIG. 12 , the key points of the face in the LR Face of the image to be processed, such as the left eye, right eye, nose, mouth, and chin contour, can be The position information of the obtained face key points and the high-frequency information of the corresponding face key points contained in the high-frequency information of the image are restored and processed to obtain the final super-resolution image SR Face.
在根据本申请实施例的人像超分辨率重建方法中,由于只需利用人像认知模型识别各个人脸关键点的位置信息,所需分析、处理的数据信息较少,因此,该人像认知模型可基于轻量级的网络模型所构建,以避免网络模型构建以及运行对处理资源的不必要的过多占用。In the portrait super-resolution reconstruction method according to the embodiment of the present application, since it is only necessary to use the portrait cognitive model to identify the position information of each key point of the human face, there is less data information to be analyzed and processed. Models can be built based on lightweight network models to avoid unnecessary excessive consumption of processing resources by network model building and running.
在根据本申请实施例的人像超分辨率重建方法中,采用人像认知模型得到各人脸关键点的位置信息的方式,可在进行复原时,准确地基于各人脸关键点的位置对待处理图像中的对应位置进行复原处理,避免出现对应人脸关键点的复原移位的现象出现。In the portrait super-resolution reconstruction method according to the embodiment of the present application, the method of obtaining the position information of each face key point by adopting the portrait cognitive model can accurately process the processing based on the position of each face key point during restoration. The corresponding position in the image is restored to avoid the phenomenon of restoration and displacement of the corresponding key points of the face.
此外,不同的人脸关键点在复原时其具体的复原要求往往不同,例如,对于眼睛而言,希望复原处理后的眼睛可以亮度较高,而对于下巴轮廓而言,可能希望复原处理后的下巴轮廓线条更清晰。In addition, the specific restoration requirements of different face key points are often different during restoration. For example, for the eyes, it is hoped that the restored eyes can be brighter, while for the chin contour, it may be desirable to restore the processed eyes. The chin contour is more defined.
因此,在根据本申请实施例的人像超分辨率重建方法中,基于上述考虑,在进行复原处理时,可首先获取各个人脸关键点对应的复原属性,该复原属性即为上述的复原处理的不同要求信息。再根据各人脸关键点的位置信息、复原属性以及图像高频信息,对待处理图像进行复原处理,得到对应的超分辨率图像。Therefore, in the portrait super-resolution reconstruction method according to the embodiment of the present application, based on the above considerations, when performing the restoration process, the restoration attribute corresponding to each face key point may be obtained first, and the restoration attribute is the value of the restoration process described above. Different request information. Then, according to the position information of each face key point, the restoration attribute and the high frequency information of the image, the restoration processing is performed on the image to be processed, and the corresponding super-resolution image is obtained.
在根据本申请实施例的人像超分辨率重建方法中,通过上述方式,可按区分各人脸关键点并基于其对应的位置信息、复原属性进行人脸关键点的独立恢复,不仅可以满足不同人脸关键点的复原的针对性需求,且重建模型在处理时,也可基于分组卷积的方式进行同步处理,可大幅减少处理的时间。In the portrait super-resolution reconstruction method according to the embodiment of the present application, through the above method, the face key points can be independently restored by distinguishing each face key point and based on its corresponding position information and restoration attributes, which can not only satisfy different needs The specific requirements for the restoration of key points of the face, and the reconstruction model can also be processed synchronously based on the group convolution method, which can greatly reduce the processing time.
在以上根据本申请的示例性实施方式的人像超分辨率重建方法中,超分辨率重建处理是采用预先所构建并训练得到的重建模型来实现的。In the above method for super-resolution reconstruction of a portrait according to an exemplary embodiment of the present application, the super-resolution reconstruction process is implemented by using a reconstruction model constructed and trained in advance.
接下来,结合图13对根据本申请实施例的图像重建模型的训练过程进行详细介绍。Next, the training process of the image reconstruction model according to the embodiment of the present application will be described in detail with reference to FIG. 13 .
本申请实施例提供的模型训练方法可以应用于任何具有图像处理功能的电子设备,例如,服务器、移动终端、通用计算机或者特殊用途的计算机等。The model training method provided in the embodiments of the present application can be applied to any electronic device with an image processing function, for example, a server, a mobile terminal, a general-purpose computer, or a special-purpose computer.
请参照图13,图13示出了本申请实施例提供的图像重建模型训练方法的一种流程示意图,该模型训练方法可以包括以下步骤:Please refer to FIG. 13. FIG. 13 shows a schematic flowchart of an image reconstruction model training method provided by an embodiment of the present application. The model training method may include the following steps:
S201,获取训练样本,训练样本包括低分辨率图像和高分辨率图像,低分辨率图像是对高分辨率图像进行下采样得到的。S201 , acquiring training samples, where the training samples include low-resolution images and high-resolution images, and the low-resolution images are obtained by down-sampling the high-resolution images.
这里的训练样本是一个数据集,可以获取大量的高分辨率图像(例如,分辨率高于某一预设值)作为原始样本,这些高分辨率图像可以是各种类型的图片或视频中的视频帧,例如,可以是视频直播场景中的高清直播视频等。The training sample here is a dataset, which can obtain a large number of high-resolution images (for example, the resolution is higher than a certain preset value) as the original sample, and these high-resolution images can be various types of pictures or videos. The video frame, for example, may be a high-definition live video in a live video scene, and the like.
在获取到原始样本后,对原始样本进行下采样处理,也就是对每张高分辨率图像均按照同样的方法进行下采样处理,得到训练样本。下采样处理的方式可以是双三次插值等。After the original samples are obtained, down-sampling is performed on the original samples, that is, down-sampling is performed on each high-resolution image according to the same method to obtain training samples. The way of downsampling processing can be bicubic interpolation or the like.
另外,如果想要在超分辨率重建的同时完成降噪,则可以对训练样本中的低分辨率提箱增加噪声,之后再输入模型进行训练,这样训练后的模型就能既完成超分辨率重建又完成降噪。In addition, if you want to complete noise reduction at the same time as super-resolution reconstruction, you can add noise to the low-resolution binning in the training samples, and then input the model for training, so that the trained model can complete super-resolution reconstruction. Noise reduction is done again.
S202,将低分辨率图像输入预先构建的图像重建模型,图像重建模型包括特征提取网络和子像素卷积层。S202: Input the low-resolution image into a pre-built image reconstruction model, where the image reconstruction model includes a feature extraction network and a sub-pixel convolution layer.
S203,利用特征提取网络对低分辨率图像进行多尺度特征提取及扩展图像通道,得到训练特征图。S203, using a feature extraction network to perform multi-scale feature extraction on the low-resolution image and expand image channels to obtain a training feature map.
S204,利用子像素卷积层对训练特征图进行放大,得到训练重建图像。S204, using the sub-pixel convolution layer to enlarge the training feature map to obtain a training reconstructed image.
需要指出的是,步骤S203~S204的处理过程与步骤S102~S103的处理过程类似,在此不再赘述。It should be noted that the processing procedures of steps S203-S204 are similar to the processing procedures of steps S102-S103, and are not repeated here.
S205,基于训练重建图像、高分辨率图像和预设的目标函数对图像重建模型进行反向传播训练,得到训练后的图像重建模型。S205, performing back-propagation training on the image reconstruction model based on the training reconstruction image, the high-resolution image and the preset objective function, to obtain a trained image reconstruction model.
在本实施例中,目标函数可以是L2损失函数,L2损失函数又称均方误差(Mean Square Error,MSE)函数,是回归损失函数的一种。L2损失函数的曲线光滑、连续,处处可导,便于使用梯度下降算法;并且随着误差的减小,梯度也在减小,从而有利于收敛,即使使用固定的学习速率,也能较快的收敛到最小值。In this embodiment, the objective function may be an L2 loss function, which is also called a mean square error (Mean Square Error, MSE) function, which is a type of regression loss function. The curve of the L2 loss function is smooth, continuous, and derivable everywhere, which is convenient for using the gradient descent algorithm; and as the error decreases, the gradient is also decreasing, which is conducive to convergence, even if a fixed learning rate is used, it can be faster. converge to the minimum value.
在本实施例中,可以基于训练重建图像、高分辨率图像和L2损失函数对图像重建模型进行反向传播训练,以对图像重建模型的参数进行调整,直至达到预设的训练完成条件,得到训练后的图像重建模型。In this embodiment, back-propagation training can be performed on the image reconstruction model based on the training reconstructed image, the high-resolution image, and the L2 loss function, so as to adjust the parameters of the image reconstruction model until the preset training completion condition is reached, and the result is obtained The trained image reconstructs the model.
训练完成条件可以是迭代次数达到设定值(例如,2000次),或者,L2损失函数收敛到最小值等,在此不作限制,可以根据实际需求设置。The training completion condition can be that the number of iterations reaches a set value (for example, 2000 times), or the L2 loss function converges to a minimum value, etc., which is not limited here and can be set according to actual needs.
通常,对于特征提取网络来说,越到后段提取到的特征越少,因此,在训练完成后,可以根据需求和测试结果对训练后的图像重建模型进行剪枝,保留长线级联而删除短线级联,从而减少中间过多的跳跃,使模型更加轻量化。Usually, for the feature extraction network, the more features extracted in the later stage, the fewer features are extracted. Therefore, after the training is completed, the trained image reconstruction model can be pruned according to the requirements and test results, and the long-line cascades are retained and deleted. Cascading short lines, thereby reducing excessive jumps in the middle, making the model more lightweight.
在一个实施例中,可以先对低分辨率图像做预处理,预处理之后再输入图像重建模型,预处理可以是将图片自减平均值。因此,在步骤S202之前,该模型训练方法还可以包括:In one embodiment, the low-resolution image can be pre-processed first, and then the image reconstruction model can be input after the pre-processing, and the pre-processing can be the self-subtraction of the image. Therefore, before step S202, the model training method may further include:
对低分辨率图像进行自减平均值处理,以突出低分辨率图像的纹理细节。The low-resolution image is self-reduced to highlight the texture details of the low-resolution image.
自减平均值处理可以是对图像中的前景不做处理,而对背景中的每个像素减去背景图像的像素平均值,以此来增强背景部分和前景部分的对比度,突出纹理细节。The self-subtracting average processing can be performed without processing the foreground in the image, but subtracting the pixel average value of the background image from each pixel in the background, thereby enhancing the contrast between the background part and the foreground part and highlighting the texture details.
在另一个实施例中,为了使特征提取网络提取到更多的特征,预处理还可以是对图片进行翻转对称操作后输入模型,再对模型输出的结果进行反翻转对称并求平均值,从而减少各向异性带来的某些特征层或者参数的偏差。因此,在步骤S202之前,该模型训练方法还可以包括:In another embodiment, in order to extract more features from the feature extraction network, the preprocessing can also be performed on the image by flipping the symmetry operation and then inputting the model, and then performing the reverse flip symmetry on the output result of the model and calculating the average value, thereby Reduce the deviation of some feature layers or parameters caused by anisotropy. Therefore, before step S202, the model training method may further include:
对低分辨率图像进行翻转对称处理,得到至少一个处理后的低分辨率图像。Perform inversion symmetry processing on the low-resolution image to obtain at least one processed low-resolution image.
之后将至少一个处理后的低分辨率图像输入图像重建模型,利用特征提取网络对至少一个处理后的低分辨率图像进行多尺度特征提取,得到至少一个辅助特征图;再对至少一个辅助特征图进行反翻转对称处理,并在反翻转对称处理后求平均值,得到训练特征图。Then, at least one processed low-resolution image is input into the image reconstruction model, and the feature extraction network is used to perform multi-scale feature extraction on the at least one processed low-resolution image to obtain at least one auxiliary feature map; Perform reverse-flip symmetry processing, and average values after reverse-flip-symmetric processing to obtain training feature maps.
例如,对于1张n×n的图像,按照顺时针方向翻转3次,每次翻转90°,这样就能得到4张n×n的图像;之后将4张n×n的图像输入图像重建模型,特征提取网络输出4张辅助特征图;再按照逆时针方向将对应的3张辅助特征图分别翻转90°、180°和270°;再对处理后的4张辅助特征图进行像素平均,得到最终的训练特征图。For example, for an image of n×n, flip it 3 times in the clockwise direction, 90° each time, so that 4 images of n×n can be obtained; then the 4 images of n×n are input into the image reconstruction model , the feature extraction network outputs 4 auxiliary feature maps; then flip the corresponding 3 auxiliary feature maps by 90°, 180° and 270° in the counterclockwise direction; and then perform pixel averaging on the processed 4 auxiliary feature maps to obtain The final training feature map.
需要指出的是,可以先对低分辨率图像进行自减平均值处理,再对低分辨率图像进行翻转对称处理;也可以先对低分辨率图像进行翻转对称处理,再对低分辨率图像进行自减平均值处理。可以根据实际需要灵活设置,在此不作限制。It should be pointed out that the low-resolution image can be subjected to self-reducing average processing first, and then the low-resolution image can be flipped symmetrically; Self-decreasing average processing. It can be flexibly set according to actual needs, and is not limited here.
另外,在实际应用中,为了提高模型的处理速度,可以在完成训练的模型基础上,进行新的模型的训练。例如,在训练3倍、4倍的放大模型时,假设2倍的放大模型是完成训练的,则可以将2倍放大的模型的参数作为3倍、4倍的放大模型的初始参数,在此基础上进行训练。In addition, in practical applications, in order to improve the processing speed of the model, a new model can be trained on the basis of the completed model. For example, when training 3x and 4x magnification models, assuming that the 2x magnification model has been trained, the parameters of the 2x magnification model can be used as the initial parameters of the 3x and 4x magnification models. Here based on training.
根据本申请的示例性实施方式,还提供一种人像超分辨率重建模型训练方法,用于训练得到用于前述示例性实施方式中的人像超分辨率重建方法的重建模型,图14示出了本申请实施例提供的人像超分辨率重建模型训练方法的流程示意图。According to an exemplary embodiment of the present application, a method for training a portrait super-resolution reconstruction model is also provided, which is used for training to obtain a reconstruction model for the portrait super-resolution reconstruction method in the foregoing exemplary embodiment. FIG. 14 shows A schematic flowchart of a method for training a portrait super-resolution reconstruction model provided in an embodiment of the present application.
如图所示,根据本申请的人像超分辨率重建模型训练方法包括:As shown in the figure, the method for training a portrait super-resolution reconstruction model according to the present application includes:
步骤S2100,获取训练样本以及所述训练样本对应的目标样本;Step S2100, acquiring training samples and target samples corresponding to the training samples;
步骤S2200,利用构建的生成网络对所述训练样本进行关键点检测,得到训练关键点;Step S2200, using the constructed generation network to perform key point detection on the training sample to obtain training key points;
步骤S2300,基于所述训练关键点和所述训练样本进行超分辨率重建处理和复原处理,得到输出图像;Step S2300, performing super-resolution reconstruction processing and restoration processing based on the training key points and the training samples to obtain an output image;
步骤S2400,比对所述输出图像和所述目标样本,并基于比对结果对所述生成网络进行网络参数调整后继续训练,直至满足第一预设条件时得到重建模型。Step S2400, compare the output image with the target sample, and adjust the network parameters of the generating network based on the comparison result, and then continue training until a reconstructed model is obtained when a first preset condition is satisfied.
本申请实施例提供的人像超分辨率重建模型训练方法中,通过对训练样本进行关键点检测,并基于训练关键点和训练样本的图像特征结合进行模型的训练,可提高得到的重建模型的重建准确度。In the portrait super-resolution reconstruction model training method provided by the embodiment of the present application, by performing key point detection on the training samples, and training the model based on the combination of the training key points and the image features of the training samples, the reconstruction of the obtained reconstruction model can be improved. Accuracy.
在一些实施例中,预先采集多个训练样本,各个训练样本可以是包含清晰度较低的人脸图像的样本图像。而训练样本所对应的目标样本,即为满足要求的,即在对训练样本进行处理后所希望得到的高清晰度的样本。In some embodiments, a plurality of training samples are collected in advance, and each training sample may be a sample image including a face image with lower definition. The target sample corresponding to the training sample is the one that meets the requirements, that is, the high-definition sample expected to be obtained after processing the training sample.
在一些实施例中,预先构建的生成网络可以是循环递归网络,利用生成网络对训练样本进行关键点检测、超分辨率重建处理和复原处理的过程可参见上述描述。在处理之后,生成网络可输出训练样本所对应的输出图像。In some embodiments, the pre-built generative network may be a recurrent recurrent network, and the process of using the generative network to perform key point detection, super-resolution reconstruction processing and restoration processing on training samples can be referred to the above description. After processing, the generative network can output the output images corresponding to the training samples.
目标样本作为生成网络处理质量的比对标准,可以通过比对输出图像和目标样本之间差异,以根据比对结果来不断训练生成网络,以使得到的输出图像与目标样本的差异降低至满足一定要求的情况时,得到重建模型。The target sample is used as a comparison standard for the processing quality of the generation network. By comparing the difference between the output image and the target sample, the generation network can be continuously trained according to the comparison result, so that the difference between the output image and the target sample is reduced to meet the requirements. When certain requirements are required, the reconstructed model is obtained.
在一些实施例中,针对输入至生成网络的样本,可对样本进行预处理,例如自减平均值的方式,从而提出图片纹理的细节,以提高后续处理、识别的效果。In some embodiments, for the samples input to the generation network, the samples may be preprocessed, for example, by means of self-reduction, so as to bring out the details of the image texture, so as to improve the effect of subsequent processing and recognition.
在此基础上,还可对预处理后的样本进行翻转对称操作再输入至生成网络,对于生成网络的各网络层的输出结果,可对输出结果进行反翻转对称求平均值处理,如此,可以减少各向异性带来的某些网络层或者参数的偏差。On this basis, the preprocessed samples can also be inverted symmetrically and then input to the generation network. For the output results of each network layer of the generation network, the output results can be reversed and symmetrically averaged. In this way, we can Reduce the deviation of some network layers or parameters caused by anisotropy.
在对生成网络进行训练以及测试的过程中,可以根据需求以及测试的结果对网络进行剪枝处理,以保留前面对结果影响较大的若干次循环,在此基础上,再以此为基础继续训练,从而提高得到的生成网络的重建精度,后续处理后的图像的峰值信噪比和结构相似性也可得到较大的提升。In the process of training and testing the generated network, the network can be pruned according to the requirements and the results of the test to retain several previous cycles that have a greater impact on the results. Continue the training to improve the reconstruction accuracy of the resulting generative network, and the peak signal-to-noise ratio and structural similarity of the subsequently processed images can also be greatly improved.
在本申请的一些实施例中,可构建损失函数以检测生成网络的训练。In some embodiments of the present application, a loss function may be constructed to detect training of the generative network.
请参阅图15,根据本申请的人像超分辨率重建模型训练方法的上述步骤S2400可通过以下方式实现:Please refer to FIG. 15 , according to the above-mentioned step S2400 of the method for training a portrait super-resolution reconstruction model of the present application, it can be implemented in the following ways:
步骤S2410,基于所述输出图像的像素信息和所述目标样本的像素信息之间的差异,构建第一损失函数;Step S2410, constructing a first loss function based on the difference between the pixel information of the output image and the pixel information of the target sample;
步骤S2420,基于所述输出图像中各个人脸关键点和所述目标样本中对应人脸关键点之间的差异,构建第二损失函数;Step S2420, based on the difference between each face key point in the output image and the corresponding face key point in the target sample, construct a second loss function;
步骤S2430,比对所述输出图像和所述目标样本,并基于比对结果对所述生成网络进行网络参数调整后继续训练,直至所述第一损失函数和所述第二损失函数加权后的函数值满足第一预设条件时得到重建模型。Step S2430, compare the output image and the target sample, and adjust the network parameters of the generating network based on the comparison result and continue training until the weighted first loss function and the second loss function are obtained. The reconstructed model is obtained when the function value satisfies the first preset condition.
在根据本申请的实施例中,可构建第一损失函数和第二损失函数以综合评价生成网络的训练。其中,第一损失函数从图像之间的像素差异的角度进行评价。此外,考虑到本实施例中图像经过了人脸关键点的检测,而人脸关键点对于人像重建尤为重要,因此,加入了以人脸关键点之间的差异信息所构建的第二损失函数。In the embodiments according to the present application, the first loss function and the second loss function may be constructed to comprehensively evaluate the training of the generative network. Among them, the first loss function is evaluated from the perspective of pixel differences between images. In addition, considering that the image in this embodiment has been detected by face key points, and face key points are particularly important for portrait reconstruction, a second loss function constructed with the difference information between face key points is added. .
其中,第一损失函数表征生成网络的输出图像与目标样本(即想要的输出效果)之间的整体的像素级的欧式距离,第二损失函数表征生成网络的进行人脸关键点检测之后各人脸关键点与目标样本(想要的输出效果)中对应人脸关键点之间的欧式距离。Among them, the first loss function represents the overall pixel-level Euclidean distance between the output image of the generation network and the target sample (that is, the desired output effect), and the second loss function represents the face key point detection of the generation network. The Euclidean distance between the face key points and the corresponding face key points in the target sample (desired output effect).
将上述第一损失函数和第二损失函数进行加权组合,以共同作为生成函数的损失函数。在生成网络的训练过程中,通过比对输出图像和目标样本,也即进行包含第一损失函数和第二损失函数的综合损失函数的函数值的计算。以在得到的函数值满足第一预设条件时,得到重建模型。其中,第一预设条件可以是损失函数值不再下降以达到收敛,或者是损失函数值低于某个预设值。或者也可以在迭代次数达到预设最大次数时,停止训练得到重建模型。The above-mentioned first loss function and second loss function are weighted and combined to jointly serve as the loss function of the generating function. In the training process of the generative network, the function value of the comprehensive loss function including the first loss function and the second loss function is calculated by comparing the output image and the target sample. The reconstructed model is obtained when the obtained function value satisfies the first preset condition. The first preset condition may be that the value of the loss function no longer decreases to achieve convergence, or that the value of the loss function is lower than a preset value. Alternatively, when the number of iterations reaches a preset maximum number, the training can be stopped to obtain a reconstructed model.
根据本申请的实施例,采用基于像素信息差异所构建的第一损失函数和基于人脸关键点之间的差异的第二损失函数,进行重建模型的训练监督判断,可提高后续应用重建模型进行重建处理时得到的超分辨率图像的认知度。According to the embodiment of the present application, the first loss function constructed based on the difference of pixel information and the second loss function based on the difference between the key points of the face are used to perform the training supervision and judgment of the reconstructed model, which can improve the subsequent application of the reconstructed model to improve the performance of the reconstruction model. The awareness of the super-resolution images obtained during reconstruction processing.
根据本申请的实施例,通过上述预先所构建的由生成网络所得到的重建模型,以应用于上述待处理图像的重建,可提高得到的超分辨率图像的认知度。According to the embodiment of the present application, the recognition of the obtained super-resolution image can be improved by applying the above-mentioned pre-built reconstruction model obtained by the generative network to the above-mentioned reconstruction of the to-be-processed image.
由上述可知,根据本申请实施例的人像超分辨率重建模型训练方法中的重建模型包含生成网络,该生成网络为预先训练而构建得到的,可以对低清晰度的图像进行处理,以输出对应的超分辨率图像的模型。It can be seen from the above that the reconstruction model in the method for training a portrait super-resolution reconstruction model according to the embodiment of the present application includes a generation network, which is constructed for pre-training and can process low-resolution images to output corresponding images. model of super-resolution images.
在根据本申请的一种可能的实施方式中,为了进一步提高得到的重建模型的重建效果,重建模型还可包括判别器,该判别器可以用于监督生成网络的训练。由此,在本实施方式中,生成网络为在训练好的判别器的监督下利用训练样本进行训练后所获得的生成网络。In a possible implementation manner according to the present application, in order to further improve the reconstruction effect of the obtained reconstruction model, the reconstruction model may further include a discriminator, and the discriminator may be used to supervise the training of the generation network. Therefore, in this embodiment, the generation network is a generation network obtained after training with training samples under the supervision of the trained discriminator.
在一些实施方式中,根据本申请的人像超分辨率重建模型训练方法还包括以下步骤:In some embodiments, the method for training a portrait super-resolution reconstruction model according to the present application further comprises the following steps:
构建判别器,利用所述判别器对所述输出图像进行判别处理;constructing a discriminator, and using the discriminator to discriminate the output image;
根据得到的判别结果对所述判别器进行参数调整,直至满足第二预设条件时得到训练好的判别器。According to the obtained discrimination result, the parameters of the discriminator are adjusted until the trained discriminator is obtained when the second preset condition is satisfied.
在根据本申请的实施例中,判别器的主要实现原理是,尽可能地将真的图像(即满足要求的高分辨率图像)判别为真(例如输出判别结果为1),而将生成网络的输出图像尽可能地判别为假(例如输出判别结果为0),如此,可以监督生成网络不断进行训练,最终达到使判别器将生成网络的输出图像判别为真的效果。也即判别器作为生成网络的监督器,以不断优化生成网络的训练。In the embodiment according to the present application, the main realization principle of the discriminator is to discriminate a real image (that is, a high-resolution image that meets the requirements) as real as possible (for example, output a discriminant result of 1), and generate a network The output image of the generator can be judged as false as much as possible (for example, the output discrimination result is 0), so that the generator network can be supervised for continuous training, and finally the discriminator can judge the output image of the generator network as true. That is, the discriminator acts as the supervisor of the generative network to continuously optimize the training of the generative network.
在利用判别器作为监督器以优化生成网络时,首先需要进行判别器的训练优化,使得判别器能够进行准确地判定。本实施例中,可预先构建判别器的损失函数,该损失函数可由生成网络的输出图像的判别信息以及判别器对目标样本的判别信息所构成。When using the discriminator as a supervisor to optimize the generation network, it is first necessary to train and optimize the discriminator, so that the discriminator can make accurate judgments. In this embodiment, a loss function of the discriminator may be pre-built, and the loss function may be composed of discriminant information of the output image of the generation network and discriminant information of the target sample by the discriminator.
对判别器的训练过程,即为对上述损失函数进行最小化的过程,在上述损失函数值不再下降以达到收敛时,可判定对判别器的训练满足第二预设条件,可得到训练好的判别器,即可将判别器固定下来。The training process of the discriminator is the process of minimizing the above-mentioned loss function. When the value of the above-mentioned loss function no longer decreases to achieve convergence, it can be determined that the training of the discriminator satisfies the second preset condition, and the training can be obtained. The discriminator can be fixed.
在根据本申请的实施例中,在重建模型中加入判别器,以构成包含判别器和生成网络的对抗网络,可以进一步提高得到的重建模型的重建效果。In the embodiment according to the present application, a discriminator is added to the reconstruction model to form an adversarial network including the discriminator and the generation network, which can further improve the reconstruction effect of the obtained reconstruction model.
在一种可能的实施方式下,在重建模型中加入判别器以构成对抗网络的情形下,对生成网络的训练及调整,可加入判别器的相关判别性。In a possible implementation, when a discriminator is added to the reconstructed model to form an adversarial network, the training and adjustment of the generation network may include the relevant discriminant of the discriminator.
请参阅图16,根据本申请的人像超分辨率重建模型训练方法中的上述步骤S2400可以包括以下子步骤:Please refer to FIG. 16 , according to the above-mentioned step S2400 in the training method of the portrait super-resolution reconstruction model of the present application, the following sub-steps may be included:
步骤S2410’,将所述输出图像输入至训练好的所述判别器得到判别信息;Step S2410', inputting the output image to the trained discriminator to obtain discriminant information;
步骤S2420’,比对所述输出图像和所述目标样本,得到比对结果;Step S2420', compare the output image and the target sample to obtain a comparison result;
步骤S2430’,根据所述判别信息和所述比对结果对所述生成网络进行网络参数调整后继续训练,直至满足第一预设条件时得到重建模型。Step S2430', after adjusting the network parameters of the generating network according to the discrimination information and the comparison result, continue training until a reconstructed model is obtained when the first preset condition is satisfied.
根据上述实施例,在加入判别器的情况下,可结合输出图像与目标样本之间的差异,以及判别器对输出图像的判别信息,以对生成网络进行训练调整。According to the above embodiment, when a discriminator is added, the difference between the output image and the target sample and the discriminator's discriminant information on the output image can be combined to adjust the training of the generation network.
在根据本申请的一些实施例中,可通过以下方式进行损失函数的构建,并利用构建的损失函数进行重建模型训练:In some embodiments according to the present application, the construction of the loss function may be performed in the following manner, and the reconstruction model training is performed by using the constructed loss function:
基于所述输出图像的像素信息和所述目标样本的像素信息之间的差异,构建第一损失函数;constructing a first loss function based on the difference between the pixel information of the output image and the pixel information of the target sample;
基于所述输出图像中各个人脸关键点和所述目标样本中对应人脸关键点之间的差异,构建第二损失函数;Construct a second loss function based on the difference between each face key point in the output image and the corresponding face key point in the target sample;
基于所述判别器对所述输出图像的判别信息构建第三损失函数,并基于预先构建的人像认知模型得到的所述输出图像和所述目标样本之间的图像差异构建第四损失函数;A third loss function is constructed based on the discriminant information of the output image by the discriminator, and a fourth loss function is constructed based on the image difference between the output image and the target sample obtained by the pre-built portrait cognitive model;
根据所述判别信息和所述比对结果对所述生成网络进行网络参数调整后继续训练,直至所述第一损失函数、第二损失函数、第三损失函数和第四损失函数加权后得到的函数值满足第一预设条件时得到重建模型。After adjusting the network parameters of the generating network according to the discriminant information and the comparison result, continue training until the first loss function, the second loss function, the third loss function and the fourth loss function are weighted. The reconstructed model is obtained when the function value satisfies the first preset condition.
其中,输出图像与目标样本之间的差异对生成网络的调整所产生的影响可利用第一损失函数和第二损失函数来体现。判别器对输出图像的判别信息对生成网络的训练调整所产生的影响可利用第三损失函数来体现。此外,为了加强得到的超分辨率图像的人眼认知程度,还可加入由人像认知模型所得到的输出图像与目标样本之间的图像差异所构建的第四损失函数。The influence of the difference between the output image and the target sample on the adjustment of the generation network can be represented by the first loss function and the second loss function. The influence of the discriminator's discriminant information on the output image on the training adjustment of the generation network can be represented by the third loss function. In addition, in order to enhance the human eye cognition degree of the obtained super-resolution image, a fourth loss function constructed from the image difference between the output image obtained by the portrait cognition model and the target sample can also be added.
本实施例中,上述第一损失函数为基于输出图像的像素信息与对应的目标样本的像素信息之间的差异所构建,第二损失函数由输出图像中各个人脸关键点和目标样本中对应人脸关键点之间的差异所构建得到。由于构建判别器以监督生成网络的训练目的是为了使得生成网络得到的输出图像最终能够被判别器判别为真,因此,第三损失函数即由判别器对输出图像的判别信息所构建。而第四损失函数是由人像认知模型得到的输出图像与目标样本之间的人脸特征差异所构建得到。In this embodiment, the above-mentioned first loss function is constructed based on the difference between the pixel information of the output image and the pixel information of the corresponding target sample, and the second loss function is determined by the corresponding key points of each face in the output image and the target sample. It is constructed by the difference between the key points of the face. Since the purpose of constructing the discriminator to supervise the training of the generation network is to make the output image obtained by the generation network finally judged to be true by the discriminator, the third loss function is constructed by the discriminator's discriminative information on the output image. The fourth loss function is constructed by the difference of facial features between the output image obtained by the portrait cognitive model and the target sample.
最终得到的生成网络的损失函数可由上述第一损失函数、第二损失函数、第三损失函数和第四损失函数进行加权组合得到。The finally obtained loss function of the generating network can be obtained by weighted combination of the above-mentioned first loss function, second loss function, third loss function and fourth loss function.
因此,根据本申请的实施例的人像超分辨率重建模型训练方法,可根据判别器的判别信息、输出图像与目标样本之间的比对结果对生成网络进行网络参数调整后继续训练,实质即为训练调整并进行上述组合后的损失函数的函数值的计算过程,直至由第一损失函数、第二损失函数、第三损失函数和第四损失函数加权后得到的函数值满足第一预设条件时,可得到训练好的重建模型。Therefore, according to the method for training a portrait super-resolution reconstruction model according to the embodiment of the present application, the network parameters can be adjusted for the generation network according to the discrimination information of the discriminator and the comparison result between the output image and the target sample, and then the training can be continued. The calculation process of the function value of the loss function after the above-mentioned combination is adjusted for training, until the function value weighted by the first loss function, the second loss function, the third loss function and the fourth loss function satisfies the first preset condition, the trained reconstruction model can be obtained.
本实施例中,通过加入判别器以监督生成网络的训练,可以提高得到的输出图像的人眼认知度,得到的图像清晰度更高。请参阅图17(a)至图17(c),其中,图17(a)为经过常规的插值处理后得到的图像,图17(b)为本申请中未加入判别器的实施方式下所得到的图像,而图17(c)为本申请中加入判别器的实施方式下所得到的图像。由图可见,本申请的方案下得到的图像相比常规的插值处理方式而言,清晰度明显更高、效果更好。而其中,加入判别器所得到的图像相比未加入判别器所得到的图像,在人眼认知上,图像更为清晰。In this embodiment, by adding a discriminator to supervise the training of the generation network, the human eye recognition of the obtained output image can be improved, and the obtained image has a higher definition. Please refer to FIGS. 17( a ) to 17 ( c ), wherein, FIG. 17 ( a ) is an image obtained after conventional interpolation processing, and FIG. 17 ( b ) is the embodiment of the application without adding a discriminator. The obtained image, and FIG. 17( c ) is the image obtained under the implementation of adding a discriminator in this application. As can be seen from the figure, the image obtained under the solution of the present application has significantly higher definition and better effect than the conventional interpolation processing method. Among them, the image obtained by adding the discriminator is more clear in the human eye cognition than the image obtained without adding the discriminator.
现在请参照图18。图18示出了本申请实施例提供的图像处理装置100的方框示意图。图像处理装置100应用于移动终端,包括:图像获取模块110、第一执行模块120及第二执行模块130。Refer now to Figure 18. FIG. 18 shows a schematic block diagram of an image processing apparatus 100 provided by an embodiment of the present application. The image processing apparatus 100 is applied to a mobile terminal, and includes an image acquisition module 110 , a first execution module 120 and a second execution module 130 .
图像获取模块110,可以被配置成用于获取待处理图像。The image acquisition module 110 may be configured to acquire images to be processed.
第一执行模块120,可以被配置成用于将待处理图像输入图像重建模型,利用图像重建模型的特征提取网络对待处理图像进行多尺度特征提取及扩展图像通道,得到重建特征图。The first execution module 120 may be configured to input the image to be processed into the image reconstruction model, and use the feature extraction network of the image reconstruction model to perform multi-scale feature extraction on the image to be processed and expand image channels to obtain a reconstructed feature map.
在可选的实施方式中,特征提取网络包括卷积层、多个级联块和多个第一卷积层,多个级联块和多个第一卷积层交替设置,特征提取网络采用全局级联结构;In an optional embodiment, the feature extraction network includes a convolutional layer, a plurality of concatenated blocks and a plurality of first convolutional layers, and the plurality of concatenated blocks and the plurality of first convolutional layers are alternately arranged, and the feature extraction network adopts global cascade structure;
第一执行模块120具体可以被配置成用于:将待处理图像输入卷积层进行卷积处理,得到初始特征图;将初始特征图作为第一个级联块的输入、以及将第N-1个第一卷积层的输出作为第N个级联块的输入,利用级联块进行多尺度特征提取,输出中间特征图;将初始特征图和第N个第一卷积层前每个级联块输出的中间特征图进行通道叠加,并在叠加后输入第N个第一卷积层进行卷积处理;将最后一个第一卷积层的输出作为重建特征图。The first execution module 120 may be specifically configured to: input the image to be processed into the convolution layer for convolution processing to obtain an initial feature map; use the initial feature map as the input of the first concatenated block; The output of the first convolutional layer is used as the input of the Nth convolutional block, and the multi-scale feature extraction is performed by the concatenated block, and the intermediate feature map is output; The intermediate feature maps output by the concatenated blocks are channel-stacked, and after stacking, they are input to the Nth first convolutional layer for convolution processing; the output of the last first convolutional layer is used as the reconstructed feature map.
在可选的实施方式中,级联块包括多个残差块和多个第二卷积层,多个残差块和多个第二卷积层交替设置,级联块采用局部级联结构;In an optional embodiment, the concatenated block includes multiple residual blocks and multiple second convolutional layers, the multiple residual blocks and multiple second convolutional layers are alternately arranged, and the concatenated block adopts a local cascade structure ;
第一执行模块120可以执行利用级联块进行多尺度特征提取,输出中间特征图的方式,包括:将级联块的输入作为第一个残差块的输入、以及将第N-1个第二卷积层的输出作为第N个残差块的输入,利用残差块学习残差特征,得到残差特征图;将级联块的输入和第N个第二卷积层前每个残差块的输出进行通道叠加,并在叠加后输入第N个第二卷积层进行卷积处理;将最后一个第二卷积层的输出作为中间特征图。The first execution module 120 may perform multi-scale feature extraction using concatenated blocks, and output an intermediate feature map, including: taking the input of the concatenated block as the input of the first residual block, and using the N-1 th block as the input of the first residual block. The output of the second convolutional layer is used as the input of the Nth residual block, and the residual feature is learned by using the residual block to obtain the residual feature map; The output of the difference block is subjected to channel stacking, and after stacking, it is input to the Nth second convolutional layer for convolution processing; the output of the last second convolutional layer is used as the intermediate feature map.
在可选的实施方式中,残差块包括分组卷积层、第三卷积层和第四卷积层,分组卷积层采用ReLu激活函数,分组卷积层和第三卷积层连接形成残差路径,残差块采用局部跳跃连接结构;In an optional embodiment, the residual block includes a grouped convolutional layer, a third convolutional layer and a fourth convolutional layer, the grouped convolutional layer adopts a ReLu activation function, and the grouped convolutional layer and the third convolutional layer are connected to form Residual path, residual block adopts local skip connection structure;
第一执行模块120可以执行利用残差块学习残差特征,得到残差特征图的方式,包括:将残差块的输入作为分组卷积层的输入,通过残差路径提取特征;将残差块的输入和第三卷积层的输出进行特征融合,并在融合后输入第四卷积层进行卷积处理,输出残差特征图。The first execution module 120 may perform the method of learning residual features by using the residual block to obtain the residual feature map, including: using the input of the residual block as the input of the grouped convolution layer, and extracting features through the residual path; The input of the block and the output of the third convolutional layer are feature fusion, and after fusion, they are input to the fourth convolutional layer for convolution processing, and the residual feature map is output.
第二执行模块130,可以被配置成用于利用图像重建模型的子像素卷积层对重建特征图进行放大,得到重建图像。The second execution module 130 may be configured to use the sub-pixel convolution layer of the image reconstruction model to amplify the reconstructed feature map to obtain a reconstructed image.
在可选的实施方式中,第二执行模块130具体可以被配置成用于:利用子像素卷积层调整重建特征图中的像素位置,得到重建图像。In an optional embodiment, the second execution module 130 may be specifically configured to: use a sub-pixel convolution layer to adjust the pixel positions in the reconstructed feature map to obtain a reconstructed image.
请参照图19,图19示出了本申请实施例提供的图像重建模型训练装置200的方框示意图。模型训练装置200应用于任何具有图像处理功能的电子设备,可以包括:样本获取模块210、第一处理模块220、第二处理模块230、第三处理模块240及第四处理模块250。Referring to FIG. 19 , FIG. 19 shows a schematic block diagram of an image reconstruction model training apparatus 200 provided by an embodiment of the present application. The model training apparatus 200 is applied to any electronic device with image processing function, and may include: a sample acquisition module 210 , a first processing module 220 , a second processing module 230 , a third processing module 240 and a fourth processing module 250 .
样本获取模块210,可以被配置成用于获取训练样本,训练样本包括低分辨率图像和高分辨率图像,低分辨率图像是对高分辨率图像 进行下采样得到的。The sample acquisition module 210 may be configured to acquire training samples, where the training samples include low-resolution images and high-resolution images, and the low-resolution images are obtained by down-sampling the high-resolution images.
第一处理模块220,可以被配置成用于将低分辨率图像输入预先构建的图像重建模型,图像重建模型包括特征提取网络和子像素卷积层。The first processing module 220 may be configured to input the low-resolution image into a pre-built image reconstruction model, where the image reconstruction model includes a feature extraction network and a sub-pixel convolutional layer.
第二处理模块230,可以被配置成用于利用特征提取网络对低分辨率图像进行多尺度特征提取及扩展图像通道,得到训练特征图。The second processing module 230 may be configured to use a feature extraction network to perform multi-scale feature extraction on the low-resolution image and expand image channels to obtain a training feature map.
第三处理模块240,可以被配置成用于利用子像素卷积层对训练特征图进行放大,得到训练重建图像。The third processing module 240 may be configured to use a sub-pixel convolutional layer to amplify the training feature map to obtain a training reconstructed image.
第四处理模块250,可以被配置成用于基于训练重建图像、高分辨率图像和预设的目标函数对图像重建模型进行反向传播训练,得到训练后的图像重建模型。The fourth processing module 250 may be configured to perform back-propagation training on the image reconstruction model based on the training reconstructed image, the high-resolution image and the preset objective function to obtain a trained image reconstruction model.
在可选的实施方式中,目标函数为L2损失函数;In an optional embodiment, the objective function is an L2 loss function;
第四处理模块250具体可以被配置成用于:基于训练重建图像、高分辨率图像和L2损失函数对图像重建模型进行反向传播训练,以对图像重建模型的参数进行调整,直至达到预设的训练完成条件,得到训练后的图像重建模型。The fourth processing module 250 may be specifically configured to: perform back-propagation training on the image reconstruction model based on the training reconstructed image, the high-resolution image and the L2 loss function, so as to adjust the parameters of the image reconstruction model until the preset value is reached. The training completion condition is obtained, and the image reconstruction model after training is obtained.
在可选的实施方式中,第一处理模块220还可以被配置成用于:对训练后的图像重建模型进行剪枝,以保留长线级联及删除短线级联。In an optional embodiment, the first processing module 220 may also be configured to: prune the trained image reconstruction model, so as to retain long-line cascades and delete short-line cascades.
在可选的实施方式中,第一处理模块220还可以被配置成用于:对低分辨率图像进行翻转对称处理,得到至少一个处理后的低分辨率图像。In an optional embodiment, the first processing module 220 may also be configured to: perform flip symmetry processing on the low-resolution image to obtain at least one processed low-resolution image.
第二处理模块230具体可以被配置成用于:将至少一个处理后的低分辨率图像输入图像重建模型。The second processing module 230 may be specifically configured to: input at least one processed low-resolution image into the image reconstruction model.
第三处理模块240具体可以被配置成用于:利用特征提取网络对至少一个处理后的低分辨率图像进行多尺度特征提取,得到至少一个辅助特征图;对至少一个辅助特征图进行反翻转对称处理,并在反翻转对称处理后求平均值,得到所训练特征图。The third processing module 240 may be specifically configured to: use a feature extraction network to perform multi-scale feature extraction on at least one processed low-resolution image to obtain at least one auxiliary feature map; perform reverse flip symmetry on the at least one auxiliary feature map processing, and averaged after anti-flip symmetry processing to obtain the trained feature map.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的图像处理装置100和模型训练装置200的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。请参照图20,图20示出了本申请实施例提供的电子设备10的方框示意图。电子设备10可以是执行上述图像处理方法的移动终端,也可以是执行上述模型训练方法的任何具有图像处理功能的电子设备。电子设备10包括处理器11、存储器12及总线13,处理器11通过总线13与存储器12连接。Those skilled in the art can clearly understand that, for the convenience and brevity of the description, the specific working processes of the image processing apparatus 100 and the model training apparatus 200 described above may refer to the corresponding processes in the foregoing method embodiments, which are not repeated here. Repeat. Referring to FIG. 20 , FIG. 20 shows a schematic block diagram of an electronic device 10 provided by an embodiment of the present application. The electronic device 10 may be a mobile terminal that executes the above image processing method, or may be any electronic device having an image processing function that executes the above model training method. The electronic device 10 includes a processor 11 , a memory 12 and a bus 13 , and the processor 11 is connected to the memory 12 through the bus 13 .
存储器12用于存储程序,例如图18所示的图像处理装置100、或者图19所示的模型训练装置200。以图像处理装置100为例,图像处理装置100包括至少一个可以软件或固件(firmware)的形式存储于存储器12中的软件功能模块,处理器11在接收到执行指令后,执行所述程序以实现上述实施例揭示的图像处理方法。The memory 12 is used to store programs, such as the image processing apparatus 100 shown in FIG. 18 or the model training apparatus 200 shown in FIG. 19 . Taking the image processing apparatus 100 as an example, the image processing apparatus 100 includes at least one software function module that can be stored in the memory 12 in the form of software or firmware. After receiving the execution instruction, the processor 11 executes the program to realize The image processing methods disclosed in the above embodiments.
存储器12可能包括高速随机存取存储器(Random Access Memory,RAM),也可能还包括非易失存储器(non-volatile memory,NVM)。The memory 12 may include a high-speed random access memory (Random Access Memory, RAM), and may also include a non-volatile memory (non-volatile memory, NVM).
处理器11可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器11中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器11可以是通用处理器,包括中央处理器(Central Processing Unit,CPU)、微控制单元(Microcontroller Unit,MCU)、复杂可编程逻辑器件(Complex Programmable Logic Device,CPLD)、现场可编程门阵列(Field Programmable Gate Array,FPGA)、嵌入式ARM等芯片。The processor 11 may be an integrated circuit chip with signal processing capability. In the implementation process, each step of the above-mentioned method can be completed by an integrated logic circuit of hardware in the processor 11 or an instruction in the form of software. The above-mentioned processor 11 can be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a microcontroller unit (Microcontroller Unit, MCU), a complex programmable logic device (Complex Programmable Logic Device, CPLD), field programmable Gate Array (Field Programmable Gate Array, FPGA), embedded ARM and other chips.
本申请实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器11执行时实现上述实施例揭示的图像处理方法、或者模型训练方法。Embodiments of the present application further provide a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by the processor 11, implements the image processing method or the model training method disclosed in the foregoing embodiments.
综上所述,本申请实施例提供的一种图像处理和模型训练方法、装置、电子设备及存储介质,通过获取待处理图像并输入图像重建模型,图像重建模型包括特征提取网络和子像素卷积层,先利用特征提取网络对待处理图像进行多尺度特征提取及扩展图像通道得到重建特征图,再利用子像素卷积层对重建特征图进行放大得到重建图像。能够在保证重建效果的同时提高了处理速度。To sum up, an image processing and model training method, device, electronic device, and storage medium provided by the embodiments of the present application obtain an image to be processed and input an image reconstruction model, and the image reconstruction model includes a feature extraction network and a sub-pixel convolution First, the feature extraction network is used to extract the multi-scale feature of the image to be processed and expand the image channel to obtain the reconstructed feature map, and then use the sub-pixel convolution layer to amplify the reconstructed feature map to obtain the reconstructed image. The processing speed can be improved while ensuring the reconstruction effect.
请参阅图21,为本申请实施例提供的电子设备的示例性组件示意图。该电子设备可包括存储介质2110、处理器2120、机器可执行指令2130(该机器可执行指令2130可以是根据本申请的人像超分辨率重建装置131或人像超分辨率重建模型训练装置132)及通信接口140。本实施例中,存储介质2110与处理器2120均位于电子设备中且二者分离设置。然而,应当理解的是,存储介质2110也可以是独立于电子设备之外,且可以由处理器2120通过总线接口来访问。可替换地,存储介质2110也可以集成到处理器2120中,例如,可以是高速缓存和/或通用寄存器。Please refer to FIG. 21 , which is a schematic diagram of an exemplary component of an electronic device provided in an embodiment of the present application. The electronic device may include a storage medium 2110, a processor 2120, machine-executable instructions 2130 (the machine-executable instructions 2130 may be the portrait super-resolution reconstruction apparatus 131 or the portrait super-resolution reconstruction model training apparatus 132 according to the present application) and Communication interface 140 . In this embodiment, the storage medium 2110 and the processor 2120 are both located in the electronic device and are provided separately. However, it should be understood that the storage medium 2110 may also be independent of the electronic device, and may be accessed by the processor 2120 through a bus interface. Alternatively, the storage medium 2110 may also be integrated into the processor 2120, for example, may be a cache and/or a general purpose register.
机器可执行指令2130可以理解为图21所述的电子设备,或电子设备的处理器2120,也可以理解为独立于图21所述的电子设备或处理器2120之外的在电子设备控制下实现上述人像超分辨率重建方法或人像超分辨率重建模型训练方法的软件功能模块。The machine-executable instructions 2130 can be understood as the electronic device described in FIG. 21 , or the processor 2120 of the electronic device, and can also be understood as being implemented under the control of the electronic device independently of the electronic device or the processor 2120 described in FIG. 21 The software function module of the above-mentioned portrait super-resolution reconstruction method or portrait super-resolution reconstruction model training method.
如图22所示,上述人像超分辨率重建装置131可以包括检测模块1311、处理模块1312和复原模块1313。下面分别对该人像超分辨率重建装置131的各个功能模块的功能进行详细阐述。As shown in FIG. 22 , the above-mentioned human portrait super-resolution reconstruction apparatus 131 may include a detection module 1311 , a processing module 1312 and a restoration module 1313 . The functions of each functional module of the portrait super-resolution reconstruction apparatus 131 will be described in detail below.
检测模块1311,可以被配置成用于利用预先构建的重建模型对待处理图像进行关键点检测,得到人脸关键点;The detection module 1311 can be configured to use a pre-built reconstruction model to perform key point detection on the image to be processed to obtain face key points;
可以理解,该检测模块1311可以被配置成用于执行上述步骤S110,关于该检测模块1311的详细实现方式可以参照上述对步骤S110有关的内容。It can be understood that the detection module 1311 may be configured to perform the above step S110, and for the detailed implementation of the detection module 1311, please refer to the above-mentioned content related to the step S110.
处理模块1312,可以被配置成用于根据所述人脸关键点和基于所述待处理图像得到的图像特征进行超分辨率重建处理,得到图像高频信息;The processing module 1312 can be configured to perform super-resolution reconstruction processing according to the face key points and the image features obtained based on the to-be-processed image to obtain high-frequency image information;
可以理解,该处理模块1312可以被配置成用于执行上述步骤S120,关于该处理模块1312的详细实现方式可以参照上述对步骤S120有关的内容。It can be understood that the processing module 1312 may be configured to execute the above-mentioned step S120, and for the detailed implementation of the processing module 1312, please refer to the above-mentioned content related to the step S120.
复原模块1313,可以被配置成用于利用所述图像高频信息对所述待处理图像进行复原处理,得到所述待处理图像对应的超分辨率图像。The restoration module 1313 may be configured to perform restoration processing on the to-be-processed image by using the high-frequency information of the image to obtain a super-resolution image corresponding to the to-be-processed image.
可以理解,该复原模块1313可以被配置成用于执行上述步骤S130,关于该复原模块1313的详细实现方式可以参照上述对步骤S130有关的内容。It can be understood that the restoration module 1313 may be configured to perform the above step S130, and for the detailed implementation of the restoration module 1313, please refer to the above-mentioned content related to the step S130.
所述人像超分辨率重建装置还可以包括:根据图18所述的图像处理装置,所述图像处理装置被配置成用于进行超分辨率重建处理。The portrait super-resolution reconstruction apparatus may further include: the image processing apparatus according to FIG. 18 , the image processing apparatus being configured to perform super-resolution reconstruction processing.
在一种可能的实现方式中,所述关键点检测、超分辨率重建处理及复原处理包括多轮迭代处理,所述待处理图像为未经处理的待处理图像,或前一轮迭代中经过所述关键点检测、超分辨率重建处理以及复原处理后得到的超分辨率图像。In a possible implementation manner, the key point detection, super-resolution reconstruction processing and restoration processing include multiple rounds of iterative processing, and the to-be-processed image is an unprocessed to-be-processed image, or an image that has been processed in a previous round of iterations. The super-resolution image obtained after the key point detection, super-resolution reconstruction processing and restoration processing.
在一种可能的实现方式中,所述人脸关键点包括多个,上述复原模块1313可以用于通过以下方式得到超分辨率图像:In a possible implementation manner, the face key points include multiple, and the above-mentioned restoration module 1313 can be used to obtain a super-resolution image in the following manner:
利用预先构建的人像认知模型对所述待处理图像进行处理,输出各所述人脸关键点的位置信息;Process the to-be-processed image by using a pre-built portrait cognitive model, and output the position information of each of the key points of the face;
基于各所述人脸关键点的位置信息以及所述图像高频信息,对所述待处理图像进行复原处理,得到所述待处理图像对应的超分辨率图像。Based on the position information of each of the face key points and the high-frequency information of the image, restoration processing is performed on the to-be-processed image to obtain a super-resolution image corresponding to the to-be-processed image.
在一种可能的实现方式中,复原模块1313可以用于通过以下方式基于各人脸关键点的位置信息以及图像高频信息,得到超分辨率图像:In a possible implementation manner, the restoration module 1313 may be configured to obtain a super-resolution image based on the position information of each face key point and the high-frequency information of the image in the following manner:
获取各所述人脸关键点对应的复原属性;obtaining the restoration attributes corresponding to each of the face key points;
根据各所述人脸关键点以及其对应的位置信息、图像高频信息、复原属性,对所述待处理图像中对应人脸关键点进行复原处理。According to each of the face key points and their corresponding position information, image high-frequency information, and restoration attributes, restoration processing is performed on the corresponding face key points in the to-be-processed image.
在一种可能的实现方式中,所述重建模型包括判别器和生成网络,所述生成网络为在训练好的判别器的监督下,利用训练样本进行训练后获得。In a possible implementation manner, the reconstructed model includes a discriminator and a generation network, and the generation network is obtained after training with training samples under the supervision of the trained discriminator.
在一种可能的实现方式中,所述人脸关键点包括左眼、右眼、鼻子、嘴巴及下巴轮廓。In a possible implementation manner, the face key points include left eye, right eye, nose, mouth and chin contours.
关于装置中的各模块的处理流程、以及各模块之间的交互流程的描述可以参照上述方法实施例中的相关说明,这里不再详述。如图23所示,上述人像超分辨率重建模型训练装置132可以包括获取模块1321、关键点获得模块1322、输出图像获得模块1323及训练模块1324。下面分别对该人像超分辨率重建模型训练装置132的各个功能模块的功能进行详细阐述。For the description of the processing flow of each module in the apparatus and the interaction flow between the modules, reference may be made to the relevant descriptions in the foregoing method embodiments, which will not be described in detail here. As shown in FIG. 23 , the above-mentioned portrait super-resolution reconstruction model training device 132 may include an acquisition module 1321 , a key point acquisition module 1322 , an output image acquisition module 1323 and a training module 1324 . The functions of each functional module of the portrait super-resolution reconstruction model training device 132 will be described in detail below.
获取模块1321,可以被配置成用于获取训练样本以及所述训练样本对应的目标样本;an acquisition module 1321, which can be configured to acquire training samples and target samples corresponding to the training samples;
可以理解,该获取模块1321可以被配置成用于执行上述步骤S2100,关于该获取模块1321的详细实现方式可以参照上述对步骤S2100有关的内容。It can be understood that the acquisition module 1321 may be configured to perform the above step S2100, and for the detailed implementation of the acquisition module 1321, please refer to the above-mentioned content related to the step S2100.
关键点获得模块1322,可以被配置成用于利用构建的生成网络对所述训练样本进行关键点检测,得到训练关键点;The key point obtaining module 1322 can be configured to perform key point detection on the training sample by using the constructed generating network to obtain training key points;
可以理解,该关键点获得模块1322可以被配置成用于执行上述步骤S2200,关于该关键点获得模块1322的详细实现方式可以参照上述对步骤S2200有关的内容。It can be understood that the key point obtaining module 1322 may be configured to perform the above step S2200, and for the detailed implementation of the key point obtaining module 1322, reference may be made to the above-mentioned content related to step S2200.
输出图像获得模块1323,可以被配置成用于基于所述训练关键点和所述训练样本进行超分辨率重建处理和复原处理,得到输出图像;The output image obtaining module 1323 can be configured to perform super-resolution reconstruction processing and restoration processing based on the training key points and the training samples to obtain an output image;
可以理解,该输出图像获得模块1323可以被配置成用于执行上述步骤S2300,关于该输出图像获得模块1323的详细实现方式可以参照上述对步骤S2300有关的内容。It can be understood that the output image obtaining module 1323 may be configured to perform the above-mentioned step S2300, and for the detailed implementation of the output image obtaining module 1323, reference may be made to the above-mentioned content related to the step S2300.
训练模块1324,可以被配置成用于比对所述输出图像和所述目标样本,并基于比对结果对所述生成网络进行网络参数调整后继续训练,直至满足第一预设条件时得到重建模型。The training module 1324 can be configured to compare the output image and the target sample, and adjust the network parameters of the generation network based on the comparison result and continue training until the reconstruction is obtained when the first preset condition is met Model.
可以理解,该训练模块1324可以被配置成用于执行上述步骤S2400,关于该训练模块1324的详细实现方式可以参照上述对步骤S2400有关的内容。It can be understood that the training module 1324 may be configured to perform the above-mentioned step S2400, and for the detailed implementation of the training module 1324, reference may be made to the above-mentioned content related to the step S2400.
在一种可能的实现方式中,训练模块1324可以被配置成用于基于输出图像和目标样本之间比对结果,并通过以下方式得到重建模型:In a possible implementation, the training module 1324 may be configured to obtain a reconstructed model based on the comparison result between the output image and the target sample in the following manner:
基于所述输出图像的像素信息和所述目标样本的像素信息之间的差异,构建第一损失函数;constructing a first loss function based on the difference between the pixel information of the output image and the pixel information of the target sample;
基于所述输出图像中各个人脸关键点和所述目标样本中对应人脸关键点之间的差异,构建第二损失函数;Construct a second loss function based on the difference between each face key point in the output image and the corresponding face key point in the target sample;
比对所述输出图像和所述目标样本,并基于比对结果对所述生成网络进行网络参数调整后继续训练,直至所述第一损失函数和所述第二损失函数加权后的函数值满足第一预设条件时得到所述重建模型。Comparing the output image and the target sample, and adjusting the network parameters of the generating network based on the comparison result, continue training until the weighted function values of the first loss function and the second loss function satisfy The reconstructed model is obtained under the first preset condition.
在一种可能的实现方式中,所述重建模型还包括判别器,所述判别器用于监督所述生成网络的训练,人像超分辨率重建模型训练装置132还包括构建模块,该构建模块用于:In a possible implementation manner, the reconstruction model further includes a discriminator, and the discriminator is used to supervise the training of the generation network, and the portrait super-resolution reconstruction model training device 132 further includes a building module, and the building module is used for :
构建判别器,利用所述判别器对所述输出图像以及所述输出图像对应的目标样本进行判别处理;constructing a discriminator, and using the discriminator to discriminate the output image and the target sample corresponding to the output image;
根据得到的判别结果对所述判别器进行参数调整,直至满足第二预设条件时得到训练好的判别器。According to the obtained discrimination result, the parameters of the discriminator are adjusted until the trained discriminator is obtained when the second preset condition is satisfied.
在一种可能的实现方式中,训练模块1324可以通过以下方式得到重建模型:In one possible implementation, the training module 1324 can obtain the reconstructed model in the following manner:
将所述输出图像输入至训练好的所述判别器得到判别信息;Inputting the output image to the trained discriminator to obtain discriminant information;
比对所述输出图像和所述目标样本,得到比对结果;Comparing the output image and the target sample to obtain a comparison result;
根据所述判别信息和所述比对结果对所述生成网络进行网络参数调整后继续训练,直至满足第一预设条件时得到所述重建模型。After adjusting the network parameters of the generation network according to the discrimination information and the comparison result, continue training until the reconstructed model is obtained when the first preset condition is satisfied.
在一种可能的实现方式中,训练模块1324可以被配置成用于基于判别信息和比对结果,并通过以下方式构建重建模型:In one possible implementation, the training module 1324 may be configured to construct a reconstruction model based on the discriminant information and the alignment results in the following manner:
基于所述输出图像的像素信息和所述目标样本的像素信息之间的差异,构建第一损失函数;constructing a first loss function based on the difference between the pixel information of the output image and the pixel information of the target sample;
基于所述输出图像中各个人脸关键点和所述目标样本中对应人脸关键点之间的差异,构建第二损失函数;Construct a second loss function based on the difference between each face key point in the output image and the corresponding face key point in the target sample;
基于所述判别器对所述输出图像的判别信息构建第三损失函数,并基于预先构建的人像认知模型得到的所述输出图像和所述目标样本之间的图像差异构建第四损失函数;A third loss function is constructed based on the discriminant information of the output image by the discriminator, and a fourth loss function is constructed based on the image difference between the output image and the target sample obtained by the pre-built portrait cognitive model;
根据所述判别信息和所述比对结果对所述生成网络进行网络参数调整后继续训练,直至所述第一损失函数、第二损失函数、第三损失函数和第四损失函数加权后得到的函数值满足第一预设条件时得到所述重建模型。After adjusting the network parameters of the generating network according to the discriminant information and the comparison result, continue training until the first loss function, the second loss function, the third loss function and the fourth loss function are weighted. The reconstructed model is obtained when the function value satisfies the first preset condition.
关于装置中的各模块的处理流程、以及各模块之间的交互流程的描述可以参照上述方法实施例中的相关说明,这里不再详述。For the description of the processing flow of each module in the apparatus and the interaction flow between the modules, reference may be made to the relevant descriptions in the foregoing method embodiments, which will not be described in detail here.
进一步地,本申请实施例还提供一种计算机可读存储介质,计算机可读存储介质存储有机器可执行指令130,机器可执行指令130被执行时实现上述实施例提供的人像超分辨率重建方法或人像超分辨率重建模型训练方法。Further, the embodiments of the present application further provide a computer-readable storage medium, where the computer-readable storage medium stores machine-executable instructions 130, and when the machine-executable instructions 130 are executed, realize the super-resolution reconstruction method for portraits provided by the above embodiments Or portrait super-resolution reconstruction model training method.
具体地,该计算机可读存储介质能够为通用的存储介质,如移动磁盘、硬盘等,该计算机可读存储介质上的计算机程序被运行时,能够执行上述人像超分辨率重建方法或人像超分辨率重建模型训练方法。关于计算机可读存储介质中的及其可执行指令被运行时,所涉及的过程,可以参照上述方法实施例中的相关说明,这里不再详述。Specifically, the computer-readable storage medium can be a general-purpose storage medium, such as a removable disk, a hard disk, etc., when the computer program on the computer-readable storage medium is run, the above-mentioned portrait super-resolution reconstruction method or portrait super-resolution method can be executed. Rate reconstruction model training method. For the processes involved when the computer-readable storage medium and its executable instructions are executed, reference may be made to the relevant descriptions in the foregoing method embodiments, which will not be described in detail here.
综上所述,本申请实施例提供的人像超分辨率重建方法、人像超分辨率重建模型训练方法、装置、电子设备和可读存储介质,通过利用预先构建的重建模型对待处理图像进行关键点检测,得到人脸关键点,再根据人脸关键点和基于待处理图像得到的图像特征进行超分辨率重建处理,得到图像高频信息,利用图像高频信息对待处理图像进行复原处理,得到待处理图像对应的超分辨率图像。本申请中,结合人脸关键点检测以及人脸恢复,实现图像的超分辨率重建,提高得到的超分辨率图像的认知度,符合实际应用中用户的需求。To sum up, the portrait super-resolution reconstruction method, the portrait super-resolution reconstruction model training method, the device, the electronic device, and the readable storage medium provided by the embodiments of the present application perform key points on the image to be processed by using the pre-built reconstruction model. Detection, get the key points of the face, and then perform super-resolution reconstruction processing according to the key points of the face and the image features obtained based on the image to be processed to obtain the high-frequency information of the image, and use the high-frequency information of the image to restore the image to be processed to obtain the to-be-processed image. Process the super-resolution image corresponding to the image. In this application, the super-resolution reconstruction of the image is realized by combining the detection of face key points and the restoration of the face, and the recognition of the obtained super-resolution image is improved, which meets the needs of users in practical applications.
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。The above are only specific embodiments of the present application, but the protection scope of the present application is not limited to this. Any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in the present application, All should be covered within the scope of protection of this application. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.
工业实用性Industrial Applicability
本申请提供了一种图像处理方法和人像超分辨率重建方法,图像重建模型训练方法和人像超分辨率重建模型训练方法、以及相关的装置、电子设备及存储介质,通过获取待处理图像并输入图像重建模型,图像重建模型包括特征提取网络和子像素卷积层,先利用特征提取网络对待处理图像进行多尺度特征提取及扩展图像通道得到重建特征图,再利用子像素卷积层对重建特征图进行放大得到重建图像。由于特征提取网络能够提取多尺度特征和扩展图像通道,因此,不需要增加网络深度就能够得到较好的重建效果;同时,模型末端采用子像素卷积层做图像放大,特征提取网络以小尺寸图像做处理,大幅减少了计算量和参数量;从而在保证重建效果的同时提高了处理速度。The present application provides an image processing method, a portrait super-resolution reconstruction method, an image reconstruction model training method, a portrait super-resolution reconstruction model training method, and related devices, electronic equipment and storage media. Image reconstruction model. The image reconstruction model includes a feature extraction network and a sub-pixel convolution layer. First, the feature extraction network is used to extract multi-scale features of the image to be processed and expand the image channel to obtain the reconstructed feature map, and then use the sub-pixel convolution layer to reconstruct the feature map. Zoom in to get the reconstructed image. Since the feature extraction network can extract multi-scale features and expand image channels, it is possible to obtain a better reconstruction effect without increasing the depth of the network. Image processing greatly reduces the amount of calculation and parameters; thus improving the processing speed while ensuring the reconstruction effect.
此外,可以理解的是,根据本申请的图像处理方法和人像超分辨率重建方法,图像重建模型训练方法和人像超分辨率重建模型训练方法、以及相关的装置、电子设备及存储介质是可以重现的,并且可以用在多种工业应用中。例如,本申请的图像处理方法和人像超分辨率重建方法,图像重建模型训练方法和人像超分辨率重建模型训练方法、以及相关的装置、电子设备及存储介质可以用于需要用对低分辨率图像或图像序列进行图像超分辨率重建的任何装置。In addition, it can be understood that, according to the image processing method and the portrait super-resolution reconstruction method of the present application, the image reconstruction model training method, the portrait super-resolution reconstruction model training method, and the related devices, electronic equipment and storage media can be reproduced. and can be used in a variety of industrial applications. For example, the image processing method and portrait super-resolution reconstruction method of the present application, the image reconstruction model training method and the portrait super-resolution reconstruction model training method, and related apparatuses, electronic equipment and storage media can be used for low-resolution Any apparatus for image super-resolution reconstruction of an image or sequence of images.

Claims (31)

  1. 一种图像处理方法,其特征在于,所述图像处理方法包括:An image processing method, characterized in that the image processing method comprises:
    获取待处理图像;Get the image to be processed;
    将所述待处理图像输入图像重建模型,利用所述图像重建模型的特征提取网络对所述待处理图像进行多尺度特征提取及扩展图像通道,得到重建特征图;Inputting the to-be-processed image into an image reconstruction model, and using the feature extraction network of the image reconstruction model to perform multi-scale feature extraction on the to-be-processed image and expand image channels to obtain a reconstructed feature map;
    利用所述图像重建模型的子像素卷积层对所述重建特征图进行放大,得到重建图像。The reconstructed feature map is enlarged by using the sub-pixel convolution layer of the image reconstruction model to obtain a reconstructed image.
  2. 根据权利要求1所述的图像处理方法,其特征在于,所述特征提取网络包括卷积层、多个级联块和多个第一卷积层,多个所述级联块和多个所述第一卷积层交替设置,所述特征提取网络采用全局级联结构;The image processing method according to claim 1, wherein the feature extraction network comprises a convolutional layer, a plurality of concatenated blocks and a plurality of first convolutional layers, a plurality of the concatenated blocks and a plurality of the The first convolutional layers are alternately arranged, and the feature extraction network adopts a global cascade structure;
    所述利用所述图像重建模型的特征提取网络对所述待处理图像进行多尺度特征提取,得到重建特征图的步骤,包括:The step of using the feature extraction network of the image reconstruction model to perform multi-scale feature extraction on the to-be-processed image to obtain a reconstructed feature map includes:
    将所述待处理图像输入所述卷积层进行卷积处理,得到初始特征图;Inputting the image to be processed into the convolution layer for convolution processing to obtain an initial feature map;
    将所述初始特征图作为第一个所述级联块的输入、以及将第N-1个所述第一卷积层的输出作为第N个所述级联块的输入,利用所述级联块进行多尺度特征提取,输出中间特征图;Using the initial feature map as the input of the first of the concatenated blocks and the output of the N-1th first convolutional layer as the input of the Nth of the concatenated blocks, using the stage The multi-scale feature extraction is performed in the joint block, and the intermediate feature map is output;
    将所述初始特征图和第N个所述第一卷积层前每个所述级联块输出的所述中间特征图进行通道叠加,并在叠加后输入第N个所述第一卷积层进行卷积处理;Perform channel stacking on the initial feature map and the intermediate feature map output by each of the concatenated blocks before the Nth first convolution layer, and input the Nth first convolution layer after stacking layer for convolution processing;
    将最后一个所述第一卷积层的输出作为所述重建特征图。The output of the last first convolutional layer is used as the reconstructed feature map.
  3. 根据权利要求2所述的图像处理方法,其特征在于,所述级联块的数量为3至5,所述第一卷积层的数量为3至5。The image processing method according to claim 2, wherein the number of the concatenated blocks is 3 to 5, and the number of the first convolutional layers is 3 to 5.
  4. 根据权利要求2或3所述的图像处理方法,其特征在于,所述级联块包括多个残差块和多个第二卷积层,多个所述残差块和多个所述第二卷积层交替设置,所述级联块采用局部级联结构;The image processing method according to claim 2 or 3, wherein the concatenated block comprises a plurality of residual blocks and a plurality of second convolution layers, a plurality of the residual blocks and a plurality of the first convolution layers The two convolutional layers are alternately arranged, and the cascaded block adopts a local cascaded structure;
    所述利用所述级联块进行多尺度特征提取,输出中间特征图的步骤,包括:The step of using the cascaded blocks to perform multi-scale feature extraction and outputting an intermediate feature map includes:
    将所述级联块的输入作为第一个所述残差块的输入、以及将第N-1个所述第二卷积层的输出作为第N个所述残差块的输入,利用所述残差块学习残差特征,得到残差特征图;Taking the input of the concatenated block as the input of the first residual block, and taking the output of the N-1th second convolutional layer as the input of the Nth residual block, using the The residual block learns the residual features, and obtains the residual feature map;
    将所述级联块的输入和第N个所述第二卷积层前每个所述残差块的输出进行通道叠加,并在叠加后输入第N个所述第二卷积层进行卷积处理;The input of the concatenated block and the output of each of the residual blocks before the Nth second convolutional layer are channel-stacked, and input to the Nth second convolutional layer for convolution after stacking accumulated processing;
    将最后一个所述第二卷积层的输出作为所述中间特征图。The output of the last second convolutional layer is used as the intermediate feature map.
  5. 根据权利要求4所述的图像处理方法,其特征在于,所述残差块的数量为3至5,所述第二卷积层的数量为3至5。The image processing method according to claim 4, wherein the number of the residual blocks is 3 to 5, and the number of the second convolution layer is 3 to 5.
  6. 根据权利要求4或5所述的图像处理方法,其特征在于,所述残差块包括分组卷积层、第三卷积层和第四卷积层,所述分组卷积层采用ReLu激活函数,所述分组卷积层和所述第三卷积层连接形成残差路径,所述残差块采用局部跳跃连接结构;The image processing method according to claim 4 or 5, wherein the residual block comprises a grouped convolutional layer, a third convolutional layer and a fourth convolutional layer, and the grouped convolutional layer adopts a ReLu activation function , the grouping convolution layer and the third convolution layer are connected to form a residual path, and the residual block adopts a local skip connection structure;
    所述利用所述残差块学习残差特征,得到残差特征图的步骤,包括:The step of using the residual block to learn residual features to obtain a residual feature map includes:
    将所述残差块的输入作为所述分组卷积层的输入,通过所述残差路径提取特征;The input of the residual block is used as the input of the grouped convolution layer, and features are extracted through the residual path;
    将所述残差块的输入和所述第三卷积层的输出进行特征融合,并在融合后输入所述第四卷积层进行卷积处理,输出所述残差特征图。Feature fusion is performed between the input of the residual block and the output of the third convolution layer, and after fusion, the input is input to the fourth convolution layer for convolution processing, and the residual feature map is output.
  7. 根据权利要求1至6中任一项所述的图像处理方法,其特征在于,所述利用所述图像重建模型的子像素卷积层对所述重建特征图进行放大,得到重建图像的步骤,包括:The image processing method according to any one of claims 1 to 6, wherein the step of using a sub-pixel convolution layer of the image reconstruction model to amplify the reconstructed feature map to obtain a reconstructed image, include:
    利用所述子像素卷积层调整所述重建特征图中的像素位置,得到所述重建图像。Using the sub-pixel convolution layer to adjust the pixel positions in the reconstructed feature map to obtain the reconstructed image.
  8. 一种图像重建模型训练方法,其特征在于,所述图像重建模型训练方法包括:An image reconstruction model training method, wherein the image reconstruction model training method comprises:
    获取训练样本,所述训练样本包括低分辨率图像和高分辨率图像,所述低分辨率图像是对所述高分辨率图像进行下采样得到的;acquiring training samples, where the training samples include low-resolution images and high-resolution images, and the low-resolution images are obtained by down-sampling the high-resolution images;
    将所述低分辨率图像输入预先构建的图像重建模型,所述图像重建模型包括特征提取网络和子像素卷积层;Inputting the low-resolution image into a pre-built image reconstruction model, the image reconstruction model includes a feature extraction network and a sub-pixel convolution layer;
    利用所述特征提取网络对所述低分辨率图像进行多尺度特征提取及扩展图像通道,得到训练特征图;Use the feature extraction network to perform multi-scale feature extraction on the low-resolution image and expand the image channel to obtain a training feature map;
    利用所述子像素卷积层对所述训练特征图进行放大,得到训练重建图像;Using the sub-pixel convolution layer to amplify the training feature map to obtain a training reconstructed image;
    基于所述训练重建图像、所述高分辨率图像和预设的目标函数对所述图像重建模型进行反向传播训练,得到训练后的图像重建模型。Back-propagation training is performed on the image reconstruction model based on the training reconstructed image, the high-resolution image and the preset objective function to obtain a trained image reconstruction model.
  9. 根据权利要求8所述的图像重建模型训练方法,其特征在于,所述目标函数为L2损失函数;The image reconstruction model training method according to claim 8, wherein the objective function is an L2 loss function;
    所述基于所述训练重建图像、所述高分辨率图像和预设的目标函数对所述图像重建模型进行反向传播训练,得到训练后的图像重建模型的步骤:The step of performing back-propagation training on the image reconstruction model based on the training reconstruction image, the high-resolution image and the preset objective function to obtain the trained image reconstruction model:
    基于所述训练重建图像、所述高分辨率图像和所述L2损失函数对所述图像重建模型进行反向传播训练,以对所述图像重建模型的参数进行调整,直至达到预设的训练完成条件,得到训练后的图像重建模型。Back-propagation training is performed on the image reconstruction model based on the training reconstructed image, the high-resolution image and the L2 loss function to adjust the parameters of the image reconstruction model until the preset training is completed condition to obtain the image reconstruction model after training.
  10. 根据权利要求8或9所述的图像重建模型训练方法,其特征在于,所述图像重建模型训练方法还包括:The image reconstruction model training method according to claim 8 or 9, wherein the image reconstruction model training method further comprises:
    对所述训练后的图像重建模型进行剪枝,以保留长线级联及删除短线级联。The trained image reconstruction model is pruned to preserve long-line cascades and delete short-line cascades.
  11. 根据权利要求8至10中任一项所述的图像重建模型训练方法,其特征在于,所述将所述低分辨率图像输入预先构建的图像重建模型的步骤之前,所述图像重建模型训练方法还包括:The image reconstruction model training method according to any one of claims 8 to 10, wherein before the step of inputting the low-resolution image into a pre-built image reconstruction model, the image reconstruction model training method Also includes:
    对所述低分辨率图像进行自减平均值处理,以突出所述低分辨率图像的纹理细节。A self-reducing average is performed on the low-resolution image to highlight texture details of the low-resolution image.
  12. 根据权利要求8至11中任一项所述的图像重建模型训练方法,其特征在于,所述将所述低分辨率图像输入预先构建的图像重建模型的步骤之前,所述图像重建模型训练方法还包括:The image reconstruction model training method according to any one of claims 8 to 11, wherein before the step of inputting the low-resolution image into a pre-built image reconstruction model, the image reconstruction model training method Also includes:
    对所述低分辨率图像进行翻转对称处理,得到至少一个处理后的低分辨率图像;Perform flip symmetry processing on the low-resolution image to obtain at least one processed low-resolution image;
    所述将所述低分辨率图像输入预先构建的图像重建模型的步骤,包括:The step of inputting the low-resolution image into a pre-built image reconstruction model includes:
    将所述至少一个处理后的低分辨率图像输入所述图像重建模型;inputting the at least one processed low-resolution image into the image reconstruction model;
    所述利用所述特征提取网络对所述低分辨率图像进行多尺度特征提取,得到训练特征图的步骤,包括:The step of using the feature extraction network to perform multi-scale feature extraction on the low-resolution image to obtain a training feature map includes:
    利用所述特征提取网络对所述至少一个处理后的低分辨率图像进行多尺度特征提取,得到至少一个辅助特征图;Use the feature extraction network to perform multi-scale feature extraction on the at least one processed low-resolution image to obtain at least one auxiliary feature map;
    对至少一个辅助特征图进行反翻转对称处理,并在反翻转对称处理后求平均值,得到所述训练特征图。Perform anti-flip symmetry processing on at least one auxiliary feature map, and obtain the training feature map by averaging after the anti-flip symmetry processing.
  13. 一种图像处理装置,其特征在于,所述图像处理装置包括:An image processing device, characterized in that the image processing device comprises:
    图像获取模块,被配置成用于获取待处理图像;an image acquisition module configured to acquire an image to be processed;
    第一执行模块,被配置成用于将所述待处理图像输入图像重建模型,利用所述图像重建模型的特征提取网络对所述待处理图像进行多尺度特征提取及扩展图像通道,得到重建特征图;The first execution module is configured to input the to-be-processed image into an image reconstruction model, and use the feature extraction network of the image reconstruction model to perform multi-scale feature extraction on the to-be-processed image and expand image channels to obtain reconstructed features picture;
    第二执行模块,被配置成用于利用所述图像重建模型的子像素卷积层对所述重建特征图进行放大,得到重建图像。The second execution module is configured to use the sub-pixel convolution layer of the image reconstruction model to amplify the reconstructed feature map to obtain a reconstructed image.
  14. 一种图像重建模型训练装置,其特征在于,所述图像重建模型训练装置包括:An image reconstruction model training device, characterized in that the image reconstruction model training device comprises:
    样本获取模块,被配置成用于获取训练样本,所述训练样本包括低分辨率图像和高分辨率图像,所述低分辨率图像是对所述高分辨率图像进行下采样得到的;a sample acquisition module configured to acquire training samples, where the training samples include low-resolution images and high-resolution images, the low-resolution images are obtained by down-sampling the high-resolution images;
    第一处理模块,被配置成用于将所述低分辨率图像输入预先构建的图像重建模型,所述图像重建模型包括特征提取网络和子像素卷积层;a first processing module configured to input the low-resolution image into a pre-built image reconstruction model, the image reconstruction model comprising a feature extraction network and a sub-pixel convolutional layer;
    第二处理模块,被配置成用于利用所述特征提取网络对所述低分辨率图像进行多尺度特征提取及扩展图像通道,得到训练特征图;The second processing module is configured to use the feature extraction network to perform multi-scale feature extraction on the low-resolution image and expand the image channel to obtain a training feature map;
    第三处理模块,被配置成用于利用所述子像素卷积层对所述训练特征图进行放大,得到训练重建图像;a third processing module, configured to use the sub-pixel convolutional layer to amplify the training feature map to obtain a training reconstructed image;
    第四处理模块,被配置成用于基于所述训练重建图像、所述高分辨率图像和预设的目标函数对所述图像重建模型进行反向传播训练,得到训练后的图像重建模型。The fourth processing module is configured to perform back-propagation training on the image reconstruction model based on the training reconstructed image, the high-resolution image and the preset objective function to obtain a trained image reconstruction model.
  15. 一种人像超分辨率重建方法,其特征在于,所述人像超分辨率重建方法包括:A portrait super-resolution reconstruction method, characterized in that the portrait super-resolution reconstruction method comprises:
    利用图像重建模型对待处理图像进行关键点检测,得到人脸关键点;Use the image reconstruction model to detect the key points of the image to be processed, and obtain the key points of the face;
    根据所述人脸关键点和基于所述待处理图像得到的图像特征进行超分辨率重建处理,得到图像高频信息;Perform super-resolution reconstruction processing according to the face key points and the image features obtained based on the to-be-processed image to obtain image high-frequency information;
    利用所述图像高频信息对所述待处理图像进行复原处理,得到所述待处理图像对应的超分辨率图像。Perform restoration processing on the to-be-processed image by using the high-frequency information of the image to obtain a super-resolution image corresponding to the to-be-processed image.
  16. 根据权利要求15所述的人像超分辨率重建方法,其特征在于,使用根据权利要求1至7中任一项所述的图像处理方法来进行超分辨率重建处理。The super-resolution reconstruction method of a portrait according to claim 15, wherein the super-resolution reconstruction processing is performed using the image processing method according to any one of claims 1 to 7.
  17. 根据权利要求15或16所述的人像超分辨率重建方法,其特征在于,所述关键点检测、超分辨率重建处理及复原处理包括多轮迭代处理,所述待处理图像为未经处理的待处理图像,或前一轮迭代中经过所述关键点检测、超分辨率重建处理以及复原处理后得到的超分辨率图像。The portrait super-resolution reconstruction method according to claim 15 or 16, wherein the key point detection, super-resolution reconstruction processing and restoration processing comprise multiple rounds of iterative processing, and the to-be-processed image is unprocessed The image to be processed, or the super-resolution image obtained after the key point detection, super-resolution reconstruction processing, and restoration processing in the previous iteration.
  18. 根据权利要求15至17中任一项所述的人像超分辨率重建方法,其特征在于,所述人脸关键点包括多个,所述利用所述图像高频信息对所述待处理图像进行复原处理,得到所述待处理图像对应的超分辨率图像的步骤,包括:The super-resolution reconstruction method for a portrait according to any one of claims 15 to 17, wherein the face key points include a plurality of key points, and the image to be processed is performed on the image to be processed by using the high-frequency information of the image. Restoration processing, the step of obtaining the super-resolution image corresponding to the image to be processed includes:
    利用预先构建的人像认知模型对所述待处理图像进行处理,输出各所述人脸关键点的位置信息;Process the to-be-processed image by using a pre-built portrait cognitive model, and output the position information of each of the key points of the face;
    基于各所述人脸关键点的位置信息以及所述图像高频信息,对所述待处理图像进行复原处理,得到所述待处理图像对应的超分辨率图像。Based on the position information of each of the face key points and the high-frequency information of the image, restoration processing is performed on the to-be-processed image to obtain a super-resolution image corresponding to the to-be-processed image.
  19. 根据权利要求18所述的人像超分辨率重建方法,其特征在于,所述基于各所述人脸关键点的位置信息以及所述图像高频信息,对所述待处理图像进行复原处理,得到所述待处理图像对应的超分辨率图像的步骤,包括:The portrait super-resolution reconstruction method according to claim 18, wherein the restoration processing is performed on the to-be-processed image based on the position information of each of the key points of the face and the high-frequency information of the image to obtain The step of the super-resolution image corresponding to the image to be processed includes:
    获取各所述人脸关键点对应的复原属性;Obtain the restoration attributes corresponding to each of the face key points;
    根据各所述人脸关键点以及其对应的位置信息、图像高频信息、复原属性,对所述待处理图像中对应人脸关键点进行复原处理。According to each of the face key points and their corresponding position information, image high-frequency information, and restoration attributes, restoration processing is performed on the corresponding face key points in the to-be-processed image.
  20. 根据权利要求15至19中任一项所述的人像超分辨率重建方法,其特征在于,所述重建模型包括判别器和生成网络,所述生成网络为在训练好的判别器的监督下,利用训练样本进行训练后获得。The portrait super-resolution reconstruction method according to any one of claims 15 to 19, wherein the reconstruction model comprises a discriminator and a generation network, and the generation network is supervised by a trained discriminator, Obtained after training with training samples.
  21. 根据权利要求15至20中任意一项所述的人像超分辨率重建方法,其特征在于,所述人脸关键点包括左眼、右眼、鼻子、嘴巴及下巴轮廓。The super-resolution reconstruction method for a human portrait according to any one of claims 15 to 20, wherein the key points of the human face include left eye, right eye, nose, mouth and chin contour.
  22. 一种人像超分辨率重建模型训练方法,其特征在于,所述人像超分辨率重建模型训练方法包括:A method for training a portrait super-resolution reconstruction model, characterized in that the method for training a portrait super-resolution reconstruction model comprises:
    获取训练样本以及所述训练样本对应的目标样本;obtaining a training sample and a target sample corresponding to the training sample;
    利用构建的生成网络对所述训练样本进行关键点检测,得到训练关键点;Use the constructed generating network to perform key point detection on the training sample to obtain training key points;
    基于所述训练关键点和所述训练样本进行超分辨率重建处理和复原处理,得到输出图像;Perform super-resolution reconstruction processing and restoration processing based on the training key points and the training samples to obtain an output image;
    比对所述输出图像和所述目标样本,并基于比对结果对所述生成网络进行网络参数调整后继续训练,直至满足第一预设条件时得到重建模型。Comparing the output image and the target sample, and adjusting the network parameters of the generating network based on the comparison result, the training continues until a reconstructed model is obtained when a first preset condition is satisfied.
  23. 根据权利要求22所述的人像超分辨率重建模型训练方法,其特征在于,所述比对所述输出图像和所述目标样本,并基于比对结果对所述生成网络进行网络参数调整后继续训练,直至满足第一预设条件时得到重建模型的步骤,包括:The method for training a portrait super-resolution reconstruction model according to claim 22, wherein the output image and the target sample are compared, and the generation network is adjusted based on the comparison result after network parameters are adjusted. The steps of training until the reconstructed model is obtained when the first preset condition is met, including:
    基于所述输出图像的像素信息和所述目标样本的像素信息之间的差异,构建第一损失函数;constructing a first loss function based on the difference between the pixel information of the output image and the pixel information of the target sample;
    基于所述输出图像中各个人脸关键点和所述目标样本中对应人脸关键点之间的差异,构建第二损失函数;Construct a second loss function based on the difference between each face key point in the output image and the corresponding face key point in the target sample;
    比对所述输出图像和所述目标样本,并基于比对结果对所述生成网络进行网络参数调整后继续训练,直至所述第一损失函数和所述第二损失函数加权后的函数值满足第一预设条件时得到重建模型。Comparing the output image and the target sample, and adjusting the network parameters of the generating network based on the comparison result, continue training until the weighted function values of the first loss function and the second loss function satisfy The reconstructed model is obtained at the first preset condition.
  24. 根据权利要求22或23所述的人像超分辨率重建模型训练方法,其特征在于,所述重建模型还包括判别器,所述判别器用于监督所述生成网络的训练,所述人像超分辨率重建模型训练方法包括:The method for training a portrait super-resolution reconstruction model according to claim 22 or 23, wherein the reconstruction model further comprises a discriminator, and the discriminator is used to supervise the training of the generation network, and the portrait super-resolution Reconstruction model training methods include:
    构建判别器,利用所述判别器对所述输出图像以及所述输出图像对应的目标样本进行判别处理;constructing a discriminator, and using the discriminator to discriminate the output image and the target sample corresponding to the output image;
    根据得到的判别结果对所述判别器进行参数调整,直至满足第二预设条件时得到训练好的判别器。According to the obtained discrimination result, the parameters of the discriminator are adjusted until the trained discriminator is obtained when the second preset condition is satisfied.
  25. 根据权利要求22至24中任一项所述的人像超分辨率重建模型训练方法,其特征在于,所述比对所述输出图像和所述目标样本,并基于比对结果对所述生成网络进行网络参数调整后继续训练,直至满足第一预设条件时得到重建模型的步骤,包括:The method for training a portrait super-resolution reconstruction model according to any one of claims 22 to 24, wherein the output image and the target sample are compared, and the generation network is compared based on the comparison result. After adjusting the network parameters, continue training until the first preset condition is met to obtain the steps of reconstructing the model, including:
    将所述输出图像输入至训练好的所述判别器得到判别信息;Inputting the output image to the trained discriminator to obtain discriminant information;
    比对所述输出图像和所述目标样本,得到比对结果;Comparing the output image and the target sample to obtain a comparison result;
    根据所述判别信息和所述比对结果对所述生成网络进行网络参数调整后继续训练,直至满足第一预设条件时得到重建模型。After adjusting the network parameters of the generating network according to the discrimination information and the comparison result, continue training until a reconstructed model is obtained when the first preset condition is satisfied.
  26. 根据权利要求25所述的人像超分辨率重建模型训练方法,其特征在于,所述根据所述判别信息和所述比对结果对所述生成网络进行网络参数调整后继续训练,直至满足第一预设条件时得到重建模型的步骤,包括:The method for training a portrait super-resolution reconstruction model according to claim 25, wherein the generating network is adjusted according to the discriminant information and the comparison result, and the training is continued until the first The steps to obtain the reconstructed model when the preset conditions are obtained, including:
    基于所述输出图像的像素信息和所述目标样本的像素信息之间的差异,构建第一损失函数;constructing a first loss function based on the difference between the pixel information of the output image and the pixel information of the target sample;
    基于所述输出图像中各个人脸关键点和所述目标样本中对应人脸关键点之间的差异,构建第二损失函数;Construct a second loss function based on the difference between each face key point in the output image and the corresponding face key point in the target sample;
    基于所述判别器对所述输出图像的判别信息构建第三损失函数,并基于预先构建的人像认知模型得到的所述输出图像和所述目标样本之间的图像差异构建第四损失函数;A third loss function is constructed based on the discriminant information of the output image by the discriminator, and a fourth loss function is constructed based on the image difference between the output image and the target sample obtained by the pre-built portrait cognitive model;
    根据所述判别信息和所述比对结果对所述生成网络进行网络参数调整后继续训练,直至所述第一损失函数、第二损失函数、第三损失函数和第四损失函数加权后得到的函数值满足第一预设条件时得到重建模型。After adjusting the network parameters of the generating network according to the discriminant information and the comparison result, continue training until the first loss function, the second loss function, the third loss function and the fourth loss function are weighted. The reconstructed model is obtained when the function value satisfies the first preset condition.
  27. 一种人像超分辨率重建装置,其特征在于,所述人像超分辨率重建装置包括:A portrait super-resolution reconstruction device, characterized in that the portrait super-resolution reconstruction device comprises:
    检测模块,被配置成用于利用预先构建的重建模型对待处理图像进行关键点检测,得到人脸关键点;a detection module, configured to perform key point detection on the image to be processed by using the pre-built reconstruction model to obtain face key points;
    处理模块,被配置成用于根据所述人脸关键点和基于所述待处理图像得到的图像特征进行超分辨率重建处理,得到图像高频信息;a processing module, configured to perform super-resolution reconstruction processing according to the face key points and the image features obtained based on the to-be-processed image to obtain high-frequency image information;
    复原模块,被配置成用于利用所述图像高频信息对所述待处理图像进行复原处理,得到所述待处理图像对应的超分辨率图像。The restoration module is configured to perform restoration processing on the to-be-processed image by using the image high-frequency information to obtain a super-resolution image corresponding to the to-be-processed image.
  28. 根据权利要求27所述的人像超分辨率重建装置,其特征在于,所述处理模块中包括根据权利要求13所述的图像处理装置以用于 进行超分辨率重建处理。The human portrait super-resolution reconstruction device according to claim 27, wherein the processing module includes the image processing device according to claim 13 for performing super-resolution reconstruction processing.
  29. 一种人像超分辨率重建模型训练装置,其特征在于,所述人像超分辨率重建模型训练装置包括:A portrait super-resolution reconstruction model training device, characterized in that the portrait super-resolution reconstruction model training device comprises:
    获取模块,被配置成用于获取训练样本以及所述训练样本对应的目标样本;an acquisition module, configured to acquire training samples and target samples corresponding to the training samples;
    关键点获得模块,被配置成用于利用构建的生成网络对所述训练样本进行关键点检测,得到训练关键点;a key point obtaining module, configured to perform key point detection on the training sample by using the constructed generating network to obtain training key points;
    输出图像获得模块,被配置成用于基于所述训练关键点和所述训练样本进行超分辨率重建处理和复原处理,得到输出图像;an output image obtaining module, configured to perform super-resolution reconstruction processing and restoration processing based on the training key points and the training samples to obtain an output image;
    训练模块,被配置成用于比对所述输出图像和所述目标样本,并基于比对结果对所述生成网络进行网络参数调整后继续训练,直至满足第一预设条件时得到重建模型。The training module is configured to compare the output image with the target sample, and adjust the network parameters of the generating network based on the comparison result and continue training until a reconstructed model is obtained when a first preset condition is satisfied.
  30. 一种电子设备,其特征在于,所述电子设备包括:An electronic device, characterized in that the electronic device comprises:
    一个或多个处理器;one or more processors;
    一个或多个存储介质,用于存储一个或多个机器可执行指令,当所述一个或多个机器可执行指令被所述一个或多个处理器执行时,使得所述一个或多个处理器实现根据权利要求1-7中任一项所述的图像处理方法,或者根据权利要求8-12中任一项所述的图像重建模型训练方法,或者根据权利要求15-21中任一项所述的人像超分辨率重建方法,或者根据权利要求22-26中任一项所述的人像超分辨率重建模型训练方法。One or more storage media for storing one or more machine-executable instructions that, when executed by the one or more processors, cause the one or more processing The device implements the image processing method according to any one of claims 1-7, or the image reconstruction model training method according to any one of claims 8-12, or according to any one of claims 15-21. The super-resolution reconstruction method for portrait, or the method for training a super-resolution reconstruction model for portrait according to any one of claims 22-26.
  31. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有机器可执行指令,所述机器可执行指令被执行时实现根据权利要求1-7中任一项所述的图像处理方法,或者根据权利要求8-12中任一项所述的图像重建模型训练方法,或者根据权利要求15-21中任一项所述的人像超分辨率重建方法,或者根据权利要求22-26中任一项所述的人像超分辨率重建模型训练方法。A computer-readable storage medium, characterized in that the computer-readable storage medium stores machine-executable instructions, and when the machine-executable instructions are executed, realize the image according to any one of claims 1-7 processing method, or the image reconstruction model training method according to any one of claims 8-12, or the portrait super-resolution reconstruction method according to any one of claims 15-21, or according to claims 22- The method for training a portrait super-resolution reconstruction model according to any one of 26.
PCT/CN2021/118591 2020-09-16 2021-09-15 Image processing method and apparatus, portrait super-resolution reconstruction method and apparatus, and portrait super-resolution reconstruction model training method and apparatus, electronic device, and storage medium WO2022057837A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202010977254.4A CN114266697A (en) 2020-09-16 2020-09-16 Image processing and model training method and device, electronic equipment and storage medium
CN202010977254.4 2020-09-16
CN202011000670.5A CN114298901A (en) 2020-09-22 2020-09-22 Portrait super-resolution reconstruction method, model training method, device, electronic equipment and readable storage medium
CN202011000670.5 2020-09-22

Publications (1)

Publication Number Publication Date
WO2022057837A1 true WO2022057837A1 (en) 2022-03-24

Family

ID=80776497

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/118591 WO2022057837A1 (en) 2020-09-16 2021-09-15 Image processing method and apparatus, portrait super-resolution reconstruction method and apparatus, and portrait super-resolution reconstruction model training method and apparatus, electronic device, and storage medium

Country Status (1)

Country Link
WO (1) WO2022057837A1 (en)

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114049254A (en) * 2021-10-29 2022-02-15 华南农业大学 Low-pixel ox-head image reconstruction and identification method, system, equipment and storage medium
CN114663288A (en) * 2022-04-11 2022-06-24 桂林电子科技大学 Single-axial head MRI (magnetic resonance imaging) super-resolution reconstruction method
CN114841961A (en) * 2022-05-05 2022-08-02 扬州大学 Wheat scab detection method based on image enhancement and improvement of YOLOv5
CN114943639A (en) * 2022-05-24 2022-08-26 北京瑞莱智慧科技有限公司 Image acquisition method, related device and storage medium
CN114972041A (en) * 2022-07-28 2022-08-30 中国人民解放军国防科技大学 Polarization radar image super-resolution reconstruction method and device based on residual error network
CN115331077A (en) * 2022-08-22 2022-11-11 北京百度网讯科技有限公司 Training method of feature extraction model, target classification method, device and equipment
CN115409716A (en) * 2022-11-01 2022-11-29 杭州网易智企科技有限公司 Video processing method, device, storage medium and equipment
CN115409755A (en) * 2022-11-03 2022-11-29 腾讯科技(深圳)有限公司 Map processing method and device, storage medium and electronic equipment
CN115546030A (en) * 2022-11-30 2022-12-30 武汉大学 Compressed video super-resolution method and system based on twin super-resolution network
CN115908142A (en) * 2023-01-06 2023-04-04 诺比侃人工智能科技(成都)股份有限公司 Contact net tiny part damage testing method based on visual recognition
CN115937794A (en) * 2023-03-08 2023-04-07 北京龙智数科科技服务有限公司 Small target object detection method and device, electronic equipment and storage medium
CN115953296A (en) * 2022-12-09 2023-04-11 中山大学·深圳 Transform and convolutional neural network combined based face super-resolution reconstruction method and system
CN116091712A (en) * 2023-04-12 2023-05-09 安徽大学 Multi-view three-dimensional reconstruction method and system for computing resource limited equipment
CN116309591A (en) * 2023-05-19 2023-06-23 杭州健培科技有限公司 Medical image 3D key point detection method, model training method and device
CN116385318A (en) * 2023-06-06 2023-07-04 湖南纵骏信息科技有限公司 Image quality enhancement method and system based on cloud desktop
CN116452424A (en) * 2023-05-19 2023-07-18 山东大学 Face super-resolution reconstruction method and system based on double generalized distillation
CN117097876A (en) * 2023-07-07 2023-11-21 天津大学 Event camera image reconstruction method based on neural network
CN117196947A (en) * 2023-09-06 2023-12-08 南通大学 High-efficiency compression reconstruction model construction method for high-resolution image
CN117238020A (en) * 2023-11-10 2023-12-15 杭州启源视觉科技有限公司 Face recognition method, device and computer equipment
CN117425013A (en) * 2023-12-19 2024-01-19 杭州靖安防务科技有限公司 Video transmission method and system based on reversible architecture
CN117575916A (en) * 2024-01-19 2024-02-20 青岛漫斯特数字科技有限公司 Image quality optimization method, system, equipment and medium based on deep learning
CN117612017A (en) * 2024-01-23 2024-02-27 江西啄木蜂科技有限公司 Environment-adaptive remote sensing image change detection method
WO2024078403A1 (en) * 2022-10-13 2024-04-18 维沃移动通信有限公司 Image processing method and apparatus, and device
CN117238020B (en) * 2023-11-10 2024-04-26 杭州启源视觉科技有限公司 Face recognition method, device and computer equipment

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0959433A2 (en) * 1998-05-20 1999-11-24 Itt Manufacturing Enterprises, Inc. Super resolution apparatus and methods for electro-optical systems
CN107154023A (en) * 2017-05-17 2017-09-12 电子科技大学 Face super-resolution reconstruction method based on generation confrontation network and sub-pix convolution
CN109903219A (en) * 2019-02-28 2019-06-18 深圳市商汤科技有限公司 Image processing method and device, electronic equipment, computer readable storage medium
CN110782395A (en) * 2019-10-28 2020-02-11 西安电子科技大学 Image processing method and device, electronic equipment and computer readable storage medium
CN110992265A (en) * 2019-12-02 2020-04-10 北京数码视讯科技股份有限公司 Image processing method and model, model training method and electronic equipment
CN111192200A (en) * 2020-01-02 2020-05-22 南京邮电大学 Image super-resolution reconstruction method based on fusion attention mechanism residual error network
CN111461983A (en) * 2020-03-31 2020-07-28 华中科技大学鄂州工业技术研究院 Image super-resolution reconstruction model and method based on different frequency information
CN111488779A (en) * 2019-07-19 2020-08-04 同观科技(深圳)有限公司 Video image super-resolution reconstruction method, device, server and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0959433A2 (en) * 1998-05-20 1999-11-24 Itt Manufacturing Enterprises, Inc. Super resolution apparatus and methods for electro-optical systems
CN107154023A (en) * 2017-05-17 2017-09-12 电子科技大学 Face super-resolution reconstruction method based on generation confrontation network and sub-pix convolution
CN109903219A (en) * 2019-02-28 2019-06-18 深圳市商汤科技有限公司 Image processing method and device, electronic equipment, computer readable storage medium
CN111488779A (en) * 2019-07-19 2020-08-04 同观科技(深圳)有限公司 Video image super-resolution reconstruction method, device, server and storage medium
CN110782395A (en) * 2019-10-28 2020-02-11 西安电子科技大学 Image processing method and device, electronic equipment and computer readable storage medium
CN110992265A (en) * 2019-12-02 2020-04-10 北京数码视讯科技股份有限公司 Image processing method and model, model training method and electronic equipment
CN111192200A (en) * 2020-01-02 2020-05-22 南京邮电大学 Image super-resolution reconstruction method based on fusion attention mechanism residual error network
CN111461983A (en) * 2020-03-31 2020-07-28 华中科技大学鄂州工业技术研究院 Image super-resolution reconstruction model and method based on different frequency information

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LI SUMEI, LEI GUOQING;FAN RU: "Depth Map Super-Resolution Based on Two-Channel Convolutional Neural Network", ACTA OPTICA SINICA, SHANGHAI KEXUE JISHU CHUBANSHE , SHANGHAI, CN, vol. 38, no. 10, 31 October 2018 (2018-10-31), CN , pages 136 - 142, XP055911996, ISSN: 0253-2239, DOI: 10.3788/AOS201838.1010002 *
LI WEI, XUDONG ZHANG: "Depth image super-resolution reconstruction based on convolution neural network", JOURNAL OF ELECTRONIC MEASUREMENT AND INSTRUMENT, vol. 31, no. 12, 31 December 2017 (2017-12-31), pages 1918 - 1928, XP055911992, ISSN: 1000-7105, DOI: 10.13382/j.jemi.2017.12.006 *

Cited By (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114049254A (en) * 2021-10-29 2022-02-15 华南农业大学 Low-pixel ox-head image reconstruction and identification method, system, equipment and storage medium
CN114663288A (en) * 2022-04-11 2022-06-24 桂林电子科技大学 Single-axial head MRI (magnetic resonance imaging) super-resolution reconstruction method
CN114841961A (en) * 2022-05-05 2022-08-02 扬州大学 Wheat scab detection method based on image enhancement and improvement of YOLOv5
CN114841961B (en) * 2022-05-05 2024-04-05 扬州大学 Wheat scab detection method based on image enhancement and improved YOLOv5
CN114943639A (en) * 2022-05-24 2022-08-26 北京瑞莱智慧科技有限公司 Image acquisition method, related device and storage medium
CN114972041A (en) * 2022-07-28 2022-08-30 中国人民解放军国防科技大学 Polarization radar image super-resolution reconstruction method and device based on residual error network
CN115331077A (en) * 2022-08-22 2022-11-11 北京百度网讯科技有限公司 Training method of feature extraction model, target classification method, device and equipment
CN115331077B (en) * 2022-08-22 2024-04-26 北京百度网讯科技有限公司 Training method of feature extraction model, target classification method, device and equipment
WO2024078403A1 (en) * 2022-10-13 2024-04-18 维沃移动通信有限公司 Image processing method and apparatus, and device
CN115409716A (en) * 2022-11-01 2022-11-29 杭州网易智企科技有限公司 Video processing method, device, storage medium and equipment
CN115409755A (en) * 2022-11-03 2022-11-29 腾讯科技(深圳)有限公司 Map processing method and device, storage medium and electronic equipment
CN115409755B (en) * 2022-11-03 2023-03-03 腾讯科技(深圳)有限公司 Map processing method and device, storage medium and electronic equipment
CN115546030A (en) * 2022-11-30 2022-12-30 武汉大学 Compressed video super-resolution method and system based on twin super-resolution network
CN115953296A (en) * 2022-12-09 2023-04-11 中山大学·深圳 Transform and convolutional neural network combined based face super-resolution reconstruction method and system
CN115953296B (en) * 2022-12-09 2024-04-05 中山大学·深圳 Face super-resolution reconstruction method and system based on combination of transducer and convolutional neural network
CN115908142A (en) * 2023-01-06 2023-04-04 诺比侃人工智能科技(成都)股份有限公司 Contact net tiny part damage testing method based on visual recognition
CN115937794B (en) * 2023-03-08 2023-08-15 成都须弥云图建筑设计有限公司 Small target object detection method and device, electronic equipment and storage medium
CN115937794A (en) * 2023-03-08 2023-04-07 北京龙智数科科技服务有限公司 Small target object detection method and device, electronic equipment and storage medium
CN116091712B (en) * 2023-04-12 2023-06-27 安徽大学 Multi-view three-dimensional reconstruction method and system for computing resource limited equipment
CN116091712A (en) * 2023-04-12 2023-05-09 安徽大学 Multi-view three-dimensional reconstruction method and system for computing resource limited equipment
CN116309591B (en) * 2023-05-19 2023-08-25 杭州健培科技有限公司 Medical image 3D key point detection method, model training method and device
CN116452424B (en) * 2023-05-19 2023-10-10 山东大学 Face super-resolution reconstruction method and system based on double generalized distillation
CN116452424A (en) * 2023-05-19 2023-07-18 山东大学 Face super-resolution reconstruction method and system based on double generalized distillation
CN116309591A (en) * 2023-05-19 2023-06-23 杭州健培科技有限公司 Medical image 3D key point detection method, model training method and device
CN116385318A (en) * 2023-06-06 2023-07-04 湖南纵骏信息科技有限公司 Image quality enhancement method and system based on cloud desktop
CN116385318B (en) * 2023-06-06 2023-10-10 湖南纵骏信息科技有限公司 Image quality enhancement method and system based on cloud desktop
CN117097876A (en) * 2023-07-07 2023-11-21 天津大学 Event camera image reconstruction method based on neural network
CN117097876B (en) * 2023-07-07 2024-03-08 天津大学 Event camera image reconstruction method based on neural network
CN117196947B (en) * 2023-09-06 2024-03-22 南通大学 High-efficiency compression reconstruction model construction method for high-resolution image
CN117196947A (en) * 2023-09-06 2023-12-08 南通大学 High-efficiency compression reconstruction model construction method for high-resolution image
CN117238020A (en) * 2023-11-10 2023-12-15 杭州启源视觉科技有限公司 Face recognition method, device and computer equipment
CN117238020B (en) * 2023-11-10 2024-04-26 杭州启源视觉科技有限公司 Face recognition method, device and computer equipment
CN117425013B (en) * 2023-12-19 2024-04-02 杭州靖安防务科技有限公司 Video transmission method and system based on reversible architecture
CN117425013A (en) * 2023-12-19 2024-01-19 杭州靖安防务科技有限公司 Video transmission method and system based on reversible architecture
CN117575916A (en) * 2024-01-19 2024-02-20 青岛漫斯特数字科技有限公司 Image quality optimization method, system, equipment and medium based on deep learning
CN117575916B (en) * 2024-01-19 2024-04-30 青岛漫斯特数字科技有限公司 Image quality optimization method, system, equipment and medium based on deep learning
CN117612017A (en) * 2024-01-23 2024-02-27 江西啄木蜂科技有限公司 Environment-adaptive remote sensing image change detection method

Similar Documents

Publication Publication Date Title
WO2022057837A1 (en) Image processing method and apparatus, portrait super-resolution reconstruction method and apparatus, and portrait super-resolution reconstruction model training method and apparatus, electronic device, and storage medium
TWI728465B (en) Method, device and electronic apparatus for image processing and storage medium thereof
US11688070B2 (en) Video frame segmentation using reduced resolution neural network and masks from previous frames
RU2697928C1 (en) Superresolution of an image imitating high detail based on an optical system, performed on a mobile device having limited resources, and a mobile device which implements
US10848746B2 (en) Apparatus including multiple cameras and image processing method
JP2018537748A (en) Light field rendering of images with variable computational complexity
WO2023284401A1 (en) Image beautification processing method and apparatus, storage medium, and electronic device
US20190114833A1 (en) Surface reconstruction for interactive augmented reality
CN112991171B (en) Image processing method, device, electronic equipment and storage medium
US11862053B2 (en) Display method based on pulse signals, apparatus, electronic device and medium
WO2023103378A1 (en) Video frame interpolation model training method and apparatus, and computer device and storage medium
CN113034358A (en) Super-resolution image processing method and related device
US11127111B2 (en) Selective allocation of processing resources for processing image data
Tang et al. Structure-embedded ghosting artifact suppression network for high dynamic range image reconstruction
CN113822803A (en) Image super-resolution processing method, device, equipment and computer readable storage medium
US11570384B2 (en) Image sensor employing varied intra-frame analog binning
WO2024032331A9 (en) Image processing method and apparatus, electronic device, and storage medium
WO2020259123A1 (en) Method and device for adjusting image quality, and readable storage medium
WO2023280266A1 (en) Fisheye image compression method, fisheye video stream compression method and panoramic video generation method
WO2023131111A1 (en) Image processing method, apparatus and system, and storage medium
CN112261296B (en) Image enhancement method, image enhancement device and mobile terminal
CN114266697A (en) Image processing and model training method and device, electronic equipment and storage medium
Fang et al. Artificial Intelligence: Second CAAI International Conference, CICAI 2022, Beijing, China, August 27–28, 2022, Revised Selected Papers, Part I
US20150229848A1 (en) Method and system for generating an image including optically zoomed and digitally zoomed regions
US20240144429A1 (en) Image processing method, apparatus and system, and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21868659

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21868659

Country of ref document: EP

Kind code of ref document: A1