CN113554058A - Method, system, device and storage medium for enhancing resolution of visual target image - Google Patents

Method, system, device and storage medium for enhancing resolution of visual target image Download PDF

Info

Publication number
CN113554058A
CN113554058A CN202110698882.3A CN202110698882A CN113554058A CN 113554058 A CN113554058 A CN 113554058A CN 202110698882 A CN202110698882 A CN 202110698882A CN 113554058 A CN113554058 A CN 113554058A
Authority
CN
China
Prior art keywords
resolution
face image
samples
face
low
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110698882.3A
Other languages
Chinese (zh)
Inventor
金龙存
卢盛林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong OPT Machine Vision Co Ltd
Original Assignee
Guangdong OPT Machine Vision Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong OPT Machine Vision Co Ltd filed Critical Guangdong OPT Machine Vision Co Ltd
Priority to CN202110698882.3A priority Critical patent/CN113554058A/en
Publication of CN113554058A publication Critical patent/CN113554058A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4023Scaling of whole images or parts thereof, e.g. expanding or contracting based on decimating pixels or lines of pixels; based on inserting pixels or lines of pixels
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method, a system, a device and a storage medium for enhancing the resolution of a visual target image, wherein the method comprises the following steps: processing the low-resolution face image to be processed and the corresponding face attribute by adopting a pre-trained face image super-resolution model, and outputting a high-resolution face image; the training method of the face image super-resolution model comprises the following steps: acquiring training samples, wherein the training samples comprise high-resolution face image samples, low-resolution face image samples and a preset number of face attribute samples corresponding to the high-resolution face image samples and the low-resolution face image samples; and establishing a face image super-resolution model based on a preset loss function and the high-resolution face image sample according to the training sample. The invention realizes the prior guidance of low-resolution face image enhancement and the recovery of high-frequency information of the face image by the trained face image super-resolution model, so that the output high-resolution face image contains more face structure details, and the definition of the face image is improved.

Description

Method, system, device and storage medium for enhancing resolution of visual target image
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a method, a system, a device and a storage medium for enhancing the resolution of a visual target image.
Background
In recent years, with the increasing demand for image and video quality, how to improve the image and video quality becomes an increasingly important issue. The super-resolution of the image aims to repair the low-resolution image, so that the image contains more detail information, and the definition of the image is improved. The technology has important practical significance, for example, in the field of safety monitoring, due to cost limitation, monitoring video acquisition equipment acquires video frames lacking effective information, and safety monitoring greatly depends on high-resolution images with clear information. By adopting the image super-resolution technology, the details of the video frame can be increased. The supplement of such information can provide effective evidence for fighting crimes. At present, the image super-resolution is used as a pre-processing technology, so that the precision of tasks such as target detection, face recognition, abnormity early warning and the like in the safety field can be effectively improved.
Previous methods used for super resolution of images are either interpolation-based or reconstruction-based methods. The super-resolution of an image based on an interpolation mode is an algorithm applied firstly in the super-resolution field, and the algorithm is based on a fixed polynomial calculation mode and calculates a pixel value of an interpolation position by using an existing pixel value, such as bilinear interpolation, bicubic interpolation and Lanczos scaling. The reconstruction-based method adopts strict prior knowledge as constraint, and finds a proper reconstruction function in a constraint space, so that a high-resolution image with detail information is reconstructed. These algorithms often suffer from the problem of the image being too smooth to recover the texture details of the image well.
In recent years, with the development of deep learning and convolutional neural networks, the image super-resolution technology has made a great breakthrough, and the research on the super-resolution of face images is also receiving more and more attention from researchers. However, some work on combining the face attributes is to concatenate face attribute information on shallow features, while the face image is a special image with high structure and symmetry, and the deep features can better represent the structure information of the face image. At present, models of existing models are constructed through structures such as an hourglass network, a residual error network or an automatic encoder, and the structures are not enough for supporting deep model construction and are limited to simple feature extraction network structures.
Disclosure of Invention
The invention provides a method, a system, a device and a storage medium for enhancing the resolution of a visual target image, which can further combine attribute information and deep facial structure characteristics to construct a deeper network, so that the network can acquire a human face image with high definition based on specific human face attribute prior, and the technical problems can be solved or at least partially solved.
In order to achieve the purpose, the invention adopts the following technical scheme:
in a first aspect, a method for enhancing the resolution of an image of a visual target is provided, which includes:
processing the low-resolution face image to be processed and the corresponding face attribute by adopting a pre-trained face image super-resolution model, and outputting a high-resolution face image; the training method of the face image super-resolution model comprises the following steps:
acquiring training samples, wherein the training samples comprise high-resolution face image samples, low-resolution face image samples and a preset number of face attribute samples corresponding to the high-resolution face image samples and the low-resolution face image samples;
and establishing a face image super-resolution model based on a preset loss function and the high-resolution face image sample according to the training sample.
Optionally, the acquiring a training sample, where the training sample includes a high-resolution face image sample, a low-resolution face image sample, and a preset number of face attribute samples corresponding to the high-resolution face image sample, includes:
collecting a high-resolution face image sample, and obtaining the high-resolution face image sample by adopting a face image data set CelebA and backing up the high-resolution face image sample;
adopting an image scaling algorithm to carry out down-sampling on the high-resolution face image sample to generate a low-resolution face image sample;
and selecting attributes which are beneficial to super-resolution of the face image in the face image data set CelebA as corresponding face attribute samples, and selecting a preset number of face attribute samples.
Optionally, the establishing a face image super-resolution model based on a preset loss function and a high-resolution face image sample according to the training sample includes:
acquiring low-resolution face image samples and a corresponding preset number of face attribute samples;
extracting shallow features of the low-resolution face image based on a convolutional neural network to generate low-resolution face image features;
combining with an automatic encoder, and acquiring the structural characteristics of the face image by carrying out operations of encoding compression and decoding reduction on the low-resolution face image characteristics;
extracting deep features from the serial connection of the structural features of the face images and the corresponding preset number of face attributes by adopting a double residual dense connection network;
pixel rearrangement upsampling is adopted to enlarge the characteristic dimension and reconstruct a high-resolution face image;
and reversely converging the reconstructed high-resolution face image and the backed-up high-resolution face image sample based on a preset loss function, and establishing a face image super-resolution model.
Optionally, the obtaining, in combination with an automatic encoder, the structural features of the face image by performing operations of encoding compression and decoding restoration on the low-resolution face image features includes:
the encoder downsamples using a maximum pooling operation, expands or compresses the number of characteristic channels using 2 convolutional layers with a convolutional kernel size of 3;
the decoder performs upsampling using deconvolution, expanding or compressing the number of feature channels using 2 convolutional layers of convolution kernel size 3.
Optionally, the extracting deep features of the series connection of the structural features of the face image and the corresponding preset number of face attributes by using a dual residual dense connection network includes:
carrying out size transformation on feature maps input by a preset number of face attribute variables;
connecting the structural features of the face image and the face attribute variables in series in a channel dimension;
compressing the number of channels of the characteristic graphs connected in series through a convolution layer with the convolution kernel size of 3;
and extracting deep features from the features with the compressed channel number through a double residual dense connection network.
Optionally, the reconstructing a high-resolution face image by enlarging a feature scale through pixel rearrangement and upsampling includes:
based on pixel rearrangement, the number of channels is expanded to 4 times of the original number by one layer of convolution operation, and then after rearrangement is carried out by utilizing specific positions of the channels, a result that the characteristic scale is expanded by 2 times and the number of the channels is reduced to be consistent with the input is obtained; in the implementation of the model, aiming at a multiplied by 4 amplification factor model, two pixel rearrangement upsampling modules are cascaded to obtain output characteristics amplified by 4 times;
and reconstructing a high-resolution face image by adopting 2 convolution layers with convolution kernel size of 3.
Optionally, the processing the low-resolution face image to be processed and the face attribute corresponding to the low-resolution face image by using the pre-trained face image super-resolution model and outputting the high-resolution face image includes:
inputting the low-resolution face image into a face structural feature extraction model to obtain face structural features;
and connecting the human face structural features and the human face attribute variables in series, inputting the human face structural features and the human face attribute variables into a subsequent human face image super-resolution model, and outputting a high-resolution human face image.
In a second aspect, a system for visual target image resolution enhancement is provided, comprising:
the output module is used for processing the low-resolution face image to be processed and the face attribute corresponding to the low-resolution face image to be processed by adopting a pre-trained face image super-resolution model and outputting a high-resolution face image; the training module of the face image super-resolution model comprises:
the acquisition submodule is used for acquiring training samples, and the training samples comprise high-resolution face image samples, low-resolution face image samples and a preset number of face attribute samples corresponding to the high-resolution face image samples;
and the model establishing submodule is used for establishing a face image super-resolution model based on a preset loss function and a high-resolution face image sample according to the training sample.
Optionally, the acquisition sub-module comprises:
the acquisition unit is used for acquiring a high-resolution face image sample, and obtaining and backing up the high-resolution face image sample by adopting a public large-scale face image data set CelebA;
the sampling unit is used for carrying out downsampling on the high-resolution face image sample by adopting an image scaling algorithm to generate a low-resolution face image sample;
and the selecting unit is used for selecting a preset number of face attribute samples by taking the attribute which is in favor of super-resolution of the face image in the face image data set CelebA as the corresponding face attribute samples. Wherein the preset number may be ten.
Optionally, the model building submodule includes:
the acquisition unit is used for acquiring low-resolution face image samples and corresponding preset number of face attribute samples;
the first extraction unit is used for extracting shallow features from the low-resolution face image based on a convolutional neural network to generate low-resolution face image features;
the coding and decoding unit is used for combining an automatic coder and obtaining the structural characteristics of the human face image by the operations of coding compression and decoding reduction on the low-resolution human face image characteristics;
the second extraction unit is used for extracting deep features from the serial connection of the structural features of the face images and the corresponding preset number of face attributes by adopting a double residual dense connection network;
the reconstruction unit is used for enlarging the characteristic scale by adopting pixel rearrangement upsampling and reconstructing a high-resolution face image;
and the model establishing unit is used for reversely converging the reconstructed high-resolution face image and the backed-up high-resolution face image sample based on a preset loss function and establishing a face image super-resolution model.
Optionally, the automatic encoder is provided with an encoder and a decoder, and the encoding and decoding unit includes:
the coding subunit is used for performing down-sampling by using the maximum pooling operation after shallow feature extraction, and then expanding the number of feature channels by using 2 convolutional layers with the convolutional kernel size of 3;
and the decoding subunit is used for performing up-sampling by using deconvolution before the human face attribute variable is input, and then expanding the number of characteristic channels by using 2 convolution layers with convolution kernel size of 3.
Optionally, the second extraction unit includes:
the transformation subunit is used for carrying out size transformation on the feature maps input by the preset number of human face attribute variables;
the concatenation subunit is used for concatenating the structural features of the face image and the face attribute variables in a channel dimension;
the compression subunit is used for compressing the channel number of the characteristic graphs which are connected in series through a convolution layer with the convolution kernel size of 3;
and the extraction subunit is used for extracting the deep features from the features with the compressed channel number through a double residual dense connection network.
Optionally, the reconstruction unit comprises:
the rearrangement subunit is used for expanding the number of the channels to 4 times of the original number by one layer of convolution operation based on pixel rearrangement, and then obtaining a result that the characteristic scale is expanded by 2 times and the number of the channels is reduced to be consistent with the input number after the rearrangement is carried out by utilizing the specific position of the channels; in the implementation of the model, aiming at a multiplied by 4 amplification factor model, two pixel rearrangement upsampling modules are cascaded to obtain output characteristics amplified by 4 times;
and the reconstruction subunit is used for reconstructing a high-resolution face image by adopting 2 convolution layers with the convolution kernel size of 3.
Optionally, the output module includes:
the extraction submodule is used for inputting the low-resolution face image into the face structure feature extraction model to obtain the face structure feature;
and the output sub-module is used for connecting the face structure characteristics with the face attribute variables in series, inputting the follow-up face image super-resolution model and outputting a high-resolution face image.
In a third aspect, an apparatus is provided that includes a memory for storing at least one program and a processor for loading the at least one program to perform the method for visual target image resolution enhancement as described above.
In a fourth aspect, a storage medium is provided that stores a processor-executable program which, when executed by a processor, is configured to perform the method of visual target image resolution enhancement as described above.
Compared with the prior art, the embodiment of the invention has the following beneficial effects:
the embodiment of the invention provides a method, a system, a device and a storage medium for enhancing the resolution of a visual target image, which adopt training samples comprising high-resolution face image samples, low-resolution face image samples and a preset number of corresponding face attribute samples, and accurately and efficiently realize the effect of restoring a low-resolution face image into a high-resolution face image by performing resolution processing on the acquired low-resolution face image to be processed based on a face image super-resolution model established by a preset loss function, and can acquire the face image with higher definition based on a specific face attribute prior.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
The structures, ratios, sizes, and the like shown in the present specification are only used for matching with the contents disclosed in the specification, so that those skilled in the art can understand and read the present invention, and do not limit the conditions for implementing the present invention, so that the present invention has no technical significance, and any structural modifications, changes in the ratio relationship, or adjustments of the sizes, without affecting the functions and purposes of the present invention, should still fall within the scope covered by the contents disclosed in the present invention.
FIG. 1 is a flowchart illustrating steps of a method for enhancing the resolution of an image of a visual target according to an embodiment of the present invention;
FIG. 2 is a block diagram of a system for enhancing the resolution of an image of a visual target according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart of a super-resolution model of a face image based on reconstruction loss in an embodiment of the present invention;
FIG. 4 is a schematic flow chart of a face image super-resolution model based on countermeasure loss in the embodiment of the invention;
FIG. 5 is a schematic diagram illustrating the details of the operation of the rearrangement sub-unit in the embodiment of the present invention;
fig. 6 shows the screened attributes that contribute to super-resolution of face images in the embodiment of the present invention.
Detailed Description
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the embodiments described below are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
As shown in fig. 1, the present embodiment provides a method for enhancing the resolution of an image of a visual target, the method comprising the following steps:
s3, processing the low-resolution face image to be processed and the face attribute corresponding to the low-resolution face image to be processed by adopting a pre-trained face image super-resolution model, and outputting a high-resolution face image; the training method of the face image super-resolution model comprises the following steps:
s1, collecting training samples, wherein the training samples comprise high-resolution face image samples, low-resolution face image samples and a preset number of face attribute samples corresponding to the high-resolution face image samples and the low-resolution face image samples; optionally, the preset number is ten;
and S2, establishing a face image super-resolution model based on the preset loss function and the high-resolution face image sample according to the training sample.
As an alternative implementation manner of this embodiment, step S1 includes:
s11, collecting high-resolution face image samples, and obtaining and backing up the high-resolution face image samples by adopting a face image data set CelebA;
s12, down-sampling the high-resolution face image sample by adopting an image scaling algorithm to generate a low-resolution face image sample;
and S13, selecting the attributes which are beneficial to super resolution of the face image in the face image data set CelebA as corresponding face attribute samples, and selecting a preset number of face attribute samples.
Therefore, training samples can be established according to the high-resolution face image samples, the low-resolution face image samples and the corresponding preset number of face attribute samples.
As an alternative implementation manner of this embodiment, step S2 includes:
s21, acquiring low-resolution face image samples and corresponding preset number of face attribute samples;
s22, extracting shallow features of the low-resolution face image based on the convolutional neural network to generate low-resolution face image features;
s23, combining with an automatic encoder, obtaining the structural characteristics of the face image by the operations of encoding compression and decoding reduction of the low-resolution face image characteristics;
s24, extracting deep features of the structural features of the face images and the corresponding series connection of the preset number of face attributes by adopting a double residual dense connection network;
s25, enlarging the characteristic scale by adopting pixel rearrangement and up-sampling to reconstruct a high-resolution face image;
and S26, reversely converging the reconstructed high-resolution face image and the backed high-resolution face image sample based on a preset loss function, and establishing a face image super-resolution model.
As an alternative implementation manner of this embodiment, the automatic encoder is provided with an encoder and a decoder, and step S23 includes:
s231, the encoder performs downsampling by using the maximum pooling operation, and expands or compresses the number of characteristic channels by using 2 convolutional layers with the convolutional kernel size of 3;
and S232, the decoder performs up-sampling by using deconvolution, and expands or compresses the number of characteristic channels by using 2 convolutional layers with the convolutional kernel size of 3.
As an alternative implementation manner of this embodiment, step S24 includes:
s241, carrying out size transformation on feature graphs input by the preset number of face attribute variables;
s242, connecting the structural features of the face image and the face attribute variables in series in a channel dimension;
s243, compressing the channel number of the characteristic images which are connected in series through a convolution layer with the convolution kernel size of 3;
and S244, extracting deep features from the features with the compressed channel number through a double residual dense connection network.
As an alternative implementation manner of this embodiment, step S25 includes:
s251, based on pixel rearrangement, expanding the number of channels to 4 times of the original number by one layer of convolution operation, and then obtaining a result that the characteristic scale is expanded by 2 times and the number of channels is reduced to be consistent with the input number after the channels are rearranged at specific positions; in the implementation of the model, aiming at a multiplied by 4 amplification factor model, two pixel rearrangement upsampling modules are cascaded to obtain output characteristics amplified by 4 times;
and S252, reconstructing a high-resolution face image by adopting 2 convolution layers with convolution kernel size of 3.
As an alternative implementation manner of this embodiment, step S3 includes:
s31, inputting the low-resolution face image into a face structure feature extraction model to obtain face structure features;
and S32, connecting the face structure characteristics with the face attribute variables in series, inputting a subsequent face image super-resolution model, and outputting a high-resolution face image.
Therefore, in the method for enhancing the resolution of the visual target image provided by this embodiment, the training samples including the high-resolution face image samples, the low-resolution face image samples, and the corresponding preset number of face attribute samples are used, and the face image super-resolution model established based on the preset loss function is used to perform resolution processing on the acquired low-resolution face image to be processed, so that the effect of recovering the low-resolution face image into the high-resolution face image can be accurately and efficiently achieved, and the face image with higher definition can be acquired based on the specific face attribute in a priori.
Example 2
As shown in fig. 2, the present embodiment provides a system for enhancing the resolution of an image of a visual target, which can be used to implement the method provided in embodiment 1, and the system includes:
the output module 20 is configured to process the low-resolution face image to be processed and the face attribute corresponding to the low-resolution face image to be processed by using a pre-trained face image super-resolution model, and output a high-resolution face image; the training module 10 of the face image super-resolution model comprises:
the acquisition submodule 11 is configured to acquire a training sample, where the training sample includes a high-resolution face image sample, a low-resolution face image sample, and a preset number of face attribute samples corresponding to the high-resolution face image sample and the low-resolution face image sample;
and the model establishing submodule 12 is used for establishing a face image super-resolution model based on a preset loss function and the high-resolution face image sample according to the training sample.
As an optional implementation manner of this embodiment, the acquisition sub-module 11 includes:
the acquisition unit is used for acquiring a high-resolution face image sample, and obtaining and backing up the high-resolution face image sample by adopting a public large-scale face image data set CelebA;
the sampling unit is used for carrying out down-sampling on the high-resolution face image sample by adopting an image scaling algorithm to generate a low-resolution face image sample;
and the selecting unit is used for selecting a preset number of face attribute samples by taking the attribute which is in favor of super-resolution of the face image in the face image data set CelebA as the corresponding face attribute samples. Wherein the preset number may be ten.
As an optional implementation manner of this embodiment, the model building sub-module 12 includes:
the acquisition unit is used for acquiring low-resolution face image samples and corresponding preset number of face attribute samples;
the first extraction unit is used for extracting shallow features from the low-resolution face image based on a convolutional neural network to generate low-resolution face image features;
the encoding and decoding unit is used for combining an automatic encoder and acquiring the structural characteristics of the face image by performing encoding compression and decoding reduction on the low-resolution face image characteristics;
the second extraction unit is used for extracting deep features from the serial connection of the structural features of the face image and the corresponding preset number of face attributes by adopting a double residual dense connection network;
the reconstruction unit is used for enlarging the characteristic scale by adopting pixel rearrangement upsampling and reconstructing a high-resolution face image;
and the model establishing unit is used for reversely converging the reconstructed high-resolution face image and the backed-up high-resolution face image sample based on a preset loss function and establishing a face image super-resolution model.
As an alternative implementation of this embodiment, the automatic encoder is provided with an encoder and a decoder, and the encoding and decoding unit includes:
the coding subunit is used for performing down-sampling by using the maximum pooling operation after shallow feature extraction, and then expanding the number of feature channels by using 2 convolutional layers with the convolutional kernel size of 3;
and the decoding subunit is used for performing up-sampling by using deconvolution before the human face attribute variable is input, and expanding the number of characteristic channels by using 2 convolution layers with convolution kernel size of 3.
As an optional implementation manner of this embodiment, the second extraction unit includes:
the transformation subunit is used for carrying out size transformation on the feature maps input by the preset number of human face attribute variables;
the series subunit is used for connecting the structural features of the face image and the face attribute variables in series in a channel dimension;
the compression subunit is used for compressing the channel number of the characteristic graphs which are connected in series through a convolution layer with the convolution kernel size of 3;
and the extraction subunit is used for extracting the deep features from the features with the compressed channel number through a double residual dense connection network.
As an optional implementation manner of this embodiment, the reconstruction unit includes:
the rearrangement subunit is used for expanding the number of the channels to 4 times of the original number by one layer of convolution operation based on pixel rearrangement, and then obtaining a result that the characteristic scale is expanded by 2 times and the number of the channels is reduced to be consistent with the input number after the rearrangement is carried out by utilizing the specific position of the channels; in the implementation of the model, aiming at a multiplied by 4 amplification factor model, two pixel rearrangement upsampling modules are cascaded to obtain output characteristics amplified by 4 times;
and the reconstruction subunit is used for reconstructing a high-resolution face image by adopting 2 convolution layers with the convolution kernel size of 3.
As an optional implementation manner of this embodiment, the output module 20 includes:
the extraction submodule is used for inputting the low-resolution face image into the face structure feature extraction model to obtain the face structure feature;
and the output sub-module is used for connecting the face structure characteristics with the face attribute variables in series, inputting the follow-up face image super-resolution model and outputting a high-resolution face image.
Therefore, in the system for enhancing the resolution of the visual target image provided by this embodiment, the training samples including the high-resolution face image samples, the low-resolution face image samples, and the preset number of face attribute samples corresponding to the high-resolution face image samples are adopted, and the face image super-resolution model established based on the preset loss function performs resolution processing on the acquired low-resolution face image to be processed, so that the effect of recovering the low-resolution face image into the high-resolution face image can be accurately and efficiently achieved, and the face image with higher definition can be acquired based on the specific face attribute in a priori.
Example 3
The present embodiments provide an apparatus, comprising:
at least one processor;
at least one memory for storing at least one program;
when the at least one program is executed by the at least one processor, the at least one processor may be configured to perform the steps of a method for visual target image resolution enhancement as described in embodiment 1 above.
Example 4
A storage medium having stored therein a program executable by a processor, the program executable by the processor being for performing the method steps of visual target image resolution enhancement as in embodiment 1.
Example 5
Referring to fig. 3 to 6, a flow chart of a method for enhancing the resolution of an image of a visual target is provided, which specifically includes the following steps:
A. collecting training samples, wherein the training samples comprise high-resolution face image samples, low-resolution face image samples and ten face attribute samples corresponding to the high-resolution face image samples and the low-resolution face image samples;
B. establishing a face image super-resolution model according to the acquired training samples;
C. acquiring a low-resolution face image to be processed and a face attribute corresponding to the low-resolution face image;
D. and processing the low-resolution face image to be processed and the corresponding face attribute thereof through the face image super-resolution model, and outputting the high-resolution face image.
Wherein, the specific implementation scheme of the step A is as follows:
a1, acquiring a public large-scale face image data set CelebA as a training data set. The data set is composed of 202599 human face images, the data set comprises an aligned subdata set and a non-aligned subdata set, the data sets respectively contain artificial labeling information, the aligned subset comprises 5 feature points including eyes, noses and mouths, and each piece of data comprises 40 attribute labels. And selecting the aligned face data subsets as a training set and a test set, and selecting 10 attributes shown in the figure 4 by screening the attributes which are beneficial to super-resolution of the face image. Because a large number of background areas exist in the original face pictures, in order to better extract the feature information of the face, firstly, the face images with the length and width of 120 are cut out from the pictures in a center cutting mode to be used as target high-resolution images.
A2, using MATLAB's "imresize" function to make 4 times of double cubic down sampling to the high resolution face image, and getting the corresponding low resolution face image, whose size is 30 x 30, forming { I }HR,ILRVar }. And a horizontal or vertical overturning, 90-degree rotation and random cutting of image blocks are adopted as a data enhancement mode.
The specific embodiment of the step B is as follows:
b1, taking the low-resolution face image as network input and marking as ILR
B2, use shallow feature extraction module NetfeaAnd performing shallow feature extraction on the input image, wherein the obtained features comprise 64 channels and have the same size as the input image. The shallow feature extraction module consists of a 3 × 3 convolutional layer and an activation function. The features of the shallow feature extraction module can be expressed by the following formula:
FLR=Netfea(ILR)
b3, using a face structure feature extraction module which extracts the shallow feature FLRInputting an automatic encoder, and acquiring structural characteristics F of the face image through operations of encoding compression and decoding reductionstc. The encoder uses maximum pooling for the operation down-sampling, the decoder uses deconvolution for up-sampling, and both the encoder and decoder use 2 convolutional layers with a convolutional kernel size of 3 to expand or compress the number of feature channels. This module can be expressed as:
Fstc=NetDe(NetEn(FLR))
net hereDeAnd NetEnThe encoder and decoder networks are in opposite directions for structural consistency.
B4, processing the guided attribute variable var and the facial image structural feature F by using the deep feature extraction modulestcFeatures concatenated in channel dimensions to obtain deep features Fdeep. The deep feature extraction module consists of a layer of 1 multiplied by 1 convolutional layer and a double residual dense connection network. The 1 × 1 convolutional layer is used to compress the number of feature map channels. The dual residual dense connection network comprises a plurality of residual dense connection blocks which are cascaded, and jump connections are added outside each dense connection block and the plurality of cascaded blocks. The residual error dense connection block can improve the network performance by utilizing multi-level characteristic information, construct a deep network and avoid the problem of gradient disappearance. The convolution kernels used by the depth feature extraction module RRDB are all 3 multiplied by 3, and the default channel number is 64. This process can be expressed as:
Fdeep=NetRRDB(conv(Fstc,var))
b5, using an upsampling module to reconstruct a large-scale feature FSR. The operation of the up-sampling module is to obtain the deep layer characteristic FdeepAnd then, expanding the number of channels to 4 times of the original number by one layer of convolution operation, and then obtaining a result that the characteristic scale is expanded by 2 times and the number of channels is reduced to be consistent with the input after rearrangement is carried out by utilizing the specific position of the channels. In the implementation of the model, for a x 4 magnification factor model, two pixel rearrangement upsampling modules are cascaded to obtain an output characteristic at 4 times magnification. The moldA block may be represented as:
FSR=Netup(Fdeep)
b6, feature F after amplificationSRReconstructing and outputting super-resolution image I with RGB three channels through reconstruction moduleSR. The reconstruction module consists of two 3 x 3 convolutional layers and an activation function. This module can be expressed as:
ISR=Netrec(FSR)
b7, adopting loss function to reconstruct high-resolution face image ISRAnd reversely converging with the backed-up high-resolution face image sample to establish a face image super-resolution model. There are two cases, respectively, based on models of reconstruction loss and countermeasures against loss.
Based on a model of reconstruction loss, the loss function uses L1Loss function, computing network generated high resolution face image ISRAnd the real high-resolution face image I in the sampleHRThe error between. The loss function can constrain the generated image to be closer to the real image, L1The formula for the loss function is as follows:
Figure BDA0003128957170000161
here, N represents the total number of pixels, and the data set uses a pattern of sRGB color space, so the total number of pixels is N — H × W × C. Wherein W, H, C represent the width, height and channel number of the high-resolution face image respectively. And setting a learning rate, reversely propagating the gradient by minimizing the loss function error, updating network parameters, and continuously iterating until the network is trained to be convergent.
Based on the model for resisting loss, a discriminator structure is added on the basis of the model based on reconstruction loss. This discriminator inputs the generated super-resolution image and the real image, respectively, to the discriminator network of the VGG structure. The feature extraction block of the structure consists of a convolution layer with convolution kernel size of 3 and step length of 1, a batch normalization layer and a LeakyReLU activation layer, and meanwhile, the scale compression block of the structure consists of a convolution layer with convolution kernel size of 4 and step length of 2, a batch normalization layer and a LeakyReLU activation layer. After passing through the feature extraction block and the scale compression block for a plurality of times, the features with the size of 4 multiplied by c are obtained, and c is a hyper-parameter of the setting channel. Then, the corresponding attribute variables are connected in series with the features proposed by the discriminator, and a binary judgment result is output after passing through the full connection layer, and is expressed by a formula as follows:
{Vreal,Vfake}=NetD({ISR,IHR},var)
after joining the arbiter network, the model becomes an attribute-guided conditional generator arbiter network whose generator loss is made up of three parts:
LG=λ1*Lrec2*LVGG3*Ladv
λ1,λ2and λ3Corresponding to the three lost weights, respectively. In order to ensure that the reconstructed image is similar to the real image in image content as much as possible, the image space is constrained pixel by pixel through reconstruction loss, wherein the reconstruction loss uses L1Loss function:
Figure BDA0003128957170000171
where N × W × C represents the total pixels of the image, and W, H, and C represent the width, height, and number of channels of the high-resolution face image, respectively.
Meanwhile, in order to increase the texture information of the image, the feature information extracted by the reconstructed image through the fixed classification network VGG should be similar to the real image, and the perceptual loss is used for constraining the reconstructed image, and is defined as follows:
Figure BDA0003128957170000172
here, M — H × W × C indicates the size of the specific feature map.
The aim of the loss countermeasure is to make the reconstructed image and the real image as close as possible in distribution, which is defined as follows:
Ladv=log(1-NetD(NetG(ILR,var)))
where var represents attribute information of a human face.
The discriminator loss is the matching loss after adding the attribute variables, and the loss of the positive sample is adopted to restrict:
LD=log(1-NetD(IHR,var))+log(NetD(NetG(ILR,var),var))
and setting a learning rate, reversely propagating the gradient by minimizing the loss function error, updating network parameters, and continuously iterating until the network is trained to be convergent.
In the backward convergence training, the batch size is set to 16, and the initial learning rate is set to 10-4. During the iterative training process, according to the convergence condition of the network, on the model based on the reconstruction loss, when the total number of the trained iterations reaches {2 x 10 }5,4×105,6×105,8×105The learning rate is attenuated by half; on the model based on the resistance loss, when the total number of training iterations reaches {5 × 10 }4,1×105,2×105,3×105The learning rate is halved. This example uses an ADAM optimizer to perform inverse gradient propagation on models with ADAM parameters set to β1=0.9,β20.999 and e 10-8. Model usage L based on reconstruction loss1Loss function, computing network generated high resolution face image ISRAnd the original high-resolution face image IHRAnd (4) reversely propagating and updating the network parameters by minimizing the error, and continuously iterating until the network is trained to be converged. Based on a loss-resisting model, an L1 loss function is used for ensuring that a reconstructed image is similar to a real image on the image content as much as possible, perceptual loss is used for ensuring that image texture information is similar to the real image as much as possible, and the reconstructed image and the real image are distributed on the basis of the loss-resisting modelThe coefficients of these loss functions are set as close as possible, and the network parameters are updated by back-propagating by minimizing the sum of these errors, and iteration is continued until the network is trained to converge.
The scheme of the step C is specifically as follows:
and acquiring a pre-divided CelebA test data set, wherein the data set comprises various low-resolution face images and face attribute variables corresponding to the low-resolution face images.
The scheme of the step D is specifically as follows:
inputting the low-resolution face image of the CelebA test data set to be restored into a trained face image super-resolution model, performing the implementation scheme of the step B on the face image of the CelebA test data set through the face image super-resolution model, connecting the extracted structural features of the face image and the face attribute variables in series before a deep feature extraction module, and outputting the high-resolution face image through subsequent network processing.
In summary, the method, the system, the apparatus, and the storage medium for enhancing the resolution of the visual target image provided by the embodiments of the present invention adopt the training samples including the high-resolution face image samples, the low-resolution face image samples, and the corresponding preset number of face attribute samples, and perform resolution processing on the acquired low-resolution face image to be processed based on the face image super-resolution model established by the preset loss function, so that the effect of recovering the low-resolution face image into the high-resolution face image can be accurately and efficiently achieved, and the face image with higher definition can be acquired based on the specific face attribute prior.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for resolution enhancement of an image of a visual target, comprising:
processing the low-resolution face image to be processed and the corresponding face attribute by adopting a pre-trained face image super-resolution model, and outputting a high-resolution face image; the training method of the face image super-resolution model comprises the following steps:
acquiring training samples, wherein the training samples comprise high-resolution face image samples, low-resolution face image samples and a preset number of face attribute samples corresponding to the high-resolution face image samples and the low-resolution face image samples;
and establishing a face image super-resolution model based on a preset loss function and the high-resolution face image sample according to the training sample.
2. The method of claim 1, wherein the acquiring training samples comprising high resolution face image samples, low resolution face image samples and a predetermined number of face attribute samples comprises:
collecting a high-resolution face image sample, and obtaining the high-resolution face image sample by adopting a face image data set CelebA and backing up the high-resolution face image sample;
adopting an image scaling algorithm to carry out down-sampling on the high-resolution face image sample to generate a low-resolution face image sample;
and selecting attributes which are beneficial to super-resolution of the face image in the face image data set CelebA as corresponding face attribute samples, and selecting a preset number of face attribute samples.
3. The method for resolution enhancement of visual target images according to claim 2, wherein said building a super-resolution model of facial images based on a pre-determined loss function and high resolution facial image samples from said training samples comprises:
acquiring low-resolution face image samples and a corresponding preset number of face attribute samples;
extracting shallow features of the low-resolution face image based on a convolutional neural network to generate low-resolution face image features;
combining with an automatic encoder, and acquiring the structural characteristics of the face image by carrying out operations of encoding compression and decoding reduction on the low-resolution face image characteristics;
extracting deep features from the serial connection of the structural features of the face images and the corresponding preset number of face attributes by adopting a double residual dense connection network;
pixel rearrangement upsampling is adopted to enlarge the characteristic dimension and reconstruct a high-resolution face image;
and reversely converging the reconstructed high-resolution face image and the backed-up high-resolution face image sample based on a preset loss function, and establishing a face image super-resolution model.
4. The method for resolution enhancement of visual target images according to claim 3, wherein said combining with an automatic encoder, obtaining structural features of the facial image by operations of encoding compression and decoding restoration on said low resolution facial image features, comprises:
the encoder downsamples using a maximum pooling operation, expands or compresses the number of characteristic channels using 2 convolutional layers with a convolutional kernel size of 3;
the decoder performs upsampling using deconvolution, expanding or compressing the number of feature channels using 2 convolutional layers of convolution kernel size 3.
5. The method of claim 3, wherein the extracting deep features from the concatenation of the structural features of the face image and the corresponding predetermined number of face attributes using a dual residual dense connectivity network comprises:
carrying out size transformation on feature maps input by a preset number of face attribute variables;
connecting the structural features of the face image and the face attribute variables in series in a channel dimension;
compressing the number of channels of the characteristic graphs connected in series through a convolution layer with the convolution kernel size of 3;
and extracting deep features from the features with the compressed channel number through a double residual dense connection network.
6. The method of enhancing the resolution of a visual target image according to claim 3, wherein said reconstructing a high resolution face image by upscaling the feature size using pixel rebinning upsampling comprises:
based on pixel rearrangement, the number of channels is expanded to 4 times of the original number by one layer of convolution operation, and then after rearrangement is carried out by utilizing specific positions of the channels, a result that the characteristic scale is expanded by 2 times and the number of the channels is reduced to be consistent with the input is obtained; in the implementation of the model, aiming at a multiplied by 4 amplification factor model, two pixel rearrangement upsampling modules are cascaded to obtain output characteristics amplified by 4 times;
and reconstructing a high-resolution face image by adopting 2 convolution layers with convolution kernel size of 3.
7. The method for enhancing the resolution of a visual target image according to claim 1, wherein the processing the low-resolution face image to be processed and the corresponding face attribute thereof by using the pre-trained super-resolution model of the face image and outputting the high-resolution face image comprises:
inputting the low-resolution face image into a face structural feature extraction model to obtain face structural features;
and connecting the human face structural features and the human face attribute variables in series, inputting the human face structural features and the human face attribute variables into a subsequent human face image super-resolution model, and outputting a high-resolution human face image.
8. A system for visual target image resolution enhancement, comprising:
the output module is used for processing the low-resolution face image to be processed and the face attribute corresponding to the low-resolution face image to be processed by adopting a pre-trained face image super-resolution model and outputting a high-resolution face image; the training module of the face image super-resolution model comprises:
the acquisition submodule is used for acquiring training samples, and the training samples comprise high-resolution face image samples, low-resolution face image samples and a preset number of face attribute samples corresponding to the high-resolution face image samples;
and the model establishing submodule is used for establishing a face image super-resolution model based on a preset loss function and a high-resolution face image sample according to the training sample.
9. An apparatus comprising a memory for storing at least one program and a processor for loading the at least one program to perform the method of any one of claims 1-7.
10. A storage medium storing a program executable by a processor, the program executable by the processor being configured to perform the method of any one of claims 1-7 when executed by the processor.
CN202110698882.3A 2021-06-23 2021-06-23 Method, system, device and storage medium for enhancing resolution of visual target image Pending CN113554058A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110698882.3A CN113554058A (en) 2021-06-23 2021-06-23 Method, system, device and storage medium for enhancing resolution of visual target image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110698882.3A CN113554058A (en) 2021-06-23 2021-06-23 Method, system, device and storage medium for enhancing resolution of visual target image

Publications (1)

Publication Number Publication Date
CN113554058A true CN113554058A (en) 2021-10-26

Family

ID=78130832

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110698882.3A Pending CN113554058A (en) 2021-06-23 2021-06-23 Method, system, device and storage medium for enhancing resolution of visual target image

Country Status (1)

Country Link
CN (1) CN113554058A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114238904A (en) * 2021-12-08 2022-03-25 马上消费金融股份有限公司 Identity recognition method, and training method and device of two-channel hyper-resolution model
CN114781601A (en) * 2022-04-06 2022-07-22 北京科技大学 Image super-resolution method and device
CN115358917A (en) * 2022-07-14 2022-11-18 北京汉仪创新科技股份有限公司 Method, device, medium and system for transferring non-aligned faces in hand-drawing style
CN115376188A (en) * 2022-08-17 2022-11-22 天翼爱音乐文化科技有限公司 Video call processing method, system, electronic equipment and storage medium
CN116030048A (en) * 2023-03-27 2023-04-28 山东鹰眼机械科技有限公司 Lamp inspection machine and method thereof

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108062754A (en) * 2018-01-19 2018-05-22 深圳大学 Segmentation, recognition methods and device based on dense network image
CN108961186A (en) * 2018-06-29 2018-12-07 赵岩 A kind of old film reparation recasting method based on deep learning
CN110706157A (en) * 2019-09-18 2020-01-17 中国科学技术大学 Face super-resolution reconstruction method for generating confrontation network based on identity prior
CN111915484A (en) * 2020-07-06 2020-11-10 天津大学 Reference image guiding super-resolution method based on dense matching and self-adaptive fusion
CN112348743A (en) * 2020-11-06 2021-02-09 天津大学 Image super-resolution method fusing discriminant network and generation network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108062754A (en) * 2018-01-19 2018-05-22 深圳大学 Segmentation, recognition methods and device based on dense network image
CN108961186A (en) * 2018-06-29 2018-12-07 赵岩 A kind of old film reparation recasting method based on deep learning
CN110706157A (en) * 2019-09-18 2020-01-17 中国科学技术大学 Face super-resolution reconstruction method for generating confrontation network based on identity prior
CN111915484A (en) * 2020-07-06 2020-11-10 天津大学 Reference image guiding super-resolution method based on dense matching and self-adaptive fusion
CN112348743A (en) * 2020-11-06 2021-02-09 天津大学 Image super-resolution method fusing discriminant network and generation network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
覃家宇: ""多尺度子网络及结合先验信息的单幅图像超分辨率研究"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114238904A (en) * 2021-12-08 2022-03-25 马上消费金融股份有限公司 Identity recognition method, and training method and device of two-channel hyper-resolution model
CN114781601A (en) * 2022-04-06 2022-07-22 北京科技大学 Image super-resolution method and device
CN115358917A (en) * 2022-07-14 2022-11-18 北京汉仪创新科技股份有限公司 Method, device, medium and system for transferring non-aligned faces in hand-drawing style
CN115358917B (en) * 2022-07-14 2024-05-07 北京汉仪创新科技股份有限公司 Method, equipment, medium and system for migrating non-aligned faces of hand-painted styles
CN115376188A (en) * 2022-08-17 2022-11-22 天翼爱音乐文化科技有限公司 Video call processing method, system, electronic equipment and storage medium
CN115376188B (en) * 2022-08-17 2023-10-24 天翼爱音乐文化科技有限公司 Video call processing method, system, electronic equipment and storage medium
CN116030048A (en) * 2023-03-27 2023-04-28 山东鹰眼机械科技有限公司 Lamp inspection machine and method thereof

Similar Documents

Publication Publication Date Title
CN113139907B (en) Generation method, system, device and storage medium for visual resolution enhancement
CN113554058A (en) Method, system, device and storage medium for enhancing resolution of visual target image
CN109903228B (en) Image super-resolution reconstruction method based on convolutional neural network
CN110348487B (en) Hyperspectral image compression method and device based on deep learning
CN111028150B (en) Rapid space-time residual attention video super-resolution reconstruction method
Yang et al. Coupled dictionary training for image super-resolution
CN107240066A (en) Image super-resolution rebuilding algorithm based on shallow-layer and deep layer convolutional neural networks
Sun et al. Lightweight image super-resolution via weighted multi-scale residual network
CN109544448B (en) Group network super-resolution image reconstruction method of Laplacian pyramid structure
CN111105352A (en) Super-resolution image reconstruction method, system, computer device and storage medium
CN112801901A (en) Image deblurring algorithm based on block multi-scale convolution neural network
CN109949222B (en) Image super-resolution reconstruction method based on semantic graph
CN110349087B (en) RGB-D image high-quality grid generation method based on adaptive convolution
CN111292265A (en) Image restoration method based on generating type antagonistic neural network
KR102289045B1 (en) Method and Apparatus for Providing Multi-scale Object Image Restoration
CN111951167B (en) Super-resolution image reconstruction method, super-resolution image reconstruction device, computer equipment and storage medium
Wei et al. Improving resolution of medical images with deep dense convolutional neural network
CN111861886B (en) Image super-resolution reconstruction method based on multi-scale feedback network
Liang et al. Improved non-local iterative back-projection method for image super-resolution
CN115358932A (en) Multi-scale feature fusion face super-resolution reconstruction method and system
Hui et al. Two-stage convolutional network for image super-resolution
Li et al. Image super-resolution via feature-augmented random forest
CN112132158A (en) Visual picture information embedding method based on self-coding network
CN111951165A (en) Image processing method, image processing device, computer equipment and computer readable storage medium
CN109993701B (en) Depth map super-resolution reconstruction method based on pyramid structure

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination