CN113240584B - Multitasking gesture picture super-resolution method based on picture edge information - Google Patents

Multitasking gesture picture super-resolution method based on picture edge information Download PDF

Info

Publication number
CN113240584B
CN113240584B CN202110508733.6A CN202110508733A CN113240584B CN 113240584 B CN113240584 B CN 113240584B CN 202110508733 A CN202110508733 A CN 202110508733A CN 113240584 B CN113240584 B CN 113240584B
Authority
CN
China
Prior art keywords
resolution
super
picture
edge information
key point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110508733.6A
Other languages
Chinese (zh)
Other versions
CN113240584A (en
Inventor
方昱春
冉启材
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Shanghai for Science and Technology
Original Assignee
University of Shanghai for Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Shanghai for Science and Technology filed Critical University of Shanghai for Science and Technology
Priority to CN202110508733.6A priority Critical patent/CN113240584B/en
Publication of CN113240584A publication Critical patent/CN113240584A/en
Application granted granted Critical
Publication of CN113240584B publication Critical patent/CN113240584B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4046Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a multitasking gesture picture super-resolution method based on picture edge information, which specifically comprises the following steps: inputting a plurality of groups of 'low-resolution pictures-skeleton key points' into a super-resolution model; the low-resolution picture respectively executes a picture super-resolution task and an edge detection task through a super-resolution unit and an edge information unit; the features obtained by the super-resolution unit and the edge information unit are sent to a designed multi-scale feature fusion module to carry out feature fusion of 3 different scales; iteratively updating parameters of the model through designed edge loss functions and content loss functions until convergence; and inputting the low-resolution picture to be superseparated and the corresponding hand skeleton key point information into the model, and obtaining the finally generated super-resolution picture through forward transmission once. The method has good effect, generates the super-resolution picture which is more in line with the real scene, has high expandability, and can combine the latest super-resolution network and the edge detection network to improve the performance of the model.

Description

Multitasking gesture picture super-resolution method based on picture edge information
Technical Field
The invention relates to the field of computer vision, mainly relates to a super-resolution method of a single picture, in particular to a multi-task gesture picture super-resolution method based on picture edge information.
Background
The single image super-Resolution task is a typical inverse problem in computer vision, with the goal of reconstructing a High Resolution (HR) image from a Low Resolution (LR) input image. Image super-resolution technology is also widely applied in the real world, such as medical image processing, monitoring and security, and the like, and besides being capable of improving the quality of images and videos, the image super-resolution technology is also used as an upstream task of other advanced computer vision tasks (such as image segmentation, image recognition, action positioning, and the like) so as to improve the performance of the advanced computer vision tasks.
The existing general super-resolution technology has two main problems: first, convolutional Neural Network (CNN) based methods typically use Mean Square Error (MSE) as an objective function of the network, which ignores high frequency information of a large number of pictures by calculating the distance between HR and LR image pixels, resulting in a blurred structure of the finally generated picture; second, the method based on the generative countermeasure network retains the high frequency information of the picture, but often generates distortion of the picture. The existence of these two types of problems severely limits the application of image super-resolution technology in real life. Therefore, finding a technology capable of reducing image deformation and retaining image high-frequency information is a problem to be solved in the current image super-resolution task.
Disclosure of Invention
The invention aims to provide a multitasking gesture picture super-resolution method based on picture edge information, so as to solve the problems in the prior art and make the method more suitable for processing gesture pictures.
In order to achieve the above object, the present invention provides the following solutions:
the invention provides a multitasking gesture picture super-resolution method based on picture edge information, which comprises the following steps:
information acquisition, namely acquiring a plurality of high-resolution gesture pictures, and preprocessing data of the high-resolution gesture pictures to acquire low-resolution gesture pictures and skeletal key point information of hands;
information processing, namely constructing a super-resolution model based on the low-resolution gesture picture, wherein the super-resolution model comprises an edge information unit and a super-resolution unit, and performing edge information detection on the low-resolution gesture picture based on the edge information unit to obtain edge characteristics and a first edge information picture; after feature extraction is carried out on the hand skeleton key point information based on a convolution block, merging the hand skeleton key point information into the super-resolution unit, and carrying out super-resolution processing on the low-resolution gesture picture based on the super-resolution unit to obtain picture features;
determining a loss function, inputting the edge features and the picture features into a multi-scale feature fusion module, and carrying out feature fusion of three different scales to obtain a super-resolution picture; obtaining a second edge information graph based on a high-resolution picture, obtaining an edge loss value based on the first edge information graph and the second edge information graph, obtaining a content loss value based on the high-resolution picture and the super-resolution picture, and obtaining a loss function by carrying out weighted summation based on the edge loss value and the content loss value;
and training and utilizing the model, carrying out one-time back propagation based on the loss function, updating parameters of the super-resolution model, repeating iteration until the parameters are converged, finishing the training of the super-resolution model, inputting a low-resolution gesture picture needing super resolution and corresponding hand skeleton key point information into the super-resolution model, and obtaining a finally generated super-resolution gesture picture through one-time forward propagation, thereby finishing super-resolution of the low-resolution gesture picture.
Further, the data preprocessing method comprises the following steps: and 4 times of downsampling is carried out on the high-resolution gesture image through a bilinear interpolation algorithm, so that a low-resolution gesture image is obtained.
Further, the hand skeletal keypoint information includes, but is not limited to: the skeleton key point coordinates are obtained by the following steps:
and acquiring skeleton key point coordinates of the high-resolution gesture picture and the low-resolution gesture picture by using an OpenPose tool, wherein positions where the skeleton key point coordinates cannot be acquired are represented by (-1, -1) coordinates.
Further, the acquiring method is adopted to acquire 21 skeleton key point coordinates, distances between each skeleton key point coordinate and each pixel coordinate on the corresponding low-resolution picture are calculated respectively, thermodynamic diagram data corresponding to each skeleton key point coordinate are obtained, and thermodynamic diagram data corresponding to the 21 skeleton key point coordinates are overlapped in picture depth to obtain input data of the skeleton key point coordinates.
Further, the first edge information map is an edge information map with the same number of pixels as the high-resolution picture.
Further, in the information processing, feature extraction is performed on the skeleton key point information, three 3×3 convolution blocks are used for feature extraction, and features acquired by each convolution block are sent to a super-resolution unit to be fused with picture features of each layer; the super-resolution unit adopts two-dimensional convolution with a convolution kernel of 3 multiplied by 3 and comprises four 3 multiplied by 3 convolution blocks; the edge information unit comprises four layers, wherein the first layer adopts 3×3 convolution, and the other layers adopt multi-scale residual blocks.
Further, the multi-scale residual block includes, but is not limited to: two convolution blocks with a convolution kernel size of 3, two convolution blocks with a convolution kernel size of 5, and a dimension-reducing convolution layer of 1x 1.
Further, the super-resolution unit and the edge information unit also form a dynamic multi-task structure, and the dynamic multi-task structure is used for synchronously carrying out the super-resolution unit and the edge information unit, assisting the edge feature acquired by the edge information unit in enhancing the super-resolution unit, and adding the skeleton key point coordinates as auxiliary features.
Further, in the model construction, when three features with different scales are fused, 16 residual groups containing residual modules are adopted to extract the features with the three different scales, and then the fusion is carried out.
Further, in the model construction, performing one back propagation specifically includes:
and carrying out gradient back propagation on the loss function by using an Adam optimizer, updating parameters of the super-resolution model until the loss of the final model is no longer changed, and completing model training.
The invention discloses the following technical effects:
the system generates the high-resolution picture corresponding to the low-resolution picture through one-time operation, has high efficiency, performs special processing on the gesture picture, is more suitable for processing the gesture picture, can bear preprocessing work of tasks such as sign language recognition and gesture recognition, and improves quality and efficiency of related tasks of the gesture. The method is more focused on the super-resolution task of the gesture picture, and a clearer hand super-resolution result is obtained.
According to the invention, a dynamic multitasking method is used for linking the edge information detection task and the super-resolution task, so that the model learns the contribution degree of the image edge information to the super-resolution task from the data, and the better image characteristic representation is obtained.
The super-resolution unit and the edge information unit are of a plug-and-play unit structure, which means that the super-resolution unit and the edge information unit are replaced by any network with the same task and more excellent performance, and the performance of the network is better improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a super-resolution model according to the present invention;
fig. 2 is a schematic structural diagram of a multi-scale feature fusion module.
Detailed Description
Various exemplary embodiments of the invention will now be described in detail, which should not be considered as limiting the invention, but rather as more detailed descriptions of certain aspects, features and embodiments of the invention.
It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. In addition, for numerical ranges in this disclosure, it is understood that each intermediate value between the upper and lower limits of the ranges is also specifically disclosed. Every smaller range between any stated value or stated range, and any other stated value or intermediate value within the stated range, is also encompassed within the invention. The upper and lower limits of these smaller ranges may independently be included or excluded in the range.
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although only preferred methods and materials are described herein, any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention. All documents mentioned in this specification are incorporated by reference for the purpose of disclosing and describing the methods and/or materials associated with the documents. In case of conflict with any incorporated document, the present specification will control.
It will be apparent to those skilled in the art that various modifications and variations can be made in the specific embodiments of the invention described herein without departing from the scope or spirit of the invention. Other embodiments will be apparent to those skilled in the art from consideration of the specification of the present invention. The specification and examples are exemplary only.
As used herein, the terms "comprising," "including," "having," "containing," and the like are intended to be inclusive and mean an inclusion, but not limited to.
The "parts" in the present invention are all parts by mass unless otherwise specified.
The invention relates to a gesture picture super-resolution method based on dynamic multitasking. The super-resolution task and the edge information detection task are constructed, the edge information is used for assisting in enhancing the super-resolution task, attention around a hand is enhanced by adopting the hand key point information enhancing model, and the obtained features are subjected to feature fusion at different scales, so that the performance of the model is improved.
Example 1
According to the technical scheme, a multi-task network for simultaneously performing superdivision tasks and edge detection tasks is constructed by adopting dynamic multi-tasks, the characteristics of an edge information unit are assisted to enhance the characteristics of a super-resolution unit, and hand skeleton key point information is added as an auxiliary characteristic, so that the attention of the network is focused on a hand related area, and finally a gesture super-resolution picture with a sharper edge is generated.
A multitasking gesture picture super-resolution method based on picture edge information comprises the following specific steps:
a. simultaneously, a plurality of low-resolution (LR) gesture pictures and corresponding hand skeleton key point information are input into the network provided by the invention. The gesture image is subjected to processing such as feature coding, feature fusion, feature decoding and the like by a super-resolution unit and an edge information unit, hand skeleton key point information is fused into the super-resolution unit after feature extraction is performed through a convolution block each time, and finally the super-resolution unit generates an image feature f image The edge information unit generates an edge feature f edge . At the same time, a picture edge information map with the same size as the high-resolution picture is generated at the same step.
b. The two features generated in step a are sent together to a multi-scale feature fusion module, which ultimately generates a super-resolution output SR. And (c) calculating the L1 loss by the obtained SR and the real super-resolution picture HR, and simultaneously calculating the edge loss by the edge information picture generated in the step a and the real edge information picture. Both losses are then back propagated once to update the network parameters. And (5) carrying out iterative updating for a plurality of times until parameters in the network are converged to obtain a final model.
c. B, inputting a low-resolution gesture picture with super resolution and corresponding hand key point information (obtained through OpenPose) into the model obtained in the step b, and obtaining the finally generated super-resolution gesture picture through forward propagation once.
Preferably, the super-resolution units in a above all use conventional 3x3 convolution, i.e. the convolution kernel is a two-dimensional convolution of 3x3 size. The edge information uses multi-scale residual blocks for the layers except for the first layer which uses 3x3 convolution. The hand key point information is also extracted by adopting a 3x3 convolution block.
The specific steps of the step a are as follows:
a-1, data preprocessing: the existing high-resolution HR gesture image data (with the image size of 512x 512) is subjected to 4 times downsampling through a bilinear interpolation algorithm, so that input LR image data of a network with the size of 128x128 is obtained. The HR and LR pictures are then hand keypoint estimated using the openelse tool, where the positions of points that cannot be estimated are represented using (-1, -1). Obtaining 21 skeleton key points of the hand in total, then obtaining thermodynamic diagram data related to each key point by calculating the distance between each skeleton key point coordinate and each pixel coordinate on the picture, and obtaining skeleton key point input data by superposing the thermodynamic diagram data of the 21 key points on the depth of the picture. The overall structure of the network is shown in fig. 1.
a-2, model construction:
a-2-1. Hand key points adopt three convolution blocks of 3x3 to extract the characteristics, and the characteristics after each convolution block are sent to a super resolution unit to be fused with the picture characteristics of each layer.
The super-resolution unit is also subjected to feature extraction by four 3x3 convolution blocks, except for the last layer, the output of each layer is fused with the hand key point feature, and the output of each layer is fused with the feature obtained by the edge information unit by a dynamic multitasking method.
a-2-3. The first layer of the edge information unit is a 3x3 convolutional block, and the three following layers are all multi-scale residual blocks with residuals. Each multi-scale residual block consists of two convolution blocks with a convolution kernel size of 3 and two convolution blocks with a convolution kernel size of 5, and a dimension-reducing convolution layer of 1x 1.
The mathematical representation of process a-2 is:
Figure BDA0003059415660000081
Figure BDA0003059415660000082
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA0003059415660000083
representing the characteristics obtained by the ith convolution block; />
Figure BDA0003059415660000084
An ith convolution block representing a hand keypoint information unit; i K And indicating the inputted hand key point information.
Figure BDA0003059415660000085
Figure BDA0003059415660000086
/>
Figure BDA0003059415660000087
Figure BDA0003059415660000088
Wherein the method comprises the steps of
Figure BDA0003059415660000089
Representing the characteristics obtained by the ith convolution block in the super-resolution unit; />
Figure BDA00030594156600000810
Representing an ith convolution block in the super-resolution unit; />
Figure BDA00030594156600000811
Representing the characteristics obtained by the ith layer of the edge information unit;/>
Figure BDA00030594156600000812
a convolution block representing a first layer of edge information units; MSRB i (·) represents the ith multi-scale residual block of the edge information unit; θ i Representing parameters in dynamic multitasking.
The output characteristic of the fourth layer in the super resolution unit is denoted as f image Simultaneously, the output of each layer of the edge information unit is connected in a residual mode, and then the characteristic fusion block is adopted to carry out characteristic fusion f edge . The above process is mathematically expressed as:
Figure BDA0003059415660000091
Figure BDA0003059415660000092
where Fusion (·) represents a feature Fusion module, which consists of a 1x1 convolution.
The specific steps of the step b are as follows:
b-1. Two features obtained in the formula (7) and the formula (8) are sent to a multi-scale feature fusion module to perform feature fusion of different scales, and each feature is extracted by using a Residual Group (RG) containing 16 Residual modules, wherein the total number of the features is 3. Each residual block contains two 3x3 convolutions. The multi-scale feature fusion module is shown in fig. 2.
b-2. Feature fusion layer of edge information element in step a, except generating f edge Edge information maps Edge' of the same size as the high resolution map (HighResolution, HR) are also generated. Obtaining Edge loss through a formula (9), and marking the Edge loss as l E . Through the multi-scale feature fusion module, the network generates a Super Resolution picture (SR). Calculating the L1 loss between the generated SR picture and the original HR picture according to the formula (10), and recording as L I . Based onThe final loss function of the model expressed above is:
l E =||I SR -I HR || 1 formula (9)
l I =||Edge'-Edge|| 1 Formula (10)
L loss =l I +λ*l E Formula (11)
Wherein I is SR Representing a super-resolution picture generated by the model; i HR Representing an original high-resolution picture; lambda is an over-parameter controlling both losses, set to 0.5; l (L) loss I.e. the total loss function of the model.
Example 2
The invention relates to a gesture picture super-resolution method based on dynamic multitasking edge information and hand skeleton key point assistance. As shown in fig. 1, the specific implementation is as follows:
the network inputs a low resolution picture of RBG type, a picture size of 128x128, and three channels, so the input picture type is 3x128x128. The number of the hand key points corresponding to each picture is 21, the distance between each key point position and each pixel position of the picture is calculated to form a numpy distance matrix, and therefore the hand key point information corresponding to each picture is a numpy type three-dimensional matrix with the size of 21x128x 128.
Extracting features of the key point information of the hand by adopting 3 convolution blocks with the convolution kernel size of 3; 4 convolution blocks with the convolution kernel size of 3 are adopted to perform feature extraction on an input LR picture in a super-resolution unit; in the edge information unit, a convolution block with a convolution kernel size of 3 is adopted in the first layer, and then 3 multi-scale residual blocks are adopted, wherein each residual block comprises 2 convolutions with the convolution kernel size of 3 and two convolutions with the convolution kernel size of 5.
The outputs of the super resolution unit and the edge information unit are up sampled by 4 times to obtain the characteristic f with the size of 512x512 image And f edge The method comprises the steps of carrying out a first treatment on the surface of the At the same time, the Edge information unit generates an Edge information graph Edge' of 512x512 size. Edge corresponding to the obtained Edge' and the existing high-resolution pictureThe edge loss is calculated by the formula (10) according to the edge picture.
f image And f edge Is simultaneously fed into a multi-scale feature fusion module, wherein the multi-scale feature fusion module comprises three features with different scales, the same scale represents a residual group module with the same number, and each residual group comprises 16 residual blocks formed by 3x3 convolution.
And finally outputting the super-resolution picture SR by the multi-scale feature fusion module, and calculating the L1 loss of the picture through a formula (9).
The multiple sets of picture-hand key point information pairs are sent into the network of the invention, and then the parameters of the model are updated by using the Adam optimizer to perform gradient back propagation on the final loss function until the loss of the final model is unchanged, namely, the model training is considered to be completed.
The invention relates to a gesture picture super-resolution method based on dynamic multitasking. The super-resolution task and the edge information detection task are constructed, the edge information is used for assisting in enhancing the super-resolution task, attention around a hand is enhanced by adopting the hand key point information enhancing model, and the obtained features are subjected to feature fusion at different scales, so that the performance of the model is improved.
The above embodiments are only illustrative of the preferred embodiments of the present invention and are not intended to limit the scope of the present invention, and various modifications and improvements made by those skilled in the art to the technical solutions of the present invention should fall within the protection scope defined by the claims of the present invention without departing from the design spirit of the present invention.

Claims (6)

1. A multitasking gesture picture super-resolution method based on picture edge information is characterized in that: the method comprises the following steps:
information acquisition, namely acquiring a plurality of high-resolution gesture pictures, and preprocessing data of the high-resolution gesture pictures to acquire low-resolution gesture pictures and skeletal key point information of hands;
information processing, namely constructing a super-resolution model based on the low-resolution gesture picture, wherein the super-resolution model comprises an edge information unit and a super-resolution unit, and performing edge information detection on the low-resolution gesture picture based on the edge information unit to obtain edge characteristics and a first edge information picture; extracting features of the hand skeleton key point information based on the convolution block to obtain hand skeleton key point features; performing super-resolution processing on the low-resolution gesture image based on the super-resolution unit to obtain image characteristics;
in the information processing, feature extraction is carried out on the hand skeleton key point information, and three are adopted
Figure QLYQS_1
The convolution blocks extract features, and the hand skeleton key point features acquired by each convolution block are fused with the output of each layer in the super-resolution unit;
the super-resolution unit adopts convolution kernel as
Figure QLYQS_2
Comprises four +.>
Figure QLYQS_3
A convolution block; the outputs of the first three layers of the super-resolution unit are respectively fused with hand skeleton key point features;
the edge information unit comprises four layers, wherein the first layer adopts
Figure QLYQS_4
Convolving, wherein the rest layers adopt multi-scale residual blocks;
the multi-scale residual block includes, but is not limited to: two convolution blocks with the convolution kernel size of 3, two convolution blocks with the convolution kernel size of 5 and a dimension-reducing convolution layer with the convolution kernel size of 1x 1;
the super-resolution unit and the edge information unit also form a dynamic multi-task structure, the dynamic multi-task structure is used for synchronously carrying out the super-resolution unit and the edge information unit, the edge characteristics acquired by the edge information unit are used for assisting in enhancing the super-resolution unit, and the hand skeleton key point coordinates are added as auxiliary characteristics;
in the model construction, when three features with different scales are fused, firstly, 16 residual groups containing residual modules are adopted to extract the three features with different scales, and then fusion is carried out;
determining a loss function, inputting the edge features and the picture features into a multi-scale feature fusion module, and carrying out feature fusion of three different scales to obtain a super-resolution picture; obtaining a second edge information graph based on a high-resolution picture, obtaining an edge loss value based on the first edge information graph and the second edge information graph, obtaining a content loss value based on the high-resolution picture and the super-resolution picture, and obtaining a loss function by carrying out weighted summation based on the edge loss value and the content loss value;
and training and utilizing the model, carrying out one-time back propagation based on the loss function, updating parameters of the super-resolution model, repeating iteration until the parameters are converged, finishing the training of the super-resolution model, inputting a low-resolution gesture picture needing super resolution and corresponding hand skeleton key point information into the super-resolution model, and obtaining a finally generated super-resolution gesture picture through one-time forward propagation, thereby finishing super-resolution of the low-resolution gesture picture.
2. The picture edge information-based multitasking gesture picture super-resolution method of claim 1, characterized by: the data preprocessing method comprises the following steps: and 4 times of downsampling is carried out on the high-resolution gesture image through a bilinear interpolation algorithm, so that a low-resolution gesture image is obtained.
3. The picture edge information-based multitasking gesture picture super-resolution method of claim 1, characterized by: the hand skeletal keypoint information includes, but is not limited to: the skeleton key point coordinates are obtained by the following steps:
and acquiring skeleton key point coordinates of the high-resolution gesture picture and the low-resolution gesture picture by using an OpenPose tool, wherein positions where the skeleton key point coordinates cannot be acquired are represented by (-1, -1) coordinates.
4. A multitasking gesture picture super-resolution method based on picture edge information as claimed in claim 3, characterized by: and acquiring 21 skeleton key point coordinates by adopting the acquisition method, respectively calculating the distance between each skeleton key point coordinate and each pixel coordinate on the corresponding low-resolution picture, obtaining thermodynamic diagram data corresponding to each skeleton key point coordinate, and superposing the thermodynamic diagram data corresponding to the 21 skeleton key point coordinates on the picture depth to obtain the input data of the skeleton key point coordinates.
5. The picture edge information-based multitasking gesture picture super-resolution method of claim 1, characterized by: the first edge information graph is an edge information graph with the same pixel number as that of the high-resolution picture.
6. The picture edge information-based multitasking gesture picture super-resolution method of claim 1, characterized by: in the model construction, performing one back propagation specifically includes:
and carrying out gradient back propagation on the loss function by using an Adam optimizer, updating parameters of the super-resolution model until the loss of the final model is no longer changed, and completing model training.
CN202110508733.6A 2021-05-11 2021-05-11 Multitasking gesture picture super-resolution method based on picture edge information Active CN113240584B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110508733.6A CN113240584B (en) 2021-05-11 2021-05-11 Multitasking gesture picture super-resolution method based on picture edge information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110508733.6A CN113240584B (en) 2021-05-11 2021-05-11 Multitasking gesture picture super-resolution method based on picture edge information

Publications (2)

Publication Number Publication Date
CN113240584A CN113240584A (en) 2021-08-10
CN113240584B true CN113240584B (en) 2023-04-28

Family

ID=77131340

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110508733.6A Active CN113240584B (en) 2021-05-11 2021-05-11 Multitasking gesture picture super-resolution method based on picture edge information

Country Status (1)

Country Link
CN (1) CN113240584B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115083016A (en) * 2022-06-09 2022-09-20 广州紫为云科技有限公司 Monocular camera-based small-target-oriented hand space interaction method and device
CN117037221B (en) * 2023-10-08 2023-12-29 腾讯科技(深圳)有限公司 Living body detection method, living body detection device, computer equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107610140A (en) * 2017-08-07 2018-01-19 中国科学院自动化研究所 Near edge detection method, device based on depth integration corrective networks
CN111062872A (en) * 2019-12-17 2020-04-24 暨南大学 Image super-resolution reconstruction method and system based on edge detection
CN112767427A (en) * 2021-01-19 2021-05-07 西安邮电大学 Low-resolution image recognition algorithm for compensating edge information

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI624804B (en) * 2016-11-07 2018-05-21 盾心科技股份有限公司 A method and system for providing high resolution image through super-resolution reconstrucion
CN109345449B (en) * 2018-07-17 2020-11-10 西安交通大学 Image super-resolution and non-uniform blur removing method based on fusion network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107610140A (en) * 2017-08-07 2018-01-19 中国科学院自动化研究所 Near edge detection method, device based on depth integration corrective networks
CN111062872A (en) * 2019-12-17 2020-04-24 暨南大学 Image super-resolution reconstruction method and system based on edge detection
CN112767427A (en) * 2021-01-19 2021-05-07 西安邮电大学 Low-resolution image recognition algorithm for compensating edge information

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
Multi-Stage Feature Fusion Network for Video Super-Resolution;Huihui Song et al.;《IEEE Transactions on Image Processing》;20210209;第30卷;全文 *
一种基于边缘自适应插值的超分辨率重建方法;李翊凡;《福建电脑》;20101125(第11期);全文 *
卷积稀疏表示图像融合与超分辨率联合实现;杨默远等;《光学技术》;20200315(第02期);全文 *
基于二进小波变换的边缘保持图像插值算法;马社祥等;《光电子.激光》;20050715(第07期);全文 *
基于多尺度特征映射网络的图像超分辨率重建;段然等;《浙江大学学报(工学版)》;20190731;第53卷(第07期);全文 *
基于边缘检测的超分辨率重构方法;蔡秋荣等;《计算机工程》;20110605(第11期);全文 *
边缘修正的多尺度卷积神经网络重建算法;程德强等;《激光与光电子学进展》;20180328(第09期);全文 *

Also Published As

Publication number Publication date
CN113240584A (en) 2021-08-10

Similar Documents

Publication Publication Date Title
CN113240584B (en) Multitasking gesture picture super-resolution method based on picture edge information
CN111626927B (en) Binocular image super-resolution method, system and device adopting parallax constraint
Liu et al. Effective image super resolution via hierarchical convolutional neural network
CN114882524A (en) Monocular three-dimensional gesture estimation method based on full convolution neural network
CN112184547A (en) Super-resolution method of infrared image and computer readable storage medium
CN117788296A (en) Infrared remote sensing image super-resolution reconstruction method based on heterogeneous combined depth network
CN117612204A (en) Construction method and system of three-dimensional hand gesture estimator
CN111311732B (en) 3D human body grid acquisition method and device
Zoetgnande et al. Edge focused super-resolution of thermal images
Wang et al. Super-resolving face image by facial parsing information
CN117078518A (en) Three-dimensional point cloud superdivision method based on multi-mode iterative fusion
Sun et al. A rapid and accurate infrared image super-resolution method based on zoom mechanism
CN115578260B (en) Attention method and system for directional decoupling of image super-resolution
Bai et al. Restoration of turbulence-degraded images based on deep convolutional network
CN112598581B (en) Training method and image generation method of RDN super-resolution network
Li et al. V-ShadowGAN: generative adversarial networks for removing and generating shadows associated with vehicles based on unpaired data
Song et al. Spatial-aware dynamic lightweight self-supervised monocular depth estimation
Wen et al. Mrft: Multiscale recurrent fusion transformer based prior knowledge for bit-depth enhancement
Liu et al. Remote sensing image super-resolution via dilated convolution network with gradient prior
CN110706167A (en) Fine completion processing method and device for remote sensing image to-be-repaired area
CN115272083B (en) Image super-resolution method, device, equipment and medium
Shi et al. Dual-Branch Multiscale Channel Fusion Unfolding Network for Optical Remote Sensing Image Super-Resolution
Xu et al. Color Guided Depth Map Super-Resolution with Nonlocla Autoregres-Sive Modeling
Wang et al. Unbiased feature position alignment for human pose estimation
CN117496091B (en) Single-view three-dimensional reconstruction method based on local texture

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant