WO2020048484A1

WO2020048484A1 - Super-resolution image reconstruction method and apparatus, and terminal and storage medium

Info

Publication number: WO2020048484A1
Application number: PCT/CN2019/104388
Authority: WO
Inventors: 方璐; 戴琼海; 李广涵
Original assignee: 清华-伯克利深圳学院筹备办公室
Priority date: 2018-09-04
Filing date: 2019-09-04
Publication date: 2020-03-12
Also published as: CN109191554B; CN109191554A

Abstract

Disclosed in embodiments of the present application are a super-resolution image reconstruction method and device, and a terminal and a storage medium. The super-resolution image reconstruction method comprises: obtaining a first image of a target region in a current region at a first time, and generating at least one three-dimensional model corresponding to at least one first target object in the target region according to the first image, wherein the first target object is a first non-rigid target object; obtaining a second image of the current region at a second time after the first time, extracting a third image corresponding to the target region from the second image, and updating at least one three-dimensional model based on the third image; and mapping at least one updated three-dimensional model into at least one two-dimensional image, and splicing the at least one two-dimensional image into the second image to obtain a target super-resolution image.

Description

Super-resolution image reconstruction method, device, terminal and storage medium

This application claims priority from a Chinese patent application filed with the Chinese Patent Office on September 4, 2018, with application number 201811027057.5, the entire contents of which are incorporated herein by reference.

Technical field

The embodiments of the present application relate to the field of computational vision technology, for example, to a method, a device, a terminal, and a storage medium for super-resolution image reconstruction.

Background technique

The accuracy of computer vision algorithms depends on the imaging quality of the input image or video. Therefore, the resolution of the input image or video needs to be improved. Usually the scene corresponding to an image or video contains two parts, static and dynamic, and the dynamic part contains rigid deformed objects and non-rigid deformed objects. Among them, since the shape and attitude of a rigidly deformed object will not change over time, you can directly use any frame of high-definition images to improve its resolution; for non-rigidly deformed objects, because their own shape and attitude will change over time, they cannot Increase the resolution of any frame of high-resolution images. Therefore, the difficulty in improving the accuracy of computer vision algorithms is to improve the resolution of non-rigidly deformed objects.

In the related art, there are mainly two methods for improving the resolution of a specific target object (ie, super-resolution reconstruction), one is a single-image super-resolution algorithm, and the other is a super-resolution algorithm based on a reference image. Among them, when the input image and the training set are not similar, the single-image super-resolution algorithm cannot do a good super-resolution reconstruction of low-resolution input images with severe loss of detail, and all high-frequency details generated by this method are all low-frequency The information is generated and the authenticity is not high. The reference image-based super-resolution algorithm needs to input the depth map of high-definition images. Although it can better complement high-frequency details, in practical applications, it is difficult to obtain high-definition depth images, and the algorithm is not universal.

Summary of the Invention

The present application provides a super-resolution image reconstruction method, device, terminal, and storage medium to improve the resolution of non-rigid target objects in a low-resolution global image sequence.

In a first aspect, an embodiment of the present application provides a super-resolution image reconstruction method. The method includes: acquiring a first image of a target region in a current region at a first moment, and generating and the first image according to the first image. At least one three-dimensional model corresponding to at least one first target object in the target region, wherein the first target object is a first non-rigid target object; obtaining a second image of the current region at a second time after the first time Extracting a third image corresponding to the target area from the second image, and updating the at least one three-dimensional model based on the third image; mapping the updated at least one three-dimensional model to at least one two Dimensional image, and stitching the at least one two-dimensional image into the second image to obtain a target super-resolution image.

In a second aspect, an embodiment of the present application further provides a super-resolution image reconstruction apparatus, the apparatus includes: a three-dimensional model generating module configured to acquire a first image of a target region in a current region at a first moment, and The first image generates at least one three-dimensional model corresponding to at least one first target object in the target area, wherein the first target object is a first non-rigid target object; a three-dimensional model update module is configured to obtain the first target object; A second image of the current area at a second time after the first time, extracting a third image corresponding to the target area from the second image, and updating the at least one three-dimensional model based on the third image; The super-resolution image acquisition module is configured to map the updated at least one three-dimensional model into at least one two-dimensional image, and stitch the at least one two-dimensional image into the second image to obtain a target super-resolution image.

According to a third aspect, an embodiment of the present application further provides a super-resolution image reconstruction terminal. The terminal includes: one or more processors; and a storage device configured to store one or more programs. The program is executed by the one or more processors, so that the one or more processors implement the super-resolution image reconstruction method according to any embodiment of the present application.

According to a fourth aspect, an embodiment of the present application further provides a computer-readable storage medium on which a computer program is stored. When the program is executed by a processor, the super-resolution image reconstruction method according to any one of the embodiments of the present application is implemented.

BRIEF DESCRIPTION OF THE DRAWINGS

1 is a flowchart of a super-resolution image reconstruction method according to an embodiment of the present application;

2 is a schematic structural diagram of a super-resolution image reconstruction device according to an embodiment of the present application;

FIG. 3 is a schematic structural diagram of a super-resolution image reconstruction terminal in another embodiment of the present application.

detailed description

The following describes the present application in detail with reference to the drawings and embodiments. It can be understood that the specific embodiments described herein are only used to explain the present application, rather than limiting the present application. In addition, it should be noted that, for convenience of description, only some parts related to the present application are shown in the drawings instead of the entire structure.

FIG. 1 is a flowchart of a super-resolution image reconstruction method according to an embodiment of the present application. This embodiment is applicable to a case where the resolution of a non-rigid target object in a low-resolution global image sequence needs to be improved. The image reconstruction apparatus performs the operation. As shown in FIG. 1, the method in this embodiment includes steps S110 to S130.

In step S110, a first image of a target region in the current region at a first moment is obtained, and at least one three-dimensional model corresponding to at least one first target object in the target region is generated according to the first image, where the first target object Is the first non-rigid target object.

In this embodiment, the target area may be an area containing at least one first target object. The first target object may be a first non-rigid target object. The non-rigid target object is an object whose shape and attitude can change with time. For example, a non-rigid target object may be a pedestrian. The first image acquired at the first moment is a local image corresponding to the target region of the current region. The first image can be acquired by a camera with a relatively small field of view. Correspondingly, the sharpness of the image is relatively high.

In an embodiment, at least one two-dimensional image corresponding to the at least one first target object may be extracted from the first image. Based on the correspondence between the two-dimensional image and the three-dimensional model, the at least one two-dimensional image may be generated. At least one three-dimensional model corresponding to at least one first target object.

In step S120, a second image of the current region at a second time after the first time is acquired, a third image corresponding to the target region is extracted from the second image, and at least one three-dimensional model is updated based on the third image.

The second image acquired at a second time after the first time is a global image corresponding to the current region, and the second image can use a relatively large field of view (compared to the camera that acquired the first image). The camera acquires, correspondingly, the sharpness of the second image is relatively low. In this embodiment, in an embodiment, the resolution of the camera that acquires the first image is the same as the resolution of the camera that acquires the second image, that is, the size of the first image and the second image are the same.

Since the second image is acquired after the first image, the shape and posture of the first target object corresponding to the second image will be updated relative to the target object corresponding to the first image, so the second image can be used Update at least one three-dimensional model obtained by using the first image. In an embodiment, a third image corresponding to the target area may be extracted from the second image, and at least one may be updated with at least one two-dimensional image corresponding to at least one first target object extracted from the third image. Three-dimensional model.

In step S130, the updated at least one three-dimensional model is mapped into at least one two-dimensional image, and the at least one two-dimensional image is stitched into a second image to obtain a target super-resolution image.

Among them, super-resolution means increasing the resolution of the original image by means of hardware or software, and super-resolution means the image after increasing the resolution. There is a corresponding mapping relationship between the three-dimensional model and the two-dimensional image. After obtaining the updated at least one three-dimensional model, the mapping relationship can be used to map the at least one three-dimensional model into at least one two-dimensional image. The sharpness of the at least one two-dimensional image obtained by the mapping is equivalent to that of the first image, and is higher than the sharpness of the second image. The at least one two-dimensional image obtained by the mapping is stitched into the second image by using an image stitching method , Using the at least one two-dimensional image with high definition to replace the corresponding low-resolution portion of the second image, and finally obtain the target super-resolution image.

It should be noted here that for other parts of the second image that do not use at least one two-dimensional image for image stitching, the corresponding scenes are mainly static scenes and rigidly deformed objects. Although rigidly deformed objects will move with time, but Since the shape and posture of the rigid deformation object will not change with time, in order to improve the overall resolution of the second image, the static scene of the second image and the corresponding part of the rigid deformation object in the first image can be directly stitched to The corresponding position of the second image to increase its resolution. In addition, the at least one three-dimensional model generated using the first image includes both the shape and posture information of the at least one first target object and the texture information of the at least one first target object.

Exemplarily, the system configured to obtain the first image and the second image may be a rotatable high-definition monitoring PTZ system. In an embodiment, the system may include a first-scale camera, a second-scale camera, and a rotatable PTZ There are three parts. Among them, the first-scale camera is installed on the rotatable gimbal, and can rotate following the rotation of the gimbal. The first scale camera may be a small field of view camera, configured to acquire a first image of the target area in the current area, and the second scale camera may be a large field of view camera, configured to monitor the current area in real time and be able to continuously acquire the second area of the current area. image. And the resolution of the first-scale camera may be the same as that of the second-scale camera, and the size of the first image may be the same as that of the second image. Accordingly, the sharpness of the first image is higher than that of the second image. Clarity. Based on this, the resolution of the second image acquired at the second time may be improved by using the first image acquired at the first time based on the above scheme.

The super-resolution image reconstruction method provided by this embodiment obtains a first image of a target region in a current region at a first moment, and generates at least one three-dimensional model corresponding to at least one first target object in the target region according to the first image. Where the first target object is a first non-rigid target object, a second image of the current region at a second time after the first time is obtained, a third image corresponding to the target region is extracted from the second image, and based on the first Three images update at least one three-dimensional model, map the updated at least one three-dimensional model to at least one two-dimensional image, and stitch at least one two-dimensional image into the second image to obtain the target super-resolution image, which improves the low-resolution global image Resolution of non-rigid target objects in the sequence.

Based on the above embodiment, generating at least one three-dimensional model corresponding to at least one first target object in the target area according to the first image includes: performing target object detection on the first image based on a preset target object detection method to obtain At least one first partial image corresponding to at least one first target object; using a preset two-dimensional pose point estimation method, two-dimensional pose point estimation is performed on each first partial image separately, and Each first two-dimensional pose point corresponding to the target object; For each first target object, the initial three-dimensional model is optimized using each first two-dimensional pose point to obtain a three-dimensional model corresponding to the first target object; for each three-dimensional model , Using the texture information in the corresponding first partial image to render the three-dimensional model to update the three-dimensional model.

In this embodiment, before generating at least one three-dimensional model, at least one two-dimensional image corresponding to at least one target object may be acquired. In an embodiment, a first target object in the first image may be detected by using a preset target object detection method to obtain at least one first partial image corresponding to at least one first target object. A first target object corresponds to a first partial image, and each first partial image can be labeled from the first image by using a square area. In this embodiment, the preset target object detection method may be a faster-rcnn detection algorithm, which has a high detection accuracy and a fast operation speed. The faster-rcnn detection algorithm uses a deep learning method to propose an RPN network structure. The convolutional neural network outputs two branches, one branch is the parameters corresponding to all candidate regions: the center coordinates x and y of the region, and the length and width of the region w. , H; The other branch is the probability that the candidate area is the first target object. Based on the two branches of the convolutional neural network output, the specific position of the at least one first target object in the first image can be determined to determine the position of the at least one first partial image.

After obtaining at least one first partial image, the posture information of each first target object may be determined using each first partial image. In an embodiment, a preset two-dimensional pose point estimation method may be used to separately perform two-dimensional pose point estimation on each first partial image to obtain each first two-dimensional pose point corresponding to each first target object. . The preset two-dimensional pose point estimation method may be Openpose, which uses deep learning methods to predict each first partial image separately to obtain the two-dimensional pose points of all first target objects in each first partial image. , And then divide all the two-dimensional pose points according to the characteristics of the first target object, and finally determine the two-dimensional pose points corresponding to each first target object. In this embodiment, since a first partial image includes only a first target object, the first two-dimensional pose points obtained by predicting each first partial image by using Openpose are corresponding to each first target object. The first two-dimensional pose point.

In this embodiment, before generating at least one three-dimensional model corresponding to at least one first target object, an initial three-dimensional model can be constructed by using initialization parameters. For each first target object, the decibels use each first two-dimensional attitude point to initialize the The three-dimensional model is optimized to obtain a three-dimensional model corresponding to the first target object. The three-dimensional model obtained by using the above method does not include the texture information of the first target object, and the two-dimensional image obtained by mapping the three-dimensional model does not include color information. Therefore, for each three-dimensional model, the three-dimensional model can be rendered using the texture information in the corresponding first partial image to update the three-dimensional model, so that the updated three-dimensional model contains both the shape and posture information of the first target object and the first Color information of the target object.

In one embodiment, for each first target object, the initial three-dimensional model is optimized by using each first two-dimensional pose point to obtain a three-dimensional model corresponding to the first target object. The method includes: The initial shape factor matrix β and the initial attitude angle vector θ are used to construct an initial three-dimensional model. The initial camera model parameter matrix K is used to perform a two-dimensional mapping on the initial three-dimensional model to obtain the initial two-dimensional attitude points corresponding to the initial three-dimensional model. A target object: Calculate a shape factor matrix β ₁ and a first attitude angle vector θ ₁ that satisfy a preset condition, where the preset condition is a pair of matching points of each first two-dimensional pose point and each initial two-dimensional pose point. The sum of the differences between them is the smallest, and the shape factor matrix β _{1 is the} smallest; the initial three-dimensional model is optimized by using the shape factor matrix β ₁ and the first attitude angle vector θ ₁ to obtain a three-dimensional model corresponding to the first target object.

Generally, a 3D model consists of dense point clouds in a 3D space. In this embodiment, an initial three-dimensional model may be constructed based on a preset three-dimensional model construction method, an initial shape factor matrix β, and an initial attitude angle vector θ. In one embodiment, an SMPLily algorithm may be used to construct the initial three-dimensional model. Taking the first target object as a human body as an example, the SMPLily algorithm constructs a three-dimensional model using a three-dimensional model of the SMPL human body, a shape factor matrix β, and an attitude angle vector θ. The three-dimensional human body model includes 6890 three-dimensional points and 24 three-dimensional joint points, of which 24 The three-dimensional joint points are used to control the position of the entire three-dimensional model point cloud, and then control the attitude of the three-dimensional model. The shape factor matrix β controls the characteristics of the three-dimensional model such as height, fat, and thinness. The angle of rotation of the position of this point in the model. Each of the 6,890 three-dimensional points in the three-dimensional model can be represented by a linear weighted average of 24 attitude angle vectors. After using the SMPLily algorithm, the initial form factor matrix β, and the initial attitude angle vector θ to obtain the initial three-dimensional model, the initial camera model parameter matrix K can be used to perform two-dimensional mapping on the 24 three-dimensional joint points in the initial three-dimensional model. Each initial two-dimensional pose point corresponding to the three-dimensional model.

Since the initial three-dimensional model is determined using the initial shape factor matrix β and the initial attitude angle vector θ, in order to obtain the three-dimensional model corresponding to the first target object, the shape factor matrix β ₁ and The initial attitude angle vector θ ₁ is optimized by using the shape factor matrix β ₁ and the initial attitude angle vector θ ₁ to obtain a three-dimensional model corresponding to the first target object. In an embodiment, for each first target object, a shape factor matrix β ₁ and a first attitude angle vector θ ₁ that satisfy a preset condition may be calculated, where the preset conditions are each first two-dimensional pose point and each The sum of the differences between the pair of matching points of the initial two-dimensional pose points is the smallest, and the shape factor matrix β _{1 is the} smallest; the shape factor matrix β ₁ and the first attitude angle vector θ ₁ are used to optimize the initial three-dimensional model to obtain A three-dimensional model corresponding to the first target object.

In an embodiment, for each first target object, in addition to calculating a shape factor matrix β ₁ and a first attitude angle vector θ ₁ that satisfy a preset condition, the method further includes: calculating a camera model parameter matrix that satisfies a preset condition. K ₁ , where the preset condition is that the sum of the differences between each first two-dimensional pose point and each pair of matching points of each initial two-dimensional pose point is the smallest, and the shape factor matrix β _{1 is the} smallest; accordingly, for Each three-dimensional model uses the texture information in the corresponding first partial image to render the three-dimensional model to update the three-dimensional model, including: for each three-dimensional model: using the camera model parameter matrix K1, converting the texture information in the corresponding first partial image Map to the 3D model to update the 3D model.

In one embodiment, after obtaining the three-dimensional models corresponding to the first target objects, the camera model parameter matrix K may also be used to render texture information on the three-dimensional models. In an embodiment, a camera model parameter matrix K ₁ that satisfies a preset condition is calculated, where the preset condition is a difference between each first two-dimensional pose point and each matching point pair of each initial two-dimensional pose point. The sum is minimal, and the form factor matrix β _{1 is} minimal. After obtaining the camera model parameter matrix K _1, for each of the three-dimensional model, the model using a camera parameter matrix K1, mapping texture information corresponding to a first partial image to the three-dimensional model, in order to update the three-dimensional model.

In one embodiment, for each three-dimensional model: after using the camera model parameter matrix K1 to map the texture information in the corresponding first partial image onto the three-dimensional model, the method further includes: using a preset interpolation algorithm to map the three-dimensional model obtained. The texture information is interpolated to obtain the texture information of the complete 3D model.

In this embodiment, it is the first partial image that provides texture information for the three-dimensional model. Because the first partial image is a two-dimensional image, when the texture information in the first partial image is mapped to the three-dimensional model, the three-dimensional model is There must be some 3D points where texture information cannot be obtained, and these 3D points may include 3D points that can enter the field of view; in addition, when mapping a 3D model to a 2D image, it is only necessary to use the 3D model Able to enter three-dimensional points in the field of view. Therefore, texture information interpolation processing can be performed on the three-dimensional points that can enter the field of view and cannot obtain texture information, so that when the three-dimensional model is mapped to a two-dimensional image, complete texture information can be obtained. In an embodiment, a bilinear interpolation algorithm may be used to perform interpolation processing on the texture information of the mapped three-dimensional model to obtain the complete three-dimensional model texture information.

In an embodiment, updating at least one three-dimensional model based on the third image includes: performing target object detection on the third image based on a preset target object detection method to obtain a one-to-one correspondence with at least one second target object in the target area. At least one second partial image, wherein the second target object is a second non-rigid target object; matching each first partial image with each second partial image to obtain a match between the at least one first partial image and the second partial image Pair to determine at least one three-dimensional model corresponding to a second target object in at least one second partial image; using a preset two-dimensional pose point estimation method to perform two-dimensional pose point estimation on each second partial image to obtain Each second two-dimensional pose point corresponding to each second target object; for each second target object, the second three-dimensional model corresponding to the second target object is updated using each second two-dimensional pose point.

In this embodiment, the second target object may be the first target object after its shape and posture are changed. The method for obtaining the second partial image is the same as the method for obtaining the first partial image, and also uses a faster-rcnn detection algorithm. After using the faster-rcnn detection algorithm to obtain at least one second partial image, the image matching algorithm is used to match each first partial image with each second partial image to obtain each second partial image that matches each first partial image. Since each first partial image corresponds to a three-dimensional model, based on each first partial image, each three-dimensional model corresponding to a second target object in each second partial image can be determined.

Each three-dimensional model corresponding to each second partial image determined by using the foregoing steps is determined using each first partial image. Therefore, the pose information of each three-dimensional model corresponds to the pose information of the first target object in each first partial image. In order to match each three-dimensional model with each second partial image, the posture information of each three-dimensional model may be updated using the posture information of each second target object. In an embodiment, a preset two-dimensional pose point estimation method may be used to separately perform two-dimensional pose point estimation on each second partial image to obtain each second two-dimensional pose point corresponding to each second target object. And for each second target object, using each second two-dimensional pose point to update the three-dimensional model corresponding to the second target object. The preset two-dimensional pose point estimation method may be Openpose. The process of obtaining the second two-dimensional pose point by using Openpose is the same as the process of obtaining the first two-dimensional pose point by using Openpose.

In an embodiment, for each second target object, updating the three-dimensional model corresponding to the second target object using each second two-dimensional pose point includes: for each second target object: using a preset deep learning algorithm , Each second two-dimensional attitude point is converted into a second attitude angle vector θ ₂ ; the shape factor matrix β ₁ and the second attitude angle vector θ ₂ are used to update the three-dimensional model corresponding to the second target object to obtain the second target The 3D model corresponding to the object.

Since the three-dimensional model corresponding to the first partial image is obtained by using the shape factor matrix and the attitude angle vector optimization, the three-dimensional model can also be updated by using the updated two parameters, and because the target object is determined, the shape factor matrix There will be no change, so the 3D model can be updated with the updated attitude angle vector. In one embodiment, the second three-dimensional model corresponding to the second target object is updated by using each second two-dimensional pose point. After obtaining the second two-dimensional pose points corresponding to each second target object, the Set a deep learning algorithm to convert each second two-dimensional attitude point into a second attitude angle vector θ ₂ , and use the shape factor matrix β ₁ and the second attitude angle vector θ ₂ to update the three-dimensional model corresponding to the second target object. A three-dimensional model corresponding to the second target object is obtained. Among them, the deep learning method is based on a deep residual network, and uses a combination of the most basic linear layer, RELU activation function, and reasonable network parameters to finally obtain the second attitude angle vector θ ₂ .

In one embodiment, matching each first partial image with each second partial image to obtain at least one matching pair of the first partial image and the second partial image includes: determining each of the first partial image and each of the second partial image. The center point of the partial image; for each second partial image: Calculate the Euclidean distance between the center point of the second partial image and the center point of each first partial image; use the first partial image that minimizes the Euclidean distance as the second Matching pairs of local images.

In this embodiment, the image matching algorithm that matches each first partial image with each second partial image may be a method of determining the center points of each of the first partial images and each of the second partial images, wherein the method of determining the center points is It can be the average value of the horizontal and vertical coordinates of the four vertices in the square area. After determining the center point, for each second partial image: Calculate the Euclidean distance between the center point of the second partial image and the center point of each first partial image, and compare the magnitude relationship of each Euclidean distance. The first partial image with the smallest distance is used as a matching pair of the second partial image.

In an embodiment, mapping the updated at least one three-dimensional model into at least one two-dimensional image includes: using the camera model parameter matrix K ₁ to map the updated at least one three-dimensional model into at least one two-dimensional image.

FIG. 2 is a schematic structural diagram of a super-resolution image reconstruction device according to an embodiment of the present application. As shown in FIG. 2, the super-resolution image reconstruction device of this embodiment includes a three-dimensional model generation module 210, a three-dimensional model update module 220, and a super-resolution image acquisition module 230.

The three-dimensional model generating module 210 is configured to acquire a first image of a target region in the current region at a first moment, and generate at least one three-dimensional model corresponding to at least one first target object in the target region according to the first image. A target object is a first non-rigid target object.

The three-dimensional model update module 220 is configured to obtain a second image of the current region at a second time after the first time, extract a third image corresponding to the target region from the second image, and update at least one three-dimensional model based on the third image. .

The super-resolution image acquisition module 230 is configured to map the updated at least one three-dimensional model into at least one two-dimensional image, and stitch the at least one two-dimensional image into a second image to obtain a target super-resolution image.

The super-resolution image reconstruction device provided in this embodiment obtains a first image of a target region in a current region at a first moment through a three-dimensional model generation module, and generates a first image corresponding to at least one first target object in the target region according to the first image. At least one three-dimensional model, wherein the first target object is a first non-rigid target object, and a three-dimensional model update module is used to obtain a second image of the current region at a second time after the first time, and the target region is extracted from the second image. A corresponding third image, and update at least one three-dimensional model based on the third image, and use the super-resolution image acquisition module to map the updated at least one three-dimensional model to at least one two-dimensional image, and stitch the at least one two-dimensional image to the first In the two images, the target super-resolution image is obtained, which improves the resolution of non-rigid target objects in the low-resolution global image sequence.

Based on the above embodiments, the three-dimensional model generation module 210 may include a first partial image acquisition sub-module, a first two-dimensional pose point acquisition sub-module, a three-dimensional model determination sub-module, and a texture information rendering sub-module.

The first partial image acquisition sub-module is configured to perform target object detection on the first image based on a preset target object detection method to obtain at least one first partial image corresponding to at least one first target object.

The first two-dimensional pose point acquisition sub-module is configured to use a preset two-dimensional pose point estimation method to perform two-dimensional pose point estimation on each first partial image respectively, and obtain each first partial object corresponding to each first target object. A two-dimensional pose point.

The three-dimensional model determination sub-module is set to optimize the initial three-dimensional model with each first two-dimensional pose point for each first target object to obtain a three-dimensional model corresponding to the first target object.

The texture information rendering sub-module is configured to, for each three-dimensional model, use the texture information in the corresponding first partial image to render the three-dimensional model to update the three-dimensional model.

In an embodiment, the three-dimensional model determination sub-module may include an initial three-dimensional model construction unit, an initial two-dimensional pose point acquisition unit, a parameter acquisition unit, and a three-dimensional model acquisition unit.

The initial three-dimensional model construction unit is configured to construct an initial three-dimensional model based on a preset three-dimensional model construction method, an initial shape factor matrix β, and an initial attitude angle vector θ.

The initial two-dimensional pose point acquisition unit is set to perform two-dimensional mapping on the initial three-dimensional model by using the initial camera model parameter matrix K to obtain each initial two-dimensional pose point corresponding to the initial three-dimensional model.

A parameter acquisition unit is set for each first target object: calculating a form factor matrix β ₁ and a first attitude angle vector θ ₁ that satisfy a preset condition, where the preset conditions are each first two-dimensional pose point and each initial The sum of the differences between the pairs of matching points of the two-dimensional pose points is the smallest, and the form factor matrix β _{1 is the} smallest.

The three-dimensional model acquisition unit is configured to optimize the initial three-dimensional model by using the shape factor matrix β ₁ and the first attitude angle vector θ ₁ to obtain a three-dimensional model corresponding to the first target object.

In an embodiment, the parameter obtaining unit may be further configured to calculate a camera model parameter matrix K ₁ that satisfies a preset condition, where the preset condition is each matching of each first two-dimensional pose point and each initial two-dimensional pose point. The sum of the differences between the point pairs is minimal, and the form factor matrix β _{1 is} minimal. Correspondingly, the texture information rendering sub-module may be configured to: for each three-dimensional model: use the camera model parameter matrix K1 to map the texture information in the corresponding first partial image to the three-dimensional model to update the three-dimensional model.

In an embodiment, the texture information rendering sub-module may be further configured to: for each three-dimensional model: after using the camera model parameter matrix K1 to map the texture information in the corresponding first partial image onto the three-dimensional model, a preset is adopted The interpolation algorithm interpolates the texture information of the mapped 3D model to obtain the texture coordinates of the complete 3D model.

In one embodiment, the three-dimensional model update module 220 may include a second local image acquisition sub-module, a local image matching sub-module, a second two-dimensional pose point acquisition sub-module, and a three-dimensional model update sub-module.

The second partial image acquisition sub-module is configured to perform target object detection on the third image based on a preset target object detection method to obtain at least one second partial image corresponding to at least one second target object in the target area. , The second target object is a second non-rigid target object.

The partial image matching sub-module is configured to match each first partial image with each second partial image to obtain a matching pair of at least one first partial image and a second partial image, so as to determine a first pair of at least one second partial image. At least one three-dimensional model corresponding to two target objects.

The second two-dimensional pose point acquisition sub-module is configured to use a preset two-dimensional pose point estimation method to perform two-dimensional pose point estimation on each second partial image respectively, and obtain each first partial object corresponding to each second target object. Two-dimensional and two-dimensional pose points.

The three-dimensional model update sub-module is configured to update the three-dimensional model corresponding to the second target object with each second two-dimensional pose point for each second target object.

In one embodiment, the local image matching sub-module may include an image center point determination unit, a European-style distance calculation unit, and a local image matching pair determination unit.

The image center point determination unit is configured to determine a center point of each of the first partial images and each of the second partial images.

The Euclidean distance calculation unit is set for each second partial image: respectively calculate the Euclidean distance between the center point of the second partial image and the center point of each first partial image.

The local image matching pair determination unit is configured to use the first partial image with the smallest Euclidean distance as the matching pair of the second partial image.

In an embodiment, the three-dimensional model updating sub-module may include: for each second target object: a second attitude angle vector determination unit, configured to use a preset deep learning algorithm to convert each second two-dimensional attitude point into a first Two attitude angle vectors θ ₂ ; a three-dimensional model update unit configured to update the three-dimensional model corresponding to the second target object by using the shape factor matrix β ₁ and the second attitude angle vector θ ₂ to obtain the three-dimensional model corresponding to the second target object .

In one embodiment, the super-resolution image acquisition module 230 is configured to: use the camera model parameter matrix K ₁ to map the updated at least one three-dimensional model into at least one two-dimensional image.

The super-resolution image reconstruction apparatus provided in the embodiment of the present application can execute the super-resolution image reconstruction method provided in any embodiment of the present application, and has corresponding function modules for executing the method.

FIG. 3 is a schematic structural diagram of a super-resolution image reconstruction terminal according to another embodiment of the present application. FIG. 3 shows a block diagram of an exemplary super-resolution image reconstruction terminal 312 suitable for implementing the embodiments of the present application. The super-resolution image reconstruction terminal 312 shown in FIG. 3 is merely an example, and should not impose any limitation on the functions and scope of use of the embodiments of the present application.

As shown in FIG. 3, the super-resolution image reconstruction terminal 312 is expressed in the form of a general-purpose computing device. The components of the super-resolution image reconstruction terminal 312 may include, but are not limited to, one or more processors 316, a memory 328, and a bus 318 connecting different system components (including the memory 328 and the processor 316).

The bus 318 represents one or more of several types of bus structures, including a memory bus or a memory controller, a peripheral bus, a graphics acceleration port, a processor, or a local bus using any of a variety of bus structures. For example, these architectures include, but are not limited to, the Industry Standard Architecture (ISA) bus, the Micro Channel Architecture (MAC) bus, the enhanced ISA bus, and the Video Electronics Standards Association (Vedio Electronics Standard). Association (VESA) local area bus and Peripheral Component Interconnect (PCI) bus.

The super-resolution image reconstruction terminal 312 typically includes a variety of computer system-readable media. These media can be any available media that can be accessed by the super-resolution image reconstruction terminal 312, including volatile and non-volatile media, removable and non-removable media.

The memory 328 may include a computer system readable medium in the form of volatile memory, such as Random Access Memory (RAM) 330 and / or cache memory 332. The super-resolution image reconstruction terminal 312 may include other removable / non-removable, volatile / nonvolatile computer system storage media. For example only, the storage device 334 may be configured to read and write non-removable, non-volatile magnetic media (not shown in FIG. 3 and is commonly referred to as a “hard drive”). Although not shown in FIG. 3, a disk drive for reading and writing to a removable non-volatile disk (such as a “floppy disk”) and a read-only storage for a removable non-volatile optical disk (such as a compact optical disk) may be provided. (Compact Disc-Read-Only Memory (CD-ROM), Digital Video Disc (Read-Only Memory, DVD-ROM) or other optical media). In these cases, each drive may be connected to the bus 318 through one or more data medium interfaces. The memory 328 may include at least one program product having a set (eg, at least one) of program modules configured to perform the functions of the embodiments of the present application.

A program / utility tool 340 having a set (at least one) of program modules 342 may be stored in, for example, the memory 328. Such program modules 342 include, but are not limited to, an operating system, one or more application programs, other program modules, and program data Each of these examples, or some combination, may include an implementation of a network environment. The program module 342 generally performs functions and / or methods in the embodiments described in this application.

The super-resolution image reconstruction terminal 312 may also communicate with one or more external devices 314 (such as a keyboard, a pointing device, a display 324, etc., where the display 324 can decide whether to configure it according to actual needs), and may also communicate with one or more users A device capable of interacting with the super-resolution image reconstruction terminal 312 and / or any device (such as a network card, modem, etc.) that enables the super-resolution image reconstruction terminal 312 to communicate with one or more other computing devices. This communication can be performed through an input / output (I / O) interface 322. In addition, the super-resolution image reconstruction terminal 312 may also communicate with one or more networks (such as a local area network (LAN), a wide area network (WAN), and / or a public network, such as the Internet) through the network adapter 320. As shown, the network adapter 320 communicates with other modules of the super-resolution image reconstruction terminal 312 through the bus 318. It should be understood that although not shown in FIG. 3, other hardware and / or software modules may be used in combination with the super-resolution image reconstruction terminal 312, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID Systems, tape drives, and data backup storage devices.

The processor 316 executes various functional applications and data processing by running a program stored in the memory 328, for example, implementing a super-resolution image reconstruction method provided by any embodiment of the present application.

An embodiment of the present application further provides a computer-readable storage medium on which a computer program is stored. When the program is executed by a processor, the super-resolution image reconstruction method provided by the embodiment of the present application is implemented. The method includes: A first image of a target region in the region at a first moment, and at least one three-dimensional model corresponding to at least one first target object in the target region is generated according to the first image, where the first target object is a first non-rigid target object ; Obtain a second image of the current area at a second time after the first time, extract a third image corresponding to the target area from the second image, and update at least one three-dimensional model based on the third image; The three-dimensional model is mapped into at least one two-dimensional image, and the at least one two-dimensional image is stitched into a second image to obtain a target super-resolution image.

Certainly, the computer-readable storage medium provided in the embodiment of the present application is not limited to the method operations described above, and the computer program stored on the computer program may also be implemented in the super-resolution image reconstruction method provided by any embodiment of the present application. Related operations.

The computer storage medium in the embodiments of the present application may adopt any combination of one or more computer-readable media. The computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium. The computer-readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (non-exhaustive list) of computer-readable storage media include: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (Read Only Memory , ROM), Erasable Programmable Read Only Memory (EPROM) or flash memory, optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any of the above The right combination. In this document, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in combination with an instruction execution system, apparatus, or device.

The computer-readable signal medium may include a data signal in baseband or propagated as part of a carrier wave, which carries a computer-readable program code. Such a propagated data signal may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. The computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, and the computer-readable medium may send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device .

The program code contained on the computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wire, optical fiber cable, radio frequency (RF), etc., or any suitable combination of the foregoing.

Computer program code for performing the operations of this application may be written in one or more programming languages, or a combination thereof, including programming languages such as Java, Smalltalk, C ++, and also conventional Procedural programming language—such as "C" or similar programming language. The program code can be executed entirely on the user's computer, partly on the user's computer, as an independent software package, partly on the user's computer, partly on a remote computer, or entirely on a remote computer or server. In the case of a remote computer, the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through an Internet service provider) Internet connection).

Claims

A super-resolution image reconstruction method includes:

Acquiring a first image of a target region in the current region at a first moment, and generating at least one three-dimensional model corresponding to at least one first target object in the target region according to the first image, wherein the first target The object is a first non-rigid target object;

Acquiring a second image of the current area at a second time after the first time, extracting a third image corresponding to the target area from the second image, and updating the at least one based on the third image Three-dimensional model

The updated at least one three-dimensional model is mapped into at least one two-dimensional image, and the at least one two-dimensional image is stitched into the second image to obtain a target super-resolution image.
The method according to claim 1, wherein generating at least one three-dimensional model corresponding to at least one first target object in the target area based on the first image comprises:

Performing target object detection on the first image based on a preset target object detection method to obtain at least one first partial image corresponding to the at least one first target object;

Using a preset two-dimensional pose point estimation method to perform two-dimensional pose point estimation on each first partial image to obtain a first two-dimensional pose point corresponding to each first target object;

For each first target object, using the first two-dimensional pose point to optimize an initial three-dimensional model to obtain a three-dimensional model corresponding to each first target object;

For the three-dimensional model corresponding to each first target object, the three-dimensional model corresponding to each first target object is rendered using the texture information in the corresponding first partial image to update the at least one three-dimensional model.
The method according to claim 2, wherein for each of the first target objects, the initial three-dimensional model is optimized by using the first two-dimensional pose points to obtain a correspondence with each of the first target objects 3D models, including:

Constructing the initial three-dimensional model based on a preset three-dimensional model construction method, an initial shape factor matrix β, and an initial attitude angle vector θ;

Performing a two-dimensional mapping on the initial three-dimensional model by using an initial camera model parameter matrix K to obtain an initial two-dimensional pose point corresponding to the initial three-dimensional model;

For each first target object:

Calculate a shape factor matrix β 1 and a first attitude angle vector θ 1 that satisfy a preset condition, where the preset condition is that the matching points of the first two-dimensional pose point and the initial two-dimensional pose point are The sum of the differences between them is the smallest, and the form factor matrix β 1 is the smallest;

The initial three-dimensional model is optimized by using the shape factor matrix β 1 and the first attitude angle vector θ 1 to obtain a three-dimensional model corresponding to the first target object.
The method according to claim 3, for each of the first target objects: calculating a form factor matrix β 1 and a first attitude angle vector θ 1 satisfying a preset condition, further comprising:

Calculate a camera model parameter matrix K1 that satisfies a preset condition, wherein the preset condition is: an addition of a difference between a pair of matching points of the first two-dimensional pose point and the initial two-dimensional pose point And minimum, and the form factor matrix β 1 is the smallest;

For the three-dimensional model corresponding to each first target object, using the texture information in the corresponding first partial image to render the three-dimensional model corresponding to each first target object to update the at least one three-dimensional model. ,include:

For the three-dimensional model corresponding to each first target object: using the camera model parameter matrix K1 to map the texture information in the corresponding first partial image to the three-dimensional model corresponding to each first target object, To update the at least one three-dimensional model.
The method according to claim 4, wherein the three-dimensional model corresponding to each first target object: using the camera model parameter matrix K1, mapping texture information in a corresponding first partial image to the first partial image After the three-dimensional model corresponding to each first target object, the method further includes:

The preset interpolation algorithm is used to interpolate the texture information of the mapped 3D model to obtain the texture coordinates of the complete 3D model.
The method of claim 3, wherein said updating the at least one three-dimensional model based on the third image comprises:

Based on the preset target object detection method, target object detection is performed on the third image, and at least one second partial image corresponding to at least one second target object in the target area is obtained, where the first The two target objects are second non-rigid target objects;

Matching the first partial image with the second partial image to obtain at least one matching pair of the first partial image and the second partial image, so as to determine the first of the at least one second partial image At least one three-dimensional model corresponding to two target objects;

Using the preset two-dimensional pose point estimation method to separately perform two-dimensional pose point estimation on each second partial image to obtain a second two-dimensional pose point corresponding to each second target object;

For each second target object, the three-dimensional model corresponding to the second target object is updated using the second two-dimensional pose point.
The method according to claim 6, the case, said matching the first partial image with the second partial image to obtain at least one matching pair of the first partial image and the second partial image, include:

Determining a center point of each of the first partial images and each of the second partial images;

For each second partial image:

Calculating the Euclidean distance between the center point of each second partial image and the center point of each first partial image;

A first partial image that minimizes the Euclidean distance is used as a matching pair for the second partial image.
The method according to claim 6, wherein, for each of the second target objects, updating the three-dimensional model corresponding to the second target object by using the second two-dimensional pose point comprises:

For each second target object:

Using a preset deep learning algorithm to convert the second two-dimensional attitude point into a second attitude angle vector θ 2 ;

The three-dimensional model corresponding to the second target object is updated by using the shape factor matrix β 1 and the second attitude angle vector θ 2 to obtain a three-dimensional model corresponding to the second target object.
The method according to claim 4, wherein said mapping said updated at least one three-dimensional model to at least one two-dimensional image comprises:

Utilizing the camera model parameter matrix K1 to map the updated at least one three-dimensional model into at least one two-dimensional image.
A super-resolution image reconstruction device includes:

A three-dimensional model generating module configured to obtain a first image of a target region in the current region at a first moment, and generate at least one three-dimensional model corresponding to at least one first target object in the target region according to the first image, Wherein, the first target object is a first non-rigid target object;

The three-dimensional model update module is configured to obtain a second image of the current area at a second time after the first time, extract a third image corresponding to the target area from the second image, and based on the first Three images update the at least one three-dimensional model;

The super-resolution image acquisition module is configured to map the updated at least one three-dimensional model into at least one two-dimensional image, and stitch the at least one two-dimensional image into the second image to obtain a target super-resolution image.
A super-resolution image reconstruction terminal includes:

At least one processor;

A storage device configured to store at least one program,

When the at least one program is executed by the at least one processor, the at least one processor implements the super-resolution image reconstruction method according to any one of claims 1-9.
A computer-readable storage medium having stored thereon a computer program that, when executed by a processor, implements the super-resolution image reconstruction method according to any one of claims 1-9.