CN116309158A - Training method, three-dimensional reconstruction method, device, equipment and medium of network model - Google Patents

Training method, three-dimensional reconstruction method, device, equipment and medium of network model Download PDF

Info

Publication number
CN116309158A
CN116309158A CN202310258178.5A CN202310258178A CN116309158A CN 116309158 A CN116309158 A CN 116309158A CN 202310258178 A CN202310258178 A CN 202310258178A CN 116309158 A CN116309158 A CN 116309158A
Authority
CN
China
Prior art keywords
reconstruction
module
image
sample image
blurred
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310258178.5A
Other languages
Chinese (zh)
Inventor
赵毅
冀志龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Century TAL Education Technology Co Ltd
Original Assignee
Beijing Century TAL Education Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Century TAL Education Technology Co Ltd filed Critical Beijing Century TAL Education Technology Co Ltd
Priority to CN202310258178.5A priority Critical patent/CN116309158A/en
Publication of CN116309158A publication Critical patent/CN116309158A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Image Analysis (AREA)

Abstract

The disclosure relates to a training method, a three-dimensional reconstruction method, a device, equipment and a medium of a network model, wherein the network model comprises a feature extraction module, a deblurring module and a reconstruction module, and the method comprises the following steps: acquiring a blurred sample image, a clear sample image corresponding to the blurred sample image and reconstructed sample data corresponding to the clear sample image; extracting feature information of the fuzzy sample image by utilizing a feature extraction module; deblurring the characteristic information by adopting a deblurring module, and calculating a first loss according to the obtained deblurred image and the clear sample image; carrying out reconstruction processing on the characteristic information by adopting a reconstruction module, and calculating a second loss according to the obtained reconstruction prediction data and the reconstruction sample data; and optimizing model parameters of the network model based on the first loss and the second loss until convergence to obtain a trained network model. The method provided by the disclosure can effectively solve the problem of motion blur and improve the reconstruction effect under the motion blur scene.

Description

Training method, three-dimensional reconstruction method, device, equipment and medium of network model
Technical Field
The disclosure relates to the field of computer technology, and in particular, to a training method, a three-dimensional reconstruction method, a device, equipment and a medium for a network model.
Background
With the rise of the metauniverse, the development of the technology of the virtual man is also more and more mature. The human body reconstruction is used as an important link for driving the manufacturing process of the virtual human, and the reconstruction result directly influences the driving effect of the virtual human. At present, motion blur often occurs in the motion process of a human body, so that the reconstruction precision is often influenced, and the reconstruction effect is poor for parts which need frequent motion in the human body and have high probability of motion blur.
Disclosure of Invention
In order to solve the technical problems, the present disclosure provides a training method, a three-dimensional reconstruction method, a device, equipment and a medium for a network model, which can effectively solve the problem of motion blur and improve the reconstruction effect in a motion blur scene.
According to an aspect of the present disclosure, there is provided a training method of a network model, the network model including a feature extraction module, a deblurring module, and a reconstruction module, the method including:
acquiring a blurred sample image, a clear sample image corresponding to the blurred sample image and reconstructed sample data corresponding to the clear sample image;
Extracting feature information of the blurred sample image by utilizing the feature extraction module;
deblurring the characteristic information by adopting the deblurring module to obtain a deblurred image of the blurred sample image, and calculating a first loss according to the deblurred image and the clear sample image;
carrying out reconstruction processing on the characteristic information by adopting the reconstruction module to obtain reconstruction prediction data of the fuzzy sample image, and calculating a second loss according to the reconstruction prediction data and the reconstruction sample data;
and optimizing the model parameters of the network model based on the first loss and the second loss until convergence, so as to obtain the trained network model.
According to another aspect of the present disclosure, there is provided a three-dimensional reconstruction method, the method including:
acquiring a blurred image;
the feature extraction module in the network model obtained by training by the training method of the network model is used for extracting the feature information of the blurred image;
and generating a reconstructed image of the blurred image based on the characteristic information by a reconstruction module in the network model.
According to another aspect of the present disclosure, there is provided a training apparatus of a network model, the network model including a feature extraction module, a deblurring module, and a reconstruction module, the apparatus comprising:
The first acquisition unit is used for acquiring a blurred sample image, a clear sample image corresponding to the blurred sample image and reconstructed sample data corresponding to the clear sample image;
a first extraction unit for extracting feature information of the blurred sample image by using the feature extraction module;
the deblurring processing unit is used for carrying out deblurring processing on the characteristic information by adopting the deblurring module to obtain a deblurred image of the blurred sample image, and calculating a first loss according to the deblurred image and the clear sample image;
the first reconstruction unit is used for carrying out reconstruction processing on the characteristic information by adopting the reconstruction module to obtain reconstruction prediction data of the fuzzy sample image, and calculating a second loss according to the reconstruction prediction data and the reconstruction sample data;
and the training unit is used for optimizing the model parameters of the network model based on the first loss and the second loss until convergence to obtain the trained network model.
According to another aspect of the present disclosure, there is provided a three-dimensional reconstruction apparatus including:
a second acquisition unit configured to acquire a blurred image;
The second extraction unit is used for extracting the characteristic information of the blurred image by utilizing the characteristic extraction module in the network model obtained through training by the method;
and the second reconstruction unit is used for generating a reconstructed image of the blurred image based on the characteristic information through a reconstruction module in the network model.
According to another aspect of the present disclosure, there is provided an electronic device including: a processor; and a memory storing a program, wherein the program comprises instructions that when executed by the processor cause the processor to perform a training method according to the network model described above, or to perform a three-dimensional reconstruction method according to the three-dimensional reconstruction method described above.
According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform a training method according to a network model or to perform a three-dimensional reconstruction method according to the above.
According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the above-described training method of a network model or implements the above-described three-dimensional reconstruction method.
Compared with the prior art, the technical scheme provided by the embodiment of the disclosure has the following advantages:
the network model comprises a feature extraction module, a deblurring module and a reconstruction module, and the method comprises the following steps: acquiring a blurred sample image, a clear sample image corresponding to the blurred sample image and reconstructed sample data corresponding to the clear sample image; extracting feature information of the fuzzy sample image by utilizing a feature extraction module; deblurring the characteristic information by adopting a deblurring module, and calculating a first loss according to the obtained deblurred image and the clear sample image; carrying out reconstruction processing on the characteristic information by adopting a reconstruction module, and calculating a second loss according to the obtained reconstruction prediction data and the reconstruction sample data; and optimizing model parameters of the network model based on the first loss and the second loss until convergence to obtain a trained network model. The method provided by the disclosure can effectively solve the problem of motion blur and improve the reconstruction effect under the motion blur scene.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.
In order to more clearly illustrate the embodiments of the present disclosure or the solutions in the prior art, the drawings that are required for the description of the embodiments or the prior art will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.
FIG. 1 is a flowchart of a method for training a network model provided by an embodiment of the present disclosure;
fig. 2 is a schematic structural diagram of a network model according to an embodiment of the disclosure;
FIG. 3 is a schematic diagram of a refinement flow of S140 in the training method of the network model shown in FIG. 1;
FIG. 4 is a flow chart of a three-dimensional reconstruction method provided by an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of a training device for a network model according to an embodiment of the disclosure;
fig. 6 is a schematic structural diagram of a three-dimensional reconstruction device according to an embodiment of the present disclosure;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure.
Detailed Description
In order that the above objects, features and advantages of the present disclosure may be more clearly understood, embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below. It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.
It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.
The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.
With the rise of the metauniverse, the development of the technology of the virtual man is also more and more mature. Human body reconstruction is used as an important link of the manufacturing process of the drivable virtual human, and the quality of the reconstruction directly influences the final driving effect. At present, motion blur frequently occurs in the motion process of a human body, particularly in a hand region, on one hand, the hand occupies smaller area in a picture than the human body, on the other hand, the head is used as an important carrier for virtual human interaction, frequent motion is needed to transfer information outwards, and therefore the probability of occurrence of the hand motion blur is obviously higher than that of other regions of the human body. Therefore, how to effectively solve the problem of blurring of human hands during the motion process is a significant challenge in achieving high-quality drivable virtual human technology development.
At present, a deep learning method is adopted to solve the problem of motion blur, and the existing deep learning method can be solved by two ways: one approach is to add fuzzy training samples by way of data enhancement, which while improving the fuzzy problem, has limited deblurring effect; another approach is to train a deblurring model, deblur the data by the deblurring model, and then reconstruct the deblurred data, i.e., deblur and reconstruct the data, which, although it can be better deblurred, requires training a deblurring model alone, increasing the time-consuming reasoning process.
Aiming at the technical problems, the embodiment of the disclosure provides a training method of a network model, the network model is constructed, the network model comprises a feature extraction module, a debluring module and a reconstruction module, the debluring module and the reconstruction module share the feature information output by the feature extraction module, so that the network model can simultaneously perform debluring tasks and reconstruction tasks, finally, the network model is trained based on the output of the debluring module and the reconstruction module, the combined training of the debluring module and the reconstruction module, namely, the multitask training is performed, the training time consumption can be effectively reduced, meanwhile, the reconstruction module also has a certain debluring function, the motion blur problem can be effectively solved based on the reconstruction module for the three-dimensional reconstruction reasoning task of a blurred image, the debluring operation before the reconstruction is performed once again without using the debluring model which is independently trained, and the accurate recovery of a virtual person on real actions can be provided while the reasoning time consumption is effectively reduced, and the practicability is strong. The human hand is specifically described in detail by way of at least one embodiment described below.
Specifically, the training method and the three-dimensional reconstruction method of the network model can be executed by a server or a terminal. The method comprises the steps that a server executes a training method of a network model to obtain a trained network model, a terminal executes a three-dimensional reconstruction method, the terminal obtains the trained network model from the server, and the terminal performs deblurring processing and three-dimensional reconstruction processing on a fuzzy image through the reconstruction model in the network model. The blurred image may be obtained by terminal photographing. Alternatively, the blurred image is acquired by the terminal from another device. The execution main body of the training method of the network model and the execution main body of the three-dimensional reconstruction method can be the same or different. In another application scenario, a server trains a network model. Further, the server performs deblurring processing and three-dimensional reconstruction processing on the blurred image through the trained network model. The manner in which the server obtains the blurred image may be similar to the manner in which the terminal obtains the blurred image as described above, and will not be described here again. In yet another application scenario, the terminal trains the network model. Further, the terminal performs deblurring processing and three-dimensional reconstruction processing on the blurred image through the trained network model. It can be appreciated that the training method and the three-dimensional reconstruction method of the network model provided by the embodiments of the present disclosure are not limited to the several possible scenarios described above. One or more embodiments described below will be described in detail with reference to a training method and a three-dimensional reconstruction method for a server to execute a network model.
Fig. 1 is a flowchart of a training method of a network model according to an embodiment of the present disclosure, which is applied to a server, and specifically includes the following steps S110 to S150 shown in fig. 1:
the network model comprises a feature extraction module, a deblurring module and a reconstruction module.
Referring to fig. 2, an exemplary embodiment of the present disclosure is shown in fig. 2, where the network model includes a feature extraction module, a deblurring module, and a reconstruction module, and the network model may be regarded as an improved reconstruction model. In the network model, the deblurring module and the reconstruction module share the feature extraction module, and in the training stage of the network model, the network model can simultaneously complete the three-dimensional reconstruction task and the deblurring task. The feature extraction module (encoder) may use a human body key point detection network (HRNet) to extract feature information of the blurred image, for example, HRNet32 is used as the encoder, and the HRNet may output feature information (feature) under different feature scales. The Deblur module (Deblur) is used for finishing the correction of the characteristic information of the blurred image, namely, deblurring the blurred image and outputting a clear image. The reconstruction module is used for predicting reconstruction data (3D mesh) based on characteristic information, namely performing three-dimensional reconstruction processing on a blurred image and outputting the reconstructed image, the reconstruction module further comprises a perception sub-module and a reconstruction sub-module, the perception sub-module can calculate parameter information by adopting a multi-layer perceptron (MLP, multilayer Perceptron), the reconstruction sub-module can predict the reconstruction data according to the parameter information by adopting a grid generation model, for example, the grid generation model can be a parameterized Model (MANO) for hands, the reconstruction data can be a reconstructed image of the blurred image, and the reconstruction data can also be key points in the blurred image.
S110, obtaining a blurred sample image, a clear sample image corresponding to the blurred sample image and reconstructed sample data corresponding to the clear sample image.
It may be appreciated that taking the human hand region as an example, the clear sample image may be a clear hand image, the blurred sample image may be a hand blurred image, the reconstructed sample data may be a reconstructed sample image of the clear sample image, or parameter data and key point data related to reconstruction, the reconstructed sample data may be understood as real data, and the real data and prediction data output by the grid model are used for training the grid model. In particular, the plurality of blurred sample images, the plurality of sharp sample images, and the plurality of reconstructed sample data may be combined into a first data set for training the network model. It will be appreciated that there is one corresponding clean sample data for each blurred sample data and one corresponding reconstructed sample data for each clean sample data.
Optionally, the step S110 may be specifically implemented by the following steps:
obtaining a reconstruction sample set, wherein the reconstruction sample set comprises a clear sample image and reconstruction sample data corresponding to the clear sample image; and carrying out blurring processing on the clear sample image through the constructed blurring kernel to obtain a blurring sample image.
It is understood that the reconstructed sample set refers to a second data set for training the original reconstructed model, the second data set comprising the sharp sample image and reconstructed sample data corresponding to the sharp sample image. In the sample preprocessing stage, the clear sample image is subjected to blurring processing through the constructed blurring kernel, so that a blurring sample image corresponding to the clear sample image is obtained, and a first data set is obtained on the basis of a second data set, namely, a motion blurring data sample pair is constructed through the construction blurring kernel, wherein the sample pair comprises the clear sample image and the blurring sample image.
And S120, extracting the characteristic information of the fuzzy sample image by utilizing the characteristic extraction module.
It can be understood that, based on S110, the blurred sample image is input to the feature extraction module, and feature information of the blurred sample image under different feature scales is extracted, where the feature information under different feature scales can be understood as a feature map with different resolution classes. Specifically, taking four types of resolution feature images as examples, extracting feature images of a blurred sample image to obtain a first type of resolution feature image, downsampling the first type of resolution feature image to obtain a second type of resolution feature image, downsampling the second type of resolution feature image to obtain a third type of resolution feature image, and so on until a fourth type of resolution feature image is obtained, and obtaining feature information of the blurred sample image based on the four types of resolution feature images.
S130, performing deblurring processing on the characteristic information by adopting the deblurring module to obtain a deblurred image of the blurred sample image, and calculating a first loss according to the deblurred image and the clear sample image.
It can be understood that, based on the above steps S110 and S120, the feature information is input to the deblurring module to perform deblurring processing, so as to obtain a deblurred image of the blurred sample image, where the deblurred image may be understood as a predicted image output after the deblurring module corrects the feature information, where the dimensions of the deblurred image and the blurred sample image may be the same, for example, 256×256×3. After the deblurring image is obtained, comparing the deblurring image with a clear sample image, and calculating the first loss of the deblurring module, wherein the clear sample image is a real image.
Wherein the feature information includes first feature information at a first feature scale.
Optionally, in S130, the deblurring module is used to deblur the feature information to obtain a deblurred image of the blurred sample image, which may be specifically implemented by the following steps:
and adopting the deblurring module to deblur the first characteristic information to obtain a deblurred image of the blurred sample image.
It can be appreciated that the first feature scale may be 64×64×32, and the first feature information may be obtained by down-sampling the encoder by 4 times, where the encoder is the feature information directly extracted based on the blurred sample image. And then, inputting the first characteristic information into a deblurring module for deblurring processing to obtain a deblurred image of the blurred sample image, wherein the specific deblurring processing mode is not limited.
And S140, carrying out reconstruction processing on the characteristic information by adopting the reconstruction module to obtain reconstruction prediction data of the fuzzy sample image, and calculating a second loss according to the reconstruction prediction data and the reconstruction sample data.
It can be understood that, based on the above S110 and S120, the feature information is input into a reconstruction module, and the reconstruction module predicts the 3D mesh of the blurred sample image based on the feature information, so as to obtain the reconstructed prediction data of the blurred sample image. Subsequently, the reconstructed prediction data, which may be reconstructed images of size also 256 x 3, are compared with the reconstructed sample data, which is the real data, and a second loss of the reconstruction module is calculated.
Wherein the feature information includes second feature information at a second feature scale.
Optionally, in S140, the reconstructing module is used to reconstruct the feature information to obtain reconstructed prediction data of the blurred sample image, which is specifically implemented by the following steps:
and carrying out reconstruction processing on the second characteristic information by adopting the reconstruction module to obtain reconstruction prediction data of the blurred sample image.
It can be understood that the second feature scale may be 8×8×256, the second feature information may be obtained by downsampling the encoder by 32 times, and the second feature information is input into the reconstruction module to perform three-dimensional reconstruction processing to obtain a deblurred image of the blurred sample image, and a specific reconstruction processing mode is not limited.
The reconstruction module comprises a perception sub-module and a reconstruction sub-module. The reconstructed prediction data of the blurred sample image includes the parameter prediction data and the keypoint prediction data.
Optionally, the reconstructing module outputs reconstructed prediction data, which may be specifically implemented by the following steps:
performing parameter calculation according to the second characteristic information by using the perception sub-module to obtain parameter prediction data; and carrying out reconstruction processing on the parameter prediction data by utilizing the reconstruction sub-module to obtain key point prediction data.
It can be understood that the second feature information is input into the sensing submodule to perform parameter prediction to obtain parameter prediction data, the parameter prediction data includes at least one of a pose parameter (pose), a shape parameter (shape) and a camera parameter (camera), and the camera parameter (R, T, S) includes an internal parameter and an external parameter, and the external parameter is a pose of the camera. The parameter prediction data are input into a reconstruction sub-module for three-dimensional reconstruction, so that the key point prediction data can be directly obtained and recorded as a render, or a reconstructed prediction image is obtained, and then the key point prediction data are calculated by the reconstructed prediction image.
The reconstructed sample data comprises parameter sample data, two-dimensional key point sample data and three-dimensional key point sample data, and the key point prediction data comprises two-dimensional key point prediction data and three-dimensional key point prediction data.
Optionally, as shown in fig. 3, a detailed flowchart of S140 in the training method of the network model shown in fig. 1 is shown. The second loss is calculated according to the reconstructed prediction data and the reconstructed sample data in S140 described above with reference to fig. 3, and specifically includes the following steps S1401 to S1404 as shown in fig. 3:
s1401, calculating a parameterized model loss according to the parameter sample data and the parameter prediction data.
It can be understood that, the parameterized model loss of the perception sub-module is calculated according to the marked real data parameter sample data and the parameter prediction data output by the module, and the parameterized model loss calculation formula is shown in the formula (1):
Figure BDA0004134679220000081
wherein: l (L) params Representing parameterized model loss, the parametric sample data is (θ, β, C), the parametric prediction data is
Figure BDA0004134679220000082
Wherein θ is an attitude parameter, β is a shape parameter, and C is a camera parameter.
S1402, calculating two-dimensional key point loss according to the two-dimensional key point sample data and the two-dimensional key point prediction data.
It can be understood that, according to the two-dimensional key point sample data as the real data and the two-dimensional key point prediction data output by the module, the two-dimensional key point loss related to the reconstruction sub-module is calculated, and the calculation formula is shown in the formula (2):
Figure BDA0004134679220000091
wherein L is 2D joints For two-dimensional keypoint loss, N is the number of samples,
Figure BDA0004134679220000092
predicting data for two-dimensional keypoints, x i Is two-dimensionalKey point sample data, wherein i ranges from 0-N, x i To represent the ith two-dimensional keypoint sample data.
S1403, calculating three-dimensional key point loss according to the three-dimensional key point sample data and the three-dimensional key point prediction data.
It can be understood that, according to the three-dimensional key point sample data as the real data and the three-dimensional key point prediction data output by the module, the three-dimensional key point loss related to the reconstruction sub-module is calculated, and the calculation formula is shown in the formula (3):
Figure BDA0004134679220000093
Wherein L is 3D ioints Representing the loss of the 3D keypoint,
Figure BDA0004134679220000094
representing predicted 3D keypoints, X i Represents a 3D keypoint label.
And S1404, calculating the sum of the parameterized model loss, the two-dimensional key point loss and the three-dimensional key point loss to obtain a second loss.
It can be understood that the sum of the parameterized model loss, the two-dimensional key point loss and the three-dimensional key point loss is calculated to obtain a second loss for updating the network parameters of the reconstruction module, and the calculation formula is shown in the formula (4):
L reconstruct =L 2D joints +L 3D joints +L params (4)
wherein L is reconstruct A second penalty for reconstructing the module.
It can be appreciated that by calculating two-dimensional keypoint losses as well as three-dimensional keypoint losses, network parameters of the critical reconstruction sub-modules can be quickly estimated and learned.
And S150, optimizing the model parameters of the network model based on the first loss and the second loss until convergence to obtain the trained network model.
It can be understood that, based on the above S130 and S140, the sum of the first loss and the second loss is calculated to obtain a total loss, and the network parameters of the whole network model are continuously optimized through the total loss until the calculated total loss is less than or equal to the preset loss value, and the network model converges to obtain a trained network model. It can be appreciated that the network parameters of the feature extraction module, the deblurring module, and the reconstruction module are all updated synchronously during the network parameter optimization of the overall network model.
The embodiment of the disclosure provides a training method of a network model, which is characterized in that a deblurring module is added in a training stage, and the network model is jointly trained through the deblurring module and a reconstruction module, so that effective characteristic information applicable to a fuzzy reconstruction scene can be extracted by a characteristic extraction module, the reconstruction module has deblurring capability in the reconstruction process, and meanwhile, the time consumption of model training is not increased to a certain extent.
On the basis of the above embodiment, fig. 4 is a flowchart of the three-dimensional reconstruction method provided in the embodiment of the present disclosure, which is applied to a server, and after the training of the network model is completed, performs three-dimensional reconstruction based on the feature extraction module and the reconstruction module, and specifically includes the following steps S410 to S430 shown in fig. 4:
s410, acquiring a blurred image.
It can be understood that the blurred image is obtained, the blurred image may be an RGB image, the blur type of the blurred image is not limited, and the blurred image may be preprocessed so as to satisfy the input condition of the feature extraction module.
S420, extracting feature information of the blurred image by using a feature extraction module in the network model obtained by training by using a training method of the network model.
It can be understood that, based on the step S410, the feature extraction module obtained by training the training method based on the network model is obtained, the blurred image is input into the feature extraction module, the feature information of the blurred image is extracted, and when the method is applied, only the second feature information applicable to the reconstruction module can be reserved.
S430, generating a reconstructed image of the blurred image based on the characteristic information through a reconstruction module in the network model.
It can be understood that, based on the above S420, the second feature information is input into the trained reconstruction module, and the reconstruction module performs deblurring processing and reconstruction processing on the second feature information simultaneously in the reconstruction process, that is, the trained reconstruction module has deblurring and reconstruction functions at the same time, so as to generate a reconstructed image of the blurred image.
The embodiment of the disclosure provides a three-dimensional reconstruction method, which is used for realizing deblurring and three-dimensional reconstruction of a blurred image based on a trained feature extraction module and a reconstruction module, and is unnecessary to explicitly obtain the deblurred image, and after feature extraction is completed, the motion blur problem can be effectively improved based on one reconstruction module, so that a more accurate reconstruction result in a motion blur scene is obtained, and the accurate restoration of a virtual person to a real action is realized.
Fig. 5 is a schematic structural diagram of a training apparatus for a network model according to an embodiment of the present disclosure, where the network model includes a feature extraction module, a deblurring module, and a reconstruction module, as shown in fig. 5, an apparatus 500 according to an embodiment of the present disclosure may include a first acquisition unit 510, a first extraction unit 520, a deblurring processing unit 530, a first reconstruction unit 540, and a training unit 550, where:
a first obtaining unit 510, configured to obtain a blurred sample image, a clear sample image corresponding to the blurred sample image, and reconstructed sample data corresponding to the clear sample image;
a first extracting unit 520 for extracting feature information of the blurred sample image using the feature extracting module;
a deblurring processing unit 530, configured to perform deblurring processing on the feature information by using the deblurring module, obtain a deblurred image of the blurred sample image, and calculate a first loss according to the deblurred image and the clear sample image;
a first reconstruction unit 540, configured to perform reconstruction processing on the feature information by using the reconstruction module, obtain reconstructed prediction data of the blurred sample image, and calculate a second loss according to the reconstructed prediction data and the reconstructed sample data;
And the training unit 550 is configured to optimize the model parameters of the network model based on the first loss and the second loss until convergence, so as to obtain the trained network model.
In one embodiment, the first obtaining unit 510 is configured to:
obtaining a reconstruction sample set, wherein the reconstruction sample set comprises a clear sample image and reconstruction sample data corresponding to the clear sample image;
and carrying out blurring processing on the clear sample image through the constructed blurring kernel to obtain a blurring sample image.
In one embodiment, the feature information includes first feature information at a first feature scale.
In one embodiment, the deblurring processing unit 530 is configured to:
and adopting the deblurring module to deblur the first characteristic information to obtain a deblurred image of the blurred sample image.
In one embodiment, the feature information includes second feature information at a second feature scale.
In one embodiment, the first reconstruction unit 540 is configured to:
and carrying out reconstruction processing on the second characteristic information by adopting the reconstruction module to obtain reconstruction prediction data of the blurred sample image.
In one embodiment, the reconstruction module includes a perception sub-module and a reconstruction sub-module.
In one embodiment, the first reconstruction unit 540 is configured to:
performing parameter calculation according to the second characteristic information by using the perception sub-module to obtain parameter prediction data;
carrying out reconstruction processing on the parameter prediction data by utilizing the reconstruction sub-module to obtain key point prediction data;
wherein the reconstructed prediction data of the blurred sample image comprises the parameter prediction data and the keypoint prediction data.
In one embodiment, the reconstructed sample data includes parametric sample data, two-dimensional keypoint sample data, and three-dimensional keypoint sample data, and the keypoint prediction data includes two-dimensional keypoint prediction data and three-dimensional keypoint prediction data.
In one embodiment, the first reconstruction unit 540 is configured to:
calculating parameterized model loss according to the parameter sample data and the parameter prediction data;
calculating two-dimensional key point loss according to the two-dimensional key point sample data and the two-dimensional key point prediction data;
calculating three-dimensional key point loss according to the three-dimensional key point sample data and the three-dimensional key point prediction data;
and calculating the sum of the parameterized model loss, the two-dimensional key point loss and the three-dimensional key point loss to obtain a second loss.
The device provided in this embodiment has the same implementation principle and technical effects as those of the foregoing method embodiment, and for brevity, reference may be made to the corresponding content of the foregoing method embodiment where the device embodiment is not mentioned.
Fig. 6 is a schematic structural diagram of a three-dimensional reconstruction device according to an embodiment of the present disclosure, as shown in fig. 6, an apparatus 600 according to an embodiment of the present disclosure may include a second obtaining unit 610, a second extracting unit 620, and a second reconstruction unit 630, where:
a second acquisition unit 610 for acquiring a blurred image;
a second extracting unit 620, configured to extract feature information of the blurred image by using a feature extracting module in the network model obtained by training using the training method of the network model;
a second reconstruction unit 630, configured to generate, by a reconstruction module in the network model, a reconstructed image of the blurred image based on the feature information.
The device provided in this embodiment has the same implementation principle and technical effects as those of the foregoing method embodiment, and for brevity, reference may be made to the corresponding content of the foregoing method embodiment where the device embodiment is not mentioned.
The exemplary embodiments of the present disclosure also provide an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor. The memory stores a computer program executable by the at least one processor for causing the electronic device to perform a method according to embodiments of the present disclosure when executed by the at least one processor.
The present disclosure also provides a computer program product comprising a computer program, wherein the computer program, when executed by a processor of a computer, is for causing the computer to perform a method according to embodiments of the disclosure.
Referring to fig. 7, a block diagram of an electronic device 700 that may be a server or a client of the present disclosure, which is an example of a hardware device that may be applied to aspects of the present disclosure, will now be described. Electronic devices are intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 7, the electronic device 700 includes a computing unit 701 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the device 700 may also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
Various components in the electronic device 700 are connected to the I/O interface 705, including: an input unit 706, an output unit 707, a storage unit 708, and a communication unit 709. The input unit 706 may be any type of device capable of inputting information to the electronic device 700, and the input unit 706 may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device. The output unit 707 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, video/audio output terminals, vibrators, and/or printers. Storage unit 704 may include, but is not limited to, magnetic disks, optical disks. The communication unit 709 allows the electronic device 700 to exchange information/data with other devices through computer networks, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth (TM) devices, wiFi devices, wiMax devices, cellular communication devices, and/or the like.
The computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 701 performs the various methods and processes described above. For example, in some embodiments, the training method of the network model or the training method of the reconstructed model may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 700 via the ROM 702 and/or the communication unit 709. In some embodiments, the computing unit 701 may be configured to perform a training method of the network model or a training method of the reconstructed model by any other suitable means (e.g. by means of firmware).
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
As used in this disclosure, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
The foregoing is merely a specific embodiment of the disclosure to enable one skilled in the art to understand or practice the disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown and described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (11)

1. A method for training a network model, wherein the network model includes a feature extraction module, a deblurring module, and a reconstruction module, the method comprising:
acquiring a blurred sample image, a clear sample image corresponding to the blurred sample image and reconstructed sample data corresponding to the clear sample image;
extracting feature information of the blurred sample image by utilizing the feature extraction module;
deblurring the characteristic information by adopting the deblurring module to obtain a deblurred image of the blurred sample image, and calculating a first loss according to the deblurred image and the clear sample image;
carrying out reconstruction processing on the characteristic information by adopting the reconstruction module to obtain reconstruction prediction data of the fuzzy sample image, and calculating a second loss according to the reconstruction prediction data and the reconstruction sample data;
and optimizing the model parameters of the network model based on the first loss and the second loss until convergence, so as to obtain the trained network model.
2. The method of claim 1, wherein the acquiring the blurred sample image, the sharp sample image corresponding to the blurred sample image, and the reconstructed sample data corresponding to the sharp sample image comprises:
Obtaining a reconstruction sample set, wherein the reconstruction sample set comprises a clear sample image and reconstruction sample data corresponding to the clear sample image;
and carrying out blurring processing on the clear sample image through the constructed blurring kernel to obtain a blurring sample image.
3. The method according to claim 1, wherein the feature information includes first feature information at a first feature scale, the deblurring of the feature information using the deblurring module results in a deblurred image of the blurred sample image, comprising:
and adopting the deblurring module to deblur the first characteristic information to obtain a deblurred image of the blurred sample image.
4. The method according to claim 1, wherein the feature information includes second feature information at a second feature scale, the reconstructing the feature information using the reconstruction module to obtain reconstructed prediction data of the blurred sample image includes:
and carrying out reconstruction processing on the second characteristic information by adopting the reconstruction module to obtain reconstruction prediction data of the blurred sample image.
5. The method according to claim 4, wherein the reconstructing module includes a sensing sub-module and a reconstructing sub-module, and the reconstructing the second feature information using the reconstructing module obtains reconstructed prediction data of the blurred sample image, including:
Performing parameter calculation according to the second characteristic information by using the perception sub-module to obtain parameter prediction data;
carrying out reconstruction processing on the parameter prediction data by utilizing the reconstruction sub-module to obtain key point prediction data;
wherein the reconstructed prediction data of the blurred sample image comprises the parameter prediction data and the keypoint prediction data.
6. The method of claim 5, wherein the reconstructed sample data comprises parametric sample data, two-dimensional keypoint sample data, and three-dimensional keypoint sample data, the keypoint prediction data comprising two-dimensional keypoint prediction data and three-dimensional keypoint prediction data, the calculating a second loss from the reconstructed prediction data and the reconstructed sample data comprising:
calculating parameterized model loss according to the parameter sample data and the parameter prediction data;
calculating two-dimensional key point loss according to the two-dimensional key point sample data and the two-dimensional key point prediction data;
calculating three-dimensional key point loss according to the three-dimensional key point sample data and the three-dimensional key point prediction data;
and calculating the sum of the parameterized model loss, the two-dimensional key point loss and the three-dimensional key point loss to obtain a second loss.
7. A method of three-dimensional reconstruction, the method comprising:
acquiring a blurred image;
extracting feature information of the blurred image by using a feature extraction module in the network model obtained by training the network model training method according to any one of claims 1 to 6;
and generating a reconstructed image of the blurred image based on the characteristic information by a reconstruction module in the network model.
8. A training apparatus for a network model, the network model comprising a feature extraction module, a deblurring module, and a reconstruction module, the apparatus comprising:
the first acquisition unit is used for acquiring a blurred sample image, a clear sample image corresponding to the blurred sample image and reconstructed sample data corresponding to the clear sample image;
a first extraction unit for extracting feature information of the blurred sample image by using the feature extraction module;
the deblurring processing unit is used for carrying out deblurring processing on the characteristic information by adopting the deblurring module to obtain a deblurred image of the blurred sample image, and calculating a first loss according to the deblurred image and the clear sample image;
The first reconstruction unit is used for carrying out reconstruction processing on the characteristic information by adopting the reconstruction module to obtain reconstruction prediction data of the fuzzy sample image, and calculating a second loss according to the reconstruction prediction data and the reconstruction sample data;
and the training unit is used for optimizing the model parameters of the network model based on the first loss and the second loss until convergence to obtain the trained network model.
9. A three-dimensional reconstruction apparatus, the apparatus comprising:
a second acquisition unit configured to acquire a blurred image;
a second extraction unit, configured to extract feature information of the blurred image by using a feature extraction module in the network model obtained by training according to the method of any one of claims 1 to 6;
and the second reconstruction unit is used for generating a reconstructed image of the blurred image based on the characteristic information through a reconstruction module in the network model.
10. An electronic device, the electronic device comprising:
a processor; and
a memory in which a program is stored,
wherein the program comprises instructions which, when executed by the processor, cause the processor to perform the training method of the network model according to any one of claims 1 to 6 or to perform the three-dimensional reconstruction method according to claim 7.
11. A non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the training method of the network model according to any one of claims 1 to 6 or the three-dimensional reconstruction method according to claim 7.
CN202310258178.5A 2023-03-10 2023-03-10 Training method, three-dimensional reconstruction method, device, equipment and medium of network model Pending CN116309158A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310258178.5A CN116309158A (en) 2023-03-10 2023-03-10 Training method, three-dimensional reconstruction method, device, equipment and medium of network model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310258178.5A CN116309158A (en) 2023-03-10 2023-03-10 Training method, three-dimensional reconstruction method, device, equipment and medium of network model

Publications (1)

Publication Number Publication Date
CN116309158A true CN116309158A (en) 2023-06-23

Family

ID=86788327

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310258178.5A Pending CN116309158A (en) 2023-03-10 2023-03-10 Training method, three-dimensional reconstruction method, device, equipment and medium of network model

Country Status (1)

Country Link
CN (1) CN116309158A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117726760A (en) * 2024-02-07 2024-03-19 之江实验室 Training method and device for three-dimensional human body reconstruction model of video

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117726760A (en) * 2024-02-07 2024-03-19 之江实验室 Training method and device for three-dimensional human body reconstruction model of video
CN117726760B (en) * 2024-02-07 2024-05-07 之江实验室 Training method and device for three-dimensional human body reconstruction model of video

Similar Documents

Publication Publication Date Title
CN109035319B (en) Monocular image depth estimation method, monocular image depth estimation device, monocular image depth estimation apparatus, monocular image depth estimation program, and storage medium
CN111507914A (en) Training method, repairing method, device, equipment and medium of face repairing model
CN112785674A (en) Texture map generation method, rendering method, device, equipment and storage medium
CN114511041B (en) Model training method, image processing method, device, equipment and storage medium
CN117274491B (en) Training method, device, equipment and medium for three-dimensional reconstruction model
CN112967196A (en) Image restoration method and device, electronic device and medium
CN116309983B (en) Training method and generating method and device of virtual character model and electronic equipment
US20230005171A1 (en) Visual positioning method, related apparatus and computer program product
CN115239581A (en) Image processing method and related device
CN116309158A (en) Training method, three-dimensional reconstruction method, device, equipment and medium of network model
CN114140320B (en) Image migration method and training method and device of image migration model
WO2024159888A1 (en) Image restoration method and apparatus, and computer device, program product and storage medium
CN116246026B (en) Training method of three-dimensional reconstruction model, three-dimensional scene rendering method and device
CN113920023A (en) Image processing method and device, computer readable medium and electronic device
CN115457365B (en) Model interpretation method and device, electronic equipment and storage medium
CN108898557B (en) Image restoration method and apparatus, electronic device, computer program, and storage medium
CN116363429A (en) Training method of image recognition model, image recognition method, device and equipment
CN113554550B (en) Training method and device for image processing model, electronic equipment and storage medium
CN114078097A (en) Method and device for acquiring image defogging model and electronic equipment
CN114120423A (en) Face image detection method and device, electronic equipment and computer readable medium
CN113628192A (en) Image blur detection method, device, apparatus, storage medium, and program product
CN112132871A (en) Visual feature point tracking method and device based on feature optical flow information, storage medium and terminal
CN118172286B (en) License plate image deblurring method, model training method, device, equipment and medium
CN112950516B (en) Method and device for enhancing local contrast of image, storage medium and electronic equipment
CN114511458A (en) Image processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination