CN114581316A

CN114581316A - Image reconstruction method, electronic device, storage medium, and program product

Info

Publication number: CN114581316A
Application number: CN202210035539.5A
Authority: CN
Inventors: 刘震; 刘帅成
Original assignee: Beijing Kuangshi Technology Co Ltd; Beijing Megvii Technology Co Ltd
Current assignee: Beijing Kuangshi Technology Co Ltd; Beijing Megvii Technology Co Ltd
Priority date: 2022-01-13
Filing date: 2022-01-13
Publication date: 2022-06-03

Abstract

The embodiment of the invention provides an image reconstruction method, electronic equipment, a computer readable storage medium and a computer program product. The method comprises the following steps: acquiring a plurality of gamma correction images which correspond to a plurality of images to be processed one by one, wherein the images to be processed are low dynamic range images, and the plurality of gamma correction images are obtained by performing gamma correction on the plurality of images to be processed; inputting the gamma correction images into a deformable alignment module in a reconstruction network model to obtain alignment features corresponding to the images to be processed one by one; the alignment features are input into a fusion module in the reconstruction network model to obtain a reconstructed image, which is a high dynamic range image. The method can effectively solve the problem of ghost image in HDR image reconstruction.

Description

Image reconstruction method, electronic device, storage medium, and program product

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to an image reconstruction method, an electronic device, a computer-readable storage medium, and a computer program product.

Background

In the field of digital image processing, dynamic range refers to the range of light intensities in a scene that an image can capture. The Dynamic Range of a natural scene observable by human eyes can reach 10000:1, while common consumer grade photographic equipment (such as a mobile phone) can only shoot a limited Low Dynamic Range (LDR) image (the Dynamic Range of the LDR image is 100: 1-300: 1). Compared with an LDR image, the High Dynamic Range (HDR) image has a wider Dynamic Range, and can more truly restore the light and shadow effect close to a real scene, so that a photo with richer gradation, more realistic picture and higher quality can be obtained.

There are two methods of obtaining HDR images: the first is to capture HDR images directly by a specific device, but such a device is bulky, expensive, and not popular in consumer electronics (e.g., smart phones); the second approach is to obtain HDR images by fusing multiple frames of LDR images at different exposures (i.e. multi-exposure HDR image reconstruction), however multi-frame fusion is prone to ghost problems due to the shake of the hand-held camera or the motion of foreground objects.

Of the above two methods, the second method, i.e. multi-exposure HDR image reconstruction in a dynamic scene, is a relatively common HDR image acquisition method. In order to obtain a high-quality HDR image, a problem of ghosting caused by foreground motion or camera shake when multiple LDR images with different exposure intensities are fused needs to be solved.

Disclosure of Invention

The present invention has been made in view of the above problems. The invention provides an image reconstruction method and device, a rendering client and a storage medium.

According to an aspect of the present invention, there is provided an image reconstruction method including: acquiring a plurality of gamma correction images which correspond to a plurality of images to be processed one by one, wherein the images to be processed are low dynamic range images, and the plurality of gamma correction images are obtained by performing gamma correction on the plurality of images to be processed; inputting the gamma correction images into a deformable alignment module in a reconstruction network model to obtain alignment features corresponding to the images to be processed one by one; the alignment features are input into a fusion module in the reconstruction network model to obtain a reconstructed image, which is a high dynamic range image.

Illustratively, the reconstructed network model further comprises a spatial attention module, the method further comprising: acquiring a plurality of images to be processed; inputting a plurality of images to be processed into a spatial attention module to obtain attention features corresponding to the plurality of images to be processed one by one; inputting the alignment features into a fusion module in the reconstruction network model to obtain a reconstructed image comprises: the alignment features are input to a fusion module along with the attention features to obtain a reconstructed image.

Illustratively, inputting the plurality of gamma-corrected images into a deformable alignment module in the reconstruction network model to obtain alignment features in one-to-one correspondence with the plurality of images to be processed comprises: in the deformable alignment module, the following operations are performed: respectively inputting a plurality of gamma correction images for feature extraction to obtain image features corresponding to a plurality of images to be processed one by one; for any non-reference image, splicing image features corresponding to a reference image and image features corresponding to the non-reference image to obtain spliced features, wherein the reference image is one of a plurality of images to be processed, and the non-reference image is an image except the reference image in the plurality of images to be processed; convolving the second stitching feature to calculate an offset between the reference image and the non-reference image; performing deformable convolution on the image features corresponding to the non-reference image based on the offset to obtain alignment features corresponding to the non-reference image; wherein the alignment feature corresponding to the reference image is an image feature corresponding to the reference image.

Illustratively, the non-reference image is an image of which exposure value is closest to 0 among the plurality of images to be processed.

Illustratively, the reference image is one of the images of the plurality of images to be processed in which the exposure value is at the center.

Illustratively, the method further comprises: acquiring a training image, wherein the training image comprises a plurality of sample correction images which are in one-to-one correspondence with a plurality of sample images and an annotation image which is corresponding to the plurality of sample images, the sample images are low dynamic range images, the annotation image is a high dynamic range image, and the plurality of sample correction images are obtained by carrying out gamma correction on the plurality of sample images; processing the plurality of sample corrected images using the initial reconstructed network model to obtain a predicted reconstructed image; calculating a reconstruction loss term based on the predicted reconstruction image and the labeled image; acquiring predicted image characteristics and labeled image characteristics corresponding to the predicted reconstructed image and the labeled image respectively; calculating a perception loss item based on the predicted image characteristic and the marked image characteristic; calculating a total loss based on the reconstruction loss term and the perceptual loss term; and optimizing the parameters of the initial reconstruction network model based on the total loss to obtain the reconstruction network model.

Illustratively, obtaining the predicted image feature and the annotated image feature corresponding to each of the predicted reconstructed image and the annotated image comprises: tone mapping is carried out on the prediction reconstruction image and the annotation image respectively to obtain a new prediction reconstruction image and a new annotation image; and respectively inputting the new prediction reconstruction image and the new annotation image into the pre-training network model to obtain the corresponding prediction image characteristics and annotation image characteristics.

According to another aspect of the present invention, there is provided an electronic device comprising a memory, a processor and a computer program stored on the memory, wherein the processor executes the computer program to implement the image reconstruction method described above.

According to another aspect of the present invention, there is provided a computer readable storage medium having stored thereon a computer program/instructions, characterized in that the computer program/instructions, when executed by a processor, implement the above-mentioned image reconstruction method.

According to another aspect of the invention, there is provided a computer program product comprising computer program/instructions, characterized in that the computer program/instructions, when executed by a processor, implement the above-described image reconstruction method.

According to the image reconstruction method, the electronic device, the computer-readable storage medium, and the computer program product of the embodiments of the present invention, an HDR image is reconstructed by performing deformable alignment on a plurality of gamma correction images in one-to-one correspondence with a plurality of LDR images and fusing the obtained alignment features. Gamma correction can bring the exposure level of the original LDR images closer to an approximately uniform level, so alignment using gamma corrected images instead of the original LDR images is more effective. In addition, alignment of feature planes other than image planes may be achieved using deformable alignment, which may better cope with ghosting due to foreground motion or camera shake. Therefore, the method can effectively solve the ghost problem in HDR image reconstruction.

Drawings

The above and other objects, features and advantages of the present invention will become more apparent by describing in more detail embodiments of the present invention with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings, like reference numbers generally represent like parts or steps.

FIG. 1 shows a schematic block diagram of an example electronic device for implementing an image reconstruction method and apparatus in accordance with embodiments of the present invention;

FIG. 2 shows a schematic flow diagram of an image reconstruction method according to an embodiment of the invention;

FIG. 3 illustrates a network structure and process flow for reconstructing a network model according to one embodiment of the present invention;

FIG. 4 illustrates a network structure of a deformable alignment module according to one embodiment of the invention;

FIG. 5 shows a schematic block diagram of an image reconstruction apparatus according to an embodiment of the present invention; and

FIG. 6 shows a schematic block diagram of an electronic device according to one embodiment of the invention.

Detailed Description

In recent years, technical research based on artificial intelligence, such as computer vision, deep learning, machine learning, image processing, and image recognition, has been actively developed. Artificial Intelligence (AI) is an emerging scientific technology for studying and developing theories, methods, techniques and application systems for simulating and extending human Intelligence. The artificial intelligence subject is a comprehensive subject and relates to various technical categories such as chips, big data, cloud computing, internet of things, distributed storage, deep learning, machine learning and neural networks. Computer vision is used as an important branch of artificial intelligence, particularly a machine is used for identifying the world, and the computer vision technology generally comprises the technologies of face identification, image reconstruction, fingerprint identification and anti-counterfeiting verification, biological feature identification, face detection, pedestrian detection, target detection, pedestrian identification, image processing, image identification, image semantic understanding, image retrieval, character identification, video processing, video content identification, behavior identification, three-dimensional reconstruction, virtual reality, augmented reality, synchronous positioning and map construction (SLAM), computational photography, robot navigation and positioning and the like. With the research and progress of artificial intelligence technology, the technology is applied to various fields, such as security, city management, traffic management, building management, park management, face passage, face attendance, logistics management, warehouse management, robots, intelligent marketing, computational photography, mobile phone images, cloud services, smart homes, wearable equipment, unmanned driving, automatic driving, smart medical treatment, face payment, face unlocking, fingerprint unlocking, testimony verification, smart screens, smart televisions, cameras, mobile internet, live webcasts, beauty treatment, medical beauty treatment, intelligent temperature measurement and the like.

In order to make the objects, technical solutions and advantages of the present invention more apparent, exemplary embodiments according to the present invention will be described in detail below with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a subset of embodiments of the invention and not all embodiments of the invention, with the understanding that the invention is not limited to the example embodiments described herein. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the invention described herein without inventive step, shall fall within the scope of protection of the invention.

To at least partially solve the technical problem, embodiments of the present invention provide an image reconstruction method, an electronic device, a computer-readable storage medium, and a computer program product. According to the image reconstruction method of the embodiment of the present invention, an HDR image is reconstructed by performing deformable alignment on a plurality of gamma correction images in one-to-one correspondence with a plurality of LDR images and fusing the obtained alignment features. Gamma correction can bring the exposure level of the original LDR images closer to an approximately uniform level, so alignment using gamma corrected images instead of the original LDR images is more effective. In addition, alignment of feature planes other than image planes may be achieved using deformable alignment, which may better cope with ghosting due to foreground motion or camera shake. Therefore, the method can effectively solve the ghost problem in HDR image reconstruction. The image reconstruction technique according to the embodiment of the present invention can be applied to any field where HDR images need to be generated.

First, an exemplary electronic device 100 for implementing an image reconstruction method and apparatus according to an embodiment of the present invention is described with reference to fig. 1.

As shown in fig. 1, electronic device 100 includes one or more processors 102, one or more memory devices 104. Optionally, the electronic device 100 may also include an input device 106, an output device 108, and an image capture device 110, which are interconnected via a bus system 112 and/or other form of connection mechanism (not shown). It should be noted that the components and structure of the electronic device 100 shown in fig. 1 are exemplary only, and not limiting, and the electronic device may have other components and structures as desired.

The processor 102 may be implemented in at least one hardware form of a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a microprocessor, the processor 102 may be one or a combination of several of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), an Application Specific Integrated Circuit (ASIC), or other forms of processing units having data processing capability and/or instruction execution capability, and may control other components in the electronic device 100 to perform desired functions.

The storage 104 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. On which one or more computer program instructions may be stored that may be executed by processor 102 to implement client-side functionality (implemented by the processor) and/or other desired functionality in embodiments of the invention described below. Various applications and various data, such as various data used and/or generated by the applications, may also be stored in the computer-readable storage medium.

The input device 106 may be a device used by a user to input instructions and may include one or more of a keyboard, a mouse, a microphone, a touch screen, and the like.

The output device 108 may output various information (e.g., images and/or sounds) to an external (e.g., user), and may include one or more of a display, a speaker, etc. Alternatively, the input device 106 and the output device 108 may be integrated together, implemented using the same interactive device (e.g., a touch screen).

The image capture device 110 may capture images and store the captured images in the storage device 104 for use by other components. The image capture device 110 may be a separate camera or a camera in a mobile terminal, etc. It should be understood that the image capture device 110 is merely an example, and the electronic device 100 may not include the image capture device 110. In this case, other devices having image capturing capabilities may be used to capture an image and transmit the captured image to the electronic device 100.

Exemplary electronic devices for implementing the image reconstruction method and apparatus according to embodiments of the present invention may be implemented on devices such as personal computers or remote servers, for example.

Next, an image reconstruction method according to an embodiment of the present invention will be described with reference to fig. 2. Fig. 2 shows a schematic flow diagram of an image reconstruction method 200 according to an embodiment of the invention. As shown in fig. 2, the image reconstruction method 200 includes steps S210, S220, and S230.

In step S210, a plurality of gamma correction images in one-to-one correspondence with a plurality of images to be processed, which are LDR images, are acquired, and the plurality of gamma correction images are obtained by gamma correcting the plurality of images to be processed.

By way of example and not limitation, acquiring a plurality of gamma-corrected images in one-to-one correspondence with a plurality of images to be processed (step S210) may include: acquiring a plurality of images to be processed; the gamma correction is performed on the plurality of images to be processed, respectively, to obtain a plurality of gamma-corrected images.

In one example, the gamma correction operation for a plurality of images to be processed may be performed by the electronic device 100 for implementing the image reconstruction method and apparatus according to the embodiment of the present invention. In this case, the electronic device 100 may first obtain a plurality of images to be processed, perform gamma correction on the plurality of images to be processed, and then perform the subsequent step S220 based on the gamma-corrected images. In another example, the gamma correction operation for a plurality of images to be processed may be performed by an external device other than the electronic device 100 for implementing the image reconstruction method and apparatus according to the embodiment of the present invention. In this case, the electronic apparatus 100 may acquire a plurality of gamma corrected images corresponding to a plurality of images to be processed one by one directly from the external apparatus.

The image to be processed may be an original LDR image acquired by an image acquisition device, or may be an LDR image obtained by performing a certain pre-processing on the original LDR image. Illustratively, the preprocessing may include smoothing, filtering, normalizing, and the like.

The number of LDR images and corresponding gamma corrected images involved in step S210 may be greater than or equal to 2, i.e. at least two gamma corrected images corresponding to two LDR images are acquired. On this basis, the number of LDR images can be set to any suitable value as desired. For example, but not by way of limitation, the LDR images involved in step S210 may have different exposure values from each other. For example, 3 gamma corrected images corresponding to 3 LDR images may be acquired at step S210. In these 3 LDR images, 1 exposure value may be 0 (i.e. 0EV), and 1 of the other two exposure values is greater than 0EV and 1 is less than 0 EV. Of course, the number of LDR images and the exposure value may be set to any other suitable value as desired.

Illustratively, the above-described gamma correction operation may be implemented based on the following gamma formula:

in the above formula (1), I_iRepresenting the ith image to be processed,

representing the gamma-corrected image corresponding to the ith image to be processed, t_iThe exposure time of the ith image to be processed is shown, gamma is shown as gamma correction parameter, i is shown as the serial number of the image, and i is 1,2, … …, n is the total number of the images to be processed. Wherein, the gamma correction parameter gamma is preset.

Through gamma correction, the exposure degrees of a plurality of LDR images can be unified, namely, the LDR images are pulled to approximately consistent levels, so that the alignment of foreground motion and background jitter is facilitated, errors caused by inconsistent exposure during alignment can be reduced, and the alignment result is more accurate.

In step S220, the plurality of gamma-corrected images are input to a deformable alignment module in the reconstruction network model to obtain alignment features corresponding to the plurality of images to be processed one to one.

Fig. 3 illustrates a network structure of a reconstructed network model and a process flow thereof according to an embodiment of the present invention. The reconstructed network model shown in fig. 3 is only an example and is not a limitation of the present invention, and other suitable implementations of the reconstructed network model proposed by the present invention are possible. For example, fig. 3 shows a spatial attention module (located in the attention branch M2), but this spatial attention module may be omitted, i.e. the reconstructed network model may comprise only the deformable alignment module (located in the alignment branch M1) and the fusion module (located in the fusion channel M3).

As shown in FIG. 3, a plurality of gamma corrected images may be input into a deformable alignment module for alignment. Alignment at the feature level may be achieved by a deformable alignment module.

In some image reconstruction methods, alignment of image layers is attempted by using optical flow or homography matrix (homography), and these methods use the assumption of brightness consistency as a precondition, but the assumption is usually inconsistent with the actual situation, so that the alignment effect of these methods is inaccurate, and the ghost problem is difficult to solve. According to the image reconstruction method provided by the embodiment of the invention, alignment is carried out on the characteristic layer, and ghost images caused by foreground motion or camera shake can be better dealt with without depending on the assumption of brightness consistency. Therefore, the deformable alignment processing is carried out on the LDR image after the gamma correction, and the ghost problem in a multi-frame HDR reconstruction task of a dynamic scene can be effectively solved. In addition, the image reconstruction method using optical flow or homography (homography) for explicit alignment usually has very large time overhead, and the alignment mode of the feature level according to the embodiment of the present invention facilitates the end-to-end training of the alignment module embedded in the network model, and the time overhead is smaller.

In step S230, the alignment features are input into a fusion module in the reconstruction network model to obtain a reconstructed image, which is an HDR image.

In one example, the reconstructed network model may contain only the alignment branch M1 and not the attention branch M2, in which case the fusion module may directly fuse the alignment features output by the deformable alignment module. In another example, the reconstructed network model may include the alignment branch M1 and the attention branch M2, and then the features output by the alignment branch M1 and the attention branch M2 may be merged (concat) in the merging module.

Through the fusion module, the alignment features of a plurality of LDR images can be fused together, and then a fused HDR image is obtained.

According to the image reconstruction method of the embodiment of the present invention, an HDR image is reconstructed by performing deformable alignment on a plurality of gamma correction images in one-to-one correspondence with a plurality of LDR images and fusing the obtained alignment features. The method can realize the alignment of the feature level and can effectively solve the problem of ghost image in the reconstruction of the HDR image. According to the image reconstruction method provided by the embodiment of the invention, end-to-end image reconstruction can be realized by utilizing a reconstruction network model.

Illustratively, the image reconstruction method according to embodiments of the present invention may be implemented in a device, apparatus, or system having a memory and a processor.

The image reconstruction method according to the embodiment of the invention can be deployed at an image acquisition end, for example, at a personal terminal or a server end.

Alternatively, the image reconstruction method according to the embodiment of the present invention may also be distributively deployed at a server side (or a cloud side) and a personal terminal side. For example, an LDR image may be acquired at a client, and the client transmits the acquired LDR image to a server (or a cloud), so that the server (or the cloud) reconstructs the image.

According to an embodiment of the present invention, the reconstructing network model may further include a spatial attention module, and the method 100 may further include: acquiring a plurality of images to be processed; inputting a plurality of images to be processed into a spatial attention module to obtain attention features corresponding to the plurality of images to be processed one by one; inputting the alignment features into a fusion module in the reconstruction network model to obtain a reconstructed image comprises: the alignment features are input to a fusion module along with the attention features to obtain a reconstructed image.

With continued reference to fig. 3. As shown in fig. 3, the reconstructed network model includes two branches, namely an upper alignment branch M1 and a lower attention branch M2.

The upper alignment branch M1 is used to implement step S220. The lower attention branch M2 is used for spatial attention calculation based on the original LDR image.

According to the embodiment of the invention, a Spatial Attention Module (Spatial Attention Module) can be used for processing a plurality of original LDR images to obtain the Attention feature of each image. By way of example and not limitation, a spatial attention module may include two layers of 3x3 sized convolutional layers and one sigmoid activation layer.

Illustratively, the fusion module may include a first stitching layer, a first extended Residual Block (DRDB), a second DRDB, a third DRDB, a first convolutional layer and a second convolutional layer sequentially connected in sequence, wherein an output of the first DRDB is also jump-connected to an input of the first convolutional layer, an output of the second DRDB is also jump-connected to an input of the first convolutional layer, an output of the first stitching layer is also jump-connected to an input of the second convolutional layer, inputting the alignment feature together with the attention feature into the fusion module to obtain the reconstructed image includes: the alignment features are input into the first stitching layer along with the attention features to obtain a reconstructed image output by the second convolution layer.

The embodiment of merging the input subsequent network structures after splicing the alignment feature and the attention feature in the first splice layer is only an example and not a limitation of the present invention. Besides splicing, the alignment feature and the attention feature can be combined together in other ways and then input into a subsequent network structure for fusion. Of course, the alignment and attention features may also be fused together in other suitable fusion manners.

As shown in fig. 3, the fusion module may include three DRDBs and two convolutional layers in addition to the first splice layer. As can be seen from fig. 3, the first splice layer, the three DRDBs, and the two convolutional layers are sequentially connected in sequence. For convenience of description, the three DRDBs are respectively referred to as a first DRDB, a second DRDB, and a third DRDB, and the two convolutional layers are respectively referred to as a first convolutional layer and a second convolutional layer, in the order of left to right in fig. 3. As can be seen from fig. 3, in addition to the sequential connection relationship, there are some skip connection (shortcut) relationships between the DRDB and the convolutional layer. For example, the output of the first DRDB and the output of the second DRDB are both jumpshot connected to the input of the first convolutional layer. Illustratively, at the input of the first convolutional layer, the characteristics of the first DRDB output, the characteristics of the second DRDB output, and the characteristics of the third DRDB output may be spliced together before being input into the first convolutional layer. Further, the first splice characteristic is input simultaneously through a skip connection to the input of the second convolutional layer in addition to the first DRDB. For example, a first stitching feature may be summed element-by-element with a feature output by a first convolutional layer, and then the summed feature is input to a second convolutional layer for processing.

As shown in fig. 3, after the alignment feature and the attention feature are input into the fusion module, the splice may be performed in the splice layer (first splice layer) first. It is noted that the terms "first," "second," and the like herein are used primarily for distinguishing purposes and do not denote a sequential or other particular meaning.

For example, when the alignment feature and the attention feature are spliced, the images can be spliced in the order of the corresponding images, and the alignment feature and the attention feature corresponding to the same image can be arranged together. For example, assuming a total of 3 LDR images, in the stitching feature, the alignment feature and the attention feature may be arranged in the following order: an alignment feature for image 1, an attention feature for image 1, an alignment feature for image 2, an attention feature for image 2, an alignment feature for image 3, an attention feature for image 3. Of course, the above feature arrangements are merely examples and are not intended to limit the present invention, and other suitable feature arrangements may be used to splice the alignment features and the attention features.

For a plurality of originally inputted LDR images, since the images are exposed to different degrees from each other, it is advantageous to detect an overexposed saturated region and an underexposed region therefrom. The feature extraction of the regions is helpful for taking the detail information of the overexposed and underexposed regions into consideration when the regions are fused subsequently, so that the details of the overexposed and underexposed regions can be better recovered when the HDR image is reconstructed, and a more precise HDR image can be obtained.

The existing deep learning network for image reconstruction task directly splices multi-frame LDR images and corresponding gamma correction images and then sends the spliced images into the same network for processing, but does not perform different processing according to different characteristics of the LDR images and the gamma correction images.

According to an embodiment of the present invention, a novel dual branch network architecture is provided. One branch of the double-branch network structure carries out spatial attention processing on the original LDR image, and is convenient to extract abundant detail information of an overexposure saturated region and an underexposure region from images with different exposure degrees. Research shows that compared with an image reconstruction method without using double branches, the method provided by the embodiment of the invention can recover more accurate edges of an overexposure saturated region and can recover more abundant high-frequency texture details in the saturated region. The other branch of the double-branch network structure carries out alignment processing on the gamma-corrected image, so that the image with uniform exposure degree is conveniently utilized to carry out alignment on the features, and the problem of inaccurate alignment caused by inconsistent exposure degree and ghost image formation is solved. In summary, through the dual-branch network structure, the original LDR image and the corrected image after gamma correction can be respectively processed correspondingly, so as to synchronously solve the ghost problem and the overexposure and underexposure detail recovery problem in the multi-frame HDR reconstruction task of the dynamic scene.

According to an embodiment of the present invention, inputting the plurality of gamma-corrected images into a deformable alignment module in the reconstruction network model to obtain alignment features corresponding to the plurality of to-be-processed images one by one (step S220) may include:

in the deformable alignment module, the following operations are performed: respectively extracting the characteristics of the gamma correction images to obtain image characteristics which are in one-to-one correspondence with the images to be processed; for any non-reference image, splicing image features corresponding to a reference image and image features corresponding to the non-reference image to obtain spliced features, wherein the reference image is one of a plurality of images to be processed, and the non-reference image is an image except the reference image in the plurality of images to be processed; convolving the spliced features to calculate the offset between the reference image and the non-reference image; performing deformable convolution on the image features corresponding to the non-reference image based on the offset to obtain alignment features corresponding to the non-reference image; wherein the alignment feature corresponding to the reference image is an image feature corresponding to the reference image.

Alternatively, a certain image may be selected in advance from a plurality of images to be processed as a reference image, and the remaining images may be non-reference images. Each non-reference image is aligned with a reference image at the feature level.

In one example, the reconstructed network model may include deformable alignment modules in one-to-one correspondence with all the non-reference images, each deformable alignment module for aligning the corresponding non-reference image with a reference image. Therefore, the alignment processing can be synchronously carried out on a plurality of non-reference images, and the image reconstruction efficiency is high. In another example, the reconstructed network model may include only a single deformable alignment module, and any non-reference image and the reference image may be sequentially input to the deformable alignment module for alignment, and the alignment process of the previous image is completed before the alignment process of the current image is performed. The network structure is simple in this way, and the parameter quantity is convenient to reduce.

By way of example and not limitation, the deformable alignment module includes a feature extraction layer, a second stitching layer, a convolution layer and a deformable convolution layer, and inputting the plurality of gamma corrected images into the deformable alignment module in the reconstruction network model to obtain alignment features corresponding to the plurality of to-be-processed images one to one (step S220) may include: respectively inputting the gamma correction images into a feature extraction layer for feature extraction so as to obtain image features corresponding to the images to be processed one by one; for any non-reference image, inputting image features corresponding to the reference image and image features corresponding to the non-reference image into a second splicing layer for splicing to obtain second splicing features; inputting the second splicing feature into a convolution layer for convolution so as to calculate the offset between the reference image and the non-reference image; inputting the image characteristic corresponding to the non-reference image and the offset into a deformable convolution layer for deformable convolution so as to obtain the alignment characteristic corresponding to the non-reference image.

Figure 4 illustrates a network structure of a deformable alignment module according to one embodiment of the invention. Referring to fig. 4, a deformable alignment module is shown including a deformable convolution layer. In practice, the deformable alignment module may also include a stitching layer (second stitching layer), a feature extraction layer, and a convolutional layer (not shown).

First, feature extraction may be performed on a plurality of LDR images, respectively, to obtain respective image features of the LDR images.

The reference image can be extracted to obtain the reference feature fr, and the non-reference image can be extracted to obtain the non-reference feature fr

For arbitrary non-reference features

And a reference feature fr, which can be spliced and sent to the convolutional layer, and an offset map is output, and then the offset is used to match the non-reference feature fr

Performing deformable convolution and outputting alignment characteristics fi_i。

For the reference image, the alignment feature corresponding to the reference image is the image feature corresponding to the reference image, that is, the image feature of the reference image can be directly determined as the alignment feature without processing.

According to an embodiment of the present invention, the reference image may be an image of which exposure value is closest to 0 among the plurality of images to be processed.

Preferably, the LDR image is selected to include at least one 0EV image. Alternatively, the 0EV image may be directly set as a reference image, which is characterized as a reference feature. When the LDR image does not contain a 0EV image, an image closest to 0EV can still be selected as a reference image.

For example, assume that the number of images to be processed is three. In one example, the exposure values of the three images to be processed are +2EV, 0EV and-1 EV, respectively, and then the image of 0EV may be selected as the reference image. In another example, exposure values of three to-be-processed images are +3EV, +2EV, +1EV, respectively, and an image of +1EV may be selected as a reference image. In yet another example, the exposure values of the three images to be processed are-1 EV, -2EV, -3EV, respectively, and then the image of-1 EV may be selected as the reference image. In yet another example, the exposure values of the three images to be processed are +3EV, +2EV, -1EV, respectively, and then the image of-1 EV may be selected as the reference image.

The image of 0EV is an image of standard exposure, which contains relatively balanced image information, and the brightness of the HDR image after reconstruction is generally relatively close to the image of 0EV, so that it is beneficial to reconstruct and obtain a relatively accurate HDR image by using the image closest to 0EV as a reference image.

According to an embodiment of the present invention, the reference image may be one of the images having the exposure value at the center among the plurality of images to be processed.

In the case where the number of images to be processed is an odd number, the image whose exposure value is the most middle may be directly selected from all the images to be processed as the reference image. For example, assume that the number of images to be processed is three. In one example, the exposure values of the three images to be processed are +2EV, 0EV and-1 EV, respectively, and then the image of 0EV may be selected as the reference image. In another example, the exposure values of the three images to be processed are +3EV, +2EV, +1EV, respectively, and then the image of +2EV may be selected as the reference image. In yet another example, the exposure values of the three images to be processed are-1 EV, -2EV, -3EV, respectively, and then the image of-2 EV may be selected as the reference image. In yet another example, the exposure values of the three images to be processed are +3EV, +2EV, -1EV, respectively, and then the image of +2EV may be selected as the reference image.

In the case where the number of images to be processed is an even number, there are two images at the middlemost. For these two images, one of them can be further selected as a reference image. The selection of one of the two images may be achieved in any suitable manner of selection. For example, one of the two images may be randomly selected or an image having an exposure value closest to 0 may be selected as the reference image. For example, assume that the number of images to be processed is four. In one example, the exposure values of the four images to be processed are +3EV,

+2EV, -1EV and-3 EV, the image of +2EV and-1 EV is in the middle, and the image closest to 0EV (i.e. -1 EV) can be selected as the reference image.

According to an embodiment of the present invention, the method 100 may further include: acquiring a training image, wherein the training image comprises a plurality of sample correction images and annotation images, the sample correction images correspond to the sample images in a one-to-one mode, the annotation images correspond to the sample images, the sample images are LDR images, the annotation images are HDR images, and the sample correction images are obtained by carrying out gamma correction on the sample images; processing the plurality of sample corrected images using the initial reconstructed network model to obtain a predicted reconstructed image; calculating a reconstruction loss term based on the predicted reconstructed image and the labeled image; acquiring predicted image characteristics and annotated image characteristics corresponding to the predicted reconstructed image and the annotated image respectively; calculating a perception loss item based on the predicted image characteristic and the marked image characteristic; calculating a total loss based on the reconstruction loss term and the perceptual loss term; and optimizing the parameters of the initial reconstruction network model based on the total loss to obtain the reconstruction network model.

The initial reconstructed network model is a reconstructed network model with initialization parameters as parameters. The manner of processing the plurality of sample corrected images by using the initial reconstruction network model is the same as the manner of processing the plurality of gamma corrected images by using the reconstruction network model, and those skilled in the art can understand the use manner of the initial reconstruction network model and the obtaining manner of the predicted reconstruction image in the training process, which are not described herein again.

Those skilled in the art will appreciate that the training process of reconstructing the network model may include: inputting a group of sample correction images into the initial reconstruction network model to obtain corresponding predicted reconstruction images, substituting the predicted reconstruction images and the marked images into a loss function to calculate a loss item, and optimizing parameters of the initial reconstruction network model through a reverse gradient propagation algorithm until the value of the loss item meets the requirement. And after the current optimization is finished, inputting the next group of sample correction images into the optimized reconstruction network model, repeating the optimization process until the maximum iteration times or the model convergence is reached, and finishing the training process. The process of optimizing and reconstructing parameters of the network model based on the loss term can be implemented in a conventional optimization manner, and is not described herein any further. The manner in which the loss terms involved in the training of the reconstructed network model are computed is mainly described herein.

Unlike existing loss function computation methods that use pixel-level loss only, the loss functions used by embodiments of the present invention may include reconstruction loss terms and perceptual loss terms, as follows:

in the formula (2), l represents the total loss,

and

respectively representing a reconstruction loss term and a perception loss term, and lambda represents a preset scaling factor. The preset scaling factor λ may be any suitable value, such as 0.01.

The reconstruction loss term is used to calculate the error between the predicted reconstructed image and the annotated image (true label). By way of example and not limitation, the error may be, for example, a minimum absolute value error (l)₂Error), namely:

wherein, I^GTRepresenting an annotated image, I^HWhich represents a predicted reconstructed image, is,

() Represents the μ -law function. The expression for the μ -law function is as follows:

the μ -law function represents tone mapping (tone mapping) of the input image (denoted by x in equation (4)), i.e. transforming the original input image onto another luminance domain.For better display, the reconstructed HDR image usually goes through

The transformation is then displayed on the display.

The perceptual loss term may be used to calculate an error on the image feature corresponding to each of the predicted reconstructed image and the annotated image. For example, and not by way of limitation, the error may also be, for example,/₁And (4) error. The perceptual loss term is used to constrain the quality of the reconstructed HDR image from a feature level. Due to the addition of the perceptual loss term, in the process of optimizing and reconstructing the network model based on the loss term, errors on image characteristics corresponding to the predicted reconstructed image and the annotated image can be also taken into optimization factors, which is helpful for improving the quality of the HDR image output by the reconstructed network model.

The perceptual loss term may be defined as:

wherein phi_i() And (3) representing output characteristics of an i-th layer of the m layers participating in training for the reconstructed network model in the pre-trained network model, wherein i is 1,2, … … and m.

The m layers of the pre-trained network model that participate in the training for the reconstructed network model may include any m network layers of the pre-trained network model other than the input layer and the output layer, that is, any m hidden layers of the pre-trained network model. m is an integer greater than or equal to 1. The size of m may be set as desired, but the present invention is not limited thereto. Preferably, the m layers of the pre-trained network model that participate in the training for the reconstructed network model are the m layers of the pre-trained network model that are closest to the output layer (i.e., the last m hidden layers).

As mentioned above, the reconstructed HDR image usually goes through for better display

The transformation is then displayed on the display. Therefore, when the features are extracted by utilizing the pre-training network model, the predicted reconstructed image and the labeled image can be respectively carried out

And extracting the features after transformation instead of directly extracting the features of the predicted reconstructed image and the annotated image. Based on passing through

The perception loss term is calculated according to the characteristics of the transformed image, and then an HDR image obtained by reconstructing the reconstructed network model obtained by training based on the loss term can have a good display effect.

By way of example and not limitation, the pre-trained network model may be the VGG16 model.

In training, the training image may first be cropped to a size of 256x 256. Illustratively, the reconstructed network model may be trained using an Adam optimizer. The parameters of the Adam optimizer may be set as follows: beta is a₁＝0.9,β₂0.999, 10 (-8), learning rate 1e-4, the training code can be implemented using PyTorch framework.

The reconstruction network model using deformable alignment provided by the embodiment of the invention can effectively remove ghost in image reconstruction and reconstruct HDR image with higher quality.

According to another aspect of the present invention, there is provided an image reconstruction apparatus. Fig. 5 shows a schematic block diagram of an image reconstruction apparatus 500 according to an embodiment of the present invention.

As shown in fig. 5, the image reconstruction apparatus 500 according to an embodiment of the present invention includes an acquisition module 510, a first input module 520, and a second input module 530. The respective modules may perform the respective steps/functions of the image reconstruction method described above in connection with fig. 2, respectively. Only the main functions of the respective components of the image reconstruction apparatus 500 will be described below, and the details that have been described above will be omitted.

The obtaining module 510 is configured to obtain a plurality of gamma correction images corresponding to a plurality of images to be processed one by one, where the images to be processed are low dynamic range images, and the plurality of gamma correction images are obtained by performing gamma correction on the plurality of images to be processed. The obtaining module 510 may be implemented by the processor 102 in the electronic device shown in fig. 1 executing program instructions stored in the storage 104.

The first input module 520 is used for inputting the plurality of gamma corrected images into a deformable alignment module in the reconstruction network model to obtain alignment features corresponding to the plurality of images to be processed one by one. The first input module 520 may be implemented by the processor 102 in the electronic device shown in fig. 1 executing program instructions stored in the storage 104.

The second input module 530 is used to input the alignment features into a fusion module in the reconstruction network model to obtain a reconstructed image, which is a high dynamic range image. The second input module 530 may be implemented by the processor 102 in the electronic device shown in fig. 1 executing program instructions stored in the storage 104.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

FIG. 6 shows a schematic block diagram of an electronic device 600 according to one embodiment of the invention. The electronic device 600 includes a storage (i.e., memory) 610, a processor 620, and computer programs stored on the memory.

The storage device 610 stores computer program instructions for implementing respective steps in the image reconstruction method according to an embodiment of the present invention.

The processor 620 is configured to execute computer program instructions stored in the storage device 610 to perform the corresponding steps of the image reconstruction method according to the embodiment of the present invention.

In one embodiment, the computer program instructions, when executed by the processor 620, are for performing the steps of: acquiring a plurality of gamma correction images which correspond to a plurality of images to be processed one by one, wherein the images to be processed are low dynamic range images, and the plurality of gamma correction images are obtained by performing gamma correction on the plurality of images to be processed; inputting the gamma correction images into a deformable alignment module in a reconstruction network model to obtain alignment features corresponding to the images to be processed one by one; the alignment features are input into a fusion module in the reconstruction network model to obtain a reconstructed image, which is a high dynamic range image.

Furthermore, according to an embodiment of the present invention, there is also provided a computer-readable storage medium on which a computer program/instructions are stored, which, when being executed by a computer or a processor, are used for executing the respective steps of the image reconstruction method according to the embodiment of the present invention and for implementing the respective modules in the image reconstruction apparatus according to the embodiment of the present invention. The storage medium may include, for example, a memory card of a smart phone, a storage component of a tablet computer, a hard disk of a personal computer, a Read Only Memory (ROM), an Erasable Programmable Read Only Memory (EPROM), a portable compact disc read only memory (CD-ROM), a USB memory, or any combination of the above storage media.

In one embodiment, the computer program/instructions when executed are for performing the steps of: acquiring a plurality of gamma correction images which correspond to a plurality of images to be processed one by one, wherein the images to be processed are low dynamic range images, and the plurality of gamma correction images are obtained by performing gamma correction on the plurality of images to be processed; inputting the gamma correction images into a deformable alignment module in a reconstruction network model to obtain alignment features corresponding to the images to be processed one by one; the alignment features are input into a fusion module in the reconstruction network model to obtain a reconstructed image, which is a high dynamic range image.

Furthermore, according to an embodiment of the present invention, there is also provided a computer program product including a computer program/instructions which, when executed by a processor, implement the above-described image reconstruction method.

In one embodiment, the computer program/instructions, when executed by a processor, are for performing the steps of: acquiring a plurality of gamma correction images which correspond to a plurality of images to be processed one by one, wherein the images to be processed are low dynamic range images, and the plurality of gamma correction images are obtained by performing gamma correction on the plurality of images to be processed; inputting the gamma correction images into a deformable alignment module in a reconstruction network model to obtain alignment features corresponding to the images to be processed one by one; the alignment features are input into a fusion module in the reconstruction network model to obtain a reconstructed image, which is a high dynamic range image.

The modules in the electronic device according to embodiments of the present invention may be implemented by a processor of the electronic device implementing image reconstruction according to embodiments of the present invention running computer program instructions stored in the memory, or may be implemented when computer instructions stored in a computer-readable storage medium of a computer program product according to embodiments of the present invention are run by a computer.

Although the illustrative embodiments have been described herein with reference to the accompanying drawings, it is to be understood that the foregoing illustrative embodiments are merely exemplary and are not intended to limit the scope of the invention thereto. Various changes and modifications may be effected therein by one of ordinary skill in the pertinent art without departing from the scope or spirit of the present invention. All such changes and modifications are intended to be included within the scope of the present invention as set forth in the appended claims.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, a division of a unit is only one type of division of a logical function, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another device, or some features may be omitted, or not executed.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the invention and aiding in the understanding of one or more of the various inventive aspects. However, the method of the present invention should not be construed to reflect the intent: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

It will be understood by those skilled in the art that all of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where such features are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Moreover, those of skill in the art will appreciate that while some embodiments herein include some features included in other embodiments, not others, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. It will be appreciated by those skilled in the art that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some of the blocks in an image reconstruction apparatus according to an embodiment of the present invention. The present invention may also be embodied as apparatus programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

The above description is only for the specific embodiment of the present invention or the description thereof, and the protection scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and the changes or substitutions should be covered within the protection scope of the present invention. The protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. An image reconstruction method, comprising:

acquiring a plurality of gamma correction images in one-to-one correspondence with a plurality of images to be processed, wherein the images to be processed are low dynamic range images, and the plurality of gamma correction images are obtained by performing gamma correction on the plurality of images to be processed;

inputting the plurality of gamma corrected images into a deformable alignment module in the reconstruction network model to obtain alignment features corresponding to the plurality of images to be processed one by one;

inputting the alignment features into a fusion module in the reconstruction network model to obtain a reconstructed image, the reconstructed image being a high dynamic range image.

2. The method of claim 1, wherein the reconstructed network model further comprises a spatial attention module,

the method further comprises the following steps:

acquiring the plurality of images to be processed;

inputting the plurality of images to be processed into the spatial attention module to obtain attention features corresponding to the plurality of images to be processed one by one;

the inputting the alignment features into a fusion module in the reconstructed network model to obtain a reconstructed image comprises:

inputting the alignment feature and the attention feature into the fusion module to obtain the reconstructed image.

3. The method of claim 1 or 2, wherein said inputting the plurality of gamma corrected images into a deformable alignment module in a reconstruction network model to obtain alignment features in one-to-one correspondence with the plurality of images to be processed comprises:

in the deformable alignment module, the following operations are performed:

respectively extracting the characteristics of the gamma correction images to obtain image characteristics which are in one-to-one correspondence with the images to be processed;

for any of the non-reference pictures,

stitching image features corresponding to a reference image and image features corresponding to the non-reference image to obtain stitched features, wherein the reference image is one of the images to be processed, and the non-reference image is an image of the images to be processed except for the reference image;

convolving the stitching features to calculate the offset between the reference image and the non-reference image;

performing deformable convolution on the image feature corresponding to the non-reference image based on the offset to obtain an alignment feature corresponding to the non-reference image;

wherein the alignment feature corresponding to the reference image is an image feature corresponding to the reference image.

4. The method of claim 3, wherein the reference image is an image of the plurality of images to be processed having an exposure value closest to 0.

5. The method of claim 3, wherein the reference image is one of the plurality of images to be processed having the exposure value at the center.

6. The method of claim 1 or 2, wherein the method further comprises:

acquiring training images, wherein the training images comprise a plurality of sample correction images corresponding to a plurality of sample images one to one and annotation images corresponding to the plurality of sample images, the sample images are low dynamic range images, the annotation images are high dynamic range images, and the plurality of sample correction images are obtained by performing gamma correction on the plurality of sample images;

processing the plurality of sample corrected images using an initial reconstructed network model to obtain a predicted reconstructed image;

calculating a reconstruction loss term based on the predicted reconstructed image and the annotated image;

obtaining the prediction image characteristics and the annotation image characteristics corresponding to the prediction reconstruction image and the annotation image respectively;

calculating a perception loss term based on the predicted image features and the annotated image features;

calculating a total loss based on the reconstruction loss term and the perceptual loss term;

and optimizing the parameters of the initial reconstruction network model based on the total loss to obtain the reconstruction network model.

7. The method of claim 6, wherein said obtaining predictive image features and annotation image features corresponding to each of said predictively reconstructed image and said annotation image comprises:

carrying out tone mapping on the predicted reconstructed image and the annotated image respectively to obtain a new predicted reconstructed image and a new annotated image;

and inputting the new prediction reconstruction image and the new annotation image into a pre-training network model respectively to obtain the prediction image characteristics and the annotation image characteristics which correspond to each other.

8. An electronic device comprising a memory, a processor and a computer program stored on the memory, wherein the processor executes the computer program to implement the image reconstruction method according to any one of claims 1 to 7.

9. A computer-readable storage medium, on which a computer program/instructions are stored, which, when being executed by a processor, carry out an image reconstruction method according to any one of claims 1 to 7.

10. A computer program product comprising computer programs/instructions, characterized in that the computer programs/instructions, when executed by a processor, implement the image reconstruction method according to any of claims 1-7.