CN113409186A

CN113409186A - Single picture re-polishing method, system, terminal and storage medium based on priori knowledge

Info

Publication number: CN113409186A
Application number: CN202110731039.0A
Authority: CN
Inventors: 张启煊; 张龙文; 虞晶怡
Original assignee: ShanghaiTech University
Current assignee: ShanghaiTech University
Priority date: 2021-06-30
Filing date: 2021-06-30
Publication date: 2021-09-17

Abstract

The application provides a single picture redressing method, a single picture redressing system, a single picture redressing terminal and a single picture storage medium based on prior knowledge, wherein the method comprises the following steps: giving an object picture for redressing light and a panoramic picture serving as ambient light; inputting the object picture and the panoramic picture into a convolutional neural network based on prior knowledge; extracting high-dimensional characteristics of an object in an object picture and the ambient light through an encoder of the convolutional neural network; and replacing the extracted ambient light with given ambient light, inputting the picture as the ambient light and the extracted high-dimensional features into a decoder in the convolutional neural network, and arranging jump connection in the encoder and the decoder so as to obtain the refinished picture. The method has no prepositive requirement, and all optical fibers do not need to be shot in a dome light field in advance; the rendering is fast, and the convolutional neural network can reach real time under the existing hardware condition; compared with a plurality of GB required by a dome light field for re-lighting a single object, the space occupied by the invention is only tens of MB.

Description

Single picture re-polishing method, system, terminal and storage medium based on priori knowledge

Technical Field

The present application relates to the field of single-picture reprinting technology, and in particular, to a single-picture reprinting method, system, terminal, and storage medium based on priori knowledge.

Background

Redrighting techniques (lighting) generally refer to modifying the shadow of a particular object to be the result of being influenced by a specified ambient light (environmental map). The existing refinishing technology is mainly a dome Light Field (Light Field Stage), the traditional refinishing technology needs to shoot all Light rays in the equipment in advance, the problems of more time consumption, more data, slow rendering and the like exist, and the performance of the existing mobile terminal is poor.

Disclosure of Invention

In view of the above-mentioned shortcomings of the prior art, the present application aims to provide a method for solving the technical problems of multiple pre-positions, slow rendering, multiple data and the like of the conventional redraw technology in the prior art.

To achieve the above and other related objects, a first aspect of the present application provides a single-picture re-lighting method based on a priori knowledge, including: giving an object picture for redressing light and a panoramic picture serving as ambient light; inputting the object picture and the panoramic picture into a convolutional neural network based on prior knowledge; extracting high-dimensional characteristics of an object in an object picture and the ambient light through an encoder of the convolutional neural network; and replacing the extracted ambient light with given ambient light, inputting the picture as the ambient light and the extracted high-dimensional features into a decoder in the convolutional neural network, and arranging jump connection in the encoder and the decoder so as to obtain the refinished picture.

In some embodiments of the first aspect of the present application, the training process of the a priori knowledge based convolutional neural network comprises: acquiring a plurality of images of a shooting object under different illumination conditions based on a dome light field; and training a convolutional neural network by using the plurality of images as a training set.

In some embodiments of the first aspect of the present application, the method further includes training the convolutional neural network using a multi-scale progressive training mode, which includes training with an image input convolutional neural network of 64 × 64 size, and after convergence, each time doubling the size of the output image and inputting the doubled image into the convolutional neural network for training until the image size reaches 512 × 512.

In some embodiments of the first aspect of the present application, the extracting the high-dimensional feature of the object in the image of the object and the ambient light includes: the encoder encodes an input image of 512 x 512 size into features of 512 layers of 16 x 16 size through several layers of convolutional layers, active layers and pooling layers; of these, 3 layers among the features of 512 layers 16 × 16 size are set as the ambient light in which the input image is located.

In some embodiments of the first aspect of the present application, the obtaining of the redright picture includes: the decoder in the pre-trained convolutional neural network decodes 512 layers of 16 × 16 sized features and corresponding skip connections into 512 × 512 output images through several layers of upsampling layers, convolutional layers, and activation layers.

In some embodiments of the first aspect of the present application, the convolutional neural network further sets a loss function; the loss function comprises any one or more of the following combinations: logarithmic 1 norm distance between ambient light output from an input image through an encoder and real ambient light; 1 norm distance and multi-scale structural similarity between an output image directly output by a decoder without replacing ambient light and an input image; the 1 norm distance and the multi-scale structural similarity between an output image output by a decoder and a real image after replacing ambient light; a generation type countermeasure network loss function of an output image and a real image output by a decoder after replacing the ambient light; cross entropy loss of the output segmentation and the real segmentation; and (3) replacing ambient light by two adjacent frames with video continuity, and then carrying out 1 norm distance and multi-scale structural similarity on two frames of output images output by a decoder after corresponding optical flow transformation.

To achieve the above and other related objects, a second aspect of the present application provides a prior-knowledge-based single-picture redrawing system, comprising: the picture input module is used for giving an object picture for redressing light and a panoramic picture serving as ambient light; inputting the object picture and the panoramic picture into a convolutional neural network based on prior knowledge; the characteristic extraction module is used for extracting the high-dimensional characteristics of the object in the object picture and the ambient light through the encoder of the convolutional neural network; and the refinishing module is used for replacing the extracted ambient light with given ambient light, inputting the picture as the ambient light and the extracted high-dimensional characteristics into a decoder in the convolutional neural network, and arranging jump connection in the encoder and the decoder so as to obtain the refinishing picture.

To achieve the above and other related objects, a third aspect of the present application provides a computer-readable storage medium having a computer program stored thereon, where the computer program is executed by a processor to implement the single-picture redressing method based on a priori knowledge.

To achieve the above and other related objects, a fourth aspect of the present application provides an electronic terminal comprising: a processor and a memory; the memory is used for storing a computer program, and the processor is used for executing the computer program stored by the memory so as to enable the terminal to execute the single-picture redressing method based on the prior knowledge.

As described above, the method, system, terminal and storage medium for single picture rephotography based on prior knowledge according to the present application have the following beneficial effects:

1) the method has the advantages that the method has no prepositive requirement, has simple steps, can be completed by only one picture of a given object, and does not need to shoot all optical fibers in a dome light field in advance;

2) the rendering speed is high, and the convolutional neural network can achieve real time under the existing hardware condition;

3) the occupied data is less, and compared with a plurality of GB required by a dome light field for re-lighting a single object, the space occupied by the invention is only dozens of MB.

Drawings

Fig. 1 is a schematic flowchart of a single-picture re-lighting method based on a priori knowledge in an embodiment of the present application.

Fig. 2 is a schematic structural diagram of a single-picture redressing system based on a priori knowledge in an embodiment of the present application.

Fig. 3 is a schematic structural diagram of an electronic terminal according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application is provided by way of specific examples, and other advantages and effects of the present application will be readily apparent to those skilled in the art from the disclosure herein. The present application is capable of other and different embodiments and its several details are capable of modifications and/or changes in various respects, all without departing from the spirit of the present application. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.

It is noted that in the following description, reference is made to the accompanying drawings which illustrate several embodiments of the present application. It is to be understood that other embodiments may be utilized and that mechanical, structural, electrical, and operational changes may be made without departing from the spirit and scope of the present application. The following detailed description is not to be taken in a limiting sense, and the scope of embodiments of the present application is defined only by the claims of the issued patent. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. Spatially relative terms, such as "upper," "lower," "left," "right," "lower," "below," "lower," "above," "upper," and the like, may be used herein to facilitate describing one element or feature's relationship to another element or feature as illustrated in the figures.

In this application, unless expressly stated or limited otherwise, the terms "mounted," "connected," "secured," "retained," and the like are to be construed broadly and can, for example, be fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meaning of the above terms in the present application can be understood by those of ordinary skill in the art as appropriate.

Also, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context indicates otherwise. The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," and/or "including" specify the presence of stated features, operations, elements, components, items, species, and/or groups, but do not preclude the presence, or addition of one or more other features, operations, elements, components, items, species, and/or groups thereof. It should be further understood that the terms "or" and/or "as used herein are to be interpreted as being inclusive or meaning any one or any combination. Thus, "A, B or C" or "A, B and/or C" means "any of the following: a; b; c; a and B; a and C; b and C; A. b and C ". An exception to this definition will occur only when a combination of elements, functions or operations are inherently mutually exclusive in some way.

The method utilizes the dome light field to complete the pre-training of the convolutional neural network, and gives prior knowledge of the convolutional neural network on the light shadow of a certain object. The method can quickly refinish other objects of the same type, and effectively solves the problems that the traditional refinish technology needs to shoot all light rays in the equipment in advance, and the time consumption is high, the data is high, the rendering is slow and the like. In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions in the embodiments of the present invention are further described in detail by the following embodiments in conjunction with the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Fig. 1 shows a schematic flow chart of a single-picture redressing method based on a priori knowledge in an embodiment of the present invention. It should be noted that the single-picture re-polishing method in the present invention can be applied to computer devices, such as desktop computers, notebook computers, tablet computers, smart phones, smart bracelets, smart watches, smart helmets, and the like; the method can also be applied to servers which can be arranged on one or more entity servers according to various factors such as functions, loads and the like, and can also be formed by distributed or centralized server clusters.

In the present embodiment, the single-picture re-lighting method based on a priori knowledge mainly includes steps S11-S14, and each step will be further explained below.

Step S11: a picture of an object for redrawing and a panoramic picture as ambient light are given. It should be noted that the ambient light refers to various light used for illumination in daily life, such as artificial light or interior scene of drama creation appearing in life, and various light can be created by using the ambient light, such as yin, sunny, day, night, dusk, dawn, and the like; common ambient light such as sunlight, light generated by light sources such as lamps, and light reflected by building glass curtain walls are also one type of ambient light.

Step S12: and inputting the object picture and the panoramic picture into a convolutional neural network based on prior knowledge.

The convolutional neural network is a feedforward neural network which comprises convolutional calculation and has a deep structure, is constructed by imitating a visual perception mechanism of a living being, and can perform supervised learning and unsupervised learning. In general, the structure of a convolutional neural network includes an input layer, a hidden layer, and an output layer.

The input layer may process multidimensional data, and it is common that the input layer of a one-dimensional convolutional neural network receives a one-dimensional or two-dimensional array (the one-dimensional array is usually time or frequency spectrum sampling, and the two-dimensional array may include a plurality of channels), the input layer of a two-dimensional convolutional neural network receives two-dimensional or three-dimensional data, and the input layer of a three-dimensional convolutional neural network receives a four-dimensional array.

The hidden layer comprises a convolution layer, a pooling layer and a full-connection layer, and the 3 types of common structures are as follows: the convolution layer has the function of extracting the characteristics of input data, the convolution layer internally comprises a plurality of convolution kernels, each element forming the convolution kernels corresponds to a weight coefficient and a deviation amount, and the convolution kernels are similar to neurons of a feedforward neural network; excitation functions (e.g., ReLU functions) are included in the convolutional layer to help express complex features. The function of the pooling layer is that after feature extraction is carried out on the convolutional layer, the output feature graph of the convolutional layer is transmitted to the pooling layer for feature selection and information filtering; the pooling layer contains a pre-set pooling function whose function is to replace the result of a single point in the feature map with the feature map statistics of its neighboring regions. The fully connected layer is positioned at the last part of the hidden layer of the convolutional neural network, only signals are transmitted to other fully connected layers, and the fully connected layer has the function of carrying out nonlinear combination on the extracted features to obtain output so as to complete a learning target by utilizing the existing high-order features.

Upstream of the output layer is typically a fully-connected layer at the end of the hidden layer, which outputs the classification labels using a logistic function or a normalized exponential function (softmax function); in the object recognition problem, the output layer may be designed to output the center coordinates, size, and classification of the object; in the image semantic segmentation, the output layer directly outputs the classification result of each pixel.

The training of the convolutional neural network is mainly 2 stages as follows: the first stage is a stage of data propagation from a low level to a high level, namely a forward propagation stage; the second stage is a stage of training for propagating the error from the high level to the low level when the result of the current propagation does not match the expectation, i.e. a back propagation stage. The specific training steps include: 1. initializing a weight value by the network; 2. the input data is transmitted forwards through a convolution layer, a down-sampling layer and a full-connection layer to obtain an output value; 3. calculating the error between the output value of the network and the target value; 4. when the error is larger than the expected value, the error is transmitted back to the network, and the errors of the full connection layer, the down sampling layer and the convolution layer are sequentially obtained; the error of each layer can be understood as the total error for the network; 5. if the error is larger than the expected value, returning to the step 2 for continuous training; 6. if the total error is equal to or less than the desired value, the training is ended.

In this embodiment, the training process of the convolutional neural network based on a priori knowledge includes: and acquiring a plurality of images of the shooting object based on the dome light field, and training the convolutional neural network by using the plurality of images as a training set. The dome light field can be a spherical light shooting system comprising about 150 256-level controllable LED light sources, each LED light source is provided with 6 high-brightness LED lamp beads, the lamp beads are divided into two groups which are independently controlled, and each group is respectively provided with different polarizing films; the light sources are uniformly distributed on an aluminum alloy frame sphere with the diameter of about 3 meters, all lights change brightness at the frequency of 1000Hz at most, millisecond synchronization is achieved between the lights and one high-speed camera and thirty industrial cameras, the high-speed camera shoots once when the brightness of all the lights changes once, and in this way, the performances of the shot object under different illumination conditions can be collected, and the performance data can be used as training data for training a convolutional neural network.

Preferably, in the training process of the convolutional neural network based on the priori knowledge, the convolutional neural network is trained by using data shot by a dome light field on a certain object in advance, so that the convolutional neural network obtains the light shadow priori knowledge on the certain object.

Preferably, in the training process of the convolutional neural network based on the a priori knowledge, a multi-scale progressive training mode is used, for example, an image with a size of 64 × 64 of the input and output of the network is trained first, and after convergence, the size of the output image is doubled each time until the size reaches 512 × 512.

Step S13: and extracting the high-dimensional characteristics of the object in the object picture and the ambient light by using the encoder of the convolutional neural network.

For example, a pre-trained convolutional neural network includes an encoder and a decoder, the encoder encodes 512 x 512 sized input images into 512 layers of 16 x 16 sized features through several convolutional layers, activation layers, and pooling layers. Wherein, the 3 layers among the 512 layers of 16 × 16 features are artificially defined as the ambient light where the input image is located, i.e. the 16 × 16 ambient light expansion map, so as to complete the subsequent replacement of the ambient light.

It should be noted that, skip connection (skip connection) is generally used in a residual network, and its role is to solve the problems of gradient explosion and gradient disappearance in the training process in a deeper network; in a general neural network, I +1 layers are needed to reach I +2 layers, and a [ I ] is directly passed out to I +2 layers in a residual block, so that a [ l +2] ═ g (z [ l +2] + a [ l ]), that is, a [ l +2] active units are associated with a of I layer in addition to z of I +2 layer, which is a jump connection.

Step S14: and replacing the extracted ambient light with given ambient light, inputting the picture as the ambient light and the extracted high-dimensional features into a decoder in the convolutional neural network, and arranging jump connection in the encoder and the decoder so as to obtain the refinished picture.

The decoder in the pre-trained convolutional neural network decodes 512 layers of 16 × 16 sized features and corresponding skip connections into 512 × 512 output images through several layers of upsampling layers, convolutional layers, and activation layers. Wherein, the 3 layers among the 512 layers of 16 × 16 features are artificially defined as the ambient light where the input image is located, i.e. the 16 × 16 ambient light expansion map, so as to complete the subsequent replacement of the ambient light. In the task of human face redright, besides 3 layers of 512 × 512 representing colors, the output image also requires the separation of 2 layers of network output on human face skin and hair, so that the network can recognize the difference between the two layers and prevent shaking.

In some examples, the convolutional neural network also sets a loss function, which is used to evaluate how different the predicted value and the true value of the model are, the better the loss function and generally the better the performance of the model. In the convolutional neural network of the present embodiment, the loss function can be constructed by the following steps: logarithmic 1 norm distance between ambient light output from an input image through an encoder and real ambient light; 1 norm distance and multi-scale structural similarity between an output image directly output by a decoder without replacing ambient light and an input image; the 1 norm distance and the multi-scale structural similarity between an output image output by a decoder and a real image after replacing ambient light; a generation type countermeasure network loss function of an output image and a real image output by a decoder after replacing the ambient light; cross entropy loss of the output segmentation and the real segmentation; and (3) replacing ambient light by two adjacent frames with video continuity, and then carrying out 1 norm distance and multi-scale structural similarity on two frames of output images output by a decoder after corresponding optical flow transformation.

Fig. 2 shows a schematic structural diagram of a single-picture redressing system based on a priori knowledge in an embodiment of the present invention. The single-picture redrawing system 200 of the present embodiment includes a picture input module 201, a feature extraction module 202, and a redrawing module 203.

The picture input module 201 is used for giving an object picture for redressing light and a panoramic picture as ambient light; and inputting the object picture and the panoramic picture into a convolutional neural network based on prior knowledge.

The convolutional neural network is a feedforward neural network which comprises convolutional calculation and has a deep structure, is constructed by imitating a visual perception mechanism of a living being, and can perform supervised learning and unsupervised learning. Generally, the structure of the convolutional neural network includes an input layer, a hidden layer and an output layer, and the structure of the convolutional neural network is described in detail in the foregoing, and is not described herein again.

The feature extraction module 202 is configured to extract, through an encoder of the convolutional neural network, a high-dimensional feature of an object in an object picture and the ambient light.

The refinishing module 203 is configured to replace the extracted ambient light with given ambient light, input the picture as the ambient light and the extracted high-dimensional features into a decoder in the convolutional neural network, and arrange a jump connection in the encoder and the decoder, thereby obtaining a refinishing picture.

A decoder in a trained convolutional neural network decodes 512 layers of 16 × 16 sized features and corresponding skip connects into 512 × 512 output images through several layers of upsampling, convolutional, and activation layers. Wherein, the 3 layers among the 512 layers of 16 × 16 features are artificially defined as the ambient light where the input image is located, i.e. the 16 × 16 ambient light expansion map, so as to complete the subsequent replacement of the ambient light. In the task of human face redright, besides 3 layers of 512 × 512 representing colors, the output image also requires the separation of 2 layers of network output on human face skin and hair, so that the network can recognize the difference between the two layers and prevent shaking.

In some examples, the system 200 further includes a loss function module 204, and the loss function module 204 is configured to evaluate the degree to which the predicted value and the actual value of the model are different, with the better the loss function and the better the performance of the model. In the convolutional neural network of the present embodiment, the loss function can be constructed by the following steps: logarithmic 1 norm distance between ambient light output from an input image through an encoder and real ambient light; 1 norm distance and multi-scale structural similarity between an output image directly output by a decoder without replacing ambient light and an input image; the 1 norm distance and the multi-scale structural similarity between an output image output by a decoder and a real image after replacing ambient light; a generation type countermeasure network loss function of an output image and a real image output by a decoder after replacing the ambient light; cross entropy loss of the output segmentation and the real segmentation; and (3) replacing ambient light by two adjacent frames with video continuity, and then carrying out 1 norm distance and multi-scale structural similarity on two frames of output images output by a decoder after corresponding optical flow transformation.

It should be understood that the division of the modules of the above apparatus is only a logical division, and the actual implementation may be wholly or partially integrated into one physical entity, or may be physically separated. And these modules can be realized in the form of software called by processing element; or may be implemented entirely in hardware; and part of the modules can be realized in the form of calling software by the processing element, and part of the modules can be realized in the form of hardware. For example, the x module may be a processing element that is set up separately, or may be implemented by being integrated in a chip of the apparatus, or may be stored in a memory of the apparatus in the form of program code, and the function of the x module may be called and executed by a processing element of the apparatus. Other modules are implemented similarly. In addition, all or part of the modules can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software.

For example, the above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. For another example, when one of the above modules is implemented in the form of a Processing element scheduler code, the Processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor capable of calling program code. For another example, these modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).

Fig. 3 is a schematic structural diagram of an electronic terminal according to an embodiment of the present invention. This example provides an electronic terminal, includes: a processor 31, a memory 32, a communicator 33; the memory 32 is connected to the processor 31 and the communicator 33 through a system bus and completes mutual communication, the memory 32 is used for storing computer programs, the communicator 33 is used for communicating with other devices, and the processor 31 is used for running the computer programs, so that the electronic terminal executes the steps of the single-picture redrawing method based on the prior knowledge.

The above-mentioned system bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The system bus may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus. The communication interface is used for realizing communication between the database access device and other equipment (such as a client, a read-write library and a read-only library). The Memory may include a Random Access Memory (RAM), and may further include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory.

The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.

The invention also provides a computer readable storage medium on which a computer program is stored, which, when executed by a processor, implements the prior-knowledge-based single-picture redressing method.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the above method embodiments may be performed by hardware associated with a computer program. The aforementioned computer program may be stored in a computer readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

In the embodiments provided herein, the computer-readable and writable storage medium may include read-only memory, random-access memory, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, a USB flash drive, a removable hard disk, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that computer-readable-writable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but are intended to be non-transitory, tangible storage media. Disk and disc, as used in this application, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers.

In summary, the application provides a single-picture re-polishing method, a single-picture re-polishing system, a single-picture re-polishing terminal and a single-picture re-polishing storage medium based on priori knowledge, 1) the method has no prepositive requirement, is simple in step, can be completed by only one picture of a given object, and does not need to shoot all optical fibers in a dome light field in advance; 2) the rendering speed is high, and the convolutional neural network can achieve real time under the existing hardware condition; 3) the occupied data is less, and compared with a plurality of GB required by a dome light field for re-lighting a single object, the space occupied by the invention is only dozens of MB. Therefore, the application effectively overcomes various defects in the prior art and has high industrial utilization value.

The above embodiments are merely illustrative of the principles and utilities of the present application and are not intended to limit the application. Any person skilled in the art can modify or change the above-described embodiments without departing from the spirit and scope of the present application. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical concepts disclosed in the present application shall be covered by the claims of the present application.

Claims

1. A single picture redressing method based on prior knowledge is characterized by comprising the following steps:

giving an object picture for redressing light and a panoramic picture serving as ambient light;

inputting the object picture and the panoramic picture into a convolutional neural network based on prior knowledge;

extracting high-dimensional characteristics of an object in an object picture and the ambient light through an encoder of the convolutional neural network;

and replacing the extracted ambient light with given ambient light, inputting the picture as the ambient light and the extracted high-dimensional features into a decoder in the convolutional neural network, and arranging jump connection in the encoder and the decoder so as to obtain the refinished picture.

2. The prior-knowledge-based single-picture relighting method according to claim 1, wherein the training process of the prior-knowledge-based convolutional neural network comprises: acquiring a plurality of images of a shooting object under different illumination conditions based on a dome light field; and training a convolutional neural network by using the plurality of images as a training set.

3. The method of claim 2, wherein the method further comprises training the convolutional neural network using a multi-scale progressive training mode, which comprises training the convolutional neural network using an image input convolutional neural network with a size of 64 × 64, and after convergence, each time doubling the size of the output image and then inputting the image input convolutional neural network again until the image size reaches 512 × 512.

4. The a priori knowledge-based single picture redressing method according to claim 1, wherein the process of extracting the high-dimensional features of the object in the object picture and the ambient light comprises: the encoder encodes an input image of 512 x 512 size into features of 512 layers of 16 x 16 size through several layers of convolutional layers, active layers and pooling layers; of these, 3 layers among the features of 512 layers 16 × 16 size are set as the ambient light in which the input image is located.

5. The priori knowledge-based single-picture redressing method according to claim 1, wherein the redressing picture obtaining manner includes: the decoder in the pre-trained convolutional neural network decodes 512 layers of 16 × 16 sized features and corresponding skip connections into 512 × 512 output images through several layers of upsampling layers, convolutional layers, and activation layers.

6. The prior-knowledge-based single-picture restriking method according to claim 1, wherein the convolutional neural network further sets a loss function; the loss function comprises any one or more of the following combinations: logarithmic 1 norm distance between ambient light output from an input image through an encoder and real ambient light; 1 norm distance and multi-scale structural similarity between an output image directly output by a decoder without replacing ambient light and an input image; the 1 norm distance and the multi-scale structural similarity between an output image output by a decoder and a real image after replacing ambient light; a generation type countermeasure network loss function of an output image and a real image output by a decoder after replacing the ambient light; cross entropy loss of the output segmentation and the real segmentation; and (3) replacing ambient light by two adjacent frames with video continuity, and then carrying out 1 norm distance and multi-scale structural similarity on two frames of output images output by a decoder after corresponding optical flow transformation.

7. A prior-knowledge-based single-picture redressing system, comprising:

the picture input module is used for giving an object picture for redressing light and a panoramic picture serving as ambient light; inputting the object picture and the panoramic picture into a convolutional neural network based on prior knowledge;

the characteristic extraction module is used for extracting the high-dimensional characteristics of the object in the object picture and the ambient light through the encoder of the convolutional neural network;

and the refinishing module is used for replacing the extracted ambient light with given ambient light, inputting the picture as the ambient light and the extracted high-dimensional characteristics into a decoder in the convolutional neural network, and arranging jump connection in the encoder and the decoder so as to obtain the refinishing picture.

8. The a priori knowledge based single picture redressing system according to claim 7, wherein said system further comprises a loss function module for setting a loss function for said convolutional neural network; wherein the loss function comprises any one or more of the following: logarithmic 1 norm distance between ambient light output from an input image through an encoder and real ambient light; 1 norm distance and multi-scale structural similarity between an output image directly output by a decoder without replacing ambient light and an input image; the 1 norm distance and the multi-scale structural similarity between an output image output by a decoder and a real image after replacing ambient light; a generation type countermeasure network loss function of an output image and a real image output by a decoder after replacing the ambient light; cross entropy loss of the output segmentation and the real segmentation; and (3) replacing ambient light by two adjacent frames with video continuity, and then carrying out 1 norm distance and multi-scale structural similarity on two frames of output images output by a decoder after corresponding optical flow transformation.

9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the a priori knowledge-based single picture redressing method according to any one of claims 1 to 6.

10. An electronic terminal, comprising: a processor and a memory;

the memory is used for storing a computer program;

the processor is configured to execute the computer program stored in the memory to cause the terminal to execute the single-picture redressing method based on the prior knowledge according to any one of claims 1 to 6.