CN112184586A

CN112184586A - Method and system for rapidly blurring monocular visual image background based on depth perception

Info

Publication number: CN112184586A
Application number: CN202011049747.8A
Authority: CN
Inventors: 冷聪; 李成华; 乔聪玉; 程健
Original assignee: Nanjing Artificial Intelligence Chip Innovation Institute Institute Of Automation Chinese Academy Of Sciences; Zhongke Fangcun Zhiwei Nanjing Technology Co ltd
Current assignee: Nanjing Artificial Intelligence Chip Innovation Institute Institute Of Automation Chinese Academy Of Sciences; Zhongke Fangcun Zhiwei Nanjing Technology Co ltd
Priority date: 2020-09-29
Filing date: 2020-09-29
Publication date: 2021-01-05

Abstract

The invention provides a monocular visual image background rapid blurring method based on depth perception and a system for realizing the method. The invention effectively reduces the problem of poor generalization performance caused by excessively depending on the data set of a certain scene, trains an end-to-end neural output network by utilizing a large number of data sets and achieves the purposes of reducing complexity and improving operation speed.

Description

Method and system for rapidly blurring monocular visual image background based on depth perception

Technical Field

The invention relates to a shot effect rendering method and system based on an confrontation generation network, relates to general image data processing and image reconstruction technology based on machine deep learning, and particularly relates to the field of shot effect processing analysis based on neural network construction.

Background

Background blurring is an important technique in the field of photography, and in general, a photographer blurs an uninteresting portion of an image, that is, blurs a background, in order to highlight a certain region in the image.

In the prior art, the algorithm of background blurring is limited to data containing portrait, or an end-to-end network is trained by using depth estimation and image saliency segmentation as prior knowledge and using an original image together as input data. However, the disadvantages of both approaches are obvious. The data set of the first category of methods does not contain images in general scenes such as natural scenery, resulting in that this category of algorithms generally does not work with images other than portraits. The second method, which only uses depth estimation and image significance as a priori knowledge to input an end-to-end neural network training, has poor interpretability and slow running speed.

Disclosure of Invention

The purpose of the invention is as follows: an object is to provide a method for fast blurring a monocular visual image background based on depth perception, so as to solve the above problems in the prior art. A further object is to propose a system implementing the above method.

The technical scheme is as follows: a method for quickly blurring a monocular visual image background based on depth perception comprises the following steps:

step one, establishing a model training data set;

secondly, constructing a monocular visual image training network for depth perception;

and step three, receiving the picture to be subjected to the shot rendering by the trained neural network model, and outputting the picture after the shot rendering is finished.

In a further embodiment, the first step is further: and (4) establishing a picture data set for model training in the step two, and adopting the data sets to be real scene pictures in order to improve the learning of data contained in the real scene. The pictures in the data set are stored in pairs, namely, the original pictures and the corresponding pictures with the shot rendering effect are respectively stored. The original pictures in the data set are used as input data in the model training process, and the pictures with the shot effect in the data set are used as comparison data for comparing with model output pictures in the model training process.

In a further embodiment, the second step is further:

and constructing a depth-perception monocular visual image training network, and training the network to form a network model for generating the shot effect rendering. The network is trained by firstly receiving input picture data I in a data set established in the first step_org(ii) a Secondly, extracting a characteristic image in a collaborative mode of the designed convolutional layer and an activation function; then, activating the learning weight by using a loss function; and finally, forming presentation of a preset fuzzy degree effect through iteration of the fuzzy function, and realizing output of a final image through the weighted sum of the weight and the iteration result of the fuzzy function. The network learning training is completed through the process, and the output picture I is obtained_bokehIn the network model of (1), wherein I_bokehThe picture is rendered with a shot effect. The model for generating the picture with the shot rendering effect can be specifically constructed as follows:

i.e. the comparison data in the data set created in step one is regarded as a weighted sum of smoothed versions of the input data in the data set. Wherein, I_orgWhich represents the original image or images of the original image,

representing the multiplication of the matrix element by element, B_i(. h) represents the i-th order blur function, W_iA characteristic weight matrix value representing an i-th layer data image, and W_iAnd also satisfy

The employed i-th order fuzzy function B_i(. is a shallow fuzzy neural network

The loop iterates i times to obtain the following specific expression:

in the training process, the loss function l adopts the combination of a reconstruction function and a structural similarity SSIM, the compactness analysis between output data and comparison data is improved, and the purpose of effectively reducing the difference between a model and actual data is achieved through the back propagation of an error value, so that the model is better optimized. Wherein l is specifically:

wherein I_bokehThe representation model generates an image with a shot effect,

an original image representing the image in the data set that actually carries the shot effect,

representing the generated image I_bokehWith the actual image

The structural similarity between the two is as follows:

wherein alpha, beta and gamma are preset constants,

representing the generated image I_bokehWith the actual image

The relationship between the brightness of the light source and the brightness of the light source,

representing the generated image I_bokehWith the actual image

The contrast ratio relationship between the two components,

representing the generated image I_bokehWith the actual image

Structural relationship between them.

In a further embodiment, the third step is further:

firstly, the network model for the shot effect rendering obtained in the step two receives a picture to be subjected to background blurring; secondly, carrying out specification adjustment on a preset multiple value of the received picture to enable the size of the received picture to be in accordance with the size of the picture which can be accepted by a depth network in the model, obtaining an estimated image passing through the depth estimation network, and carrying out up-sampling through deconvolution to further obtain the depth image; inputting the depth map into a convolution layer in a shot effect rendering network model, extracting and calculating the weight of a feature map of the received image by combining an activation function, and taking the obtained numerical value as corresponding data of a fuzzy layer weighted sum; thirdly, a shallow fuzzy network in the shot effect rendering network model performs fuzzification operation on the received picture signal to be subjected to background blurring through a fuzzy function; secondly, performing weighted sum calculation on the obtained weights and the fuzzy function by using a weighted sum calculation mode set in the shot effect rendering network model to obtain a numerical value of the processed image; and finally, outputting the obtained processed image numerical value by the shot effect rendering network model, namely outputting the image with shot effect rendering.

A monocular visual image background rapid blurring system based on depth perception specifically comprises;

a first module for establishing a training set;

a second module for obtaining a network model with a shot effect;

and the third module is used for realizing the shot effect.

In a further embodiment, the first module is further to:

and establishing a picture data set for model training in a second module, wherein the data sets in the second module are all photo sets which appear in pairs, namely, the photo sets which appear in pairs are a plane graph of a monocular visual image and an image which corresponds to the plane graph and has a shot rendering effect. A plan view of a monocular visual image in a data set of the module is used as input data in a model training process, and a picture with a shot effect in the data set is used as comparison data for comparing with a model output picture in the model training process. In order to ensure that the shot effect is effectively improved in different scenes, the scenes described in the picture set and the object scenes contained in the picture set are real scenes and different main bodies.

In a further embodiment, the second module is further to:

the module firstly establishes a depth estimation network for obtaining a depth map, and the network receives a plane map of a monocular visual image in the first module as training data and takes an image with a shot rendering effect as contrast data of the image generated by network learning processing, so that the generated error is utilized for back propagation to obtain parameter optimization of the depth estimation network; then after the training of the depth estimation network is finished, inputting a depth map obtained by the depth estimation network into a convolutional layer with a preset number of layers and an activation function to obtain a characteristic map, further obtaining the weight of the characteristic map, and obtaining a fuzzy image by presetting the iteration times of the fuzzy network; and finally, combining the feature weight with the fuzzy image, and obtaining an image finally containing the shot effect by utilizing weighting and calculation, thereby completing the establishment of the shot effect network model.

The training process of the depth estimation network comprises the steps of firstly resetting the size of a picture serving as input image data in a first module, namely adjusting the size of the picture according to a preset multiple to obtain a picture meeting the network input size; then inputting the processed picture into a depth estimation network to generate a depth estimation image; and finally, restoring the image size of the obtained depth estimation image through the deconvolution layer.

And obtaining the feature map weight by the depth map generated by the depth estimation network through convolution and activation functions with set times, then obtaining the image with the shot effect through iteration of the fuzzy network function of preset times and finally obtaining the image with the shot effect through weighting and summation.

The model for finally generating the picture with the shot rendering effect can be specifically constructed as follows:

i.e. the comparison data in the data set created by the first module is regarded as a weighted sum of smoothed versions of the input data in the data set. Wherein, I_bokehRepresenting the finally generated image, I_orgWhich represents the original image or images of the original image,

representing the multiplication of the matrix element by element, B_i(. h) is the i-th order blur function, W_iA characteristic weight matrix value representing an i-th layer data image,

involving the i-th order blur function B_i(. is a shallow fuzzy neural network

Obtained i iterations, which is expressed as:

in the training process, the loss function l adopts the combination of a reconstruction function and a structural similarity SSIM, so that the compactness analysis between output data and comparison data is improved, the purpose of effectively reducing the difference between a model and actual data is achieved, and the model is optimized better. Wherein l is specifically:

wherein I_bokeThe representation model generates an image with a shot effect,

an original image representing an image with an actual shot effect,

representing the generated image I_bokeWith the actual image

The structural similarity between the two is as follows:

wherein alpha, beta and gamma are preset constants,

representing the generated image I_bokehWith the actual image

representing the generated image I_bokehWith the actual image

The contrast ratio relationship between the two components,

representing the generated image I_boWith the actual image

Structural relationship between them.

In a further embodiment, the third module is further to:

the module firstly receives a picture to be subjected to background blurring by using a network model of the shot effect rendering obtained in the second module, and inputs the picture into a depth estimation network in the model; secondly, performing deconvolution up-sampling on a depth map obtained by a depth estimation network so as to restore the size of input image data, inputting the depth map obtained by the depth network into a convolution layer in a shot effect rendering network model, performing weight calculation aiming at a received image feature map by combining an activation function, and taking an obtained numerical value as corresponding data of a fuzzy network layer weighted sum; thirdly, a shallow fuzzy network in the shot effect rendering network model performs fuzzification operation on the received depth image signal of the to-be-background blurred picture; secondly, performing weighted sum calculation on the obtained weights and the fuzzy function by using a weighted sum calculation mode set in the shot effect rendering network model to obtain a numerical value of the processed image; and finally, outputting the obtained processed image numerical value by the shot effect rendering network model, namely outputting the image with shot effect rendering.

Has the advantages that: the invention provides a monocular visual image background fast blurring method based on depth perception and a system for realizing the method. Compared with other algorithms, the background blurring algorithm technology adopted by the invention has stronger generalization performance and smaller error, can adapt to various scenes, and utilizes a large number of data sets to train an end-to-end neural output network, thereby solving the problem of the operation speed of the background blurring algorithm.

Drawings

FIG. 1 is a schematic flow chart of the method of the present invention.

Fig. 2 is a flowchart of background blurring based on portrait segmentation.

Fig. 3 is a flow chart of background blurring based on a combination of hardware and software.

Fig. 4 is a diagram of a depth estimation network architecture.

FIG. 5 is a flow chart of obtaining weighted composite out-of-focus imaging from a depth network map.

Fig. 6 is a diagram showing the effect of background blurring of the seat.

Fig. 7 is a diagram b of the effect of background blurring generated by various leaves.

Fig. 8 is a diagram c of the effect of background blurring generated by the single method.

Detailed Description

The applicant believes that the single lens reflex camera, due to its large aperture and the integration of numerous sensors, is easy to do by the adjustment of some parameters; the binocular camera can feel certain scene depth during shooting, and can generate the effect of background blurring through a certain depth algorithm. However, in the device with fewer monocular cameras and optical sensors, the problem of achieving fast and good background blurring effect directly from any image is not solved effectively.

In the prior art, a method adopted for shot rendering is to perform centralized processing on an image photo, use depth estimation and image saliency segmentation as prior knowledge and use an original image together as input data to train an end-to-end network. In the extraction of the portrait photo, a semantic segmentation method is generally used to segment the portrait from the image, and then the remaining area is blurred. Some people use an algorithm that combines hardware and software, as shown in fig. 3 below. Firstly, segmenting a person in an image by utilizing a neural network, and generating a foreground mask under the condition of the person image. They then generate a dense depth map using a sensor with two-pixel autofocus hardware and use this depth map along with the foreground mask for depth dependent rendering of the shallow depth image. Still others use existing networks for portrait segmentation and depth estimation of individual images, and perform different levels of blurring on scenes other than the portrait according to depth. As mentioned above, the data set of the first category of methods does not contain images in general scenes such as natural scenery, resulting in that this category of algorithms generally does not work for images other than portraits. The second method is to input the depth estimation and the image significance as the prior knowledge into an end-to-end neural network training, which has poor interpretability and slow running speed.

In order to solve the problems in the prior art, the invention provides a monocular visual image background rapid blurring method based on depth perception and a system for realizing the method.

The present invention will be further described in detail with reference to the following examples and accompanying drawings.

In the present application, a method for fast blurring a monocular visual image background based on depth perception and a system for implementing the method are provided, wherein the method for fast blurring a monocular visual image background based on depth perception includes the following steps:

step one, establishing a model training data set; in the step, a picture data set used for model training in the step two is established, and in order to improve the learning of data contained in a real scene, the data sets are all real scene pictures. The pictures in the data set are stored in pairs, namely, the original pictures and the corresponding pictures with the shot rendering effect are respectively stored. The original pictures in the data set are used as input data in the model training process, and the pictures with the shot effect in the data set are used as comparison data for comparing with model output pictures in the model training process.

Secondly, constructing a monocular visual image training network for depth perception; the step builds a monocular visual image training network for depth perception, trains the network to form a network model for generating the shot effect rendering. The network is trained by firstly receiving input picture data I in a data set established in the first step_org(ii) a Second, by the designed convolutional layerCarrying out feature extraction; then, a presentation of a predetermined blurring effect is formed by iteration of the blurring function, and finally, the learning weights are activated by the loss function. The network learning training is completed through the process, and the output picture I is obtained_bokehIn the network model of (1), wherein I_bokThe picture is rendered with a shot effect. The model for generating the picture with the shot rendering effect can be specifically constructed as follows:

involving the i-th order blur function B_i(. is a shallow fuzzy neural network

Obtained i iterations, which is expressed as:

during training, the loss function l₁By combining the reconstruction function and the structural similarity SSIM, the compactness analysis between the output data and the comparison data is improved, the purpose of effectively reducing the difference between the model and the actual data is achieved, and the model is optimized better. Wherein l₁The method specifically comprises the following steps:

wherein I_bokehThe representation model generates an image with a shot effect,

an original image representing an image with an actual shot effect,

representing the generated image I_bokehWith the actual image

The structural similarity between the two is as follows:

wherein alpha, beta and gamma are preset constants,

representing the generated image I_bokehWith the actual image

representing the generated image I_bokehWith the actual image

The contrast ratio relationship between the two components,

representing the generated image I_bokehWith the actual image

Structural relationship between them.

Step three, receiving the picture to be subjected to the shot rendering by the trained neural network model, and outputting the picture after the shot rendering is finished; firstly, receiving a picture to be subjected to background blurring by the network model with the shot effect rendering obtained in the step two; secondly, extracting and calculating the weight of a feature map of the received image by using a convolution layer and an activation function in the shot effect rendering network model, and using the obtained numerical value as corresponding data of the weighted sum of the fuzzy layers; thirdly, a shallow fuzzy network in the shot effect rendering network model performs fuzzification operation on the received picture signal to be subjected to background blurring; secondly, performing weighted sum calculation on the obtained weights and the fuzzy function by using a weighted sum calculation mode set in the shot effect rendering network model to obtain a numerical value of the processed image; and finally, outputting the obtained processed image numerical value by the shot effect rendering network model, namely outputting the image with shot effect rendering.

Based on the method, a system for implementing the method can be constructed, and specifically comprises the following steps:

a first module for establishing a training set; the module establishes a picture data set used for model training in a second module, and the data sets in the module are all photo sets which appear in pairs, namely, a plan view of a monocular visual image and an image which corresponds to the plan view and has a shot rendering effect appear in pairs. A plan view of a monocular visual image in a data set of the module is used as input data in a model training process, and a picture with a shot effect in the data set is used as comparison data for comparing with a model output picture in the model training process. In order to ensure that the shot effect is effectively improved in different scenes, the scenes described in the picture set and the object scenes contained in the picture set are both real scenes and different subjects.

A second module for obtaining a network model with a shot effect; the module firstly establishes a depth estimation network for obtaining a depth map, and the network receives a plane map of a monocular visual image in the first module as training data and takes an image with a shot rendering effect as contrast data of the image generated by network learning processing, so that the generated error is utilized for back propagation to obtain parameter optimization of the depth estimation network; then after the training of the depth estimation network is finished, inputting a depth map obtained by the depth estimation network into a convolutional layer with a preset number of layers and an activation function to obtain a characteristic map, further obtaining the weight of the characteristic map, and obtaining a fuzzy image by presetting the iteration times of the fuzzy network; and finally, combining the feature weight with the fuzzy image, and obtaining an image finally containing the shot effect by utilizing weighting and calculation, thereby completing the establishment of the shot effect network model.

The training process of the depth estimation network comprises the steps of firstly resetting the size of a picture of which the first module is used as an input image, namely adjusting the size of the picture according to a set multiple to obtain a picture which accords with the network input size; then inputting the processed picture into a depth estimation network to generate a depth estimation image; and finally, restoring the image size of the obtained depth estimation image through the deconvolution layer.

involving the i-th order blur function B_i(. is a shallow fuzzy neural network

Obtained i iterations, which is expressed as:

wherein I_bokehThe representation model generates an image with a shot effect,

an original image representing an image with an actual shot effect,

representing the generated image I_bokWith the actual image

The structural similarity between the two is as follows:

wherein alpha, beta and gamma are preset constants,

representing the generated image I_bokWith the actual image

representing the generated image I_bokehWith the actual image

The contrast ratio relationship between the two components,

representing the generated image I_boWith the actual image

Structural relationship between them.

And the third module is used for realizing the shot effect. The module firstly receives a picture to be subjected to background blurring by using a network model of the shot effect rendering obtained in the second module, and inputs the picture into a depth estimation network in the model; secondly, performing deconvolution up-sampling on a depth map obtained by a depth estimation network so as to restore the size of input image data, inputting the depth map obtained by the depth network into a convolution layer in a shot effect rendering network model, performing weight calculation aiming at a received image feature map by combining an activation function, and taking an obtained numerical value as corresponding data of a fuzzy network layer weighted sum; thirdly, a shallow fuzzy network in the shot effect rendering network model performs fuzzification operation on the received depth image signal of the to-be-background blurred picture; secondly, performing weighted sum calculation on the obtained weights and the fuzzy function by using a weighted sum calculation mode set in the shot effect rendering network model to obtain a numerical value of the processed image; and finally, outputting the obtained processed image numerical value by the shot effect rendering network model, namely outputting the image with shot effect rendering.

The invention can be applied to image blurring processing at the PC end of a computer, automatic blurring of shooting at the mobile end and the like, and realizes blurring of the background of a picture. Fig. 6 to 8 show the results of background blurring of the photos according to the present invention. The left side is an original photo, the right side is a processed photo with a shot effect, and the definition of the main body part is maintained while background blurring is realized according to the effect graph.

As noted above, while the present invention has been shown and described with reference to certain preferred embodiments, it is not to be construed as limited thereto. Various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method for rapidly blurring a background of a monocular visual image based on depth perception is characterized by comprising the following steps:

step one, establishing a model training data set;

2. The method for fast blurring the background of a monocular visual image based on depth perception according to claim 1, wherein the first step is further as follows:

establishing a data set used for model building training in the second step, wherein the data set is a real scene picture, each scene picture has a corresponding photo, and the corresponding photo is a picture with background blurring; the picture with background blurring is used for comparison with the model output picture to obtain a resulting error value.

3. The method for fast blurring the background of a monocular visual image based on depth perception according to claim 1, wherein the second step is further:

constructing a depth-perception monocular visual image training network, and training the network to form a network model for generating a shot effect rendering; the training mode of the network is that firstly, input picture data in a data set established in the first step is received; secondly, extracting features through a designed convolution layer and an activation function in a cooperative mode; then, activating the learning weight by using a loss function; and finally, forming presentation of a preset fuzzy degree effect through iteration of the fuzzy function, realizing output of a final image through the weighted sum of the weight and the iteration result of the fuzzy function, and finishing network learning training according to the process to obtain an output image which is a network model with a shot effect.

4. The method for fast blurring the background of a monocular visual image based on depth perception according to claim 1, wherein the second step is further:

the model for generating the picture with the shot rendering effect can be specifically constructed as follows:

wherein, I_bokehRepresenting the finally obtained image, I_orgWhich represents the original image or images of the original image,

involving the i-th order blur function B_i(. is a shallow fuzzy neural network

Obtained i iterations, which is expressed as:

the loss function l adopts the combination of a reconstruction function and a structural similarity SSIM, and optimizes a model through the back propagation of an error value; wherein l₁The method specifically comprises the following steps:

wherein, I_bokehThe representation model generates an image with a shot effect,

an original image representing an image with an actual shot effect,

representing the generated image I_bokehWith the actual image

The structural similarity between the two is as follows:

wherein alpha, beta and gamma are preset constants,

representing the generated image I_bokehWith the actual image

representing the generated image I_bokehWith the actual image

The contrast ratio relationship between the two components,

representing the generated image I_bokehWith the actual image

Structural relationship between them.

5. The method for fast blurring the background of a monocular visual image based on depth perception according to claim 1, wherein the third step is further:

firstly, the network model for the shot effect rendering obtained in the step two receives a picture to be subjected to background blurring; secondly, extracting and calculating the weight of a feature map of the received image by using a convolution layer and an activation function in the shot effect rendering network model, and using the obtained numerical value as corresponding data of the weighted sum of the fuzzy layers; thirdly, a shallow fuzzy network in the shot effect rendering network model performs fuzzification operation on the received picture signal to be subjected to background blurring; secondly, performing weighted sum calculation on the obtained weights and the fuzzy function by using a weighted sum calculation mode set in the shot effect rendering network model to obtain a numerical value of the processed image; and finally, outputting the obtained processed image numerical value by the shot effect rendering network model, namely outputting the image with shot effect rendering.

6. A system for fast blurring a background of a monocular visual image based on depth perception, which is used for implementing the method according to any one of claims 1-5, and is characterized by comprising the following modules:

a first module for establishing a training set;

a second module for obtaining a network model with a shot effect;

and the third module is used for realizing the shot effect.

7. The system of claim 6, wherein the first module is further configured to create a picture data set for model training in the second module, and the data sets in the module are all photo sets that appear in pairs, that is, a plan view of the monocular visual image and a corresponding image with a panoramic rendering effect appear in pairs; a plan view of a monocular visual image in a data set of the module is used as input data in a model training process, and a picture with a shot effect in the data set is used as comparison data for comparing with a model output picture in the model training process.

8. The system for rapid blurring of monocular visual image background based on depth perception according to claim 6, wherein the second module further establishes a depth estimation network for the module to obtain a depth map first, the network obtains parameter optimization of the depth estimation network by receiving a plane map of the monocular visual image in the first module as training data and using an image with a shot rendering effect as contrast data of the image generated through network learning processing, thereby performing back propagation by using the generated error; then after the training of the depth estimation network is finished, inputting a depth map obtained by the depth estimation network into a convolutional layer with a preset number of layers and an activation function to obtain a characteristic map, further obtaining the weight of the characteristic map, and obtaining a fuzzy image by presetting the iteration times of the fuzzy network; and finally, combining the feature weight with the fuzzy image, and obtaining an image finally containing the shot effect by utilizing weighting and calculation, thereby completing the establishment of the shot effect network model.

9. The system of claim 6, wherein the second module is further configured to train the depth estimation network by resetting the size of the picture in the first module as the input image data, i.e. adjusting the size of the picture according to a preset multiple to obtain a picture meeting a network receiving size; then inputting the processed picture into a depth estimation network to generate a depth estimation image; finally, restoring the image size of the obtained depth estimation image through a deconvolution layer;

obtaining feature map weight by the depth map generated by the depth estimation network through convolution and activation functions with set times, then obtaining an image with a shot effect through weighting and iteration of a predetermined fuzzy network function;

wherein, I_bokehRepresenting the finally generated image, I_orgWhich represents the original image or images of the original image,

involving the i-th order blur function B_i(. is a shallow fuzzy neural network

Obtained i iterations, which is expressed as:

in the training process, the loss function l adopts the combination of a reconstruction function and structural similarity SSIM, so that the compactness analysis between output data and comparison data is improved, the purpose of effectively reducing the difference between a model and actual data is achieved, and the model is optimized better; wherein l is specifically:

wherein I_bokehThe representation model generates an image with a shot effect,

an original image representing an image with an actual shot effect,

representing the generated image I_boWith the actual image

The structural similarity between the two is as follows:

wherein alpha, beta and gamma are preset constants,

representing the generated image I_bokeWith the actual image

representing the generated image I_bokWith the actual image

The contrast ratio relationship between the two components,

representing the generated image I_bokWith the actual image

Structural relationship between them.

10. The system of claim 6, wherein the third module is further configured to receive a picture to be background-blurred by using the network model for pop effect rendering obtained in the second module, adjust the picture by a predetermined multiple, and input the adjusted picture into the depth estimation network in the model; secondly, performing deconvolution up-sampling on a depth map obtained by a depth estimation network so as to restore the size of input image data, inputting the depth map obtained by the depth network into a convolution layer in a shot effect rendering network model, performing weight calculation aiming at a received image feature map by combining an activation function, and taking an obtained numerical value as corresponding data of a fuzzy network layer weighted sum; thirdly, a shallow fuzzy network in the shot effect rendering network model performs fuzzification operation on the received depth image signal of the to-be-background blurred picture; secondly, performing weighted sum calculation on the obtained weights and the fuzzy function by using a weighted sum calculation mode set in the shot effect rendering network model to obtain a numerical value of the processed image; and finally, outputting the obtained processed image numerical value by the shot effect rendering network model, namely outputting the image with shot effect rendering.