CN112184586A - Method and system for rapidly blurring monocular visual image background based on depth perception - Google Patents

Method and system for rapidly blurring monocular visual image background based on depth perception Download PDF

Info

Publication number
CN112184586A
CN112184586A CN202011049747.8A CN202011049747A CN112184586A CN 112184586 A CN112184586 A CN 112184586A CN 202011049747 A CN202011049747 A CN 202011049747A CN 112184586 A CN112184586 A CN 112184586A
Authority
CN
China
Prior art keywords
image
network
picture
model
depth
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011049747.8A
Other languages
Chinese (zh)
Inventor
冷聪
李成华
乔聪玉
程健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Artificial Intelligence Chip Innovation Institute Institute Of Automation Chinese Academy Of Sciences
Zhongke Fangcun Zhiwei Nanjing Technology Co ltd
Original Assignee
Nanjing Artificial Intelligence Chip Innovation Institute Institute Of Automation Chinese Academy Of Sciences
Zhongke Fangcun Zhiwei Nanjing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Artificial Intelligence Chip Innovation Institute Institute Of Automation Chinese Academy Of Sciences, Zhongke Fangcun Zhiwei Nanjing Technology Co ltd filed Critical Nanjing Artificial Intelligence Chip Innovation Institute Institute Of Automation Chinese Academy Of Sciences
Priority to CN202011049747.8A priority Critical patent/CN112184586A/en
Publication of CN112184586A publication Critical patent/CN112184586A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention provides a monocular visual image background rapid blurring method based on depth perception and a system for realizing the method. The invention effectively reduces the problem of poor generalization performance caused by excessively depending on the data set of a certain scene, trains an end-to-end neural output network by utilizing a large number of data sets and achieves the purposes of reducing complexity and improving operation speed.

Description

Method and system for rapidly blurring monocular visual image background based on depth perception
Technical Field
The invention relates to a shot effect rendering method and system based on an confrontation generation network, relates to general image data processing and image reconstruction technology based on machine deep learning, and particularly relates to the field of shot effect processing analysis based on neural network construction.
Background
Background blurring is an important technique in the field of photography, and in general, a photographer blurs an uninteresting portion of an image, that is, blurs a background, in order to highlight a certain region in the image.
In the prior art, the algorithm of background blurring is limited to data containing portrait, or an end-to-end network is trained by using depth estimation and image saliency segmentation as prior knowledge and using an original image together as input data. However, the disadvantages of both approaches are obvious. The data set of the first category of methods does not contain images in general scenes such as natural scenery, resulting in that this category of algorithms generally does not work with images other than portraits. The second method, which only uses depth estimation and image significance as a priori knowledge to input an end-to-end neural network training, has poor interpretability and slow running speed.
Disclosure of Invention
The purpose of the invention is as follows: an object is to provide a method for fast blurring a monocular visual image background based on depth perception, so as to solve the above problems in the prior art. A further object is to propose a system implementing the above method.
The technical scheme is as follows: a method for quickly blurring a monocular visual image background based on depth perception comprises the following steps:
step one, establishing a model training data set;
secondly, constructing a monocular visual image training network for depth perception;
and step three, receiving the picture to be subjected to the shot rendering by the trained neural network model, and outputting the picture after the shot rendering is finished.
In a further embodiment, the first step is further: and (4) establishing a picture data set for model training in the step two, and adopting the data sets to be real scene pictures in order to improve the learning of data contained in the real scene. The pictures in the data set are stored in pairs, namely, the original pictures and the corresponding pictures with the shot rendering effect are respectively stored. The original pictures in the data set are used as input data in the model training process, and the pictures with the shot effect in the data set are used as comparison data for comparing with model output pictures in the model training process.
In a further embodiment, the second step is further:
and constructing a depth-perception monocular visual image training network, and training the network to form a network model for generating the shot effect rendering. The network is trained by firstly receiving input picture data I in a data set established in the first steporg(ii) a Secondly, extracting a characteristic image in a collaborative mode of the designed convolutional layer and an activation function; then, activating the learning weight by using a loss function; and finally, forming presentation of a preset fuzzy degree effect through iteration of the fuzzy function, and realizing output of a final image through the weighted sum of the weight and the iteration result of the fuzzy function. The network learning training is completed through the process, and the output picture I is obtainedbokehIn the network model of (1), wherein IbokehThe picture is rendered with a shot effect. The model for generating the picture with the shot rendering effect can be specifically constructed as follows:
Figure BDA0002709182610000021
i.e. the comparison data in the data set created in step one is regarded as a weighted sum of smoothed versions of the input data in the data set. Wherein, IorgWhich represents the original image or images of the original image,
Figure BDA0002709182610000022
representing the multiplication of the matrix element by element, Bi(. h) represents the i-th order blur function, WiA characteristic weight matrix value representing an i-th layer data image, and WiAnd also satisfy
Figure BDA0002709182610000023
The employed i-th order fuzzy function Bi(. is a shallow fuzzy neural network
Figure BDA0002709182610000024
The loop iterates i times to obtain the following specific expression:
Figure BDA0002709182610000025
in the training process, the loss function l adopts the combination of a reconstruction function and a structural similarity SSIM, the compactness analysis between output data and comparison data is improved, and the purpose of effectively reducing the difference between a model and actual data is achieved through the back propagation of an error value, so that the model is better optimized. Wherein l is specifically:
Figure BDA0002709182610000026
wherein IbokehThe representation model generates an image with a shot effect,
Figure BDA0002709182610000027
an original image representing the image in the data set that actually carries the shot effect,
Figure BDA0002709182610000028
representing the generated image IbokehWith the actual image
Figure BDA0002709182610000029
The structural similarity between the two is as follows:
Figure BDA00027091826100000210
wherein alpha, beta and gamma are preset constants,
Figure BDA00027091826100000211
representing the generated image IbokehWith the actual image
Figure BDA00027091826100000212
The relationship between the brightness of the light source and the brightness of the light source,
Figure BDA00027091826100000213
representing the generated image IbokehWith the actual image
Figure BDA00027091826100000214
The contrast ratio relationship between the two components,
Figure BDA00027091826100000215
representing the generated image IbokehWith the actual image
Figure BDA00027091826100000216
Structural relationship between them.
In a further embodiment, the third step is further:
firstly, the network model for the shot effect rendering obtained in the step two receives a picture to be subjected to background blurring; secondly, carrying out specification adjustment on a preset multiple value of the received picture to enable the size of the received picture to be in accordance with the size of the picture which can be accepted by a depth network in the model, obtaining an estimated image passing through the depth estimation network, and carrying out up-sampling through deconvolution to further obtain the depth image; inputting the depth map into a convolution layer in a shot effect rendering network model, extracting and calculating the weight of a feature map of the received image by combining an activation function, and taking the obtained numerical value as corresponding data of a fuzzy layer weighted sum; thirdly, a shallow fuzzy network in the shot effect rendering network model performs fuzzification operation on the received picture signal to be subjected to background blurring through a fuzzy function; secondly, performing weighted sum calculation on the obtained weights and the fuzzy function by using a weighted sum calculation mode set in the shot effect rendering network model to obtain a numerical value of the processed image; and finally, outputting the obtained processed image numerical value by the shot effect rendering network model, namely outputting the image with shot effect rendering.
A monocular visual image background rapid blurring system based on depth perception specifically comprises;
a first module for establishing a training set;
a second module for obtaining a network model with a shot effect;
and the third module is used for realizing the shot effect.
In a further embodiment, the first module is further to:
and establishing a picture data set for model training in a second module, wherein the data sets in the second module are all photo sets which appear in pairs, namely, the photo sets which appear in pairs are a plane graph of a monocular visual image and an image which corresponds to the plane graph and has a shot rendering effect. A plan view of a monocular visual image in a data set of the module is used as input data in a model training process, and a picture with a shot effect in the data set is used as comparison data for comparing with a model output picture in the model training process. In order to ensure that the shot effect is effectively improved in different scenes, the scenes described in the picture set and the object scenes contained in the picture set are real scenes and different main bodies.
In a further embodiment, the second module is further to:
the module firstly establishes a depth estimation network for obtaining a depth map, and the network receives a plane map of a monocular visual image in the first module as training data and takes an image with a shot rendering effect as contrast data of the image generated by network learning processing, so that the generated error is utilized for back propagation to obtain parameter optimization of the depth estimation network; then after the training of the depth estimation network is finished, inputting a depth map obtained by the depth estimation network into a convolutional layer with a preset number of layers and an activation function to obtain a characteristic map, further obtaining the weight of the characteristic map, and obtaining a fuzzy image by presetting the iteration times of the fuzzy network; and finally, combining the feature weight with the fuzzy image, and obtaining an image finally containing the shot effect by utilizing weighting and calculation, thereby completing the establishment of the shot effect network model.
The training process of the depth estimation network comprises the steps of firstly resetting the size of a picture serving as input image data in a first module, namely adjusting the size of the picture according to a preset multiple to obtain a picture meeting the network input size; then inputting the processed picture into a depth estimation network to generate a depth estimation image; and finally, restoring the image size of the obtained depth estimation image through the deconvolution layer.
And obtaining the feature map weight by the depth map generated by the depth estimation network through convolution and activation functions with set times, then obtaining the image with the shot effect through iteration of the fuzzy network function of preset times and finally obtaining the image with the shot effect through weighting and summation.
The model for finally generating the picture with the shot rendering effect can be specifically constructed as follows:
Figure BDA0002709182610000041
i.e. the comparison data in the data set created by the first module is regarded as a weighted sum of smoothed versions of the input data in the data set. Wherein, IbokehRepresenting the finally generated image, IorgWhich represents the original image or images of the original image,
Figure BDA0002709182610000042
representing the multiplication of the matrix element by element, Bi(. h) is the i-th order blur function, WiA characteristic weight matrix value representing an i-th layer data image,
Figure BDA0002709182610000043
involving the i-th order blur function Bi(. is a shallow fuzzy neural network
Figure BDA0002709182610000044
Obtained i iterations, which is expressed as:
Figure BDA0002709182610000045
in the training process, the loss function l adopts the combination of a reconstruction function and a structural similarity SSIM, so that the compactness analysis between output data and comparison data is improved, the purpose of effectively reducing the difference between a model and actual data is achieved, and the model is optimized better. Wherein l is specifically:
Figure BDA0002709182610000046
wherein IbokeThe representation model generates an image with a shot effect,
Figure BDA0002709182610000047
an original image representing an image with an actual shot effect,
Figure BDA0002709182610000048
representing the generated image IbokeWith the actual image
Figure BDA0002709182610000049
The structural similarity between the two is as follows:
Figure BDA00027091826100000410
wherein alpha, beta and gamma are preset constants,
Figure BDA00027091826100000411
representing the generated image IbokehWith the actual image
Figure BDA00027091826100000412
The relationship between the brightness of the light source and the brightness of the light source,
Figure BDA00027091826100000413
representing the generated image IbokehWith the actual image
Figure BDA00027091826100000414
The contrast ratio relationship between the two components,
Figure BDA00027091826100000415
representing the generated image IboWith the actual image
Figure BDA00027091826100000416
Structural relationship between them.
In a further embodiment, the third module is further to:
the module firstly receives a picture to be subjected to background blurring by using a network model of the shot effect rendering obtained in the second module, and inputs the picture into a depth estimation network in the model; secondly, performing deconvolution up-sampling on a depth map obtained by a depth estimation network so as to restore the size of input image data, inputting the depth map obtained by the depth network into a convolution layer in a shot effect rendering network model, performing weight calculation aiming at a received image feature map by combining an activation function, and taking an obtained numerical value as corresponding data of a fuzzy network layer weighted sum; thirdly, a shallow fuzzy network in the shot effect rendering network model performs fuzzification operation on the received depth image signal of the to-be-background blurred picture; secondly, performing weighted sum calculation on the obtained weights and the fuzzy function by using a weighted sum calculation mode set in the shot effect rendering network model to obtain a numerical value of the processed image; and finally, outputting the obtained processed image numerical value by the shot effect rendering network model, namely outputting the image with shot effect rendering.
Has the advantages that: the invention provides a monocular visual image background fast blurring method based on depth perception and a system for realizing the method. Compared with other algorithms, the background blurring algorithm technology adopted by the invention has stronger generalization performance and smaller error, can adapt to various scenes, and utilizes a large number of data sets to train an end-to-end neural output network, thereby solving the problem of the operation speed of the background blurring algorithm.
Drawings
FIG. 1 is a schematic flow chart of the method of the present invention.
Fig. 2 is a flowchart of background blurring based on portrait segmentation.
Fig. 3 is a flow chart of background blurring based on a combination of hardware and software.
Fig. 4 is a diagram of a depth estimation network architecture.
FIG. 5 is a flow chart of obtaining weighted composite out-of-focus imaging from a depth network map.
Fig. 6 is a diagram showing the effect of background blurring of the seat.
Fig. 7 is a diagram b of the effect of background blurring generated by various leaves.
Fig. 8 is a diagram c of the effect of background blurring generated by the single method.
Detailed Description
The applicant believes that the single lens reflex camera, due to its large aperture and the integration of numerous sensors, is easy to do by the adjustment of some parameters; the binocular camera can feel certain scene depth during shooting, and can generate the effect of background blurring through a certain depth algorithm. However, in the device with fewer monocular cameras and optical sensors, the problem of achieving fast and good background blurring effect directly from any image is not solved effectively.
In the prior art, a method adopted for shot rendering is to perform centralized processing on an image photo, use depth estimation and image saliency segmentation as prior knowledge and use an original image together as input data to train an end-to-end network. In the extraction of the portrait photo, a semantic segmentation method is generally used to segment the portrait from the image, and then the remaining area is blurred. Some people use an algorithm that combines hardware and software, as shown in fig. 3 below. Firstly, segmenting a person in an image by utilizing a neural network, and generating a foreground mask under the condition of the person image. They then generate a dense depth map using a sensor with two-pixel autofocus hardware and use this depth map along with the foreground mask for depth dependent rendering of the shallow depth image. Still others use existing networks for portrait segmentation and depth estimation of individual images, and perform different levels of blurring on scenes other than the portrait according to depth. As mentioned above, the data set of the first category of methods does not contain images in general scenes such as natural scenery, resulting in that this category of algorithms generally does not work for images other than portraits. The second method is to input the depth estimation and the image significance as the prior knowledge into an end-to-end neural network training, which has poor interpretability and slow running speed.
In order to solve the problems in the prior art, the invention provides a monocular visual image background rapid blurring method based on depth perception and a system for realizing the method.
The present invention will be further described in detail with reference to the following examples and accompanying drawings.
In the present application, a method for fast blurring a monocular visual image background based on depth perception and a system for implementing the method are provided, wherein the method for fast blurring a monocular visual image background based on depth perception includes the following steps:
step one, establishing a model training data set; in the step, a picture data set used for model training in the step two is established, and in order to improve the learning of data contained in a real scene, the data sets are all real scene pictures. The pictures in the data set are stored in pairs, namely, the original pictures and the corresponding pictures with the shot rendering effect are respectively stored. The original pictures in the data set are used as input data in the model training process, and the pictures with the shot effect in the data set are used as comparison data for comparing with model output pictures in the model training process.
Secondly, constructing a monocular visual image training network for depth perception; the step builds a monocular visual image training network for depth perception, trains the network to form a network model for generating the shot effect rendering. The network is trained by firstly receiving input picture data I in a data set established in the first steporg(ii) a Second, by the designed convolutional layerCarrying out feature extraction; then, a presentation of a predetermined blurring effect is formed by iteration of the blurring function, and finally, the learning weights are activated by the loss function. The network learning training is completed through the process, and the output picture I is obtainedbokehIn the network model of (1), wherein IbokThe picture is rendered with a shot effect. The model for generating the picture with the shot rendering effect can be specifically constructed as follows:
Figure BDA0002709182610000061
i.e. the comparison data in the data set created in step one is regarded as a weighted sum of smoothed versions of the input data in the data set. Wherein, IorgWhich represents the original image or images of the original image,
Figure BDA0002709182610000062
representing the multiplication of the matrix element by element, Bi(. h) is the i-th order blur function, WiA characteristic weight matrix value representing an i-th layer data image,
Figure BDA0002709182610000063
involving the i-th order blur function Bi(. is a shallow fuzzy neural network
Figure BDA0002709182610000071
Obtained i iterations, which is expressed as:
Figure BDA0002709182610000072
during training, the loss function l1By combining the reconstruction function and the structural similarity SSIM, the compactness analysis between the output data and the comparison data is improved, the purpose of effectively reducing the difference between the model and the actual data is achieved, and the model is optimized better. Wherein l1The method specifically comprises the following steps:
Figure BDA0002709182610000073
wherein IbokehThe representation model generates an image with a shot effect,
Figure BDA0002709182610000074
an original image representing an image with an actual shot effect,
Figure BDA0002709182610000075
representing the generated image IbokehWith the actual image
Figure BDA0002709182610000076
The structural similarity between the two is as follows:
Figure BDA0002709182610000077
wherein alpha, beta and gamma are preset constants,
Figure BDA0002709182610000078
representing the generated image IbokehWith the actual image
Figure BDA0002709182610000079
The relationship between the brightness of the light source and the brightness of the light source,
Figure BDA00027091826100000710
representing the generated image IbokehWith the actual image
Figure BDA00027091826100000711
The contrast ratio relationship between the two components,
Figure BDA00027091826100000712
representing the generated image IbokehWith the actual image
Figure BDA00027091826100000713
Structural relationship between them.
Step three, receiving the picture to be subjected to the shot rendering by the trained neural network model, and outputting the picture after the shot rendering is finished; firstly, receiving a picture to be subjected to background blurring by the network model with the shot effect rendering obtained in the step two; secondly, extracting and calculating the weight of a feature map of the received image by using a convolution layer and an activation function in the shot effect rendering network model, and using the obtained numerical value as corresponding data of the weighted sum of the fuzzy layers; thirdly, a shallow fuzzy network in the shot effect rendering network model performs fuzzification operation on the received picture signal to be subjected to background blurring; secondly, performing weighted sum calculation on the obtained weights and the fuzzy function by using a weighted sum calculation mode set in the shot effect rendering network model to obtain a numerical value of the processed image; and finally, outputting the obtained processed image numerical value by the shot effect rendering network model, namely outputting the image with shot effect rendering.
Based on the method, a system for implementing the method can be constructed, and specifically comprises the following steps:
a first module for establishing a training set; the module establishes a picture data set used for model training in a second module, and the data sets in the module are all photo sets which appear in pairs, namely, a plan view of a monocular visual image and an image which corresponds to the plan view and has a shot rendering effect appear in pairs. A plan view of a monocular visual image in a data set of the module is used as input data in a model training process, and a picture with a shot effect in the data set is used as comparison data for comparing with a model output picture in the model training process. In order to ensure that the shot effect is effectively improved in different scenes, the scenes described in the picture set and the object scenes contained in the picture set are both real scenes and different subjects.
A second module for obtaining a network model with a shot effect; the module firstly establishes a depth estimation network for obtaining a depth map, and the network receives a plane map of a monocular visual image in the first module as training data and takes an image with a shot rendering effect as contrast data of the image generated by network learning processing, so that the generated error is utilized for back propagation to obtain parameter optimization of the depth estimation network; then after the training of the depth estimation network is finished, inputting a depth map obtained by the depth estimation network into a convolutional layer with a preset number of layers and an activation function to obtain a characteristic map, further obtaining the weight of the characteristic map, and obtaining a fuzzy image by presetting the iteration times of the fuzzy network; and finally, combining the feature weight with the fuzzy image, and obtaining an image finally containing the shot effect by utilizing weighting and calculation, thereby completing the establishment of the shot effect network model.
The training process of the depth estimation network comprises the steps of firstly resetting the size of a picture of which the first module is used as an input image, namely adjusting the size of the picture according to a set multiple to obtain a picture which accords with the network input size; then inputting the processed picture into a depth estimation network to generate a depth estimation image; and finally, restoring the image size of the obtained depth estimation image through the deconvolution layer.
And obtaining the feature map weight by the depth map generated by the depth estimation network through convolution and activation functions with set times, then obtaining the image with the shot effect through iteration of the fuzzy network function of preset times and finally obtaining the image with the shot effect through weighting and summation.
The model for finally generating the picture with the shot rendering effect can be specifically constructed as follows:
Figure BDA0002709182610000081
i.e. the comparison data in the data set created by the first module is regarded as a weighted sum of smoothed versions of the input data in the data set. Wherein, IbokehRepresenting the finally generated image, IorgWhich represents the original image or images of the original image,
Figure BDA0002709182610000082
representing the multiplication of the matrix element by element, Bi(. h) is the i-th order blur function, WiA characteristic weight matrix value representing an i-th layer data image,
Figure BDA0002709182610000083
involving the i-th order blur function Bi(. is a shallow fuzzy neural network
Figure BDA0002709182610000084
Obtained i iterations, which is expressed as:
Figure BDA0002709182610000085
in the training process, the loss function l adopts the combination of a reconstruction function and a structural similarity SSIM, so that the compactness analysis between output data and comparison data is improved, the purpose of effectively reducing the difference between a model and actual data is achieved, and the model is optimized better. Wherein l is specifically:
Figure BDA0002709182610000086
wherein IbokehThe representation model generates an image with a shot effect,
Figure BDA0002709182610000087
an original image representing an image with an actual shot effect,
Figure BDA0002709182610000091
representing the generated image IbokWith the actual image
Figure BDA0002709182610000092
The structural similarity between the two is as follows:
Figure BDA0002709182610000093
wherein alpha, beta and gamma are preset constants,
Figure BDA0002709182610000094
representing the generated image IbokWith the actual image
Figure BDA0002709182610000095
The relationship between the brightness of the light source and the brightness of the light source,
Figure BDA0002709182610000096
representing the generated image IbokehWith the actual image
Figure BDA0002709182610000097
The contrast ratio relationship between the two components,
Figure BDA0002709182610000098
representing the generated image IboWith the actual image
Figure BDA0002709182610000099
Structural relationship between them.
And the third module is used for realizing the shot effect. The module firstly receives a picture to be subjected to background blurring by using a network model of the shot effect rendering obtained in the second module, and inputs the picture into a depth estimation network in the model; secondly, performing deconvolution up-sampling on a depth map obtained by a depth estimation network so as to restore the size of input image data, inputting the depth map obtained by the depth network into a convolution layer in a shot effect rendering network model, performing weight calculation aiming at a received image feature map by combining an activation function, and taking an obtained numerical value as corresponding data of a fuzzy network layer weighted sum; thirdly, a shallow fuzzy network in the shot effect rendering network model performs fuzzification operation on the received depth image signal of the to-be-background blurred picture; secondly, performing weighted sum calculation on the obtained weights and the fuzzy function by using a weighted sum calculation mode set in the shot effect rendering network model to obtain a numerical value of the processed image; and finally, outputting the obtained processed image numerical value by the shot effect rendering network model, namely outputting the image with shot effect rendering.
The invention can be applied to image blurring processing at the PC end of a computer, automatic blurring of shooting at the mobile end and the like, and realizes blurring of the background of a picture. Fig. 6 to 8 show the results of background blurring of the photos according to the present invention. The left side is an original photo, the right side is a processed photo with a shot effect, and the definition of the main body part is maintained while background blurring is realized according to the effect graph.
As noted above, while the present invention has been shown and described with reference to certain preferred embodiments, it is not to be construed as limited thereto. Various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A method for rapidly blurring a background of a monocular visual image based on depth perception is characterized by comprising the following steps:
step one, establishing a model training data set;
secondly, constructing a monocular visual image training network for depth perception;
and step three, receiving the picture to be subjected to the shot rendering by the trained neural network model, and outputting the picture after the shot rendering is finished.
2. The method for fast blurring the background of a monocular visual image based on depth perception according to claim 1, wherein the first step is further as follows:
establishing a data set used for model building training in the second step, wherein the data set is a real scene picture, each scene picture has a corresponding photo, and the corresponding photo is a picture with background blurring; the picture with background blurring is used for comparison with the model output picture to obtain a resulting error value.
3. The method for fast blurring the background of a monocular visual image based on depth perception according to claim 1, wherein the second step is further:
constructing a depth-perception monocular visual image training network, and training the network to form a network model for generating a shot effect rendering; the training mode of the network is that firstly, input picture data in a data set established in the first step is received; secondly, extracting features through a designed convolution layer and an activation function in a cooperative mode; then, activating the learning weight by using a loss function; and finally, forming presentation of a preset fuzzy degree effect through iteration of the fuzzy function, realizing output of a final image through the weighted sum of the weight and the iteration result of the fuzzy function, and finishing network learning training according to the process to obtain an output image which is a network model with a shot effect.
4. The method for fast blurring the background of a monocular visual image based on depth perception according to claim 1, wherein the second step is further:
the model for generating the picture with the shot rendering effect can be specifically constructed as follows:
Figure FDA0002709182600000011
wherein, IbokehRepresenting the finally obtained image, IorgWhich represents the original image or images of the original image,
Figure FDA0002709182600000012
representing the multiplication of the matrix element by element, Bi(. h) is the i-th order blur function, WiA characteristic weight matrix value representing an i-th layer data image,
Figure FDA0002709182600000013
involving the i-th order blur function Bi(. is a shallow fuzzy neural network
Figure FDA0002709182600000014
Obtained i iterations, which is expressed as:
Figure FDA0002709182600000015
the loss function l adopts the combination of a reconstruction function and a structural similarity SSIM, and optimizes a model through the back propagation of an error value; wherein l1The method specifically comprises the following steps:
Figure FDA0002709182600000021
wherein, IbokehThe representation model generates an image with a shot effect,
Figure FDA0002709182600000022
an original image representing an image with an actual shot effect,
Figure FDA0002709182600000023
representing the generated image IbokehWith the actual image
Figure FDA0002709182600000024
The structural similarity between the two is as follows:
Figure FDA0002709182600000025
wherein alpha, beta and gamma are preset constants,
Figure FDA0002709182600000026
representing the generated image IbokehWith the actual image
Figure FDA0002709182600000027
The relationship between the brightness of the light source and the brightness of the light source,
Figure FDA0002709182600000028
representing the generated image IbokehWith the actual image
Figure FDA0002709182600000029
The contrast ratio relationship between the two components,
Figure FDA00027091826000000210
representing the generated image IbokehWith the actual image
Figure FDA00027091826000000211
Structural relationship between them.
5. The method for fast blurring the background of a monocular visual image based on depth perception according to claim 1, wherein the third step is further:
firstly, the network model for the shot effect rendering obtained in the step two receives a picture to be subjected to background blurring; secondly, extracting and calculating the weight of a feature map of the received image by using a convolution layer and an activation function in the shot effect rendering network model, and using the obtained numerical value as corresponding data of the weighted sum of the fuzzy layers; thirdly, a shallow fuzzy network in the shot effect rendering network model performs fuzzification operation on the received picture signal to be subjected to background blurring; secondly, performing weighted sum calculation on the obtained weights and the fuzzy function by using a weighted sum calculation mode set in the shot effect rendering network model to obtain a numerical value of the processed image; and finally, outputting the obtained processed image numerical value by the shot effect rendering network model, namely outputting the image with shot effect rendering.
6. A system for fast blurring a background of a monocular visual image based on depth perception, which is used for implementing the method according to any one of claims 1-5, and is characterized by comprising the following modules:
a first module for establishing a training set;
a second module for obtaining a network model with a shot effect;
and the third module is used for realizing the shot effect.
7. The system of claim 6, wherein the first module is further configured to create a picture data set for model training in the second module, and the data sets in the module are all photo sets that appear in pairs, that is, a plan view of the monocular visual image and a corresponding image with a panoramic rendering effect appear in pairs; a plan view of a monocular visual image in a data set of the module is used as input data in a model training process, and a picture with a shot effect in the data set is used as comparison data for comparing with a model output picture in the model training process.
8. The system for rapid blurring of monocular visual image background based on depth perception according to claim 6, wherein the second module further establishes a depth estimation network for the module to obtain a depth map first, the network obtains parameter optimization of the depth estimation network by receiving a plane map of the monocular visual image in the first module as training data and using an image with a shot rendering effect as contrast data of the image generated through network learning processing, thereby performing back propagation by using the generated error; then after the training of the depth estimation network is finished, inputting a depth map obtained by the depth estimation network into a convolutional layer with a preset number of layers and an activation function to obtain a characteristic map, further obtaining the weight of the characteristic map, and obtaining a fuzzy image by presetting the iteration times of the fuzzy network; and finally, combining the feature weight with the fuzzy image, and obtaining an image finally containing the shot effect by utilizing weighting and calculation, thereby completing the establishment of the shot effect network model.
9. The system of claim 6, wherein the second module is further configured to train the depth estimation network by resetting the size of the picture in the first module as the input image data, i.e. adjusting the size of the picture according to a preset multiple to obtain a picture meeting a network receiving size; then inputting the processed picture into a depth estimation network to generate a depth estimation image; finally, restoring the image size of the obtained depth estimation image through a deconvolution layer;
obtaining feature map weight by the depth map generated by the depth estimation network through convolution and activation functions with set times, then obtaining an image with a shot effect through weighting and iteration of a predetermined fuzzy network function;
the model for finally generating the picture with the shot rendering effect can be specifically constructed as follows:
Figure FDA0002709182600000031
wherein, IbokehRepresenting the finally generated image, IorgWhich represents the original image or images of the original image,
Figure FDA0002709182600000032
representing the multiplication of the matrix element by element, Bi(. h) is the i-th order blur function, WiA characteristic weight matrix value representing an i-th layer data image,
Figure FDA0002709182600000033
involving the i-th order blur function Bi(. is a shallow fuzzy neural network
Figure FDA0002709182600000034
Obtained i iterations, which is expressed as:
Figure FDA0002709182600000035
in the training process, the loss function l adopts the combination of a reconstruction function and structural similarity SSIM, so that the compactness analysis between output data and comparison data is improved, the purpose of effectively reducing the difference between a model and actual data is achieved, and the model is optimized better; wherein l is specifically:
Figure FDA0002709182600000036
wherein IbokehThe representation model generates an image with a shot effect,
Figure FDA0002709182600000041
an original image representing an image with an actual shot effect,
Figure FDA0002709182600000042
representing the generated image IboWith the actual image
Figure FDA0002709182600000043
The structural similarity between the two is as follows:
Figure FDA0002709182600000044
wherein alpha, beta and gamma are preset constants,
Figure FDA0002709182600000045
representing the generated image IbokeWith the actual image
Figure FDA0002709182600000046
The relationship between the brightness of the light source and the brightness of the light source,
Figure FDA0002709182600000047
representing the generated image IbokWith the actual image
Figure FDA0002709182600000048
The contrast ratio relationship between the two components,
Figure FDA0002709182600000049
representing the generated image IbokWith the actual image
Figure FDA00027091826000000410
Structural relationship between them.
10. The system of claim 6, wherein the third module is further configured to receive a picture to be background-blurred by using the network model for pop effect rendering obtained in the second module, adjust the picture by a predetermined multiple, and input the adjusted picture into the depth estimation network in the model; secondly, performing deconvolution up-sampling on a depth map obtained by a depth estimation network so as to restore the size of input image data, inputting the depth map obtained by the depth network into a convolution layer in a shot effect rendering network model, performing weight calculation aiming at a received image feature map by combining an activation function, and taking an obtained numerical value as corresponding data of a fuzzy network layer weighted sum; thirdly, a shallow fuzzy network in the shot effect rendering network model performs fuzzification operation on the received depth image signal of the to-be-background blurred picture; secondly, performing weighted sum calculation on the obtained weights and the fuzzy function by using a weighted sum calculation mode set in the shot effect rendering network model to obtain a numerical value of the processed image; and finally, outputting the obtained processed image numerical value by the shot effect rendering network model, namely outputting the image with shot effect rendering.
CN202011049747.8A 2020-09-29 2020-09-29 Method and system for rapidly blurring monocular visual image background based on depth perception Pending CN112184586A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011049747.8A CN112184586A (en) 2020-09-29 2020-09-29 Method and system for rapidly blurring monocular visual image background based on depth perception

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011049747.8A CN112184586A (en) 2020-09-29 2020-09-29 Method and system for rapidly blurring monocular visual image background based on depth perception

Publications (1)

Publication Number Publication Date
CN112184586A true CN112184586A (en) 2021-01-05

Family

ID=73946625

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011049747.8A Pending CN112184586A (en) 2020-09-29 2020-09-29 Method and system for rapidly blurring monocular visual image background based on depth perception

Country Status (1)

Country Link
CN (1) CN112184586A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113810597A (en) * 2021-08-10 2021-12-17 杭州电子科技大学 Rapid image and scene rendering method based on semi-prediction filtering
CN117893440A (en) * 2024-03-15 2024-04-16 昆明理工大学 Image defogging method based on diffusion model and depth-of-field guidance generation

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106683147A (en) * 2017-01-23 2017-05-17 浙江大学 Method of image background blur
CN107680053A (en) * 2017-09-20 2018-02-09 长沙全度影像科技有限公司 A kind of fuzzy core Optimized Iterative initial value method of estimation based on deep learning classification
CN107767413A (en) * 2017-09-20 2018-03-06 华南理工大学 A kind of image depth estimation method based on convolutional neural networks
CN108154465A (en) * 2017-12-19 2018-06-12 北京小米移动软件有限公司 Image processing method and device
CN110120009A (en) * 2019-05-09 2019-08-13 西北工业大学 Background blurring implementation method based on obvious object detection and depth estimation algorithm

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106683147A (en) * 2017-01-23 2017-05-17 浙江大学 Method of image background blur
CN107680053A (en) * 2017-09-20 2018-02-09 长沙全度影像科技有限公司 A kind of fuzzy core Optimized Iterative initial value method of estimation based on deep learning classification
CN107767413A (en) * 2017-09-20 2018-03-06 华南理工大学 A kind of image depth estimation method based on convolutional neural networks
CN108154465A (en) * 2017-12-19 2018-06-12 北京小米移动软件有限公司 Image processing method and device
CN110120009A (en) * 2019-05-09 2019-08-13 西北工业大学 Background blurring implementation method based on obvious object detection and depth estimation algorithm

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JINJIN GU等: "Blind Super-ResolutionWith Iterative Kernel Correction", 《ARXIV:1904.03377V2 [CS.CV]》, 29 May 2019 (2019-05-29), pages 3 *
QIAN M等: "Bggan: Bokeh-glass generative adversarial network for rendering realistic bokeh", 《/COMPUTER VISION–ECCV 2020 WORKSHOPS》, 4 November 2020 (2020-11-04), pages 1 - 17 *
SAIKAT DUTTA: "Depth-aware Blending of Smoothed Images for Bokeh Effect Generation", 《ARXIV:2005.14214V1 [CS.CV]》, 28 May 2020 (2020-05-28), pages 3 - 4 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113810597A (en) * 2021-08-10 2021-12-17 杭州电子科技大学 Rapid image and scene rendering method based on semi-prediction filtering
CN113810597B (en) * 2021-08-10 2022-12-13 杭州电子科技大学 Rapid image and scene rendering method based on semi-predictive filtering
CN117893440A (en) * 2024-03-15 2024-04-16 昆明理工大学 Image defogging method based on diffusion model and depth-of-field guidance generation
CN117893440B (en) * 2024-03-15 2024-05-14 昆明理工大学 Image defogging method based on diffusion model and depth-of-field guidance generation

Similar Documents

Publication Publication Date Title
Kalantari et al. Deep high dynamic range imaging of dynamic scenes.
CN112233038B (en) True image denoising method based on multi-scale fusion and edge enhancement
TW202134997A (en) Method for denoising image, method for augmenting image dataset and user equipment
WO2022042049A1 (en) Image fusion method, and training method and apparatus for image fusion model
CN111861894B (en) Image motion blur removing method based on generation type countermeasure network
CN110544213A (en) Image defogging method based on global and local feature fusion
CN110610526A (en) Method for segmenting monocular portrait and rendering depth of field based on WNET
Kinoshita et al. Multi-exposure image fusion based on exposure compensation
CN113139909B (en) Image enhancement method based on deep learning
CN113284061B (en) Underwater image enhancement method based on gradient network
CN112184586A (en) Method and system for rapidly blurring monocular visual image background based on depth perception
CN110225260B (en) Three-dimensional high dynamic range imaging method based on generation countermeasure network
Kinoshita et al. Automatic exposure compensation using an image segmentation method for single-image-based multi-exposure fusion
Dutta Depth-aware blending of smoothed images for bokeh effect generation
Luo et al. Wavelet synthesis net for disparity estimation to synthesize dslr calibre bokeh effect on smartphones
Kinoshita et al. Convolutional neural networks considering local and global features for image enhancement
Kinoshita et al. Automatic exposure compensation for multi-exposure image fusion
US20230267582A1 (en) Permutation invariant high dynamic range imaging
CN112509144A (en) Face image processing method and device, electronic equipment and storage medium
Panetta et al. Deep perceptual image enhancement network for exposure restoration
Afifi et al. Learning to correct overexposed and underexposed photos
Zheng et al. Windowing decomposition convolutional neural network for image enhancement
CN116996654A (en) New viewpoint image generation method, training method and device for new viewpoint generation model
Lee et al. Bokeh-loss gan: multi-stage adversarial training for realistic edge-aware bokeh
Honig et al. Image declipping with deep networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: Room 203b, building 3, artificial intelligence Industrial Park, 266 Chuangyan Road, Qilin science and Technology Innovation Park, Jiangning District, Nanjing City, Jiangsu Province, 211000

Applicant after: Zhongke Fangcun Zhiwei (Nanjing) Technology Co.,Ltd.

Applicant after: Zhongke Nanjing artificial intelligence Innovation Research Institute

Address before: Room 203b, building 3, artificial intelligence Industrial Park, 266 Chuangyan Road, Qilin science and Technology Innovation Park, Jiangning District, Nanjing City, Jiangsu Province, 211000

Applicant before: Zhongke Fangcun Zhiwei (Nanjing) Technology Co.,Ltd.

Applicant before: NANJING ARTIFICIAL INTELLIGENCE CHIP INNOVATION INSTITUTE, INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES