CN113344997B - Method and system for rapidly acquiring high-definition foreground image only containing target object - Google Patents

Method and system for rapidly acquiring high-definition foreground image only containing target object Download PDF

Info

Publication number
CN113344997B
CN113344997B CN202110655267.4A CN202110655267A CN113344997B CN 113344997 B CN113344997 B CN 113344997B CN 202110655267 A CN202110655267 A CN 202110655267A CN 113344997 B CN113344997 B CN 113344997B
Authority
CN
China
Prior art keywords
foreground
target
depth
depth map
target object
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110655267.4A
Other languages
Chinese (zh)
Other versions
CN113344997A (en
Inventor
陈冠宇
王磊
王飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fangtian Shenghua Beijing Digital Technology Co ltd
Original Assignee
Fangtian Shenghua Beijing Digital Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fangtian Shenghua Beijing Digital Technology Co ltd filed Critical Fangtian Shenghua Beijing Digital Technology Co ltd
Priority to CN202110655267.4A priority Critical patent/CN113344997B/en
Publication of CN113344997A publication Critical patent/CN113344997A/en
Application granted granted Critical
Publication of CN113344997B publication Critical patent/CN113344997B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/40Image enhancement or restoration by the use of histogram techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration by the use of more than one image, e.g. averaging, subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/80Camera processing pipelines; Components thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20016Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

A depth map with a very small loss rate is obtained by optimizing a depth neural network model, and a high-definition foreground map which only contains a target object is obtained by performing layer depth interception, contour edge calculation and target contour extraction on the depth map. If the foreground image only containing the target object is synthesized with the background image, a high-definition image for shielding the non-target foreground can be obtained. The problem that when a picture is taken in a scenic spot or a red card punching place or any picture taking place, non-target objects such as other passers-by, tourists and the like can be shot into the picture, and a high-definition picture only containing a specified target object cannot be obtained in a short time is solved.

Description

Method and system for rapidly acquiring high-definition foreground image only containing target object
Technical Field
The invention relates to the technical field of image real-time processing, in particular to a method and a system for quickly acquiring a high-definition foreground image only containing a target object.
Background
When a photo is taken in some places such as scenic spots or a network photo card, a target photo containing only all information of a specified object cannot be obtained due to the existence of a large number of tourists, and therefore the user cannot enjoy the beautiful scenery. The current common solution is to remove non-target objects such as passers-by and the like through drawing software at the later stage, so that the time and labor cost are greatly increased.
Disclosure of Invention
The invention provides a method and a system for rapidly acquiring a high-definition foreground image only containing a target object, wherein a depth image with extremely low loss rate is acquired by optimizing a neural network model, and layer depth interception, contour edge calculation and target contour extraction are performed on the depth image to obtain the high-definition foreground image only containing the target object, the target object not only can comprise tourists, but also can perfectly reserve information such as articles carried by the tourists and shadows, and the like, so that the subsequent processing of foreground and background combination and the like can not be distorted.
The technical scheme of the invention is as follows:
a method for rapidly acquiring a high-definition foreground image only containing a target object is characterized by comprising the following steps:
s1, acquiring a target processing picture and an auxiliary processing picture which are shot at the same time and have parallax;
s2, directly inputting the target processing picture and the auxiliary processing picture into an optimized depth neural network model, and acquiring parallax information of the target processing picture and the auxiliary processing picture to further obtain a depth map of the target processing picture;
s3, according to the maximum depth of field level and the minimum depth of field level of the target object, a foreground level depth map is intercepted from the depth map;
s4, performing contour edge calculation on the foreground layer depth map to obtain all foreground contours therein to obtain a foreground contour depth map;
and S5, performing target contour extraction on the foreground contour depth map to obtain a target foreground contour of the target object, wherein the pixel point set of the target processing picture corresponding to the pixel point set contained in the target foreground contour is the high-definition foreground map only containing the target object.
Preferably, the target object includes a target object and all objects in direct contact with the target object.
Preferably, in S3, the depth map occupied by the target object in the depth map is intercepted according to the minimum distance and the maximum distance between the target object and the camera lens, so as to obtain a foreground depth map;
preferably, in S4, the contour edge calculation includes identifying all feature point sets in the foreground depth-level map and calculating an edge of each feature point set to obtain contours of all feature point sets on the foreground depth-level map, where each feature point set is a foreground contour.
Preferably, in S5, the extracting the target contour includes extracting a foreground contour of the target object from the contours of all feature point sets of the foreground contour depth map according to an occupied area of the target object and/or the depth of field of the region where the target object is located.
Preferably, in S1, the target processing picture and the auxiliary processing picture include binocular pictures taken by binocular cameras, or binocular pictures taken by the binocular cameras after being preprocessed, or pictures taken by a plurality of cameras with parallax, or pictures obtained by preprocessing images taken by a plurality of cameras with parallax.
Preferably, the preprocessing comprises image rectification, and the image rectification comprises contour detection rectification and/or rotation angle rectification and/or corresponding similarity part connecting line rectification and/or gray level rectification and/or binarization rectification and/or histogram equalization rectification for image matching.
Preferably, the optimized deep neural network model is obtained through multiple training and testing, and the method comprises the following steps:
s2.1, splicing the data of the target processing picture and the auxiliary processing picture into different parts of the same picture to obtain a feature extraction original picture;
s2.2, performing two-dimensional convolution and pooling operation on the feature extraction original image for a plurality of times to obtain a first feature data set; the first characteristic data set does not correlate the data information of the target processing picture and the auxiliary processing picture, and is simply spliced;
s2.3, extracting a second characteristic data set under a plurality of resolution levels from high to low through residual error network operation and spatial pyramid pooling operation on the first characteristic data set; each resolution level corresponds to one of the second feature data sets;
s2.4, symmetrically fusing and normalizing the data information belonging to the target processing picture and the auxiliary processing picture in each second characteristic data set to obtain a group of third characteristic data sets;
s2.5, performing three-dimensional convolution on each third characteristic data set to obtain a group of initial depth maps;
s2.6, comparing the initial depth map with a real calibrated depth map respectively, and calculating a loss function of the initial depth map;
s2.7, extracting the features of a large number of original image groups shot at different moments, repeating the steps S2.2-S2.6, and continuously optimizing a network weight value through back propagation to obtain a loss function value L as small as possible; and obtaining an optimized deep neural network model.
Preferably, in S2.6, the loss function value L ∑ AkLk (k ═ 1,2,3,4 … …), Lk represents the loss of the initial depth map at each resolution, where L1 represents the loss of the initial depth map at the highest resolution, L2, L3 … … represent the loss of the initial depth maps at successively lower resolutions, Ak represents a loss coefficient, and is a fixed constant, and Ak > Ak + 1.
A system for rapidly acquiring a high-definition foreground image only containing a target object comprises a depth image acquisition module, a foreground layer depth image acquisition module, a foreground contour depth image acquisition module and a target foreground image acquisition module; a depth neural network model is arranged in the depth map acquisition module; the depth map acquisition module is used for processing the input target processing picture and the auxiliary processing picture through a depth neural network model to obtain a depth map, and the depth map is processed by the foreground layer depth map acquisition module, the foreground contour depth map acquisition module and the target foreground map acquisition module in sequence to obtain a high-definition target foreground map.
Preferably, the foreground layer depth map obtaining module processes the input depth map according to the maximum depth-of-field layer and the minimum depth-of-field layer occupied by the target object, and captures a depth point set between the maximum depth-of-field layer and the minimum depth-of-field layer to obtain a foreground layer depth map; the foreground contour depth map acquisition module is used for carrying out contour edge calculation on the input foreground layer depth map, calibrating and dividing all foreground contours in the foreground layer depth map to obtain a foreground contour depth map; the target foreground image acquisition module is used for extracting a target contour from the input foreground contour depth image to obtain a pixel point set of the target processing image corresponding to a pixel point set contained in the target foreground contour of the target foreground contour, namely a high-definition target foreground image;
preferably, the depth map obtaining module further comprises a neural network model training sub-module; the neural network model training submodule comprises a training set input submodule, a feature extraction submodule, a feature fusion submodule, a depth calculation submodule, a depth information comparison submodule and a loss function adjusting submodule; the training set input submodule splices the data of the target processing picture and the auxiliary processing picture into different parts of the same image to obtain a feature extraction original image; the feature extraction sub-module sends the feature extraction original image into a convolution layer and a pooling layer to carry out two-dimensional convolution and pooling operation, and a first feature data set is obtained; extracting second feature data sets under multiple resolution levels from high to low from the first feature data set through residual network operation and spatial pyramid pooling operation; each resolution level corresponds to one of the second feature data sets; the feature fusion sub-module performs feature fusion and normalization processing on data information belonging to the target processing picture in each second feature data set and data information belonging to the auxiliary processing picture in the second feature data set, and associates the data information belonging to the target processing picture and the auxiliary processing picture in each second feature data set to obtain a group of third feature data sets; the depth calculation submodule performs three-dimensional convolution on each third feature data set to obtain a group of initial depth maps; the depth information comparison submodule compares each initial depth map with a real calibrated depth map respectively to obtain a loss function; the loss function adjusting submodule continuously optimizes a network weight value through back propagation according to a loss function finally calculated by a large number of original image groups to obtain a loss function value L as small as possible; and obtaining a deep neural network model.
Compared with the prior art, the invention has the advantages that: according to the method for rapidly acquiring the high-definition foreground image only containing the target object, the depth image with extremely low loss rate is acquired by optimizing the depth neural network model, and the high-definition foreground image only containing the target object is acquired by performing layer depth interception, contour edge calculation and target contour extraction on the depth image, so that the target object can not only comprise a tourist, but also perfectly reserve information such as articles and shadows carried by the tourist, and the like, and the condition that subsequent processing such as foreground and background combination cannot be distorted is ensured. If the foreground image only containing the target object is synthesized with the background image, a high-definition image for shielding the non-target foreground can be obtained. The problem that when a picture is taken in a scenic spot or a red card punching place or any picture taking place, non-target objects such as other passers-by, tourists and the like can be shot into the picture, and a high-definition picture only containing a specified target object cannot be obtained in a short time is solved.
Drawings
FIG. 1 is a flowchart of a method for rapidly acquiring a high-definition foreground image containing only a target object according to the present invention;
FIG. 2 is a flowchart of the operation of the optimized deep neural network model of the method for rapidly obtaining a high-definition foreground map containing only a target object according to the present invention;
FIG. 3 is a schematic diagram illustrating an exemplary feature extraction and feature fusion process of the method for rapidly obtaining a high-definition foreground image containing only a target object according to the present invention;
FIG. 4 is a schematic diagram illustrating an example of a depth calculation process of the method for rapidly obtaining a high-definition foreground image containing only a target object according to the present invention;
fig. 5 is a block diagram of a system for rapidly acquiring a high-definition foreground image containing only a target object according to the present invention.
Detailed Description
To facilitate an understanding of the invention, the invention is described in more detail below with reference to the accompanying figures 1-4 and the specific examples.
Example 1
A method for rapidly acquiring a high-definition foreground image only containing a target object is disclosed, and a flow chart is shown in FIG. 1, and comprises the following steps:
s1, acquiring a target processing picture and an auxiliary processing picture which are shot at the same time and have parallax, wherein the target processing picture and the auxiliary processing picture are shot in real time through a binocular camera; the two pictures which are shot at the same time and have parallax can be taken as a target processing picture and an auxiliary processing picture after being shot by a plurality of cameras with parallax, and the pictures shot by a binocular camera or a plurality of cameras with parallax can be obtained after being preprocessed, wherein the pictures necessarily contain a target object. The preprocessing comprises image rectification, and the image rectification comprises contour detection rectification and/or rotation angle rectification and/or corresponding similarity position connecting line rectification and/or gray level rectification and/or binarization rectification and/or histogram equalization rectification of image matching.
The target object includes a target object and all objects in direct contact with the target object. The tourist can take a picture in the scenic spot, and can also be the tourist and all objects in contact with the tourist, such as accessories, shadows and the like.
S2, inputting the target processing picture and the auxiliary processing picture into an optimized depth neural network model, and acquiring parallax information of the target processing picture and the auxiliary processing picture to further obtain a depth map of the target processing picture; each pixel value of the depth map represents a distance of a point in the scene from the camera.
S3, determining the maximum depth of field and the minimum depth of field occupied by the target object according to the minimum distance and the maximum distance between the target object and the camera lens, further obtaining a maximum depth of field level and a minimum depth of field level, and intercepting and reserving a pixel point set between the maximum depth of field level and the minimum depth of field level from the depth map to obtain a foreground level depth map; at this time, the target object may include the specified guest nail and a shadow of the specified guest nail, even a friend b who is in direct contact with the guest nail, or the like. The maximum depth of field level refers to a distance level where a pixel point which is farthest from the camera and belongs to the target object is located, and the minimum depth of field level refers to a distance level where a pixel point which is closest to the camera and belongs to the target object is located. The foreground layer depth map includes all sets of depth points between a maximum depth of field layer and a minimum depth of field layer.
S4, performing contour edge calculation on the foreground layer depth map to obtain all foreground contours in the foreground layer depth map to obtain a foreground contour depth map; and the contour edge calculation comprises the steps of identifying all feature point sets in the foreground layer depth map and calculating the edge of each feature point set to obtain the contour of all the feature point sets on the foreground depth map. The feature point set is a set of pixel points left in the foreground depth map by all tourists and/or passersby and/or non-background objects staying in a short time within the shooting range.
S5, extracting a target foreground contour of the target object from the foreground contour depth map according to the area occupied by the target object and/or the depth of field of the area where the target object is located, wherein the pixel point set of the target processing picture corresponding to the pixel point set contained in the target foreground contour is a high-definition foreground map only containing the target object, and the high-definition target foreground map completely shields the foreground of the non-target object; the target foreground contour only contains a target object contour; when the guest nail has no other person in contact with it, the target object may be the outline of the guest nail and accessories (such as a satchel, a cell phone, etc.), shadows, etc. in contact with it. The target object can also comprise a friend B, and of course, if the friend B contacts the appointed tourist A, the friend A and the friend B are both used as the target objects at the same time, and only one time of target contour extraction is needed, wherein the target contour extraction comprises the step of extracting the contour of the appointed target object from the contours of all feature point sets of the foreground contour depth map according to the occupied area of the appointed target object and/or the depth of field of a central area where the appointed target object is located. In the extraction process, only target foreground is reserved, such as appointed tourists and shadows, accessories and the like contacted with the tourists. If the friend B does not contact the appointed tourist A, the high-definition target foreground image only containing the target object can be obtained by carrying out target outline extraction twice or changing a target outline extraction method, and the problem that when a picture is taken in a scenic spot, a red-line card-punching place or any picture-taking place, non-target objects such as other passersby, tourists and the like can be taken into the image, and the high-definition image only containing the appointed target object cannot be obtained in a short time is solved. The target foreground contour obtained through contour edge calculation and target contour extraction not only comprises the contour of the designated target object, but also comprises the contours of all foreground objects in contact with the designated target object, such as the shadow of a certain position or a certain target photographer and/or various articles worn on the body, particularly the shadow of the target photographer, and the high-definition picture for shielding non-target foreground can not be distorted after the foreground picture only containing the target foreground and the background picture are synthesized to the greatest extent.
Preferably, the optimized deep neural network model in step S2 is obtained through multiple training and testing, and a flowchart thereof is shown in fig. 2, and includes the following steps:
s2.1, splicing the data of the target processing picture and the auxiliary processing picture into different parts of the same picture to obtain an original feature extraction picture;
s2.2, as shown in the figure 3, sending the feature extraction original image into a convolution layer and a pooling layer to carry out two-dimensional convolution and pooling operation to obtain a first feature data set; the first feature data set does not correlate the data information of the target processing picture and the auxiliary processing picture, and is simply spliced;
s2.3, performing residual error network operation on the first characteristic data set through a residual error network layer, performing spatial pyramid pooling operation through a spatial pyramid pooling layer, and extracting a group of second characteristic data sets with 4 resolutions from high to low; there is one second feature data set at each resolution.
S2.4, symmetrically fusing and normalizing the data information which belongs to the target processing picture in each second characteristic data set with the data information which belongs to the target processing picture and the auxiliary processing picture in other second characteristic data sets to obtain a group of third characteristic data sets;
s2.5, as shown in FIG. 4, performing three-dimensional convolution on each third feature data set, namely performing depth calculation to obtain a group of initial depth maps;
s2.6, comparing the initial depth map with a real calibrated depth map respectively, and calculating a loss function of the initial depth map; loss function L ═ A 1 L 1 +A 2 L 2 +A 3 L 3 +A 4 L 4 ,L 1 Representing the loss of the initial depth map at the highest resolution, L 2 、L 3 … … represents the loss of the original depth map with successively decreasing resolution, A k Represents a loss coefficient, is a fixed constant, and A 1 >A 2 >A 3 >A 4 (ii) a The real calibrated depth map can be a depth information map calculated through the lens of the camera and the related information of the position of the camera; the depth information input after the real place is calibrated in advance can be artificially input.
S2.7, extracting a large number of characteristics of the original image group shot at different moments, repeating the steps S2.2-S2.6, and continuously optimizing a network weight value through back propagation according to each obtained loss function, wherein the optimization of the network weight value comprises the adjustment of a two-dimensional convolution in S2.2 and a three-dimensional convolution parameter in S2.5, and the optimization of a loss function L value is realized to minimize the value; and obtaining an optimized deep neural network model.
Example 2
A modularized block diagram of a system for rapidly acquiring a high-definition foreground image only containing a target object is shown in figure 5, and the system comprises a depth image acquisition module, a foreground layer depth image acquisition module, a foreground contour depth image acquisition module and a target foreground image acquisition module; a depth neural network model is arranged in the depth map acquisition module; the depth map acquisition module processes the input target processing picture and the auxiliary processing picture through a depth neural network model to acquire parallax information of the input target processing picture and the auxiliary processing picture, and then obtains a depth map of the target processing picture.
And the depth map is processed by a foreground layer depth map acquisition module, a foreground contour depth map acquisition module and a target foreground map acquisition module in sequence to obtain a high-definition target foreground map.
Preferably, the foreground level depth map acquiring module is configured to intercept a maximum depth level and a minimum depth level occupied by a target object according to a minimum distance and a maximum distance between the target object and a camera, and intercept a foreground level depth map from the depth map; the foreground level depth map includes all sets of depth points between a maximum depth of field level and a minimum depth of field level. The maximum depth level refers to a distance level where a pixel point which is farthest from the camera and belongs to the target object is located, and the minimum depth level refers to a distance level where a pixel point which is closest to the camera and belongs to the target object is located.
And the foreground contour depth map acquisition module is used for carrying out contour edge calculation on the input foreground layer depth map, calibrating and dividing all foreground contours in the foreground layer depth map to obtain a foreground contour depth map. The target foreground image acquisition module is used for extracting a target contour from the input foreground contour depth image to obtain a pixel point set of the target processing image corresponding to a pixel point set contained in the target foreground contour of the target foreground contour, namely a high-definition target foreground image; and the high-definition target foreground image completely shields the foreground of the non-target object.
Preferably, the depth map obtaining module further comprises a neural network model training sub-module; the neural network model training submodule comprises a training set input submodule, a feature extraction submodule, a feature fusion submodule, a depth calculation submodule, a depth information comparison submodule and a loss function adjusting submodule; the training set input sub-module combines the data of the target processing picture and the auxiliary processing picture into different parts of the same image to obtain a feature extraction original image; the feature extraction submodule sends the feature extraction original image into a convolution layer and a pooling layer to carry out two-dimensional convolution and pooling operation to obtain a first feature data set; extracting a second characteristic data set under a plurality of resolution levels from high to low through residual error network operation and spatial pyramid pooling operation on the first characteristic data set; each resolution level corresponds to one of the second feature data sets; the feature fusion submodule performs feature fusion and normalization processing on data information which belongs to the target processing picture and is in each second feature data set and data information which belongs to the auxiliary processing picture, and associates the data information which belongs to the target processing picture and the auxiliary processing picture in each second feature data set to obtain a group of third feature data sets; the depth calculation submodule performs three-dimensional convolution on each third characteristic data set to obtain a group of initial depth maps; the depth information comparison submodule compares each initial depth map with a real calibrated depth map respectively to obtain a loss function; the loss function adjusting submodule continuously optimizes a network weight value through back propagation according to a loss function finally calculated by a large number of original image groups to obtain a loss function value L as small as possible; and obtaining a deep neural network model.
It should be noted that the above-described embodiments may enable those skilled in the art to more fully understand the present invention, but do not limit the present invention in any way. Therefore, although the present invention has been described in detail with reference to the drawings and examples, it should be understood by those skilled in the art that the present invention may be modified and replaced by other equivalent elements, and it should be understood that all the technical solutions and modifications which do not depart from the spirit and scope of the present invention are covered by the protection scope of the present patent.

Claims (6)

1. A method for rapidly acquiring a high-definition foreground image only containing a target object is characterized by comprising the following steps:
s1, acquiring a target processing picture and an auxiliary processing picture which are shot at the same time and have parallax;
s2, directly inputting the target processing picture and the auxiliary processing picture into an optimized depth neural network model, and acquiring parallax information of the target processing picture and the auxiliary processing picture so as to obtain a depth map of the target processing picture;
the optimized deep neural network model is obtained through multiple times of training and testing, and comprises the following steps:
s2.1, splicing the data of the target processing picture and the auxiliary processing picture into different parts of the same picture to obtain a feature extraction original picture;
s2.2, performing two-dimensional convolution and pooling operation on the feature extraction original image for a plurality of times to obtain a first feature data set; the first characteristic data set does not correlate the data information of the target processing picture and the auxiliary processing picture, and is simply spliced;
s2.3, extracting second feature data sets under a plurality of resolution levels from high to low through residual error network operation and spatial pyramid pooling operation on the first feature data sets; each resolution level corresponds to one of the second feature data sets;
s2.4, symmetrically fusing and normalizing the data information belonging to the target processing picture and the auxiliary processing picture in each second characteristic data set to obtain a group of third characteristic data sets;
s2.5, performing three-dimensional convolution on each third characteristic data set to obtain a group of initial depth maps;
s2.6, comparing the initial depth map with a real calibrated depth map respectively, and calculating a loss function of the initial depth map;
s2.7, extracting a large number of different features from the original image, repeating the steps S2.2-S2.6, and continuously optimizing a network weight value through back propagation to obtain a loss function value L meeting the requirement; obtaining an optimized deep neural network model;
s3, according to the maximum depth of field level and the minimum depth of field level of the target object, intercepting a foreground level depth map from the depth map;
s4, performing contour edge calculation on the foreground layer depth map to obtain all foreground contours therein to obtain a foreground contour depth map;
s5, performing target contour extraction on the foreground contour depth map to obtain a target foreground contour of a target object, wherein a pixel point set of the target processing picture corresponding to a pixel point set contained in the target foreground contour is a high-definition foreground map only containing the target object;
the target object includes a target object and all objects in direct contact with the target object.
2. The method according to claim 1, wherein in S3, the depth map of the target object is obtained by intercepting the depth of the target object in the depth map according to the minimum distance and the maximum distance between the target object and the camera lens.
3. The method according to claim 1, wherein in S4, the contour edge calculation includes identifying all feature point sets in the foreground layer depth map and calculating the edge of each feature point set to obtain the contours of all feature point sets on the foreground depth map, and each feature point set is a foreground contour; and/or in S5, the extracting the target contour includes extracting a foreground contour of the target object from the contours of all feature point sets of the foreground contour depth map according to an occupied area of the target object and/or the depth of field of the region where the target object is located.
4. The method for rapidly acquiring a high-definition foreground image containing only a target object as claimed in claim 1 wherein, in S1, the target processing picture and the auxiliary processing picture include binocular pictures taken by binocular cameras, or the preprocessed binocular pictures taken by the binocular cameras, or the pictures taken by a plurality of cameras with parallax, or the preprocessed pictures taken by the plurality of cameras with parallax.
5. The method according to claim 1, wherein in S2.6, the loss function L ═ Σ a is used to quickly obtain the high-definition foreground image containing only the target object k L k (k=1,2,3,4……),L k Represents each pointLoss of initial depth map at resolution, where L 1 Representing the loss of the initial depth map at the highest resolution, L 2 、L 3 … … represents the loss of the original depth map with successively decreasing resolution, A k Represents a loss coefficient, is a fixed constant, and A k >A k+1
6. A system for rapidly acquiring a high-definition foreground image only containing a target object is characterized by comprising a depth image acquisition module, a foreground layer depth image acquisition module, a foreground contour depth image acquisition module and a target foreground image acquisition module; a depth neural network model is arranged in the depth map acquisition module; the depth map acquisition module is used for processing an input target processing picture and an input auxiliary processing picture through a depth neural network model to obtain a depth map, and the depth map is processed through a foreground layer depth map acquisition module, a foreground contour depth map acquisition module and a target foreground map acquisition module in sequence to obtain a high-definition target foreground map;
the foreground layer depth map acquisition module is used for processing the input depth map according to the maximum depth-of-field layer and the minimum depth-of-field layer occupied by the target object, and intercepting a depth point set between the maximum depth-of-field layer and the minimum depth-of-field layer to obtain a foreground layer depth map; the foreground contour depth map acquisition module is used for carrying out contour edge calculation on the input foreground layer depth map, calibrating and dividing all foreground contours in the foreground layer depth map to obtain a foreground contour depth map; the target foreground image acquisition module is used for extracting a target contour from the input foreground contour depth image to obtain a pixel point set of the target processing image corresponding to a pixel point set contained in the target foreground contour of the target foreground contour, namely a high-definition target foreground image; the target object comprises a target object and all objects in direct contact with the target object; the depth map acquisition module also comprises a neural network model training submodule; the neural network model training submodule comprises a training set input submodule, a feature extraction submodule, a feature fusion submodule, a depth calculation submodule, a depth information comparison submodule and a loss function adjusting submodule; the training set input sub-module combines the data of the target processing picture and the auxiliary processing picture into different parts of the same image to obtain a feature extraction original image; the feature extraction sub-module sends the feature extraction original image into a convolution layer and a pooling layer to carry out two-dimensional convolution and pooling operation, and a first feature data set is obtained; extracting a second characteristic data set under a plurality of resolution levels from high to low through residual error network operation and spatial pyramid pooling operation on the first characteristic data set; each resolution level corresponds to one of the second feature data sets; the feature fusion sub-module performs feature fusion and normalization processing on data information belonging to the target processing picture in each second feature data set and data information belonging to the auxiliary processing picture in the second feature data set, and associates the data information belonging to the target processing picture and the auxiliary processing picture in each second feature data set to obtain a group of third feature data sets; the depth calculation submodule performs three-dimensional convolution on each third feature data set to obtain a group of initial depth maps; the depth information comparison submodule compares each initial depth map with a real calibrated depth map respectively to obtain a loss function; the loss function adjusting submodule continuously optimizes a network weight value through back propagation according to a loss function finally calculated by a large number of original image groups to obtain a loss function value L meeting requirements; and obtaining an optimized deep neural network model.
CN202110655267.4A 2021-06-11 2021-06-11 Method and system for rapidly acquiring high-definition foreground image only containing target object Active CN113344997B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110655267.4A CN113344997B (en) 2021-06-11 2021-06-11 Method and system for rapidly acquiring high-definition foreground image only containing target object

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110655267.4A CN113344997B (en) 2021-06-11 2021-06-11 Method and system for rapidly acquiring high-definition foreground image only containing target object

Publications (2)

Publication Number Publication Date
CN113344997A CN113344997A (en) 2021-09-03
CN113344997B true CN113344997B (en) 2022-07-26

Family

ID=77477071

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110655267.4A Active CN113344997B (en) 2021-06-11 2021-06-11 Method and system for rapidly acquiring high-definition foreground image only containing target object

Country Status (1)

Country Link
CN (1) CN113344997B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102509343A (en) * 2011-09-30 2012-06-20 北京航空航天大学 Binocular image and object contour-based virtual and actual sheltering treatment method
CN105894998A (en) * 2014-11-30 2016-08-24 黄石木信息科技有限公司 Making method of three-dimensional virtual scene tour guidance system
CN106021330A (en) * 2016-05-06 2016-10-12 浙江工业大学 A three-dimensional model retrieval method used for mixed contour line views
CN109035319A (en) * 2018-07-27 2018-12-18 深圳市商汤科技有限公司 Monocular image depth estimation method and device, equipment, program and storage medium
CN111161291A (en) * 2019-12-31 2020-05-15 广西科技大学 Contour detection method based on target depth of field information

Family Cites Families (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5780865B2 (en) * 2011-07-14 2015-09-16 キヤノン株式会社 Image processing apparatus, imaging system, and image processing system
RU2014109439A (en) * 2014-03-12 2015-09-20 ЭлЭсАй Корпорейшн PROCESSOR OF IMAGES CONTAINING A GESTURE RECOGNITION SYSTEM WITH COMPARISON OF THE POSITION OF THE HAND, BASED ON THE SIGNS OF THE CIRCUIT
HUP1400600A2 (en) * 2014-12-17 2016-06-28 Pi Holding Zrt Method to replace image segment content
CN105225230B (en) * 2015-09-11 2018-07-13 浙江宇视科技有限公司 A kind of method and device of identification foreground target object
CN107481261B (en) * 2017-07-31 2020-06-16 中国科学院长春光学精密机械与物理研究所 Color video matting method based on depth foreground tracking
CN107395982A (en) * 2017-08-22 2017-11-24 北京小米移动软件有限公司 Photographic method and device
CN109934834A (en) * 2017-12-19 2019-06-25 北京京东尚科信息技术有限公司 Image outline extracting method and system
CN110009555B (en) * 2018-01-05 2020-08-14 Oppo广东移动通信有限公司 Image blurring method and device, storage medium and electronic equipment
CN108510535B (en) * 2018-03-14 2020-04-24 大连理工大学 High-quality depth estimation method based on depth prediction and enhancer network
CN109598754B (en) * 2018-09-29 2020-03-17 天津大学 Binocular depth estimation method based on depth convolution network
WO2020087485A1 (en) * 2018-11-02 2020-05-07 Oppo广东移动通信有限公司 Method for acquiring depth image, device for acquiring depth image, and electronic device
US10839543B2 (en) * 2019-02-26 2020-11-17 Baidu Usa Llc Systems and methods for depth estimation using convolutional spatial propagation networks
CN109829850B (en) * 2019-03-06 2023-04-28 百度在线网络技术(北京)有限公司 Image processing method, device, equipment and computer readable medium
CN110189339A (en) * 2019-06-03 2019-08-30 重庆大学 The active profile of depth map auxiliary scratches drawing method and system
CN110414674B (en) * 2019-07-31 2021-09-10 浙江科技学院 Monocular depth estimation method based on residual error network and local refinement
CN110517306B (en) * 2019-08-30 2023-07-28 的卢技术有限公司 Binocular depth vision estimation method and system based on deep learning
CN110738697B (en) * 2019-10-10 2023-04-07 福州大学 Monocular depth estimation method based on deep learning
CN111105451B (en) * 2019-10-31 2022-08-05 武汉大学 Driving scene binocular depth estimation method for overcoming occlusion effect
CN112699844B (en) * 2020-04-23 2023-06-20 华南理工大学 Image super-resolution method based on multi-scale residual hierarchy close-coupled network
CN111797841B (en) * 2020-05-10 2024-03-22 浙江工业大学 Visual saliency detection method based on depth residual error network
CN112070782B (en) * 2020-08-31 2024-01-09 腾讯科技(深圳)有限公司 Method, device, computer readable medium and electronic equipment for identifying scene contour
CN112070054B (en) * 2020-09-17 2022-07-29 福州大学 Vehicle-mounted laser point cloud marking classification method based on graph structure and attention mechanism
CN112672048A (en) * 2020-12-21 2021-04-16 山西方天圣华数字科技有限公司 Image processing method based on binocular image and neural network algorithm
CN112802079A (en) * 2021-01-19 2021-05-14 奥比中光科技集团股份有限公司 Disparity map acquisition method, device, terminal and storage medium
CN112767467B (en) * 2021-01-25 2022-11-11 郑健青 Double-image depth estimation method based on self-supervision deep learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102509343A (en) * 2011-09-30 2012-06-20 北京航空航天大学 Binocular image and object contour-based virtual and actual sheltering treatment method
CN105894998A (en) * 2014-11-30 2016-08-24 黄石木信息科技有限公司 Making method of three-dimensional virtual scene tour guidance system
CN106021330A (en) * 2016-05-06 2016-10-12 浙江工业大学 A three-dimensional model retrieval method used for mixed contour line views
CN109035319A (en) * 2018-07-27 2018-12-18 深圳市商汤科技有限公司 Monocular image depth estimation method and device, equipment, program and storage medium
CN111161291A (en) * 2019-12-31 2020-05-15 广西科技大学 Contour detection method based on target depth of field information

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"基于条件生成对抗网络的手势图像背景去除方法";王庆飞等;《计算机应用研究》;20201231;第37卷;第401-403页 *

Also Published As

Publication number Publication date
CN113344997A (en) 2021-09-03

Similar Documents

Publication Publication Date Title
JP6929047B2 (en) Image processing equipment, information processing methods and programs
CN110222787B (en) Multi-scale target detection method and device, computer equipment and storage medium
CN111107337B (en) Depth information complementing method and device, monitoring system and storage medium
CN106462956A (en) Local adaptive histogram equalization
WO2018171008A1 (en) Specular highlight area restoration method based on light field image
AU2020203790B2 (en) Transformed multi-source content aware fill
CN112085802A (en) Method for acquiring three-dimensional finger vein image based on binocular camera
KR20150031085A (en) 3D face-modeling device, system and method using Multiple cameras
CN112969023A (en) Image capturing method, apparatus, storage medium, and computer program product
CN115115611A (en) Vehicle damage identification method and device, electronic equipment and storage medium
CN111310567A (en) Face recognition method and device under multi-person scene
CN113225484B (en) Method and device for rapidly acquiring high-definition picture shielding non-target foreground
CN113096016A (en) Low-altitude aerial image splicing method and system
CN113344997B (en) Method and system for rapidly acquiring high-definition foreground image only containing target object
CN111105370A (en) Image processing method, image processing apparatus, electronic device, and readable storage medium
CN116051736A (en) Three-dimensional reconstruction method, device, edge equipment and storage medium
CN113538315B (en) Image processing method and device
CN111630569B (en) Binocular matching method, visual imaging device and device with storage function
CN112288669A (en) Point cloud map acquisition method based on light field imaging
CN116452776B (en) Low-carbon substation scene reconstruction method based on vision synchronous positioning and mapping system
CN112085653B (en) Parallax image splicing method based on depth of field compensation
CN111010558B (en) Stumpage depth map generation method based on short video image
CN111080689B (en) Method and device for determining face depth map
CN113487492A (en) Parallax value correction method, parallax value correction device, electronic apparatus, and storage medium
CN116664386A (en) Image processing method, device, mobile terminal and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20211215

Address after: 101125 404-c37, zone B, No. a 560, Luyuan South Street, Tongzhou District, Beijing

Applicant after: Fangtian Shenghua (Beijing) Digital Technology Co.,Ltd.

Address before: 030000 floor 20, Hongfu complex building, Xiaodian District, Taiyuan City, Shanxi Province

Applicant before: Shanxi Fangtian Shenghua Digital Technology Co.,Ltd.

GR01 Patent grant
GR01 Patent grant