WO2020224244A1 - Method and apparatus for obtaining depth-of-field image - Google Patents

Method and apparatus for obtaining depth-of-field image Download PDF

Info

Publication number
WO2020224244A1
WO2020224244A1 PCT/CN2019/121603 CN2019121603W WO2020224244A1 WO 2020224244 A1 WO2020224244 A1 WO 2020224244A1 CN 2019121603 W CN2019121603 W CN 2019121603W WO 2020224244 A1 WO2020224244 A1 WO 2020224244A1
Authority
WO
WIPO (PCT)
Prior art keywords
layer
feature
fusion
output
extraction
Prior art date
Application number
PCT/CN2019/121603
Other languages
French (fr)
Chinese (zh)
Inventor
赵培骁
黄轩
王孝宇
Original Assignee
深圳云天励飞技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳云天励飞技术有限公司 filed Critical 深圳云天励飞技术有限公司
Publication of WO2020224244A1 publication Critical patent/WO2020224244A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the present invention relates to the field of computers, in particular to a method and device for acquiring a depth map.
  • the depth of field information is often obtained through a special camera, binocular camera, or laser ranging and other equipment acquisition method, and the equipment cost is relatively high.
  • the current related algorithms for constructing depth images have the problem of a single feature extraction structure, which makes the extracted feature information relatively limited, which is not conducive to the construction of complex three-dimensional images.
  • the embodiment of the present invention provides a method and device for acquiring a smart depth map.
  • a single image acquired by a common camera can be used by setting multi-scale and multi-level features in the deep learning network architecture. Extractor to obtain depth image information.
  • the first aspect of the present invention discloses a method for acquiring a depth map, and the method includes:
  • a neural network is constructed, and the neural network is used to perform multiple feature extraction and fusion on the target image to obtain a depth map of the target image.
  • the neural network includes N layers, and each layer includes a cascaded main feature extractor, an extraction and fusion module, and a fusion output device, where N is a positive integer greater than 1;
  • the main feature extractor of the first layer is used to perform feature extraction on the target image, and output the obtained feature map to the main feature extractor of the second layer and the extraction and fusion module and the fusion output device of the first layer;
  • the extraction and fusion module of the first layer performs feature extraction on the feature map output by the main feature extractor of the first layer, and outputs the obtained feature map to the fusion output device of the first layer and all The extraction and fusion module and the fusion output device of the second layer;
  • the main feature extractor of the i-th layer is used to perform feature extraction on the feature map output by the main feature extractor of the i-1th layer, and output the obtained feature map to the main feature extractor of the i+1th layer and the first
  • the extraction and fusion module and fusion exporter of layer i where i is an integer and 1 ⁇ i ⁇ N;
  • the extraction and fusion module of the i-th layer is used to perform feature extraction and fusion on the feature map output by the extraction and fusion module of the i-1th layer and the feature map output by the main feature extractor of the i-th layer, And output the obtained feature map to the fusion exporter of the i-th layer and the extraction and fusion module and the fusion exporter of the i+1th layer;
  • the main feature extractor of the Nth layer is used to perform feature extraction on the feature map output by the main feature extractor of the N-1th layer, and output the obtained feature map to the extraction and fusion module and the fusion output device of the Nth layer ;
  • the extraction and fusion module of the Nth layer is used to perform feature extraction and fusion on the feature map output by the extraction and fusion module of the N-1 layer and the feature map output by the main feature extractor of the Nth layer, And output the obtained feature map to the Nth layer fusion output device;
  • the fusion output device of the Nth layer is used to extract the feature map output by the main feature extractor of the Nth layer, the feature map output by the extraction and fusion module of the Nth layer, and the extraction of the N-1th layer And perform feature extraction and fusion on the feature map output by the fusion module, and output the obtained feature map to the N-1th layer fusion output device;
  • the fusion output device of the i-th layer is used to extract the feature map output by the main feature extractor of the i-th layer, the feature map output by the i-th layer and the fusion module, and the extraction of the i-1th layer Perform feature extraction and fusion on the feature map output by the fusion module and the feature map output by the fusion exporter of the i+1th layer, and output the obtained feature map to the fusion exporter of the i-1th layer;
  • the fusion output device of the first layer is used for the feature map output by the main feature extractor of the first layer, the feature map output by the extraction and fusion module of the first layer, and the fusion output device of the second layer
  • the output feature map is subjected to feature extraction and fusion to obtain a depth map of the target image.
  • cascading means that multiple components or functional modules are connected in a straight line, and the output of the previous component or functional module is used as the input of the latter component or functional module; in addition, the depth map means that An image showing the distance of each object in the shooting scene from the camera.
  • the extraction and fusion module of the jth layer includes N+1-j auxiliary feature extractors, where j is an integer and 1 ⁇ j ⁇ N;
  • the first auxiliary feature extractor of the first layer is used to perform feature extraction on the feature map output by the main feature extractor of the first layer, and output the obtained feature map to the second to second layers of the first layer.
  • the k-th auxiliary feature extractor of the first layer is used for feature extraction and fusion of the feature maps output by the main feature extractor of the first layer and the first to k-1 auxiliary feature extractors, and The obtained feature map is output to the k+1 to Nth auxiliary feature extractor and fusion output device of the first layer and the kth auxiliary feature extractor of the second layer, where k is an integer 1 ⁇ k ⁇ N;
  • the Nth auxiliary feature extractor of the first layer is used for feature extraction and fusion of the feature maps output by the main feature extractor of the first layer and the first to N-1 auxiliary feature extractors, and The obtained feature map is output to the fusion exporter of the first layer and the fusion exporter of the second layer.
  • the first auxiliary feature extractor of the mth layer is used to perform feature extraction on the feature map output by the main feature extractor of the mth layer to obtain the first feature map;
  • the x-th auxiliary feature extractor of the m-th layer is used to perform feature extraction and fusion on the feature maps output by the main feature extractor of the m-th layer and the first to x-1 auxiliary feature extractors, and The obtained feature map is output to the k+1 to Nth auxiliary feature extractor and fusion output device of the mth layer, and the kth auxiliary feature extractor of the m+1th layer, where x is an integer And 1 ⁇ x ⁇ n.
  • the first auxiliary feature extractor of the N-1th layer is used to perform feature extraction on the feature map output by the main feature extractor of the N-1th layer to obtain a fourth feature map; receiving; The fifth feature map output by the first auxiliary feature extractor of the N-2th layer, the fourth feature map and the fifth feature map are merged to obtain a sixth feature map, and the sixth feature map is Output to the fusion output device of the N-1th layer.
  • the second aspect of the present invention discloses a depth map acquisition device, which includes an acquisition unit and a construction unit;
  • the acquiring unit is used to acquire a single target image
  • the construction unit is used to construct a neural network, and the neural network is used to perform multiple feature extraction and fusion on the target image to obtain a depth map of the target image.
  • the neural network includes N layers, and each layer includes a cascaded main feature extractor, an extraction and fusion module, and a fusion output device, where N is a positive integer greater than 1;
  • the main feature extractor of the first layer is used to perform feature extraction on the target image, and output the obtained feature map to the main feature extractor of the second layer and the extraction and fusion module and the fusion output device of the first layer;
  • the extraction and fusion module of the first layer performs feature extraction on the feature map output by the main feature extractor of the first layer, and outputs the obtained feature map to the fusion output device of the first layer and all The extraction and fusion module and the fusion output device of the second layer;
  • the main feature extractor of the i-th layer is used to perform feature extraction on the feature map output by the main feature extractor of the i-1th layer, and output the obtained feature map to the main feature extractor of the i+1th layer and the first
  • the extraction and fusion module and fusion exporter of layer i where i is an integer and 1 ⁇ i ⁇ N;
  • the extraction and fusion module of the i-th layer is used to perform feature extraction and fusion on the feature map output by the extraction and fusion module of the i-1th layer and the feature map output by the main feature extractor of the i-th layer, And output the obtained feature map to the fusion exporter of the i-th layer and the extraction and fusion module and the fusion exporter of the i+1th layer;
  • the main feature extractor of the Nth layer is used to perform feature extraction on the feature map output by the main feature extractor of the N-1th layer, and output the obtained feature map to the extraction and fusion module and the fusion output device of the Nth layer ;
  • the extraction and fusion module of the Nth layer is used to perform feature extraction and fusion on the feature map output by the extraction and fusion module of the N-1 layer and the feature map output by the main feature extractor of the Nth layer, And output the obtained feature map to the Nth layer fusion output device;
  • the fusion output device of the Nth layer is used to extract the feature map output by the main feature extractor of the Nth layer, the feature map output by the extraction and fusion module of the Nth layer, and the extraction of the N-1th layer And perform feature extraction and fusion on the feature map output by the fusion module, and output the obtained feature map to the N-1th layer fusion output device;
  • the fusion output device of the i-th layer is used to extract the feature map output by the main feature extractor of the i-th layer, the feature map output by the i-th layer and the fusion module, and the extraction of the i-1th layer Perform feature extraction and fusion on the feature map output by the fusion module and the feature map output by the fusion exporter of the i+1th layer, and output the obtained feature map to the fusion exporter of the i-1th layer;
  • the fusion output device of the first layer is used for the feature map output by the main feature extractor of the first layer, the feature map output by the extraction and fusion module of the first layer, and the fusion output device of the second layer
  • the output feature map is subjected to feature extraction and fusion to obtain a depth map of the target image.
  • the extraction and fusion module of the jth layer includes N+1-j auxiliary feature extractors, where j is an integer and 1 ⁇ j ⁇ N;
  • the first auxiliary feature extractor of the first layer is used to perform feature extraction on the feature map output by the main feature extractor of the first layer, and output the obtained feature map to the second to second layers of the first layer.
  • the k-th auxiliary feature extractor of the first layer is used for feature extraction and fusion of the feature maps output by the main feature extractor of the first layer and the first to k-1 auxiliary feature extractors, and The obtained feature map is output to the k+1 to Nth auxiliary feature extractor and fusion output device of the first layer and the kth auxiliary feature extractor of the second layer, where k is an integer 1 ⁇ k ⁇ N;
  • the Nth auxiliary feature extractor of the first layer is used for feature extraction and fusion of the feature maps output by the main feature extractor of the first layer and the first to N-1 auxiliary feature extractors, and The obtained feature map is output to the fusion exporter of the first layer and the fusion exporter of the second layer.
  • the first auxiliary feature extractor of the mth layer is used to perform feature extraction on the feature map output by the main feature extractor of the mth layer to obtain the first feature map;
  • the x-th auxiliary feature extractor of the m-th layer is used to perform feature extraction and fusion on the feature maps output by the main feature extractor of the m-th layer and the first to x-1th auxiliary feature extractors, and The obtained feature map is output to the k+1 to Nth auxiliary feature extractor and fusion output device of the mth layer, and the kth auxiliary feature extractor of the m+1th layer, where x is an integer And 1 ⁇ x ⁇ n.
  • the first auxiliary feature extractor of the N-1th layer is used to perform feature extraction on the feature map output by the main feature extractor of the N-1th layer to obtain a fourth feature map; receiving; The fifth feature map output by the first auxiliary feature extractor of the N-2th layer, the fourth feature map and the fifth feature map are merged to obtain a sixth feature map, and the sixth feature map is Output to the fusion output device of the N-1th layer.
  • a third aspect of the present invention discloses a storage medium in which a program code is stored, and when the program code is executed, the method of the first aspect is executed;
  • a fourth aspect of the present invention discloses an image fusion device, the device includes a processor and a transceiver, wherein the transceiver function described in the second aspect can be implemented by the transceiver, and the logic function described in the second aspect (The specific function of the logic unit) can be realized by the processor;
  • the fifth aspect of the present invention discloses a computer program product, the computer program product contains program code; when the program code is executed, the method of the first aspect is executed.
  • a method for acquiring a depth map is disclosed in the embodiment provided by the present invention.
  • a single target image is acquired; a neural network is constructed, and the neural network is used to perform multiple feature extraction and fusion on the target image to obtain a depth map of the target image.
  • the feature image of each layer can be extracted by setting a multi-scale and multi-level feature extractor in the neural network, and then multiple feature images can be fused to Obtain multi-scale and multi-level depth images, so that users can use the depth images for three-dimensional modeling or simulation, and provide convenience for users to perform complex three-dimensional image processing based on a single image.
  • the invention greatly reduces the equipment cost by processing a single image to obtain a depth image.
  • FIG. 1 is a schematic diagram of an image fusion network architecture provided by an embodiment of the present invention
  • Figure 1a is a schematic diagram of a deep residual network provided by an embodiment of the present invention.
  • FIG. 2 is a depth prediction diagram provided by an embodiment of the present invention.
  • FIG. 3 is a schematic flowchart of a method for acquiring a depth map according to an embodiment of the present invention
  • FIG. 4 is a schematic structural diagram of an apparatus for acquiring a depth map according to an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of the physical structure of a depth map acquiring device provided by an embodiment of the present invention.
  • the embodiment of the present invention provides a method and device for acquiring a depth map.
  • the method includes: acquiring a single target image; constructing a neural network for performing multiple feature extraction and fusion on the target image to Obtain a depth map of the target image.
  • the feature image of each layer can be extracted by setting a multi-scale and multi-level feature extractor in the neural network, and then multiple feature images can be fused to Obtain multi-scale and multi-level depth images, so that users can use the depth images for three-dimensional modeling or simulation, and provide convenience for users to perform complex three-dimensional image processing based on a single image.
  • the invention greatly reduces the equipment cost by processing a single image to obtain a depth image.
  • the single target image acquired by the present invention can be an RGB image, a grayscale image, or a binary image.
  • This embodiment proposes a single image acquired based on a common camera.
  • this technology can be applied to personal computers (PCs) and small devices, including but not limited to mobile devices such as Android ⁇ IOS.
  • the RGB picture refers to the picture obtained by the changes of the three color channels of red (R), green (G), and blue (B) and their mutual superposition.
  • RGB is the color representing the three channels of red, green, and blue. This standard includes almost all the colors that human vision can perceive, and it is one of the most widely used color systems at present.
  • the network structure diagram shown in Figure 1 (the network structure diagram is only a schematic diagram, the present invention does not limit the number of image layers and the number of auxiliary feature extractors), the figure mainly includes three links: The first is the target image Perform image preprocessing to make the target image meet the requirements of the main feature extractor; the second is to use the main feature extractor and the auxiliary feature to extract feature maps; the third is to fuse the extracted feature maps.
  • the main feature extractor is a ResNet50 structure, which is called a deep residual network, and performs feature extraction on images. Specifically, feature extraction is to extract information such as object texture, object contour, object and object edge in the image. The method of feature extraction is the result of learning through each layer of feature extractors through sample input.
  • the main feature extractor can also be an AlexNet structure (a machine learning algorithm named after Alex) or a VGG structure (the structure is a machine learning algorithm proposed by Oxford Visual Geometry Group of the University of Oxford), and the present invention is not limited here. .
  • the schematic diagram of the deep residual network shown in Figure 1a because the depth of the network affects the classification and recognition effect of the model, for example, when the conventional network is stacked to a certain depth, the deeper the network layer, the more obvious the disappearance of the gradient, making the network The problem of poor classification effect.
  • the deep residual network structure can deepen the network layer while preventing the gradient from disappearing, so as to achieve a better classification effect.
  • the deep residual network has a skip structure. For example, suppose that the input of a certain neural network is x and the expected output is H(x), that is, H(x) is the desired complex latent mapping.
  • image preprocessing refers to preprocessing the input image to meet the input of the deep residual residual network ResNet50. Specifically, image preprocessing essentially scales and crops the image.
  • the auxiliary feature extractor is used to fuse the feature map extracted by the previous feature extractor in the same layer, the feature map obtained by long sampling from the lower layer, and the feature map obtained by downsampling from the upper layer for fusion. Obtain a new feature map.
  • the auxiliary feature extractor contains several convolutional structures, whose function is to fuse the feature map extracted from the previous feature extractor in the same layer with the feature map obtained by downsampling from the upper layer feature to form a new Feature map.
  • the number of auxiliary feature extractors in Figure 1 is 4, then 1-4 are the same structure, including the same size and number of convolution kernels.
  • the auxiliary feature extractor structure includes two convolutional layers and one activation layer.
  • the size of the convolution kernel is 3 ⁇ 3.
  • the network structure of image feature collection in this embodiment is divided into 4 layers, each layer will use a main feature extractor and several auxiliary feature extractors, from the first layer of the network shown in Figure 1 to the second
  • the number of auxiliary feature extractors of the four-layer network decreases layer by layer, for example, the first layer has 4 auxiliary feature extractors, the second layer has 3 auxiliary feature extractors, and the third layer has 2 auxiliary feature extractors.
  • One and one auxiliary feature extractor of the fourth layer It should be pointed out that the secondary feature extractor of the second layer multiplexes any three of the four secondary feature extractors of the first layer.
  • the third layer also multiplexes any 2 auxiliary feature extractors from the 4 first layer, which will not be listed one by one in the following.
  • Feature map 4_0, feature map 3_2, feature map 2_3, feature map 1_4, and feature map 0_5 respectively correspond to the result of fusion of the feature map extracted by the main feature extractor with the feature map extracted by the auxiliary feature extractor layer by layer.
  • the feature map 0_5 is the final depth prediction image (the depth prediction image shown in Figure 2).
  • the fusion method is layer addition. After the addition, it undergoes two 3 ⁇ 3 convolution processing and one activation layer processing, and then the processing result is sent to the upper layer and merged with the image features of the upper layer .
  • Figure 2 it should be pointed out that for Figure 2, it can be simply understood that the closer the object to the camera, the darker the color, and the farther the color is, the lighter.
  • FIG. 3 is a schematic flowchart of a method for acquiring a depth map according to an embodiment of the present invention.
  • a method for acquiring a depth map provided by an embodiment of the present invention includes the following contents:
  • the execution subject of this embodiment may be a smart phone, a wearable device, an electronic device with a camera function, or a personal computer and other devices.
  • the execution subject is a smart phone as an example.
  • the target image can be downloaded from the Internet, received from other electronic devices, or taken through a lens.
  • the target image may be a single RGB image.
  • S102 Construct a neural network, which is used to perform multiple feature extraction and fusion on the target image to obtain a depth map of the target image.
  • the depth map refers to an image that can indicate the distance of each object in the shooting scene from the camera.
  • the neural network includes N layers, and each layer includes a cascaded main feature extractor, an extraction and fusion module, and a fusion output device, where N is a positive integer greater than 1; where the cascade is Refers to multiple components or functional modules in a linear series, and the output of the previous component or functional module is used as the input of the next component or functional module;
  • the determination of the number of layers can be determined according to the ability of the executive body. It can be defaulted by the system or manually selected. There is no restriction here. Specific layers will be selected for illustration later.
  • the method further includes: determining whether the size of the target image exceeds a threshold; if the size of the target image exceeds the threshold, preprocessing the target image, To get the processed target image.
  • the main feature extractor is used to extract at least one of the following information in the target image: object texture, object contour, object and object edge.
  • each layer of the main feature extractor, extraction and fusion module, and fusion exporter is as follows:
  • the main feature extractor of the first layer is used to perform feature extraction on the target image, and output the obtained feature map to the main feature extractor of the second layer and the extraction and fusion module and the fusion output device of the first layer;
  • the extraction and fusion module of the first layer performs feature extraction on the feature map output by the main feature extractor of the first layer, and outputs the obtained feature map to the fusion output device of the first layer and all The extraction and fusion module and the fusion output device of the second layer;
  • the main feature extractor of the i-th layer is used to perform feature extraction on the feature map output by the main feature extractor of the i-1th layer, and output the obtained feature map to the main feature extractor of the i+1th layer and the first
  • the extraction and fusion module and fusion output device of layer i where i is an integer and 1 ⁇ i ⁇ N;
  • the extraction and fusion module of the i-th layer is used to output the extraction and fusion module of the i-1th layer Perform feature extraction and fusion on the feature map of the i-th layer and the feature map output by the main feature extractor of the i-th layer, and output the obtained feature map to the fusion output device of the i-th layer and the i+1-th layer Extraction and fusion module and fusion output device;
  • the main feature extractor of the Nth layer is used to extract the feature map output by the main feature extractor of the N-1th layer, and output the obtained feature map to the Nth layer
  • the extraction and fusion module of the Nth layer is used to perform feature extraction and fusion on the feature map output by the extraction and fusion module of the N-1 layer and the feature map output by the main feature extractor of the Nth layer, And output the obtained feature map to the Nth layer fusion output device;
  • the fusion output device of the Nth layer is used to extract the feature map output by the main feature extractor of the Nth layer, the feature map output by the extraction and fusion module of the Nth layer, and the extraction of the N-1th layer And perform feature extraction and fusion on the feature map output by the fusion module, and output the obtained feature map to the N-1th layer fusion output device;
  • the fusion output device of the i-th layer is used to extract the feature map output by the main feature extractor of the i-th layer, the feature map output by the i-th layer and the fusion module, and the extraction of the i-1th layer Perform feature extraction and fusion on the feature map output by the fusion module and the feature map output by the fusion exporter of the i+1th layer, and output the obtained feature map to the fusion exporter of the i-1th layer;
  • the fusion output device of the first layer is used for the feature map output by the main feature extractor of the first layer, the feature map output by the extraction and fusion module of the first layer, and the fusion output device of the second layer
  • the output feature map is subjected to feature extraction and fusion to obtain a depth map of the target image.
  • the extraction and fusion module of the jth layer includes N+1-j auxiliary feature extractors, where j is an integer and 1 ⁇ j ⁇ N;
  • auxiliary feature extractor of each layer is as follows:
  • the first auxiliary feature extractor of the first layer is used to perform feature extraction on the feature map output by the main feature extractor of the first layer, and output the obtained feature map to the second to second layers of the first layer.
  • the k-th auxiliary feature extractor of the first layer is used for feature extraction and fusion of the feature maps output by the main feature extractor of the first layer and the first to k-1 auxiliary feature extractors, and The obtained feature map is output to the k+1 to Nth auxiliary feature extractor and fusion output device of the first layer and the kth auxiliary feature extractor of the second layer, where k is an integer 1 ⁇ k ⁇ N;
  • the Nth auxiliary feature extractor of the first layer is used for feature extraction and fusion of the feature maps output by the main feature extractor of the first layer and the first to N-1 auxiliary feature extractors, and The obtained feature map is output to the fusion exporter of the first layer and the fusion exporter of the second layer.
  • the first auxiliary feature extractor of the m-th layer is used to perform feature extraction on the feature map output by the main feature extractor of the m-th layer to obtain the first feature map; -1
  • the x-th auxiliary feature extractor of the m-th layer is used to perform feature extraction and fusion on the feature maps output by the main feature extractor of the m-th layer and the first to x-1 auxiliary feature extractors, and The obtained feature map is output to the k+1 to Nth auxiliary feature extractor and fusion output device of the mth layer, and the kth auxiliary feature extractor of the m+1th layer, where x is an integer And 1 ⁇ x ⁇ n.
  • the first auxiliary feature extractor of the N-1th layer is used to perform feature extraction on the feature map output by the main feature extractor of the N-1th layer to obtain a fourth feature map. ; Receive the fifth feature map output by the first auxiliary feature extractor of the N-2th layer, merge the fourth feature map with the fifth feature map to obtain a sixth feature map, and combine the sixth The feature map is output to the fusion output device of the N-1th layer
  • the length and degree of the target image are X and Y respectively; each time the main feature extractor is processed, the length and width will be scaled to 1/2 of the original image.
  • the length and width of the original image of the first layer are X/2 and Y/2 respectively (the original image is the image processed by the main feature extractor of this layer); in the same way, the original image of the second layer is The length and width are X/4 and Y/4 respectively; in the same way, the length and width of the original image of the third layer are X/8 and Y/8 respectively. Since the third layer is the last layer, the original image whose length and width are X/8 and Y/8 respectively is the characteristic image of the third layer.
  • the auxiliary feature extractor of the first layer will process the extracted feature maps of the same layer.
  • the auxiliary feature extractor numbers of the first layer are 1 and 2 respectively.
  • the auxiliary feature extractor 1 needs to process the original image to obtain the feature map 1; next, it needs to input the original image and feature map 1 of the first layer into the auxiliary feature extractor 2 for processing to obtain the feature map 2.
  • the feature map of the first layer can be obtained. Since the second layer has only one auxiliary feature extractor, the feature extractor will input the original image of the second layer and the feature image 1 of the first layer to obtain the feature image 3, and then according to the feature image 3 and the third layer
  • the original map and feature map 2 acquire the feature map of the second layer.
  • each feature extractor is numbered in Figure 1, where 0_0, 1_0, 2_0, 3_0 and 4_0 are used to identify the main feature extractor, and the remaining tags are all auxiliary feature extractors.
  • the input is the feature map of 1_0, the feature map of 1_1 (the two are the same as the length and width of the feature map of 1_2) and the feature map of 0_2 (0_2
  • the feature of 1_2 is down-sampled once to make the image length and width half).
  • the input is the output feature maps of all the feature extractors on the left side of the same layer and The feature map corresponding to the upper layer is down-sampled.
  • the method of fusion is channel addition.
  • the feature extractor 1_4 its input is (1_0, 1_1, 1_2, 1_3 and down-sampled 0_4 and up-sampled 2_3).
  • the uppermost layer such as 0_5, its input is (0_0, 0_1,0_2, 0_3, 0_4 and up-sampled 1_4).
  • the part of the main feature extractor is to use ResNet for the first feature extraction.
  • the subsequent auxiliary feature extractor actually makes the feature map extracted by the main feature richer, and then performs the fusion output to the subsequent calculation .
  • the shallow feature extractor extracts features such as position, shape, and size; among them, it is understandable that, for example, the feature extractor of the first N/2 layers can be understood as the feature of the shallow layer. Extractor.
  • the deep feature extractor is based on the preset feature matrix to process the features of the upper layer and the features of this layer.
  • the feature extractor of the last N/2 layer can be understood as a deep feature extractor.
  • the feature map extracted in the shallow layer corresponds to the feature map extracted in the first layer
  • the feature map extracted in the deep layer is the feature map extracted from the remaining layers.
  • the feature map extracted in the shallow layer corresponds to the feature map extracted in the first two layers
  • the feature map extracted in the deep layer is the feature map extracted from the remaining layers.
  • a single target image is obtained; a neural network is constructed, and the neural network is used to perform multiple feature extraction and fusion on the target image to obtain the target image Depth of field map.
  • the feature image of each layer can be extracted by setting a multi-scale and multi-level feature extractor in the neural network, and then multiple feature images can be fused to Obtain multi-scale and multi-level depth images, so that users can use the depth images for three-dimensional modeling or simulation, and provide convenience for users to perform complex three-dimensional image processing based on a single image.
  • the invention greatly reduces the equipment cost by processing a single image to obtain a depth image.
  • FIG. 4 is a schematic structural diagram of an image fusion provided by an embodiment of the present invention.
  • an apparatus 200 for acquiring a depth map provided by an embodiment of the present invention, wherein the apparatus 200 includes an acquiring unit 201 and a construction unit 202;
  • the obtaining unit 201 is used to obtain a single target image
  • the construction unit 202 is configured to construct a neural network, and the neural network is used to perform multiple feature extraction and fusion on the target image to obtain a depth map of the target image.
  • the neural network includes N layers, and each layer includes a cascaded main feature extractor, an extraction and fusion module, and a fusion output device, where N is a positive integer greater than 1;
  • the main feature extractor of the first layer is used to perform feature extraction on the target image, and output the obtained feature map to the main feature extractor of the second layer and the extraction and fusion module and the fusion output device of the first layer;
  • the extraction and fusion module of the first layer performs feature extraction on the feature map output by the main feature extractor of the first layer, and outputs the obtained feature map to the fusion output device of the first layer and all The extraction and fusion module and the fusion output device of the second layer;
  • the main feature extractor of the i-th layer is used to perform feature extraction on the feature map output by the main feature extractor of the i-1th layer, and output the obtained feature map to the main feature extractor of the i+1th layer and the first
  • the extraction and fusion module and fusion exporter of layer i where i is an integer and 1 ⁇ i ⁇ N;
  • the extraction and fusion module of the i-th layer is used to perform feature extraction and fusion on the feature map output by the extraction and fusion module of the i-1th layer and the feature map output by the main feature extractor of the i-th layer, And output the obtained feature map to the fusion exporter of the i-th layer and the extraction and fusion module and the fusion exporter of the i+1th layer;
  • the main feature extractor of the Nth layer is used to perform feature extraction on the feature map output by the main feature extractor of the N-1th layer, and output the obtained feature map to the extraction and fusion module and the fusion output device of the Nth layer ;
  • the extraction and fusion module of the Nth layer is used to perform feature extraction and fusion on the feature map output by the extraction and fusion module of the N-1 layer and the feature map output by the main feature extractor of the Nth layer, And output the obtained feature map to the Nth layer fusion output device;
  • the fusion output device of the Nth layer is used to extract the feature map output by the main feature extractor of the Nth layer, the feature map output by the extraction and fusion module of the Nth layer, and the extraction of the N-1th layer And perform feature extraction and fusion on the feature map output by the fusion module, and output the obtained feature map to the N-1th layer fusion output device;
  • the fusion output device of the i-th layer is used to extract the feature map output by the main feature extractor of the i-th layer, the feature map output by the i-th layer and the fusion module, and the extraction of the i-1th layer Perform feature extraction and fusion on the feature map output by the fusion module and the feature map output by the fusion exporter of the i+1th layer, and output the obtained feature map to the fusion exporter of the i-1th layer;
  • the fusion output device of the first layer is used for the feature map output by the main feature extractor of the first layer, the feature map output by the extraction and fusion module of the first layer, and the fusion output device of the second layer
  • the output feature map is subjected to feature extraction and fusion to obtain a depth map of the target image.
  • the extraction and fusion module of the jth layer includes N+1-j auxiliary feature extractors, where j is an integer and 1 ⁇ j ⁇ N;
  • the first auxiliary feature extractor of the first layer is used to perform feature extraction on the feature map output by the main feature extractor of the first layer, and output the obtained feature map to the second to second layers of the first layer.
  • the k-th auxiliary feature extractor of the first layer is used for feature extraction and fusion of the feature maps output by the main feature extractor of the first layer and the first to k-1 auxiliary feature extractors, and The obtained feature map is output to the k+1 to Nth auxiliary feature extractor and fusion output device of the first layer and the kth auxiliary feature extractor of the second layer, where k is an integer 1 ⁇ k ⁇ N;
  • the Nth auxiliary feature extractor of the first layer is used for feature extraction and fusion of the feature maps output by the main feature extractor of the first layer and the first to N-1 auxiliary feature extractors, and The obtained feature map is output to the fusion exporter of the first layer and the fusion exporter of the second layer.
  • the first auxiliary feature extractor of the m-th layer is used to perform feature extraction on the feature map output by the main feature extractor of the m-th layer to obtain the first feature map;
  • the second feature map output by the auxiliary feature extractor, the first feature map and the second feature map are merged to obtain a third feature map, and the third feature map is output to the mth layer.
  • the x-th auxiliary feature extractor of the m-th layer is used to perform feature extraction and fusion on the feature maps output by the main feature extractor of the m-th layer and the first to x-1th auxiliary feature extractors, and The obtained feature map is output to the k+1 to Nth auxiliary feature extractor and fusion output device of the mth layer, and the kth auxiliary feature extractor of the m+1th layer, where x is an integer And 1 ⁇ x ⁇ n.
  • the first auxiliary feature extractor of the N-1th layer is used to perform feature extraction on the feature map output by the main feature extractor of the N-1th layer to obtain the fourth feature map;
  • the fifth feature map output by the first auxiliary feature extractor of the layer, the fourth feature map and the fifth feature map are merged to obtain a sixth feature map, and the sixth feature map is output to the The fusion exporter of layer N-1.
  • the above-mentioned unit may be used to execute the method described in any of the above-mentioned embodiments.
  • FIG. 5 is a schematic structural diagram of an electronic device 300 provided by an embodiment of the present application.
  • the electronic device 300 includes an application The processor 310, the memory 320, the communication interface 330, and one or more programs 321, wherein the one or more programs 321 are stored in the memory 320 and are configured to be executed by the application processor 310.
  • the processor 310 performs the following operations:
  • a neural network is constructed, and the neural network is used to perform multiple feature extraction and fusion on the target image to obtain a depth map of the target image.
  • the main feature extractor of the first layer is used to perform feature extraction on the target image, and output the obtained feature map to the main feature extractor of the second layer and the extraction and fusion module and the fusion output device of the first layer;
  • the extraction and fusion module of the first layer performs feature extraction on the feature map output by the main feature extractor of the first layer, and outputs the obtained feature map to the fusion output device of the first layer and all The extraction and fusion module and the fusion output device of the second layer;
  • the main feature extractor of the i-th layer is used to perform feature extraction on the feature map output by the main feature extractor of the i-1th layer, and output the obtained feature map to the main feature extractor of the i+1th layer and the first
  • the extraction and fusion module and fusion exporter of layer i where i is an integer and 1 ⁇ i ⁇ N;
  • the extraction and fusion module of the i-th layer is used to perform feature extraction and fusion on the feature map output by the extraction and fusion module of the i-1th layer and the feature map output by the main feature extractor of the i-th layer, And output the obtained feature map to the fusion exporter of the i-th layer and the extraction and fusion module and the fusion exporter of the i+1th layer;
  • the main feature extractor of the Nth layer is used to perform feature extraction on the feature map output by the main feature extractor of the N-1th layer, and output the obtained feature map to the extraction and fusion module and the fusion output device of the Nth layer ;
  • the extraction and fusion module of the Nth layer is used to perform feature extraction and fusion on the feature map output by the extraction and fusion module of the N-1 layer and the feature map output by the main feature extractor of the Nth layer, And output the obtained feature map to the Nth layer fusion output device;
  • the fusion output device of the Nth layer is used to extract the feature map output by the main feature extractor of the Nth layer, the feature map output by the extraction and fusion module of the Nth layer, and the extraction of the N-1th layer And perform feature extraction and fusion on the feature map output by the fusion module, and output the obtained feature map to the N-1th layer fusion output device;
  • the fusion output device of the i-th layer is used to extract the feature map output by the main feature extractor of the i-th layer, the feature map output by the i-th layer and the fusion module, and the extraction of the i-1th layer Perform feature extraction and fusion on the feature map output by the fusion module and the feature map output by the fusion exporter of the i+1th layer, and output the obtained feature map to the fusion exporter of the i-1th layer;
  • the fusion output device of the first layer is used for the feature map output by the main feature extractor of the first layer, the feature map output by the extraction and fusion module of the first layer, and the fusion output device of the second layer
  • the output feature map is subjected to feature extraction and fusion to obtain a depth map of the target image.
  • the first auxiliary feature extractor of the first layer is used to perform feature extraction on the feature map output by the main feature extractor of the first layer, and output the obtained feature map to the second to second layers of the first layer.
  • the k-th auxiliary feature extractor of the first layer is used for feature extraction and fusion of the feature maps output by the main feature extractor of the first layer and the first to k-1 auxiliary feature extractors, and The obtained feature map is output to the k+1 to Nth auxiliary feature extractor and fusion output device of the first layer and the kth auxiliary feature extractor of the second layer, where k is an integer 1 ⁇ k ⁇ N;
  • the Nth auxiliary feature extractor of the first layer is used for feature extraction and fusion of the feature maps output by the main feature extractor of the first layer and the first to N-1 auxiliary feature extractors, and The obtained feature map is output to the fusion exporter of the first layer and the fusion exporter of the second layer.
  • a storage medium stores program code.
  • the program code When the program code is executed, the method in the foregoing method embodiment is executed.
  • a computer program product in another embodiment, contains program code; when the program code is executed, the method in the foregoing method embodiment will be executed.

Abstract

A method and apparatus for obtaining a depth-of-field image, the method comprising: obtaining a single target image (S101); and constructing a neural network which is used for performing multiple feature extraction and fusion on the target image to obtain a depth-of-field image of the target image (S102). In the method, on the basis of a single image obtained by an ordinary camera, a feature image of each layer may be extracted by means of setting a multi-scale and multi-level feature extractor in the neural network, and then a plurality of feature images may be fused so as to obtain a multi-scale and multi-level depth-of-field image, thereby facilitating users in utilizing the depth-of-field image for three-dimensional modeling or simulation, and further facilitating the users in performing complex three-dimensional image processing on the basis of the single image. At the same time, the described method greatly reduces device costs by means of processing a single image to obtain a depth-of-field image.

Description

一种景深图获取方法及装置Method and device for acquiring depth map
本申请要求于2019年5月7日提交中国专利局,申请号为201910377551.2、发明名称为“一种景深图获取方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on May 7, 2019. The application number is 201910377551.2 and the title of the invention is "A method and device for acquiring a depth map". The entire content is incorporated into this application by reference. in.
技术领域Technical field
本发明涉及计算机领域,具体涉及了一种景深图获取方法及装置。The present invention relates to the field of computers, in particular to a method and device for acquiring a depth map.
背景技术Background technique
随着科学技术的发展,用户对于照片的要求越来越高。比如通过物理技术提升分辨率,提升色彩对比度等手段以提升用户的感知。With the development of science and technology, users have higher and higher requirements for photos. For example, physical technology is used to improve resolution and color contrast to enhance user perception.
现有技术中,获取景深信息往往通过特殊的摄像头、双目摄像头或者激光测距等设备的获取方法,其设备成本较高。另外,当前构建景深图像的相关算法存在特征提取结构单一的问题,使得提取到的特征信息较为有限,不利于构建复杂的三维图像。In the prior art, the depth of field information is often obtained through a special camera, binocular camera, or laser ranging and other equipment acquisition method, and the equipment cost is relatively high. In addition, the current related algorithms for constructing depth images have the problem of a single feature extraction structure, which makes the extracted feature information relatively limited, which is not conducive to the construction of complex three-dimensional images.
发明内容Summary of the invention
本发明实施例提供了一种智能景深图获取方法及装置,通过使用本发明提供的方法,能够基于普通摄像头获取到的单张图像,通过在深度学习网络架构中设置多尺度、多层次的特征提取器,获取景深图像信息。The embodiment of the present invention provides a method and device for acquiring a smart depth map. By using the method provided by the present invention, a single image acquired by a common camera can be used by setting multi-scale and multi-level features in the deep learning network architecture. Extractor to obtain depth image information.
本发明第一方面公开了一种景深图获取方法,所述方法包括:The first aspect of the present invention discloses a method for acquiring a depth map, and the method includes:
获取单张目标图像;Obtain a single target image;
构建神经网络,所述神经网络用于对所述目标图像进行多次特征提取及融合,以得到所述目标图像的景深图。A neural network is constructed, and the neural network is used to perform multiple feature extraction and fusion on the target image to obtain a depth map of the target image.
其中,可选的,所述神经网络包括N层,每层包括级联的主特征提取器、提取及融合模块以及融合输出器,其中,N为大于1的正整数;Wherein, optionally, the neural network includes N layers, and each layer includes a cascaded main feature extractor, an extraction and fusion module, and a fusion output device, where N is a positive integer greater than 1;
第一层的主特征提取器用于对所述目标图像进行特征提取,并将得到的特征图输出给第二层的主特征提取器以及所述第一层的提取及融合模块及融合输出器;The main feature extractor of the first layer is used to perform feature extraction on the target image, and output the obtained feature map to the main feature extractor of the second layer and the extraction and fusion module and the fusion output device of the first layer;
所述第一层的提取及融合模块对用于所述第一层的主特征提取器输出的特征图进行特征提取,并将得到的特征图输出给所述第一层的融合输出器以及所述第二层的提取及融合模块及融合输出器;The extraction and fusion module of the first layer performs feature extraction on the feature map output by the main feature extractor of the first layer, and outputs the obtained feature map to the fusion output device of the first layer and all The extraction and fusion module and the fusion output device of the second layer;
第i层的主特征提取器用于对第i-1层的主特征提取器输出的特征图进行特征提取,并将得到的特征图输出给第i+1层的主特征提取器以及所述第i层的提取及融合模块及融合输出器,其中,i为整数且1<i<N;The main feature extractor of the i-th layer is used to perform feature extraction on the feature map output by the main feature extractor of the i-1th layer, and output the obtained feature map to the main feature extractor of the i+1th layer and the first The extraction and fusion module and fusion exporter of layer i, where i is an integer and 1<i<N;
所述第i层的提取及融合模块用于对所述第i-1层的提取及融合模块输出的特征图以及所述第i层的主特征提取器输出的特征图进行特征提取及融合,并将得到的特征图输出给所述第i层的融合输出器以及所述第i+1层的提取及融合模块及融合输出器;The extraction and fusion module of the i-th layer is used to perform feature extraction and fusion on the feature map output by the extraction and fusion module of the i-1th layer and the feature map output by the main feature extractor of the i-th layer, And output the obtained feature map to the fusion exporter of the i-th layer and the extraction and fusion module and the fusion exporter of the i+1th layer;
第N层的主特征提取器用于对第N-1层的主特征提取器输出的特征图进行特征提取,并将得到的特征图输出给所述第N层的提取及融合模块及融合输出器;The main feature extractor of the Nth layer is used to perform feature extraction on the feature map output by the main feature extractor of the N-1th layer, and output the obtained feature map to the extraction and fusion module and the fusion output device of the Nth layer ;
所述第N层的提取及融合模块用于对所述第N-1层的提取及融合模块输出的特征图以及所述第N层的主特征提取器输出的特征图进行特征提取及融合,并将得到的特征图输出给所述第N层的融合输出器;The extraction and fusion module of the Nth layer is used to perform feature extraction and fusion on the feature map output by the extraction and fusion module of the N-1 layer and the feature map output by the main feature extractor of the Nth layer, And output the obtained feature map to the Nth layer fusion output device;
所述第N层的融合输出器用于对所述第N层的主特征提取器输出的特征图、所述第N层的提取及融合模块输出的特征图以及所述第N-1层的提取及融合模块输出的特征图进行特征提取及融合,并将得到的特征图输出给所述第N-1层的融合输出器;The fusion output device of the Nth layer is used to extract the feature map output by the main feature extractor of the Nth layer, the feature map output by the extraction and fusion module of the Nth layer, and the extraction of the N-1th layer And perform feature extraction and fusion on the feature map output by the fusion module, and output the obtained feature map to the N-1th layer fusion output device;
所述第i层的融合输出器用于对所述第i层的主特征提取器输出的特征图、 所述第i层的提取及融合模块输出的特征图、所述第i-1层的提取及融合模块输出的特征图以及所述第i+1层的融合输出器输出的特征图进行特征提取及融合,并将得到的特征图输出给所述第i-1层的融合输出器;The fusion output device of the i-th layer is used to extract the feature map output by the main feature extractor of the i-th layer, the feature map output by the i-th layer and the fusion module, and the extraction of the i-1th layer Perform feature extraction and fusion on the feature map output by the fusion module and the feature map output by the fusion exporter of the i+1th layer, and output the obtained feature map to the fusion exporter of the i-1th layer;
所述第一层的融合输出器用于对所述第一层的主特征提取器输出的特征图、所述第一层的提取及融合模块输出的特征图以及所述第二层的融合输出器输出的特征图进行特征提取及融合,以得到所述目标图像的景深图。The fusion output device of the first layer is used for the feature map output by the main feature extractor of the first layer, the feature map output by the extraction and fusion module of the first layer, and the fusion output device of the second layer The output feature map is subjected to feature extraction and fusion to obtain a depth map of the target image.
另外,需要指出的是,级联是指多个元器件或功能模块呈直线串联,且前一个元器件或功能模块的输出作为后一个元器件或功能模块的输入;另外,景深图是指可以表示出拍摄场景内中的每一个物体距离摄像头的距离的图像。In addition, it should be pointed out that cascading means that multiple components or functional modules are connected in a straight line, and the output of the previous component or functional module is used as the input of the latter component or functional module; in addition, the depth map means that An image showing the distance of each object in the shooting scene from the camera.
其中,需要指出的是,第j层的提取及融合模块包括N+1-j个辅助特征提取器,其中,j为整数且1≤j≤N;Among them, it should be pointed out that the extraction and fusion module of the jth layer includes N+1-j auxiliary feature extractors, where j is an integer and 1≤j≤N;
所述第一层的第一个辅助特征提取器用于对所述第一层的主特征提取器输出的特征图进行特征提取,并将得到的特征图输出给所述第一层的第二至第N个辅助特征提取器及融合输出器以及所述第二层的第一个辅助特征提取器;The first auxiliary feature extractor of the first layer is used to perform feature extraction on the feature map output by the main feature extractor of the first layer, and output the obtained feature map to the second to second layers of the first layer. The Nth auxiliary feature extractor and fusion output device and the first auxiliary feature extractor of the second layer;
所述第一层的第k个辅助特征提取器用于对所述第一层的主特征提取器以及第一至第k-1个辅助特征提取器输出的特征图进行特征提取及融合,并将得到的特征图输出给所述第一层的第k+1至第N个辅助特征提取器及融合输出器以及所述第二层的第k个辅助特征提取器,其中,k为整数1<k<N;The k-th auxiliary feature extractor of the first layer is used for feature extraction and fusion of the feature maps output by the main feature extractor of the first layer and the first to k-1 auxiliary feature extractors, and The obtained feature map is output to the k+1 to Nth auxiliary feature extractor and fusion output device of the first layer and the kth auxiliary feature extractor of the second layer, where k is an integer 1< k<N;
所述第一层的第N个辅助特征提取器用于对所述第一层的主特征提取器以及第一至第N-1个辅助特征提取器输出的特征图进行特征提取及融合,并将得到的特征图输出给所述第一层的融合输出器以及所述第二层的融合输出器。The Nth auxiliary feature extractor of the first layer is used for feature extraction and fusion of the feature maps output by the main feature extractor of the first layer and the first to N-1 auxiliary feature extractors, and The obtained feature map is output to the fusion exporter of the first layer and the fusion exporter of the second layer.
其中,可选的,所述第m层的第一个辅助特征提取器用于对所述第m层的主特征提取器输出的特征图进行特征提取以获取第一特征图;接收第m-1层的第一个辅特征提取器输出的第二特征图,将所述第一特征图与所述第二特征图进行融合以获取第三特征图,将所述第三特征图输出给所述第m层的第二至第n个辅助特征提取器及融合输出器以及所述第m+1层的第一个辅助特征提取 器,其中,m为正整数且1<m<N-1,n为整数且n=N+1-m;Wherein, optionally, the first auxiliary feature extractor of the mth layer is used to perform feature extraction on the feature map output by the main feature extractor of the mth layer to obtain the first feature map; The second feature map output by the first auxiliary feature extractor of the layer, the first feature map and the second feature map are merged to obtain a third feature map, and the third feature map is output to the The second to nth auxiliary feature extractors and fusion output devices of the mth layer and the first auxiliary feature extractor of the m+1th layer, where m is a positive integer and 1<m<N-1, n is an integer and n=N+1-m;
所述第m层的第x个辅助特征提取器用于对所述第m层的主特征提取器以及第一至第x-1个辅助特征提取器输出的特征图进行特征提取及融合,并将得到的特征图输出给所述第m层的第k+1至第N个辅助特征提取器及融合输出器以及所述第m+1层的第k个辅助特征提取器,其中,x为整数且1<x<n。The x-th auxiliary feature extractor of the m-th layer is used to perform feature extraction and fusion on the feature maps output by the main feature extractor of the m-th layer and the first to x-1 auxiliary feature extractors, and The obtained feature map is output to the k+1 to Nth auxiliary feature extractor and fusion output device of the mth layer, and the kth auxiliary feature extractor of the m+1th layer, where x is an integer And 1<x<n.
其中,可选的,所述第N-1层的第一个辅助特征提取器用于对所述第N-1层的主特征提取器输出的特征图进行特征提取以获取第四特征图;接收第N-2层的第一个辅特征提取器输出的第五特征图,将所述第四特征图与所述第五特征图进行融合以获取第六特征图,将所述第六特征图输出给所述第N-1层的融合输出器。Wherein, optionally, the first auxiliary feature extractor of the N-1th layer is used to perform feature extraction on the feature map output by the main feature extractor of the N-1th layer to obtain a fourth feature map; receiving; The fifth feature map output by the first auxiliary feature extractor of the N-2th layer, the fourth feature map and the fifth feature map are merged to obtain a sixth feature map, and the sixth feature map is Output to the fusion output device of the N-1th layer.
本发明的第二方面公开了一种景深图获取装置,所述装置包括获取单元和构建单元;The second aspect of the present invention discloses a depth map acquisition device, which includes an acquisition unit and a construction unit;
所述获取单元,用于获取单张目标图像;The acquiring unit is used to acquire a single target image;
所述构建单元,用于构建神经网络,所述神经网络用于对所述目标图像进行多次特征提取及融合,以得到所述目标图像的景深图。The construction unit is used to construct a neural network, and the neural network is used to perform multiple feature extraction and fusion on the target image to obtain a depth map of the target image.
其中,所述神经网络包括N层,每层包括级联的主特征提取器、提取及融合模块以及融合输出器,其中,N为大于1的正整数;Wherein, the neural network includes N layers, and each layer includes a cascaded main feature extractor, an extraction and fusion module, and a fusion output device, where N is a positive integer greater than 1;
第一层的主特征提取器用于对所述目标图像进行特征提取,并将得到的特征图输出给第二层的主特征提取器以及所述第一层的提取及融合模块及融合输出器;The main feature extractor of the first layer is used to perform feature extraction on the target image, and output the obtained feature map to the main feature extractor of the second layer and the extraction and fusion module and the fusion output device of the first layer;
所述第一层的提取及融合模块对用于所述第一层的主特征提取器输出的特征图进行特征提取,并将得到的特征图输出给所述第一层的融合输出器以及所述第二层的提取及融合模块及融合输出器;The extraction and fusion module of the first layer performs feature extraction on the feature map output by the main feature extractor of the first layer, and outputs the obtained feature map to the fusion output device of the first layer and all The extraction and fusion module and the fusion output device of the second layer;
第i层的主特征提取器用于对第i-1层的主特征提取器输出的特征图进行特征提取,并将得到的特征图输出给第i+1层的主特征提取器以及所述第i层的提取及融合模块及融合输出器,其中,i为整数且1<i<N;The main feature extractor of the i-th layer is used to perform feature extraction on the feature map output by the main feature extractor of the i-1th layer, and output the obtained feature map to the main feature extractor of the i+1th layer and the first The extraction and fusion module and fusion exporter of layer i, where i is an integer and 1<i<N;
所述第i层的提取及融合模块用于对所述第i-1层的提取及融合模块输出的特征图以及所述第i层的主特征提取器输出的特征图进行特征提取及融合,并将得到的特征图输出给所述第i层的融合输出器以及所述第i+1层的提取及融合模块及融合输出器;The extraction and fusion module of the i-th layer is used to perform feature extraction and fusion on the feature map output by the extraction and fusion module of the i-1th layer and the feature map output by the main feature extractor of the i-th layer, And output the obtained feature map to the fusion exporter of the i-th layer and the extraction and fusion module and the fusion exporter of the i+1th layer;
第N层的主特征提取器用于对第N-1层的主特征提取器输出的特征图进行特征提取,并将得到的特征图输出给所述第N层的提取及融合模块及融合输出器;The main feature extractor of the Nth layer is used to perform feature extraction on the feature map output by the main feature extractor of the N-1th layer, and output the obtained feature map to the extraction and fusion module and the fusion output device of the Nth layer ;
所述第N层的提取及融合模块用于对所述第N-1层的提取及融合模块输出的特征图以及所述第N层的主特征提取器输出的特征图进行特征提取及融合,并将得到的特征图输出给所述第N层的融合输出器;The extraction and fusion module of the Nth layer is used to perform feature extraction and fusion on the feature map output by the extraction and fusion module of the N-1 layer and the feature map output by the main feature extractor of the Nth layer, And output the obtained feature map to the Nth layer fusion output device;
所述第N层的融合输出器用于对所述第N层的主特征提取器输出的特征图、所述第N层的提取及融合模块输出的特征图以及所述第N-1层的提取及融合模块输出的特征图进行特征提取及融合,并将得到的特征图输出给所述第N-1层的融合输出器;The fusion output device of the Nth layer is used to extract the feature map output by the main feature extractor of the Nth layer, the feature map output by the extraction and fusion module of the Nth layer, and the extraction of the N-1th layer And perform feature extraction and fusion on the feature map output by the fusion module, and output the obtained feature map to the N-1th layer fusion output device;
所述第i层的融合输出器用于对所述第i层的主特征提取器输出的特征图、所述第i层的提取及融合模块输出的特征图、所述第i-1层的提取及融合模块输出的特征图以及所述第i+1层的融合输出器输出的特征图进行特征提取及融合,并将得到的特征图输出给所述第i-1层的融合输出器;The fusion output device of the i-th layer is used to extract the feature map output by the main feature extractor of the i-th layer, the feature map output by the i-th layer and the fusion module, and the extraction of the i-1th layer Perform feature extraction and fusion on the feature map output by the fusion module and the feature map output by the fusion exporter of the i+1th layer, and output the obtained feature map to the fusion exporter of the i-1th layer;
所述第一层的融合输出器用于对所述第一层的主特征提取器输出的特征图、所述第一层的提取及融合模块输出的特征图以及所述第二层的融合输出器输出的特征图进行特征提取及融合,以得到所述目标图像的景深图。The fusion output device of the first layer is used for the feature map output by the main feature extractor of the first layer, the feature map output by the extraction and fusion module of the first layer, and the fusion output device of the second layer The output feature map is subjected to feature extraction and fusion to obtain a depth map of the target image.
其中,第j层的提取及融合模块包括N+1-j个辅助特征提取器,其中,j为整数且1≤j≤N;Among them, the extraction and fusion module of the jth layer includes N+1-j auxiliary feature extractors, where j is an integer and 1≤j≤N;
所述第一层的第一个辅助特征提取器用于对所述第一层的主特征提取器输出的特征图进行特征提取,并将得到的特征图输出给所述第一层的第二至第N个辅助特征提取器及融合输出器以及所述第二层的第一个辅助特征提取器;The first auxiliary feature extractor of the first layer is used to perform feature extraction on the feature map output by the main feature extractor of the first layer, and output the obtained feature map to the second to second layers of the first layer. The Nth auxiliary feature extractor and fusion output device and the first auxiliary feature extractor of the second layer;
所述第一层的第k个辅助特征提取器用于对所述第一层的主特征提取器以及第一至第k-1个辅助特征提取器输出的特征图进行特征提取及融合,并将得到的特征图输出给所述第一层的第k+1至第N个辅助特征提取器及融合输出器以及所述第二层的第k个辅助特征提取器,其中,k为整数1<k<N;The k-th auxiliary feature extractor of the first layer is used for feature extraction and fusion of the feature maps output by the main feature extractor of the first layer and the first to k-1 auxiliary feature extractors, and The obtained feature map is output to the k+1 to Nth auxiliary feature extractor and fusion output device of the first layer and the kth auxiliary feature extractor of the second layer, where k is an integer 1< k<N;
所述第一层的第N个辅助特征提取器用于对所述第一层的主特征提取器以及第一至第N-1个辅助特征提取器输出的特征图进行特征提取及融合,并将得到的特征图输出给所述第一层的融合输出器以及所述第二层的融合输出器。The Nth auxiliary feature extractor of the first layer is used for feature extraction and fusion of the feature maps output by the main feature extractor of the first layer and the first to N-1 auxiliary feature extractors, and The obtained feature map is output to the fusion exporter of the first layer and the fusion exporter of the second layer.
其中,可选的,所述第m层的第一个辅助特征提取器用于对所述第m层的主特征提取器输出的特征图进行特征提取以获取第一特征图;接收第m-1层的第一个辅特征提取器输出的第二特征图,将所述第一特征图与所述第二特征图进行融合以获取第三特征图,将所述第三特征图输出给所述第m层的第二至第n个辅助特征提取器及融合输出器以及所述第m+1层的第一个辅助特征提取器,其中,m为正整数且1<m<N-1,n为整数且n=N+1-m;Wherein, optionally, the first auxiliary feature extractor of the mth layer is used to perform feature extraction on the feature map output by the main feature extractor of the mth layer to obtain the first feature map; The second feature map output by the first auxiliary feature extractor of the layer, the first feature map and the second feature map are merged to obtain a third feature map, and the third feature map is output to the The second to nth auxiliary feature extractors and fusion output devices of the mth layer and the first auxiliary feature extractor of the m+1th layer, where m is a positive integer and 1<m<N-1, n is an integer and n=N+1-m;
所述第m层的第x个辅助特征提取器用于对所述第m层的主特征提取器以及第一至第x-1个辅助特征提取器输出的特征图进行特征提取及融合,并将得到的特征图输出给所述第m层的第k+1至第N个辅助特征提取器及融合输出器以及所述第m+1层的第k个辅助特征提取器,其中,x为整数且1<x<n。The x-th auxiliary feature extractor of the m-th layer is used to perform feature extraction and fusion on the feature maps output by the main feature extractor of the m-th layer and the first to x-1th auxiliary feature extractors, and The obtained feature map is output to the k+1 to Nth auxiliary feature extractor and fusion output device of the mth layer, and the kth auxiliary feature extractor of the m+1th layer, where x is an integer And 1<x<n.
其中,可选的,所述第N-1层的第一个辅助特征提取器用于对所述第N-1层的主特征提取器输出的特征图进行特征提取以获取第四特征图;接收第N-2层的第一个辅特征提取器输出的第五特征图,将所述第四特征图与所述第五特征图进行融合以获取第六特征图,将所述第六特征图输出给所述第N-1层的融合输出器。Wherein, optionally, the first auxiliary feature extractor of the N-1th layer is used to perform feature extraction on the feature map output by the main feature extractor of the N-1th layer to obtain a fourth feature map; receiving; The fifth feature map output by the first auxiliary feature extractor of the N-2th layer, the fourth feature map and the fifth feature map are merged to obtain a sixth feature map, and the sixth feature map is Output to the fusion output device of the N-1th layer.
本发明第三方面公开了一种存储介质,所述存储介质中存储有程序代码,当所述程序代码被运行时,所述第一方面的方法会被执行;A third aspect of the present invention discloses a storage medium in which a program code is stored, and when the program code is executed, the method of the first aspect is executed;
本发明第四方面公开了一种图像融合的装置,所述装置包括处理器和收发器,其中,第二方面所述的收发功能可通过所述收发器实现,第二方面所述的 逻辑功能(即逻辑单元所具体的功能)可由处理器实现;A fourth aspect of the present invention discloses an image fusion device, the device includes a processor and a transceiver, wherein the transceiver function described in the second aspect can be implemented by the transceiver, and the logic function described in the second aspect (The specific function of the logic unit) can be realized by the processor;
本发明第五方面公开了一种计算机程序产品,所述计算机程序产品中包含有程序代码;当所述程序代码被运行时,所述第一方面的方法会被执行。The fifth aspect of the present invention discloses a computer program product, the computer program product contains program code; when the program code is executed, the method of the first aspect is executed.
可以看出,本发明提供的实施例中公开了一种景深图获取方法。在本发明提供的实施例中的,获取单张目标图像;构建神经网络,所述神经网络用于对所述目标图像进行多次特征提取及融合,以得到所述目标图像的景深图。通过使用本发明提供的方法,能够基于普通摄像头获取到的单张图像,通过在神经网络中设置多尺度、多层次的特征提取器提取每层的特征图像,然后将多个特征图像进行融合以获取具有多尺度、多层次的景深图像,从而方便用户利用景深图像进行三维建模或仿真,进而为用户基于单张图像进行复杂的三维图像处理提供了便利。同时,本发明通过对单张图像处理获取景深图像的方式大大降低了设备成本。It can be seen that a method for acquiring a depth map is disclosed in the embodiment provided by the present invention. In the embodiment provided by the present invention, a single target image is acquired; a neural network is constructed, and the neural network is used to perform multiple feature extraction and fusion on the target image to obtain a depth map of the target image. By using the method provided by the present invention, based on a single image acquired by a common camera, the feature image of each layer can be extracted by setting a multi-scale and multi-level feature extractor in the neural network, and then multiple feature images can be fused to Obtain multi-scale and multi-level depth images, so that users can use the depth images for three-dimensional modeling or simulation, and provide convenience for users to perform complex three-dimensional image processing based on a single image. At the same time, the invention greatly reduces the equipment cost by processing a single image to obtain a depth image.
附图说明Description of the drawings
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly describe the technical solutions in the embodiments of the present invention, the following will briefly introduce the drawings needed in the embodiments. Obviously, the drawings in the following description are some embodiments of the present invention. For those of ordinary skill in the art, without creative work, other drawings can be obtained from these drawings.
图1为本发明实施例提供的一种图像融合的网络架构示意图;FIG. 1 is a schematic diagram of an image fusion network architecture provided by an embodiment of the present invention;
图1a为本发明实施例提供的一种深度残差网络示意图;Figure 1a is a schematic diagram of a deep residual network provided by an embodiment of the present invention;
图2为本发明的实施例提供的一种景深预测图;FIG. 2 is a depth prediction diagram provided by an embodiment of the present invention;
图3为本发明实施例提供的一种景深图获取方法的流程示意图;3 is a schematic flowchart of a method for acquiring a depth map according to an embodiment of the present invention;
图4为本发明实施例提供的一种景深图获取装置的结构示意图;4 is a schematic structural diagram of an apparatus for acquiring a depth map according to an embodiment of the present invention;
图5为本发明实施例提供的一种景深图获取装置的物理结构示意图。FIG. 5 is a schematic diagram of the physical structure of a depth map acquiring device provided by an embodiment of the present invention.
具体实施方式Detailed ways
本发明实施例提供了一种景深图获取方法及装置,所述方法包括:获取单张目标图像;构建神经网络,所述神经网络用于对所述目标图像进行多次特征提取及融合,以得到所述目标图像的景深图。通过使用本发明提供的方法,能够基于普通摄像头获取到的单张图像,通过在神经网络中设置多尺度、多层次的特征提取器提取每层的特征图像,然后将多个特征图像进行融合以获取具有多尺度、多层次的景深图像,从而方便用户利用景深图像进行三维建模或仿真,进而为用户基于单张图像进行复杂的三维图像处理提供了便利。同时,本发明通过对单张图像处理获取景深图像的方式大大降低了设备成本。The embodiment of the present invention provides a method and device for acquiring a depth map. The method includes: acquiring a single target image; constructing a neural network for performing multiple feature extraction and fusion on the target image to Obtain a depth map of the target image. By using the method provided by the present invention, based on a single image acquired by a common camera, the feature image of each layer can be extracted by setting a multi-scale and multi-level feature extractor in the neural network, and then multiple feature images can be fused to Obtain multi-scale and multi-level depth images, so that users can use the depth images for three-dimensional modeling or simulation, and provide convenience for users to perform complex three-dimensional image processing based on a single image. At the same time, the invention greatly reduces the equipment cost by processing a single image to obtain a depth image.
首先,需要指出的是,本发明获取的单张目标图像可以是RGB图像、灰度图像、或者二值化图像等图像类型,本实施例提出了一种基于普通摄像头获取到的单张图像,通过在深度学习网络架构中设置多尺度、多层次的特征提取器,解决在技术背景中提到的针对普通摄像头获取到的单张图像难以提取到丰富图像特征的问题。理论上该技术可应用于个人电脑(PC),也可应用于小型设备上,包含但不限于Android\IOS等移动设备。其中,RGB图是指对红(R)、绿(G)、蓝(B)三个颜色通道的变化以及它们相互之间的叠加得到的图。RGB即是代表红、绿、蓝三个通道的颜色,这个标准几乎包括了人类视力所能感知的所有颜色,是目前运用最广的颜色系统之一。First of all, it needs to be pointed out that the single target image acquired by the present invention can be an RGB image, a grayscale image, or a binary image. This embodiment proposes a single image acquired based on a common camera. By setting up a multi-scale and multi-level feature extractor in the deep learning network architecture, the problem that it is difficult to extract rich image features from a single image acquired by a common camera mentioned in the technical background is solved. Theoretically, this technology can be applied to personal computers (PCs) and small devices, including but not limited to mobile devices such as Android\IOS. Among them, the RGB picture refers to the picture obtained by the changes of the three color channels of red (R), green (G), and blue (B) and their mutual superposition. RGB is the color representing the three channels of red, green, and blue. This standard includes almost all the colors that human vision can perceive, and it is one of the most widely used color systems at present.
如图1所示的网络结构图(该网络结构图只是示意图,本发明并不限定图像层数和辅特征提取器的个数),该图中主要包括三个环节:第一是对目标图像进行图像预处理,以使得目标图像符合主特征提取器的要求;第二是利用主特征提取器和辅特征提取特征图;第三就是针对提取的特征图进行融合。The network structure diagram shown in Figure 1 (the network structure diagram is only a schematic diagram, the present invention does not limit the number of image layers and the number of auxiliary feature extractors), the figure mainly includes three links: The first is the target image Perform image preprocessing to make the target image meet the requirements of the main feature extractor; the second is to use the main feature extractor and the auxiliary feature to extract feature maps; the third is to fuse the extracted feature maps.
举例来说,主特征提取器为ResNet50结构,被称为深度残差网络,针对图像进行特征提取。具体的,特征提取是为了提取图像中的物体纹理、物体轮廓、物体与物体的边缘等信息。特征提取的方式是通过样本输入通过各层特征提取器进行学习的结果。另外,所述主特征提取器还可以是AlexNet结构(以Alex命名的机器学习算法)或VGG结构(该结构是牛津大学的Oxford Visual  Geometry Group提出的机器学习算法),本发明在此不做限定。For example, the main feature extractor is a ResNet50 structure, which is called a deep residual network, and performs feature extraction on images. Specifically, feature extraction is to extract information such as object texture, object contour, object and object edge in the image. The method of feature extraction is the result of learning through each layer of feature extractors through sample input. In addition, the main feature extractor can also be an AlexNet structure (a machine learning algorithm named after Alex) or a VGG structure (the structure is a machine learning algorithm proposed by Oxford Visual Geometry Group of the University of Oxford), and the present invention is not limited here. .
如图1a所示的深度残差网络示意图,由于网络的深度影响模型的分类和识别效果,例如,常规的网络堆叠到一定深度时会出现网络层越深,梯度消失的现象越明显,使得网络的分类效果不佳的问题。而深度残差网络结构可以在加深网络层的同时防止梯度消失,以实现较佳的分类效果。如图1a所示,深度残差网络是带有跳跃结构的。举例来说,假定某段神经网络的输入是x,期望输出是H(x),即H(x)是期望的复杂潜在映射,如果是要学习这样的模型,则训练难度会比较大;另外,如果已经学习到较饱和的准确率(或者当发现下层的误差变大时),那么接下来的学习目标就转变为恒等映射的学习,也就是使输入x近似于输出H(x),以保持在后面的层次中不会造成精度下降。在该深度残差网络结构图中,通过“shortcut connections(捷径连接)”的方式,直接把输入x传到输出作为初始结果,输出结果为H(x)=F(x)+x,当F(x)=0时,那么H(x)=x,也就是上面所提到的恒等映射。于是,ResNet相当于将学习目标改变了,不再是学习一个完整的输出,而是目标值H(X)和x的差值,也就是所谓的残差F(x):=H(x)-x,因此,后面的训练目标就是要将残差结果逼近于0,使到随着网络加深,准确率不下降。这种残差跳跃式的结构,打破了传统的神经网络n-1层的输出只能给n层作为输入的惯例,使某一层的输出可以直接跨过几层作为后面某一层的输入,其意义在于为叠加多层网络而使得整个学习模型的错误率不降反升的难题提供了新的方向。The schematic diagram of the deep residual network shown in Figure 1a, because the depth of the network affects the classification and recognition effect of the model, for example, when the conventional network is stacked to a certain depth, the deeper the network layer, the more obvious the disappearance of the gradient, making the network The problem of poor classification effect. The deep residual network structure can deepen the network layer while preventing the gradient from disappearing, so as to achieve a better classification effect. As shown in Figure 1a, the deep residual network has a skip structure. For example, suppose that the input of a certain neural network is x and the expected output is H(x), that is, H(x) is the desired complex latent mapping. If you want to learn such a model, the training will be more difficult; in addition, , If you have learned a more saturated accuracy rate (or when the error of the lower layer is found to become larger), then the next learning goal will be changed to the learning of identity mapping, that is, to make the input x approximate the output H(x), In order to remain in the later levels, the accuracy will not decrease. In this deep residual network structure diagram, through "shortcut connections", the input x is directly passed to the output as the initial result, and the output result is H(x)=F(x)+x, when F When (x)=0, then H(x)=x, which is the identity mapping mentioned above. Therefore, ResNet is equivalent to changing the learning goal. It is no longer learning a complete output, but the difference between the target value H(X) and x, which is the so-called residual F(x):=H(x) -x, therefore, the following training goal is to approach the residual result to 0, so that as the network deepens, the accuracy does not decrease. This residual jump structure breaks the traditional neural network n-1 layer output can only be given to n layers as the input convention, so that the output of a certain layer can directly cross several layers as the input of a later layer Its significance is to provide a new direction for the problem that the error rate of the entire learning model does not drop but rises by superimposing a multilayer network.
举例来说,图像预处理是指针对输入的图像进行预处理以满足深度残差残差网络ResNet50的输入。具体的,图像预处理实质对图像做了图像的缩放与剪裁。For example, image preprocessing refers to preprocessing the input image to meet the input of the deep residual residual network ResNet50. Specifically, image preprocessing essentially scales and crops the image.
举例来说,所述辅特征提取器用于将同层的前一个特征提取器提取的特征图、来自下层的经过长采样得到的特征图、以及来自上层的经过下采样得到的特征图进行融合以获取新的特征图。For example, the auxiliary feature extractor is used to fuse the feature map extracted by the previous feature extractor in the same layer, the feature map obtained by long sampling from the lower layer, and the feature map obtained by downsampling from the upper layer for fusion. Obtain a new feature map.
举例来说,辅特征提取器包含若干个卷积结构,作用在于将图中来自处于 同层的前一个特征提取器提取的特征图和来自上层特征经过下采样得到的特征图融合,形成新的特征图。比如图1辅特征提取器的个数为4个,那么1-4为相同结构,包含相同尺寸和数量的卷积核。其中,辅特征提取器结构为包含两个卷积层和一个激活层的结构。卷积核尺寸为3×3。For example, the auxiliary feature extractor contains several convolutional structures, whose function is to fuse the feature map extracted from the previous feature extractor in the same layer with the feature map obtained by downsampling from the upper layer feature to form a new Feature map. For example, the number of auxiliary feature extractors in Figure 1 is 4, then 1-4 are the same structure, including the same size and number of convolution kernels. Among them, the auxiliary feature extractor structure includes two convolutional layers and one activation layer. The size of the convolution kernel is 3×3.
如图1所述,在本实施例中图像特征采集的网络结构分为4层,每层会使用一个主特征提取器和若干辅特征提取器,从图1所示的第一层网络到第四层网络的辅特征提取器的数量逐层递减,例如,第一层的辅特征提取器为4个、第二层的辅特征提取器为3个、第三层的辅特征提取器为2个以及第四层的辅特征提取器为1个。需要指出的是,第二层的辅特征提取器是复用第一层4个中的任意3个辅特征提取器。同理,第三层也是复用第一层4个中的任意2个辅特征提取器,后续不再一一列举。特征图4_0、特征图3_2、特征图2_3、特征图1_4以及特征图0_5分别对应着由主特征提取器提取到的特征图逐层与辅特征提取器提取到的特征图融合的结果。而特征图0_5即为最终得到的景深预测图(如图2所示的景深预测图像)。融合的方式是层相加,相加之后经过两次3×3的卷积的处理,以及经过一次激活层的处理,然后将处理结果送至上一层,并与与上一层的图像特征融合。其中,需要指出的是的,针对图2,可以简单理解为距离离摄像头越近的物体其颜色显示越深,越远则越浅。As shown in Figure 1, the network structure of image feature collection in this embodiment is divided into 4 layers, each layer will use a main feature extractor and several auxiliary feature extractors, from the first layer of the network shown in Figure 1 to the second The number of auxiliary feature extractors of the four-layer network decreases layer by layer, for example, the first layer has 4 auxiliary feature extractors, the second layer has 3 auxiliary feature extractors, and the third layer has 2 auxiliary feature extractors. One and one auxiliary feature extractor of the fourth layer. It should be pointed out that the secondary feature extractor of the second layer multiplexes any three of the four secondary feature extractors of the first layer. In the same way, the third layer also multiplexes any 2 auxiliary feature extractors from the 4 first layer, which will not be listed one by one in the following. Feature map 4_0, feature map 3_2, feature map 2_3, feature map 1_4, and feature map 0_5 respectively correspond to the result of fusion of the feature map extracted by the main feature extractor with the feature map extracted by the auxiliary feature extractor layer by layer. The feature map 0_5 is the final depth prediction image (the depth prediction image shown in Figure 2). The fusion method is layer addition. After the addition, it undergoes two 3×3 convolution processing and one activation layer processing, and then the processing result is sent to the upper layer and merged with the image features of the upper layer . Among them, it should be pointed out that for Figure 2, it can be simply understood that the closer the object to the camera, the darker the color, and the farther the color is, the lighter.
请参阅图3,图3是本发明一个实施例提供的一种景深图获取方法的流程示意图。其中,如图3所示,本发明的一个实施例提供的一种景深图获取方法包括以下内容:Please refer to FIG. 3, which is a schematic flowchart of a method for acquiring a depth map according to an embodiment of the present invention. Wherein, as shown in FIG. 3, a method for acquiring a depth map provided by an embodiment of the present invention includes the following contents:
S101,获取单张目标图像;S101, acquiring a single target image;
可以理解的是,本实施例的执行主体可以是智能手机、穿戴设备、具备摄像功能的电子设备、或个人电脑等设备。其中,本实施例以执行主体为智能手机来举例说明。It is understandable that the execution subject of this embodiment may be a smart phone, a wearable device, an electronic device with a camera function, or a personal computer and other devices. Among them, in this embodiment, the execution subject is a smart phone as an example.
其中,需要指出的是,目标图像可以是从网络上下载的,接收其他电子设备发送的,也可以是通过镜头拍摄的。Among them, it should be pointed out that the target image can be downloaded from the Internet, received from other electronic devices, or taken through a lens.
其中,所述目标图像可以为单张RGB图像。Wherein, the target image may be a single RGB image.
S102,构建神经网络,所述神经网络用于对所述目标图像进行多次特征提取及融合,以得到所述目标图像的景深图。S102: Construct a neural network, which is used to perform multiple feature extraction and fusion on the target image to obtain a depth map of the target image.
其中,景深图是指可以表示出拍摄场景内中的每一个物体距离摄像头的距离的图像。Among them, the depth map refers to an image that can indicate the distance of each object in the shooting scene from the camera.
其中,需要指出的是,所述神经网络包括N层,每层包括级联的主特征提取器、提取及融合模块以及融合输出器,其中,N为大于1的正整数;其中,级联是指多个元器件或功能模块呈直线串联,且前一个元器件或功能模块的输出作为后一个元器件或功能模块的输入;Among them, it should be noted that the neural network includes N layers, and each layer includes a cascaded main feature extractor, an extraction and fusion module, and a fusion output device, where N is a positive integer greater than 1; where the cascade is Refers to multiple components or functional modules in a linear series, and the output of the previous component or functional module is used as the input of the next component or functional module;
其中,需要指出的是,其中,神经网络包含的层数越多,提取到的特征越丰富,但是随着特征提取次数的增多,特征图的尺寸越来越小,此时再进行特征提取就提取不到有效的特征;且特征提取器越多,网络参数越多,会导致网络的速度、对硬件的需求就越高,带来成本的增加。而如果选择特征提取器的减少,模型速度和成本会下降,但是模型的准确度会因提取到了较少的特征图而下降因此,确定分层数量可以是根据执行主体自身的能力来确定,也可以是系统默认的,也可以是人工选择的。在此不做限制。后续会选择具体的分层进行举例说明。Among them, it needs to be pointed out that the more layers the neural network contains, the richer the extracted features. However, as the number of feature extraction increases, the size of the feature map becomes smaller and smaller. At this time, feature extraction Effective features cannot be extracted; and the more feature extractors, the more network parameters, which will lead to higher network speeds, higher hardware requirements, and increased costs. If fewer feature extractors are selected, the speed and cost of the model will decrease, but the accuracy of the model will decrease due to the extraction of fewer feature maps. Therefore, the determination of the number of layers can be determined according to the ability of the executive body. It can be defaulted by the system or manually selected. There is no restriction here. Specific layers will be selected for illustration later.
另外,还需要指出的是,构建神经网络之前,所述方法还包括:判断所述目标图像的尺寸是否超过阈值;若所述目标图像的尺寸超过阈值时,对所述目标图像进行预处理,以得到处理过的目标图像。In addition, it should be pointed out that, before constructing the neural network, the method further includes: determining whether the size of the target image exceeds a threshold; if the size of the target image exceeds the threshold, preprocessing the target image, To get the processed target image.
另外,需要指出的是,所述主特征提取器用于提取所述目标图像中以下至少一种信息:物体纹理、物体轮廓、物体与物体的边缘。In addition, it should be pointed out that the main feature extractor is used to extract at least one of the following information in the target image: object texture, object contour, object and object edge.
具体的,每层主特征提取器、提取及融合模块以及融合输出器的作用如下:Specifically, the functions of each layer of the main feature extractor, extraction and fusion module, and fusion exporter are as follows:
第一层的主特征提取器用于对所述目标图像进行特征提取,并将得到的特征图输出给第二层的主特征提取器以及所述第一层的提取及融合模块及融合输出器;所述第一层的提取及融合模块对用于所述第一层的主特征提取器输出的 特征图进行特征提取,并将得到的特征图输出给所述第一层的融合输出器以及所述第二层的提取及融合模块及融合输出器;The main feature extractor of the first layer is used to perform feature extraction on the target image, and output the obtained feature map to the main feature extractor of the second layer and the extraction and fusion module and the fusion output device of the first layer; The extraction and fusion module of the first layer performs feature extraction on the feature map output by the main feature extractor of the first layer, and outputs the obtained feature map to the fusion output device of the first layer and all The extraction and fusion module and the fusion output device of the second layer;
第i层的主特征提取器用于对第i-1层的主特征提取器输出的特征图进行特征提取,并将得到的特征图输出给第i+1层的主特征提取器以及所述第i层的提取及融合模块及融合输出器,其中,i为整数且1<i<N;所述第i层的提取及融合模块用于对所述第i-1层的提取及融合模块输出的特征图以及所述第i层的主特征提取器输出的特征图进行特征提取及融合,并将得到的特征图输出给所述第i层的融合输出器以及所述第i+1层的提取及融合模块及融合输出器;第N层的主特征提取器用于对第N-1层的主特征提取器输出的特征图进行特征提取,并将得到的特征图输出给所述第N层的提取及融合模块及融合输出器;The main feature extractor of the i-th layer is used to perform feature extraction on the feature map output by the main feature extractor of the i-1th layer, and output the obtained feature map to the main feature extractor of the i+1th layer and the first The extraction and fusion module and fusion output device of layer i, where i is an integer and 1<i<N; the extraction and fusion module of the i-th layer is used to output the extraction and fusion module of the i-1th layer Perform feature extraction and fusion on the feature map of the i-th layer and the feature map output by the main feature extractor of the i-th layer, and output the obtained feature map to the fusion output device of the i-th layer and the i+1-th layer Extraction and fusion module and fusion output device; the main feature extractor of the Nth layer is used to extract the feature map output by the main feature extractor of the N-1th layer, and output the obtained feature map to the Nth layer The extraction and fusion module and the fusion output device;
所述第N层的提取及融合模块用于对所述第N-1层的提取及融合模块输出的特征图以及所述第N层的主特征提取器输出的特征图进行特征提取及融合,并将得到的特征图输出给所述第N层的融合输出器;The extraction and fusion module of the Nth layer is used to perform feature extraction and fusion on the feature map output by the extraction and fusion module of the N-1 layer and the feature map output by the main feature extractor of the Nth layer, And output the obtained feature map to the Nth layer fusion output device;
所述第N层的融合输出器用于对所述第N层的主特征提取器输出的特征图、所述第N层的提取及融合模块输出的特征图以及所述第N-1层的提取及融合模块输出的特征图进行特征提取及融合,并将得到的特征图输出给所述第N-1层的融合输出器;The fusion output device of the Nth layer is used to extract the feature map output by the main feature extractor of the Nth layer, the feature map output by the extraction and fusion module of the Nth layer, and the extraction of the N-1th layer And perform feature extraction and fusion on the feature map output by the fusion module, and output the obtained feature map to the N-1th layer fusion output device;
所述第i层的融合输出器用于对所述第i层的主特征提取器输出的特征图、所述第i层的提取及融合模块输出的特征图、所述第i-1层的提取及融合模块输出的特征图以及所述第i+1层的融合输出器输出的特征图进行特征提取及融合,并将得到的特征图输出给所述第i-1层的融合输出器;The fusion output device of the i-th layer is used to extract the feature map output by the main feature extractor of the i-th layer, the feature map output by the i-th layer and the fusion module, and the extraction of the i-1th layer Perform feature extraction and fusion on the feature map output by the fusion module and the feature map output by the fusion exporter of the i+1th layer, and output the obtained feature map to the fusion exporter of the i-1th layer;
所述第一层的融合输出器用于对所述第一层的主特征提取器输出的特征图、所述第一层的提取及融合模块输出的特征图以及所述第二层的融合输出器输出的特征图进行特征提取及融合,以得到所述目标图像的景深图。The fusion output device of the first layer is used for the feature map output by the main feature extractor of the first layer, the feature map output by the extraction and fusion module of the first layer, and the fusion output device of the second layer The output feature map is subjected to feature extraction and fusion to obtain a depth map of the target image.
另外,进一步需要指出的是,第j层的提取及融合模块包括N+1-j个辅助特征提取器,其中,j为整数且1≤j≤N;In addition, it should be further pointed out that the extraction and fusion module of the jth layer includes N+1-j auxiliary feature extractors, where j is an integer and 1≤j≤N;
具体的,每层的辅特征提取器的作用如下所示:Specifically, the role of the auxiliary feature extractor of each layer is as follows:
所述第一层的第一个辅助特征提取器用于对所述第一层的主特征提取器输出的特征图进行特征提取,并将得到的特征图输出给所述第一层的第二至第N个辅助特征提取器及融合输出器以及所述第二层的第一个辅助特征提取器;The first auxiliary feature extractor of the first layer is used to perform feature extraction on the feature map output by the main feature extractor of the first layer, and output the obtained feature map to the second to second layers of the first layer. The Nth auxiliary feature extractor and fusion output device and the first auxiliary feature extractor of the second layer;
所述第一层的第k个辅助特征提取器用于对所述第一层的主特征提取器以及第一至第k-1个辅助特征提取器输出的特征图进行特征提取及融合,并将得到的特征图输出给所述第一层的第k+1至第N个辅助特征提取器及融合输出器以及所述第二层的第k个辅助特征提取器,其中,k为整数1<k<N;The k-th auxiliary feature extractor of the first layer is used for feature extraction and fusion of the feature maps output by the main feature extractor of the first layer and the first to k-1 auxiliary feature extractors, and The obtained feature map is output to the k+1 to Nth auxiliary feature extractor and fusion output device of the first layer and the kth auxiliary feature extractor of the second layer, where k is an integer 1< k<N;
所述第一层的第N个辅助特征提取器用于对所述第一层的主特征提取器以及第一至第N-1个辅助特征提取器输出的特征图进行特征提取及融合,并将得到的特征图输出给所述第一层的融合输出器以及所述第二层的融合输出器。The Nth auxiliary feature extractor of the first layer is used for feature extraction and fusion of the feature maps output by the main feature extractor of the first layer and the first to N-1 auxiliary feature extractors, and The obtained feature map is output to the fusion exporter of the first layer and the fusion exporter of the second layer.
另外,进一步需要指出的是,所述第m层的第一个辅助特征提取器用于对所述第m层的主特征提取器输出的特征图进行特征提取以获取第一特征图;接收第m-1层的第一个辅特征提取器输出的第二特征图,将所述第一特征图与所述第二特征图进行融合以获取第三特征图,将所述第三特征图输出给所述第m层的第二至第n个辅助特征提取器及融合输出器以及所述第m+1层的第一个辅助特征提取器,其中,m为正整数且1<m<N-1,n为整数且n=N+1-m;In addition, it should be further pointed out that the first auxiliary feature extractor of the m-th layer is used to perform feature extraction on the feature map output by the main feature extractor of the m-th layer to obtain the first feature map; -1 The second feature map output by the first auxiliary feature extractor of layer, the first feature map and the second feature map are merged to obtain a third feature map, and the third feature map is output to The second to nth auxiliary feature extractors and fusion outputters of the mth layer and the first auxiliary feature extractor of the m+1th layer, where m is a positive integer and 1<m<N- 1, n is an integer and n=N+1-m;
所述第m层的第x个辅助特征提取器用于对所述第m层的主特征提取器以及第一至第x-1个辅助特征提取器输出的特征图进行特征提取及融合,并将得到的特征图输出给所述第m层的第k+1至第N个辅助特征提取器及融合输出器以及所述第m+1层的第k个辅助特征提取器,其中,x为整数且1<x<n。The x-th auxiliary feature extractor of the m-th layer is used to perform feature extraction and fusion on the feature maps output by the main feature extractor of the m-th layer and the first to x-1 auxiliary feature extractors, and The obtained feature map is output to the k+1 to Nth auxiliary feature extractor and fusion output device of the mth layer, and the kth auxiliary feature extractor of the m+1th layer, where x is an integer And 1<x<n.
另外,进一步需要指出的是,所述第N-1层的第一个辅助特征提取器用于对所述第N-1层的主特征提取器输出的特征图进行特征提取以获取第四特征图;接收第N-2层的第一个辅特征提取器输出的第五特征图,将所述第四特征图与所述第五特征图进行融合以获取第六特征图,将所述第六特征图输出给所述第N-1层的融合输出器In addition, it should be further pointed out that the first auxiliary feature extractor of the N-1th layer is used to perform feature extraction on the feature map output by the main feature extractor of the N-1th layer to obtain a fourth feature map. ; Receive the fifth feature map output by the first auxiliary feature extractor of the N-2th layer, merge the fourth feature map with the fifth feature map to obtain a sixth feature map, and combine the sixth The feature map is output to the fusion output device of the N-1th layer
另外,需要指出的是,多尺度是指每层获取的融合图像的尺寸不同。举例来说,比如一共分为3层,M=N-1;其中N为层数,M为辅特征提取器的数量。根据上述公式可知,辅特征提取器的数量为2。目标图像的长度与度分别为X和Y;每经过一次主特征提取器的处理,长与宽就会缩放至原图像的1/2。也就是说第一层的原始图像的长和宽分别为X/2和Y/2(该原始图为经过该层主特征提取器处理过的图像);同理,第二层的原始图像的长和宽分别为X/4和Y/4;同理,第3层的原始图像的长和宽分别为X/8和Y/8。由于第三层为最后一层,那么该长和宽分别为X/8和Y/8的原始图像为第三层的特征图像。第二层对应的辅特征提取器为1个;第一层对应的辅特征处理器为两个。可以理解的是,辅特征提取器的数量随着层数的增加逐渐递减。第一层的辅特征提取器会对同层的已提取的特征图进行处理。举例来说,比如第一层的辅特征提取器编号分别为1和2。那么辅特征提取器1需要对原始图进行处理以获取特征图1;接下来就需要将第一层的原始图和特征图1输入到辅特征提取器2中进行处理以获取特征图2。根据特征图2和第二层的特征图就可以获取第一层的特征图。由于第二层只有一个辅特征器提取器,该特征提取器会将第二层的原始图以及第一层的特征图1输入进去以获取特征图3,然后根据特征图3和第三层的原始图和特征图2获取第二层的特征图。需要指出的是,每层图像特征的提取方式是有差异的,本实施例中以一个主特征提取器和四个辅特征提取器、分为5层,包括四次特征提取来举例说明。如图1所示,图1中针对每个特征提取器进行了编号,其中0_0,1_0,2_0,3_0以及4_0用于标识主特征提取器,剩余的标记均为辅特征提取器。需要指出的是,从0_0,1_0,2_0,3_0以及4_0的特征图像其长和宽逐次变为原来的
Figure PCTCN2019121603-appb-000001
例如0_0的尺寸对应的输入是1×1的话,那么输出就是
Figure PCTCN2019121603-appb-000002
Figure PCTCN2019121603-appb-000003
输入到1_0,1_0输出
Figure PCTCN2019121603-appb-000004
同理,依次进行缩放的话,那么4_0就是
Figure PCTCN2019121603-appb-000005
因为经过了5个主特征提取器,每次缩减
Figure PCTCN2019121603-appb-000006
那么5个
Figure PCTCN2019121603-appb-000007
依次相乘就是
Figure PCTCN2019121603-appb-000008
In addition, it should be pointed out that multi-scale refers to the different sizes of the fused images acquired in each layer. For example, it is divided into 3 layers, M=N-1; where N is the number of layers, and M is the number of auxiliary feature extractors. According to the above formula, the number of auxiliary feature extractors is 2. The length and degree of the target image are X and Y respectively; each time the main feature extractor is processed, the length and width will be scaled to 1/2 of the original image. That is to say, the length and width of the original image of the first layer are X/2 and Y/2 respectively (the original image is the image processed by the main feature extractor of this layer); in the same way, the original image of the second layer is The length and width are X/4 and Y/4 respectively; in the same way, the length and width of the original image of the third layer are X/8 and Y/8 respectively. Since the third layer is the last layer, the original image whose length and width are X/8 and Y/8 respectively is the characteristic image of the third layer. There are one auxiliary feature extractor corresponding to the second layer; there are two auxiliary feature processors corresponding to the first layer. It can be understood that the number of auxiliary feature extractors gradually decreases as the number of layers increases. The auxiliary feature extractor of the first layer will process the extracted feature maps of the same layer. For example, the auxiliary feature extractor numbers of the first layer are 1 and 2 respectively. Then the auxiliary feature extractor 1 needs to process the original image to obtain the feature map 1; next, it needs to input the original image and feature map 1 of the first layer into the auxiliary feature extractor 2 for processing to obtain the feature map 2. According to the feature map 2 and the feature map of the second layer, the feature map of the first layer can be obtained. Since the second layer has only one auxiliary feature extractor, the feature extractor will input the original image of the second layer and the feature image 1 of the first layer to obtain the feature image 3, and then according to the feature image 3 and the third layer The original map and feature map 2 acquire the feature map of the second layer. It should be pointed out that the method of extracting image features of each layer is different. In this embodiment, one main feature extractor and four auxiliary feature extractors are divided into five layers, including four feature extractions. As shown in Figure 1, each feature extractor is numbered in Figure 1, where 0_0, 1_0, 2_0, 3_0 and 4_0 are used to identify the main feature extractor, and the remaining tags are all auxiliary feature extractors. It should be pointed out that the length and width of the feature images from 0_0, 1_0, 2_0, 3_0 and 4_0 become the original
Figure PCTCN2019121603-appb-000001
For example, if the input corresponding to the size of 0_0 is 1×1, then the output is
Figure PCTCN2019121603-appb-000002
will
Figure PCTCN2019121603-appb-000003
Input to 1_0, 1_0 output
Figure PCTCN2019121603-appb-000004
In the same way, if you scale sequentially, then 4_0 is
Figure PCTCN2019121603-appb-000005
Because after 5 main feature extractors, each reduction
Figure PCTCN2019121603-appb-000006
Then 5
Figure PCTCN2019121603-appb-000007
Multiplying sequentially is
Figure PCTCN2019121603-appb-000008
另外,针对图像融合,举例来说,例如1_2这里,其输入是1_0的特征图、 1_1的特征图(这两个和1_2特征图的长和宽是一样的)和0_2的的特征图(0_2的特征到1_2经过一次下采样使得图像长宽变为一半)。除了最上面的0_x层(x=0,1,2,3,4,5)外,针对其他所有中间的特征提取器,其输入都是同一层左侧所有的特征提取器的输出特征图和上一层对应的特征图经过下采样之后。另外,需要指出的是,融合的方式是通道相加。In addition, for image fusion, for example, for example 1_2 here, the input is the feature map of 1_0, the feature map of 1_1 (the two are the same as the length and width of the feature map of 1_2) and the feature map of 0_2 (0_2 The feature of 1_2 is down-sampled once to make the image length and width half). Except for the top 0_x layer (x=0,1,2,3,4,5), for all other intermediate feature extractors, the input is the output feature maps of all the feature extractors on the left side of the same layer and The feature map corresponding to the upper layer is down-sampled. In addition, it should be pointed out that the method of fusion is channel addition.
再举例来说,针对特征提取器1_4,它的输入是(1_0,1_1,1_2,1_3和经过下采样的0_4和经过上采样的2_3)。而位于最上面的层,比如0_5,它的输入就是(0_0,0_1,0_2,0_3,0_4和经过上采样的1_4)。也就是说,所以主特征提取器的部分就是利用ResNet进行第一次特征提取,后面的辅特征提取器实际上是使主特征提取的特征图变得更加丰富,然后进行融合输出到后面的计算。For another example, for the feature extractor 1_4, its input is (1_0, 1_1, 1_2, 1_3 and down-sampled 0_4 and up-sampled 2_3). And the uppermost layer, such as 0_5, its input is (0_0, 0_1,0_2, 0_3, 0_4 and up-sampled 1_4). In other words, so the part of the main feature extractor is to use ResNet for the first feature extraction. The subsequent auxiliary feature extractor actually makes the feature map extracted by the main feature richer, and then performs the fusion output to the subsequent calculation .
另外,需要指出的是,下采样的使用的是最大池化的方法,上采样使用的是双线性插值法。上述两种方法较为常用,在此不做详细介绍。In addition, it should be pointed out that the maximum pooling method is used for downsampling, and the bilinear interpolation method is used for upsampling. The above two methods are more commonly used and will not be introduced in detail here.
另外,需要进一步指出的是,浅层的特征提取器提取的是位置、形状、大小等特征;其中,可以理解的是,例如前N/2层的特征提取器都可以理解为浅层的特征提取器。而深层的特征提取器是基于预设的特征矩阵对上层特征采样和本层特征采样进行处理后得到的特征。其中,可以理解的是,例如后N/2层的特征提取器都可以理解为深层的特征提取器。比如浅层提取的特征图对应的是第一层提取的特征图,而深层提取的特征图是剩余的其它层提取的特征图。再比如,比如浅层提取的特征图对应的是前两层提取的特征图,而深层提取的特征图是剩余的其它层提取的特征图。In addition, it should be further pointed out that the shallow feature extractor extracts features such as position, shape, and size; among them, it is understandable that, for example, the feature extractor of the first N/2 layers can be understood as the feature of the shallow layer. Extractor. The deep feature extractor is based on the preset feature matrix to process the features of the upper layer and the features of this layer. Among them, it can be understood that, for example, the feature extractor of the last N/2 layer can be understood as a deep feature extractor. For example, the feature map extracted in the shallow layer corresponds to the feature map extracted in the first layer, and the feature map extracted in the deep layer is the feature map extracted from the remaining layers. For another example, for example, the feature map extracted in the shallow layer corresponds to the feature map extracted in the first two layers, and the feature map extracted in the deep layer is the feature map extracted from the remaining layers.
可以看出,通过本发明实施例公开的技术方案,获取单张目标图像;构建神经网络,所述神经网络用于对所述目标图像进行多次特征提取及融合,以得到所述目标图像的景深图。通过使用本发明提供的方法,能够基于普通摄像头获取到的单张图像,通过在神经网络中设置多尺度、多层次的特征提取器提取每层的特征图像,然后将多个特征图像进行融合以获取具有多尺度、多层次的景深图像,从而方便用户利用景深图像进行三维建模或仿真,进而为用户基于 单张图像进行复杂的三维图像处理提供了便利。同时,本发明通过对单张图像处理获取景深图像的方式大大降低了设备成本。It can be seen that through the technical solutions disclosed in the embodiments of the present invention, a single target image is obtained; a neural network is constructed, and the neural network is used to perform multiple feature extraction and fusion on the target image to obtain the target image Depth of field map. By using the method provided by the present invention, based on a single image acquired by a common camera, the feature image of each layer can be extracted by setting a multi-scale and multi-level feature extractor in the neural network, and then multiple feature images can be fused to Obtain multi-scale and multi-level depth images, so that users can use the depth images for three-dimensional modeling or simulation, and provide convenience for users to perform complex three-dimensional image processing based on a single image. At the same time, the invention greatly reduces the equipment cost by processing a single image to obtain a depth image.
请参阅图4,图4是本发明的一个实施例提供的一种图像融合的结构示意图。其中,如图4所示,本发明的一个实施例提供的一种景深图获取装置200,其中,该装置200包括获取单元201和构建单元202;Please refer to FIG. 4, which is a schematic structural diagram of an image fusion provided by an embodiment of the present invention. Wherein, as shown in FIG. 4, an apparatus 200 for acquiring a depth map provided by an embodiment of the present invention, wherein the apparatus 200 includes an acquiring unit 201 and a construction unit 202;
获取单元201,用于获取单张目标图像;The obtaining unit 201 is used to obtain a single target image;
构建单元202,用于构建神经网络,所述神经网络用于对所述目标图像进行多次特征提取及融合,以得到所述目标图像的景深图。The construction unit 202 is configured to construct a neural network, and the neural network is used to perform multiple feature extraction and fusion on the target image to obtain a depth map of the target image.
其中,所述神经网络包括N层,每层包括级联的主特征提取器、提取及融合模块以及融合输出器,其中,N为大于1的正整数;Wherein, the neural network includes N layers, and each layer includes a cascaded main feature extractor, an extraction and fusion module, and a fusion output device, where N is a positive integer greater than 1;
第一层的主特征提取器用于对所述目标图像进行特征提取,并将得到的特征图输出给第二层的主特征提取器以及所述第一层的提取及融合模块及融合输出器;The main feature extractor of the first layer is used to perform feature extraction on the target image, and output the obtained feature map to the main feature extractor of the second layer and the extraction and fusion module and the fusion output device of the first layer;
所述第一层的提取及融合模块对用于所述第一层的主特征提取器输出的特征图进行特征提取,并将得到的特征图输出给所述第一层的融合输出器以及所述第二层的提取及融合模块及融合输出器;The extraction and fusion module of the first layer performs feature extraction on the feature map output by the main feature extractor of the first layer, and outputs the obtained feature map to the fusion output device of the first layer and all The extraction and fusion module and the fusion output device of the second layer;
第i层的主特征提取器用于对第i-1层的主特征提取器输出的特征图进行特征提取,并将得到的特征图输出给第i+1层的主特征提取器以及所述第i层的提取及融合模块及融合输出器,其中,i为整数且1<i<N;The main feature extractor of the i-th layer is used to perform feature extraction on the feature map output by the main feature extractor of the i-1th layer, and output the obtained feature map to the main feature extractor of the i+1th layer and the first The extraction and fusion module and fusion exporter of layer i, where i is an integer and 1<i<N;
所述第i层的提取及融合模块用于对所述第i-1层的提取及融合模块输出的特征图以及所述第i层的主特征提取器输出的特征图进行特征提取及融合,并将得到的特征图输出给所述第i层的融合输出器以及所述第i+1层的提取及融合模块及融合输出器;The extraction and fusion module of the i-th layer is used to perform feature extraction and fusion on the feature map output by the extraction and fusion module of the i-1th layer and the feature map output by the main feature extractor of the i-th layer, And output the obtained feature map to the fusion exporter of the i-th layer and the extraction and fusion module and the fusion exporter of the i+1th layer;
第N层的主特征提取器用于对第N-1层的主特征提取器输出的特征图进行特征提取,并将得到的特征图输出给所述第N层的提取及融合模块及融合输出器;The main feature extractor of the Nth layer is used to perform feature extraction on the feature map output by the main feature extractor of the N-1th layer, and output the obtained feature map to the extraction and fusion module and the fusion output device of the Nth layer ;
所述第N层的提取及融合模块用于对所述第N-1层的提取及融合模块输出的特征图以及所述第N层的主特征提取器输出的特征图进行特征提取及融合,并将得到的特征图输出给所述第N层的融合输出器;The extraction and fusion module of the Nth layer is used to perform feature extraction and fusion on the feature map output by the extraction and fusion module of the N-1 layer and the feature map output by the main feature extractor of the Nth layer, And output the obtained feature map to the Nth layer fusion output device;
所述第N层的融合输出器用于对所述第N层的主特征提取器输出的特征图、所述第N层的提取及融合模块输出的特征图以及所述第N-1层的提取及融合模块输出的特征图进行特征提取及融合,并将得到的特征图输出给所述第N-1层的融合输出器;The fusion output device of the Nth layer is used to extract the feature map output by the main feature extractor of the Nth layer, the feature map output by the extraction and fusion module of the Nth layer, and the extraction of the N-1th layer And perform feature extraction and fusion on the feature map output by the fusion module, and output the obtained feature map to the N-1th layer fusion output device;
所述第i层的融合输出器用于对所述第i层的主特征提取器输出的特征图、所述第i层的提取及融合模块输出的特征图、所述第i-1层的提取及融合模块输出的特征图以及所述第i+1层的融合输出器输出的特征图进行特征提取及融合,并将得到的特征图输出给所述第i-1层的融合输出器;The fusion output device of the i-th layer is used to extract the feature map output by the main feature extractor of the i-th layer, the feature map output by the i-th layer and the fusion module, and the extraction of the i-1th layer Perform feature extraction and fusion on the feature map output by the fusion module and the feature map output by the fusion exporter of the i+1th layer, and output the obtained feature map to the fusion exporter of the i-1th layer;
所述第一层的融合输出器用于对所述第一层的主特征提取器输出的特征图、所述第一层的提取及融合模块输出的特征图以及所述第二层的融合输出器输出的特征图进行特征提取及融合,以得到所述目标图像的景深图。The fusion output device of the first layer is used for the feature map output by the main feature extractor of the first layer, the feature map output by the extraction and fusion module of the first layer, and the fusion output device of the second layer The output feature map is subjected to feature extraction and fusion to obtain a depth map of the target image.
其中,第j层的提取及融合模块包括N+1-j个辅助特征提取器,其中,j为整数且1≤j≤N;Among them, the extraction and fusion module of the jth layer includes N+1-j auxiliary feature extractors, where j is an integer and 1≤j≤N;
所述第一层的第一个辅助特征提取器用于对所述第一层的主特征提取器输出的特征图进行特征提取,并将得到的特征图输出给所述第一层的第二至第N个辅助特征提取器及融合输出器以及所述第二层的第一个辅助特征提取器;The first auxiliary feature extractor of the first layer is used to perform feature extraction on the feature map output by the main feature extractor of the first layer, and output the obtained feature map to the second to second layers of the first layer. The Nth auxiliary feature extractor and fusion output device and the first auxiliary feature extractor of the second layer;
所述第一层的第k个辅助特征提取器用于对所述第一层的主特征提取器以及第一至第k-1个辅助特征提取器输出的特征图进行特征提取及融合,并将得到的特征图输出给所述第一层的第k+1至第N个辅助特征提取器及融合输出器以及所述第二层的第k个辅助特征提取器,其中,k为整数1<k<N;The k-th auxiliary feature extractor of the first layer is used for feature extraction and fusion of the feature maps output by the main feature extractor of the first layer and the first to k-1 auxiliary feature extractors, and The obtained feature map is output to the k+1 to Nth auxiliary feature extractor and fusion output device of the first layer and the kth auxiliary feature extractor of the second layer, where k is an integer 1< k<N;
所述第一层的第N个辅助特征提取器用于对所述第一层的主特征提取器以及第一至第N-1个辅助特征提取器输出的特征图进行特征提取及融合,并将得到的特征图输出给所述第一层的融合输出器以及所述第二层的融合输出器。The Nth auxiliary feature extractor of the first layer is used for feature extraction and fusion of the feature maps output by the main feature extractor of the first layer and the first to N-1 auxiliary feature extractors, and The obtained feature map is output to the fusion exporter of the first layer and the fusion exporter of the second layer.
其中,所述第m层的第一个辅助特征提取器用于对所述第m层的主特征提取器输出的特征图进行特征提取以获取第一特征图;接收第m-1层的第一个辅特征提取器输出的第二特征图,将所述第一特征图与所述第二特征图进行融合以获取第三特征图,将所述第三特征图输出给所述第m层的第二至第n个辅助特征提取器及融合输出器以及所述第m+1层的第一个辅助特征提取器,其中,m为正整数且1<m<N-1,n为整数且n=N+1-m;Wherein, the first auxiliary feature extractor of the m-th layer is used to perform feature extraction on the feature map output by the main feature extractor of the m-th layer to obtain the first feature map; The second feature map output by the auxiliary feature extractor, the first feature map and the second feature map are merged to obtain a third feature map, and the third feature map is output to the mth layer The second to nth auxiliary feature extractors and the fusion output device and the first auxiliary feature extractor of the m+1th layer, where m is a positive integer and 1<m<N-1, n is an integer and n=N+1-m;
所述第m层的第x个辅助特征提取器用于对所述第m层的主特征提取器以及第一至第x-1个辅助特征提取器输出的特征图进行特征提取及融合,并将得到的特征图输出给所述第m层的第k+1至第N个辅助特征提取器及融合输出器以及所述第m+1层的第k个辅助特征提取器,其中,x为整数且1<x<n。The x-th auxiliary feature extractor of the m-th layer is used to perform feature extraction and fusion on the feature maps output by the main feature extractor of the m-th layer and the first to x-1th auxiliary feature extractors, and The obtained feature map is output to the k+1 to Nth auxiliary feature extractor and fusion output device of the mth layer, and the kth auxiliary feature extractor of the m+1th layer, where x is an integer And 1<x<n.
其中,所述第N-1层的第一个辅助特征提取器用于对所述第N-1层的主特征提取器输出的特征图进行特征提取以获取第四特征图;接收第N-2层的第一个辅特征提取器输出的第五特征图,将所述第四特征图与所述第五特征图进行融合以获取第六特征图,将所述第六特征图输出给所述第N-1层的融合输出器。Wherein, the first auxiliary feature extractor of the N-1th layer is used to perform feature extraction on the feature map output by the main feature extractor of the N-1th layer to obtain the fourth feature map; The fifth feature map output by the first auxiliary feature extractor of the layer, the fourth feature map and the fifth feature map are merged to obtain a sixth feature map, and the sixth feature map is output to the The fusion exporter of layer N-1.
其中,上述单元可以用于执行上述任一实施例所描述方法,具体描述详见实施例1对所述方法的描述,在此不再赘述。The above-mentioned unit may be used to execute the method described in any of the above-mentioned embodiments. For a detailed description, please refer to the description of the method in Embodiment 1, which will not be repeated here.
与上述图3、图4所示的实施例一致的,请参阅图5,图5是本申请实施例提供的一种电子设备300的结构示意图,如图所示,所述电子设备300包括应用处理器310、存储器320、通信接口330以及一个或多个程序321,其中,所述一个或多个程序321被存储在上述存储器320中,并且被配置由上述应用处理器310执行,当所述一个或多个程序321被运行时,处理器310执行以下操作:Consistent with the embodiments shown in FIGS. 3 and 4, please refer to FIG. 5. FIG. 5 is a schematic structural diagram of an electronic device 300 provided by an embodiment of the present application. As shown in the figure, the electronic device 300 includes an application The processor 310, the memory 320, the communication interface 330, and one or more programs 321, wherein the one or more programs 321 are stored in the memory 320 and are configured to be executed by the application processor 310. When the When one or more programs 321 are executed, the processor 310 performs the following operations:
获取单张目标图像;Obtain a single target image;
构建神经网络,所述神经网络用于对所述目标图像进行多次特征提取及融合,以得到所述目标图像的景深图。A neural network is constructed, and the neural network is used to perform multiple feature extraction and fusion on the target image to obtain a depth map of the target image.
第一层的主特征提取器用于对所述目标图像进行特征提取,并将得到的特征图输出给第二层的主特征提取器以及所述第一层的提取及融合模块及融合输出器;The main feature extractor of the first layer is used to perform feature extraction on the target image, and output the obtained feature map to the main feature extractor of the second layer and the extraction and fusion module and the fusion output device of the first layer;
所述第一层的提取及融合模块对用于所述第一层的主特征提取器输出的特征图进行特征提取,并将得到的特征图输出给所述第一层的融合输出器以及所述第二层的提取及融合模块及融合输出器;The extraction and fusion module of the first layer performs feature extraction on the feature map output by the main feature extractor of the first layer, and outputs the obtained feature map to the fusion output device of the first layer and all The extraction and fusion module and the fusion output device of the second layer;
第i层的主特征提取器用于对第i-1层的主特征提取器输出的特征图进行特征提取,并将得到的特征图输出给第i+1层的主特征提取器以及所述第i层的提取及融合模块及融合输出器,其中,i为整数且1<i<N;The main feature extractor of the i-th layer is used to perform feature extraction on the feature map output by the main feature extractor of the i-1th layer, and output the obtained feature map to the main feature extractor of the i+1th layer and the first The extraction and fusion module and fusion exporter of layer i, where i is an integer and 1<i<N;
所述第i层的提取及融合模块用于对所述第i-1层的提取及融合模块输出的特征图以及所述第i层的主特征提取器输出的特征图进行特征提取及融合,并将得到的特征图输出给所述第i层的融合输出器以及所述第i+1层的提取及融合模块及融合输出器;The extraction and fusion module of the i-th layer is used to perform feature extraction and fusion on the feature map output by the extraction and fusion module of the i-1th layer and the feature map output by the main feature extractor of the i-th layer, And output the obtained feature map to the fusion exporter of the i-th layer and the extraction and fusion module and the fusion exporter of the i+1th layer;
第N层的主特征提取器用于对第N-1层的主特征提取器输出的特征图进行特征提取,并将得到的特征图输出给所述第N层的提取及融合模块及融合输出器;The main feature extractor of the Nth layer is used to perform feature extraction on the feature map output by the main feature extractor of the N-1th layer, and output the obtained feature map to the extraction and fusion module and the fusion output device of the Nth layer ;
所述第N层的提取及融合模块用于对所述第N-1层的提取及融合模块输出的特征图以及所述第N层的主特征提取器输出的特征图进行特征提取及融合,并将得到的特征图输出给所述第N层的融合输出器;The extraction and fusion module of the Nth layer is used to perform feature extraction and fusion on the feature map output by the extraction and fusion module of the N-1 layer and the feature map output by the main feature extractor of the Nth layer, And output the obtained feature map to the Nth layer fusion output device;
所述第N层的融合输出器用于对所述第N层的主特征提取器输出的特征图、所述第N层的提取及融合模块输出的特征图以及所述第N-1层的提取及融合模块输出的特征图进行特征提取及融合,并将得到的特征图输出给所述第N-1层的融合输出器;The fusion output device of the Nth layer is used to extract the feature map output by the main feature extractor of the Nth layer, the feature map output by the extraction and fusion module of the Nth layer, and the extraction of the N-1th layer And perform feature extraction and fusion on the feature map output by the fusion module, and output the obtained feature map to the N-1th layer fusion output device;
所述第i层的融合输出器用于对所述第i层的主特征提取器输出的特征图、所述第i层的提取及融合模块输出的特征图、所述第i-1层的提取及融合模块输出的特征图以及所述第i+1层的融合输出器输出的特征图进行特征提取及融 合,并将得到的特征图输出给所述第i-1层的融合输出器;The fusion output device of the i-th layer is used to extract the feature map output by the main feature extractor of the i-th layer, the feature map output by the i-th layer and the fusion module, and the extraction of the i-1th layer Perform feature extraction and fusion on the feature map output by the fusion module and the feature map output by the fusion exporter of the i+1th layer, and output the obtained feature map to the fusion exporter of the i-1th layer;
所述第一层的融合输出器用于对所述第一层的主特征提取器输出的特征图、所述第一层的提取及融合模块输出的特征图以及所述第二层的融合输出器输出的特征图进行特征提取及融合,以得到所述目标图像的景深图。The fusion output device of the first layer is used for the feature map output by the main feature extractor of the first layer, the feature map output by the extraction and fusion module of the first layer, and the fusion output device of the second layer The output feature map is subjected to feature extraction and fusion to obtain a depth map of the target image.
所述第一层的第一个辅助特征提取器用于对所述第一层的主特征提取器输出的特征图进行特征提取,并将得到的特征图输出给所述第一层的第二至第N个辅助特征提取器及融合输出器以及所述第二层的第一个辅助特征提取器;The first auxiliary feature extractor of the first layer is used to perform feature extraction on the feature map output by the main feature extractor of the first layer, and output the obtained feature map to the second to second layers of the first layer. The Nth auxiliary feature extractor and fusion output device and the first auxiliary feature extractor of the second layer;
所述第一层的第k个辅助特征提取器用于对所述第一层的主特征提取器以及第一至第k-1个辅助特征提取器输出的特征图进行特征提取及融合,并将得到的特征图输出给所述第一层的第k+1至第N个辅助特征提取器及融合输出器以及所述第二层的第k个辅助特征提取器,其中,k为整数1<k<N;The k-th auxiliary feature extractor of the first layer is used for feature extraction and fusion of the feature maps output by the main feature extractor of the first layer and the first to k-1 auxiliary feature extractors, and The obtained feature map is output to the k+1 to Nth auxiliary feature extractor and fusion output device of the first layer and the kth auxiliary feature extractor of the second layer, where k is an integer 1< k<N;
所述第一层的第N个辅助特征提取器用于对所述第一层的主特征提取器以及第一至第N-1个辅助特征提取器输出的特征图进行特征提取及融合,并将得到的特征图输出给所述第一层的融合输出器以及所述第二层的融合输出器。The Nth auxiliary feature extractor of the first layer is used for feature extraction and fusion of the feature maps output by the main feature extractor of the first layer and the first to N-1 auxiliary feature extractors, and The obtained feature map is output to the fusion exporter of the first layer and the fusion exporter of the second layer.
在本发明的另一个实施例中,公开了一种存储介质,所述存储介质中存储有程序代码,当所述程序代码被运行时,前述方法实施例中的方法会被执行。In another embodiment of the present invention, a storage medium is disclosed. The storage medium stores program code. When the program code is executed, the method in the foregoing method embodiment is executed.
在本发明的另一个实施例中,公开了一种计算机程序产品,所述计算机程序产品中包含有程序代码;当所述程序代码被运行时,前述方法实施例中的方法会被执行。In another embodiment of the present invention, a computer program product is disclosed. The computer program product contains program code; when the program code is executed, the method in the foregoing method embodiment will be executed.

Claims (10)

  1. 一种景深图获取方法,其特征在于,所述方法包括:A method for acquiring a depth map, characterized in that the method includes:
    获取单张目标图像;Obtain a single target image;
    构建神经网络,所述神经网络用于对所述目标图像进行多次特征提取及融合,以得到所述目标图像的景深图。A neural network is constructed, and the neural network is used to perform multiple feature extraction and fusion on the target image to obtain a depth map of the target image.
  2. 根据权利要求1所述的方法,其特征在于,所述神经网络包括N层,每层包括级联的主特征提取器、提取及融合模块以及融合输出器,其中,N为大于1的正整数;The method according to claim 1, wherein the neural network includes N layers, and each layer includes a cascaded main feature extractor, an extraction and fusion module, and a fusion output device, wherein N is a positive integer greater than 1. ;
    第一层的主特征提取器用于对所述目标图像进行特征提取,并将得到的特征图输出给第二层的主特征提取器以及所述第一层的提取及融合模块及融合输出器;The main feature extractor of the first layer is used to perform feature extraction on the target image, and output the obtained feature map to the main feature extractor of the second layer and the extraction and fusion module and the fusion output device of the first layer;
    所述第一层的提取及融合模块对用于所述第一层的主特征提取器输出的特征图进行特征提取,并将得到的特征图输出给所述第一层的融合输出器以及所述第二层的提取及融合模块及融合输出器;The extraction and fusion module of the first layer performs feature extraction on the feature map output by the main feature extractor of the first layer, and outputs the obtained feature map to the fusion output device of the first layer and all The extraction and fusion module and the fusion output device of the second layer;
    第i层的主特征提取器用于对第i-1层的主特征提取器输出的特征图进行特征提取,并将得到的特征图输出给第i+1层的主特征提取器以及所述第i层的提取及融合模块及融合输出器,其中,i为整数且1<i<N;The main feature extractor of the i-th layer is used to perform feature extraction on the feature map output by the main feature extractor of the i-1th layer, and output the obtained feature map to the main feature extractor of the i+1th layer and the first The extraction and fusion module and fusion exporter of layer i, where i is an integer and 1<i<N;
    所述第i层的提取及融合模块用于对所述第i-1层的提取及融合模块输出的特征图以及所述第i层的主特征提取器输出的特征图进行特征提取及融合,并将得到的特征图输出给所述第i层的融合输出器以及所述第i+1层的提取及融合模块及融合输出器;The extraction and fusion module of the i-th layer is used to perform feature extraction and fusion on the feature map output by the extraction and fusion module of the i-1th layer and the feature map output by the main feature extractor of the i-th layer, And output the obtained feature map to the fusion exporter of the i-th layer and the extraction and fusion module and the fusion exporter of the i+1th layer;
    第N层的主特征提取器用于对第N-1层的主特征提取器输出的特征图进行特征提取,并将得到的特征图输出给所述第N层的提取及融合模块及融合输出器;The main feature extractor of the Nth layer is used to perform feature extraction on the feature map output by the main feature extractor of the N-1th layer, and output the obtained feature map to the extraction and fusion module and the fusion output device of the Nth layer ;
    所述第N层的提取及融合模块用于对所述第N-1层的提取及融合模块输出的特征图以及所述第N层的主特征提取器输出的特征图进行特征提取及融合,并将得到的特征图输出给所述第N层的融合输出器;The extraction and fusion module of the Nth layer is used to perform feature extraction and fusion on the feature map output by the extraction and fusion module of the N-1 layer and the feature map output by the main feature extractor of the Nth layer, And output the obtained feature map to the Nth layer fusion output device;
    所述第N层的融合输出器用于对所述第N层的主特征提取器输出的特征 图、所述第N层的提取及融合模块输出的特征图以及所述第N-1层的提取及融合模块输出的特征图进行特征提取及融合,并将得到的特征图输出给所述第N-1层的融合输出器;The fusion output device of the Nth layer is used to extract the feature map output by the main feature extractor of the Nth layer, the feature map output by the extraction and fusion module of the Nth layer, and the extraction of the N-1th layer And perform feature extraction and fusion on the feature map output by the fusion module, and output the obtained feature map to the N-1th layer fusion output device;
    所述第i层的融合输出器用于对所述第i层的主特征提取器输出的特征图、所述第i层的提取及融合模块输出的特征图、所述第i-1层的提取及融合模块输出的特征图以及所述第i+1层的融合输出器输出的特征图进行特征提取及融合,并将得到的特征图输出给所述第i-1层的融合输出器;The fusion output device of the i-th layer is used to extract the feature map output by the main feature extractor of the i-th layer, the feature map output by the i-th layer and the fusion module, and the extraction of the i-1th layer Perform feature extraction and fusion on the feature map output by the fusion module and the feature map output by the fusion exporter of the i+1th layer, and output the obtained feature map to the fusion exporter of the i-1th layer;
    所述第一层的融合输出器用于对所述第一层的主特征提取器输出的特征图、所述第一层的提取及融合模块输出的特征图以及所述第二层的融合输出器输出的特征图进行特征提取及融合,以得到所述目标图像的景深图。The fusion output device of the first layer is used for the feature map output by the main feature extractor of the first layer, the feature map output by the extraction and fusion module of the first layer, and the fusion output device of the second layer The output feature map is subjected to feature extraction and fusion to obtain a depth map of the target image.
  3. 根据权利要求2所述的方法,其特征在于,第j层的提取及融合模块包括N+1-j个辅助特征提取器,其中,j为整数且1≤j≤N;The method according to claim 2, wherein the extraction and fusion module of the jth layer includes N+1-j auxiliary feature extractors, wherein j is an integer and 1≤j≤N;
    所述第一层的第一个辅助特征提取器用于对所述第一层的主特征提取器输出的特征图进行特征提取,并将得到的特征图输出给所述第一层的第二至第N个辅助特征提取器及融合输出器以及所述第二层的第一个辅助特征提取器;The first auxiliary feature extractor of the first layer is used to perform feature extraction on the feature map output by the main feature extractor of the first layer, and output the obtained feature map to the second to second layers of the first layer. The Nth auxiliary feature extractor and fusion output device and the first auxiliary feature extractor of the second layer;
    所述第一层的第k个辅助特征提取器用于对所述第一层的主特征提取器以及第一至第k-1个辅助特征提取器输出的特征图进行特征提取及融合,并将得到的特征图输出给所述第一层的第k+1至第N个辅助特征提取器及融合输出器以及所述第二层的第k个辅助特征提取器,其中,k为整数1<k<N;The k-th auxiliary feature extractor of the first layer is used for feature extraction and fusion of the feature maps output by the main feature extractor of the first layer and the first to k-1 auxiliary feature extractors, and The obtained feature map is output to the k+1 to Nth auxiliary feature extractor and fusion output device of the first layer and the kth auxiliary feature extractor of the second layer, where k is an integer 1< k<N;
    所述第一层的第N个辅助特征提取器用于对所述第一层的主特征提取器以及第一至第N-1个辅助特征提取器输出的特征图进行特征提取及融合,并将得到的特征图输出给所述第一层的融合输出器以及所述第二层的融合输出器。The Nth auxiliary feature extractor of the first layer is used for feature extraction and fusion of the feature maps output by the main feature extractor of the first layer and the first to N-1 auxiliary feature extractors, and The obtained feature map is output to the fusion exporter of the first layer and the fusion exporter of the second layer.
  4. 根据权利要求3所述的方法,其特征在于,The method according to claim 3, wherein:
    所述第m层的第一个辅助特征提取器用于对所述第m层的主特征提取器输出的特征图进行特征提取以获取第一特征图;接收第m-1层的第一个辅特征提取器输出的第二特征图,将所述第一特征图与所述第二特征图进行融合以获取第三特征图,将所述第三特征图输出给所述第m层的第二至第n个辅助特征提取器及融合输出器以及所述第m+1层的第一个辅助特征提取器,其中,m为正 整数且1<m<N-1,n为整数且n=N+1-m;The first auxiliary feature extractor of the m-th layer is used to perform feature extraction on the feature map output by the main feature extractor of the m-th layer to obtain the first feature map; receiving the first auxiliary feature map of the m-1th layer The second feature map output by the feature extractor, the first feature map and the second feature map are merged to obtain a third feature map, and the third feature map is output to the second feature map of the mth layer To the nth auxiliary feature extractor and fusion output device and the first auxiliary feature extractor of the m+1th layer, where m is a positive integer and 1<m<N-1, n is an integer and n= N+1-m;
    所述第m层的第x个辅助特征提取器用于对所述第m层的主特征提取器以及第一至第x-1个辅助特征提取器输出的特征图进行特征提取及融合,并将得到的特征图输出给所述第m层的第k+1至第N个辅助特征提取器及融合输出器以及所述第m+1层的第k个辅助特征提取器,其中,x为整数且1<x<n。The x-th auxiliary feature extractor of the m-th layer is used to perform feature extraction and fusion on the feature maps output by the main feature extractor of the m-th layer and the first to x-1 auxiliary feature extractors, and The obtained feature map is output to the k+1 to Nth auxiliary feature extractor and fusion output device of the mth layer, and the kth auxiliary feature extractor of the m+1th layer, where x is an integer And 1<x<n.
  5. 根据权利要求3或4所述的方法,其特征在于,The method according to claim 3 or 4, wherein:
    所述第N-1层的第一个辅助特征提取器用于对所述第N-1层的主特征提取器输出的特征图进行特征提取以获取第四特征图;接收第N-2层的第一个辅特征提取器输出的第五特征图,将所述第四特征图与所述第五特征图进行融合以获取第六特征图,将所述第六特征图输出给所述第N-1层的融合输出器。The first auxiliary feature extractor of the N-1th layer is used to perform feature extraction on the feature map output by the main feature extractor of the N-1th layer to obtain a fourth feature map; The fifth feature map output by the first auxiliary feature extractor, the fourth feature map and the fifth feature map are merged to obtain a sixth feature map, and the sixth feature map is output to the Nth feature map. -1 layer of fusion exporter.
  6. 一种景深图获取装置,其特征在于,所述装置包括获取单元和构建单元;A depth map acquisition device, characterized in that the device includes an acquisition unit and a construction unit;
    所述获取单元,用于获取单张目标图像;所述构建单元,用于构建神经网络,所述神经网络用于对所述目标图像进行多次特征提取及融合,以得到所述目标图像的景深图。The acquisition unit is used to acquire a single target image; the construction unit is used to construct a neural network, and the neural network is used to perform multiple feature extraction and fusion on the target image to obtain the target image Depth of field map.
  7. 根据权利要求6所述的装置,其特征在于,所述神经网络包括N层,每层包括级联的主特征提取器、提取及融合模块以及融合输出器,其中,N为大于1的正整数;The device according to claim 6, wherein the neural network includes N layers, each layer includes a cascaded main feature extractor, an extraction and fusion module, and a fusion output device, wherein N is a positive integer greater than 1. ;
    第一层的主特征提取器用于对所述目标图像进行特征提取,并将得到的特征图输出给第二层的主特征提取器以及所述第一层的提取及融合模块及融合输出器;The main feature extractor of the first layer is used to perform feature extraction on the target image, and output the obtained feature map to the main feature extractor of the second layer and the extraction and fusion module and the fusion output device of the first layer;
    所述第一层的提取及融合模块对用于所述第一层的主特征提取器输出的特征图进行特征提取,并将得到的特征图输出给所述第一层的融合输出器以及所述第二层的提取及融合模块及融合输出器;The extraction and fusion module of the first layer performs feature extraction on the feature map output by the main feature extractor of the first layer, and outputs the obtained feature map to the fusion output device of the first layer and all The extraction and fusion module and the fusion output device of the second layer;
    第i层的主特征提取器用于对第i-1层的主特征提取器输出的特征图进行特征提取,并将得到的特征图输出给第i+1层的主特征提取器以及所述第i层的提取及融合模块及融合输出器,其中,i为整数且1<i<N;The main feature extractor of the i-th layer is used to perform feature extraction on the feature map output by the main feature extractor of the i-1th layer, and output the obtained feature map to the main feature extractor of the i+1th layer and the first The extraction and fusion module and fusion exporter of layer i, where i is an integer and 1<i<N;
    所述第i层的提取及融合模块用于对所述第i-1层的提取及融合模块输出的特征图以及所述第i层的主特征提取器输出的特征图进行特征提取及融合,并 将得到的特征图输出给所述第i层的融合输出器以及所述第i+1层的提取及融合模块及融合输出器;The extraction and fusion module of the i-th layer is used to perform feature extraction and fusion on the feature map output by the extraction and fusion module of the i-1th layer and the feature map output by the main feature extractor of the i-th layer, And output the obtained feature map to the fusion exporter of the i-th layer and the extraction and fusion module and the fusion exporter of the i+1th layer;
    第N层的主特征提取器用于对第N-1层的主特征提取器输出的特征图进行特征提取,并将得到的特征图输出给所述第N层的提取及融合模块及融合输出器;The main feature extractor of the Nth layer is used to perform feature extraction on the feature map output by the main feature extractor of the N-1th layer, and output the obtained feature map to the extraction and fusion module and the fusion output device of the Nth layer ;
    所述第N层的提取及融合模块用于对所述第N-1层的提取及融合模块输出的特征图以及所述第N层的主特征提取器输出的特征图进行特征提取及融合,并将得到的特征图输出给所述第N层的融合输出器;The extraction and fusion module of the Nth layer is used to perform feature extraction and fusion on the feature map output by the extraction and fusion module of the N-1 layer and the feature map output by the main feature extractor of the Nth layer, And output the obtained feature map to the Nth layer fusion output device;
    所述第N层的融合输出器用于对所述第N层的主特征提取器输出的特征图、所述第N层的提取及融合模块输出的特征图以及所述第N-1层的提取及融合模块输出的特征图进行特征提取及融合,并将得到的特征图输出给所述第N-1层的融合输出器;The fusion output device of the Nth layer is used to extract the feature map output by the main feature extractor of the Nth layer, the feature map output by the extraction and fusion module of the Nth layer, and the extraction of the N-1th layer And perform feature extraction and fusion on the feature map output by the fusion module, and output the obtained feature map to the N-1th layer fusion output device;
    所述第i层的融合输出器用于对所述第i层的主特征提取器输出的特征图、所述第i层的提取及融合模块输出的特征图、所述第i-1层的提取及融合模块输出的特征图以及所述第i+1层的融合输出器输出的特征图进行特征提取及融合,并将得到的特征图输出给所述第i-1层的融合输出器;The fusion output device of the i-th layer is used to extract the feature map output by the main feature extractor of the i-th layer, the feature map output by the i-th layer and the fusion module, and the extraction of the i-1th layer Perform feature extraction and fusion on the feature map output by the fusion module and the feature map output by the fusion exporter of the i+1th layer, and output the obtained feature map to the fusion exporter of the i-1th layer;
    所述第一层的融合输出器用于对所述第一层的主特征提取器输出的特征图、所述第一层的提取及融合模块输出的特征图以及所述第二层的融合输出器输出的特征图进行特征提取及融合,以得到所述目标图像的景深图。The fusion output device of the first layer is used for the feature map output by the main feature extractor of the first layer, the feature map output by the extraction and fusion module of the first layer, and the fusion output device of the second layer The output feature map is subjected to feature extraction and fusion to obtain a depth map of the target image.
  8. 根据权利要求7所述的装置,其特征在于,第j层的提取及融合模块包括N+1-j个辅助特征提取器,其中,j为整数且1≤j≤N;The device according to claim 7, wherein the extraction and fusion module of the jth layer includes N+1-j auxiliary feature extractors, wherein j is an integer and 1≤j≤N;
    所述第一层的第一个辅助特征提取器用于对所述第一层的主特征提取器输出的特征图进行特征提取,并将得到的特征图输出给所述第一层的第二至第N个辅助特征提取器及融合输出器以及所述第二层的第一个辅助特征提取器;The first auxiliary feature extractor of the first layer is used to perform feature extraction on the feature map output by the main feature extractor of the first layer, and output the obtained feature map to the second to second layers of the first layer. The Nth auxiliary feature extractor and fusion output device and the first auxiliary feature extractor of the second layer;
    所述第一层的第k个辅助特征提取器用于对所述第一层的主特征提取器以及第一至第k-1个辅助特征提取器输出的特征图进行特征提取及融合,并将得到的特征图输出给所述第一层的第k+1至第N个辅助特征提取器及融合输出器以及所述第二层的第k个辅助特征提取器,其中,k为整数1<k<N;The k-th auxiliary feature extractor of the first layer is used for feature extraction and fusion of the feature maps output by the main feature extractor of the first layer and the first to k-1 auxiliary feature extractors, and The obtained feature map is output to the k+1 to Nth auxiliary feature extractor and fusion output device of the first layer and the kth auxiliary feature extractor of the second layer, where k is an integer 1< k<N;
    所述第一层的第N个辅助特征提取器用于对所述第一层的主特征提取器以及第一至第N-1个辅助特征提取器输出的特征图进行特征提取及融合,并将得到的特征图输出给所述第一层的融合输出器以及所述第二层的融合输出器。The Nth auxiliary feature extractor of the first layer is used for feature extraction and fusion of the feature maps output by the main feature extractor of the first layer and the first to N-1 auxiliary feature extractors, and The obtained feature map is output to the fusion exporter of the first layer and the fusion exporter of the second layer.
  9. 根据权利要求8所述的装置,其特征在于,The device according to claim 8, wherein:
    所述第m层的第一个辅助特征提取器用于对所述第m层的主特征提取器输出的特征图进行特征提取以获取第一特征图;接收第m-1层的第一个辅特征提取器输出的第二特征图,将所述第一特征图与所述第二特征图进行融合以获取第三特征图,将所述第三特征图输出给所述第m层的第二至第n个辅助特征提取器及融合输出器以及所述第m+1层的第一个辅助特征提取器,其中,m为正整数且1<m<N-1,n为整数且n=N+1-m;The first auxiliary feature extractor of the m-th layer is used to perform feature extraction on the feature map output by the main feature extractor of the m-th layer to obtain the first feature map; receiving the first auxiliary feature map of the m-1th layer The second feature map output by the feature extractor, the first feature map and the second feature map are merged to obtain a third feature map, and the third feature map is output to the second feature map of the mth layer To the nth auxiliary feature extractor and fusion output device and the first auxiliary feature extractor of the m+1th layer, where m is a positive integer and 1<m<N-1, n is an integer and n= N+1-m;
    所述第m层的第x个辅助特征提取器用于对所述第m层的主特征提取器以及第一至第x-1个辅助特征提取器输出的特征图进行特征提取及融合,并将得到的特征图输出给所述第m层的第k+1至第N个辅助特征提取器及融合输出器以及所述第m+1层的第k个辅助特征提取器,其中,x为整数且1<x<n。The x-th auxiliary feature extractor of the m-th layer is used to perform feature extraction and fusion on the feature maps output by the main feature extractor of the m-th layer and the first to x-1 auxiliary feature extractors, and The obtained feature map is output to the k+1 to Nth auxiliary feature extractor and fusion output device of the mth layer, and the kth auxiliary feature extractor of the m+1th layer, where x is an integer And 1<x<n.
  10. 根据权利要求8或9所述的装置,其特征在于,The device according to claim 8 or 9, characterized in that:
    所述第N-1层的第一个辅助特征提取器用于对所述第N-1层的主特征提取器输出的特征图进行特征提取以获取第四特征图;接收第N-2层的第一个辅特征提取器输出的第五特征图,将所述第四特征图与所述第五特征图进行融合以获取第六特征图,将所述第六特征图输出给所述第N-1层的融合输出器。The first auxiliary feature extractor of the N-1th layer is used to perform feature extraction on the feature map output by the main feature extractor of the N-1th layer to obtain a fourth feature map; The fifth feature map output by the first auxiliary feature extractor, the fourth feature map and the fifth feature map are merged to obtain a sixth feature map, and the sixth feature map is output to the Nth feature map. -1 layer of fusion exporter.
PCT/CN2019/121603 2019-05-07 2019-11-28 Method and apparatus for obtaining depth-of-field image WO2020224244A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910377551.2 2019-05-07
CN201910377551.2A CN110223334B (en) 2019-05-07 2019-05-07 Depth-of-field map acquisition method and device

Publications (1)

Publication Number Publication Date
WO2020224244A1 true WO2020224244A1 (en) 2020-11-12

Family

ID=67820707

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/121603 WO2020224244A1 (en) 2019-05-07 2019-11-28 Method and apparatus for obtaining depth-of-field image

Country Status (2)

Country Link
CN (1) CN110223334B (en)
WO (1) WO2020224244A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110223334B (en) * 2019-05-07 2021-09-14 深圳云天励飞技术有限公司 Depth-of-field map acquisition method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7983486B2 (en) * 2007-08-29 2011-07-19 Seiko Epson Corporation Method and apparatus for automatic image categorization using image texture
CN105488534A (en) * 2015-12-04 2016-04-13 中国科学院深圳先进技术研究院 Method, device and system for deeply analyzing traffic scene
CN106981080A (en) * 2017-02-24 2017-07-25 东华大学 Night unmanned vehicle scene depth method of estimation based on infrared image and radar data
CN110223334A (en) * 2019-05-07 2019-09-10 深圳云天励飞技术有限公司 A kind of depth of field picture capturing method and device

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10699151B2 (en) * 2016-06-03 2020-06-30 Miovision Technologies Incorporated System and method for performing saliency detection using deep active contours
US9990728B2 (en) * 2016-09-09 2018-06-05 Adobe Systems Incorporated Planar region guided 3D geometry estimation from a single image
CN107563390A (en) * 2017-08-29 2018-01-09 苏州智萃电子科技有限公司 A kind of image-recognizing method and system
CN108335322B (en) * 2018-02-01 2021-02-12 深圳市商汤科技有限公司 Depth estimation method and apparatus, electronic device, program, and medium
CN109308483B (en) * 2018-07-11 2021-09-17 南京航空航天大学 Dual-source image feature extraction and fusion identification method based on convolutional neural network
CN109087349B (en) * 2018-07-18 2021-01-26 亮风台(上海)信息科技有限公司 Monocular depth estimation method, device, terminal and storage medium
CN109461177B (en) * 2018-09-29 2021-12-10 浙江科技学院 Monocular image depth prediction method based on neural network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7983486B2 (en) * 2007-08-29 2011-07-19 Seiko Epson Corporation Method and apparatus for automatic image categorization using image texture
CN105488534A (en) * 2015-12-04 2016-04-13 中国科学院深圳先进技术研究院 Method, device and system for deeply analyzing traffic scene
CN106981080A (en) * 2017-02-24 2017-07-25 东华大学 Night unmanned vehicle scene depth method of estimation based on infrared image and radar data
CN110223334A (en) * 2019-05-07 2019-09-10 深圳云天励飞技术有限公司 A kind of depth of field picture capturing method and device

Also Published As

Publication number Publication date
CN110223334B (en) 2021-09-14
CN110223334A (en) 2019-09-10

Similar Documents

Publication Publication Date Title
CN109558832B (en) Human body posture detection method, device, equipment and storage medium
CN109685819B (en) Three-dimensional medical image segmentation method based on feature enhancement
CN111160375B (en) Three-dimensional key point prediction and deep learning model training method, device and equipment
CN111179419B (en) Three-dimensional key point prediction and deep learning model training method, device and equipment
WO2021129181A1 (en) Portrait segmentation method, model training method and electronic device
CN110473137A (en) Image processing method and device
US20210398252A1 (en) Image denoising method and apparatus
CN111860398B (en) Remote sensing image target detection method and system and terminal equipment
CN106650615B (en) A kind of image processing method and terminal
CN111989689A (en) Method for identifying objects within an image and mobile device for performing the method
EP4047509A1 (en) Facial parsing method and related devices
CN111340077B (en) Attention mechanism-based disparity map acquisition method and device
CN110245621B (en) Face recognition device, image processing method, feature extraction model, and storage medium
CN111709268B (en) Human hand posture estimation method and device based on human hand structure guidance in depth image
Zhou et al. A lightweight hand gesture recognition in complex backgrounds
CN111797882A (en) Image classification method and device
CN111833360B (en) Image processing method, device, equipment and computer readable storage medium
WO2021179822A1 (en) Human body feature point detection method and apparatus, electronic device, and storage medium
CN112749576B (en) Image recognition method and device, computing equipment and computer storage medium
CN113869282A (en) Face recognition method, hyper-resolution model training method and related equipment
WO2020224244A1 (en) Method and apparatus for obtaining depth-of-field image
US20230153965A1 (en) Image processing method and related device
KR20180092453A (en) Face recognition method Using convolutional neural network and stereo image
CN114913339B (en) Training method and device for feature map extraction model
CN106845550B (en) Image identification method based on multiple templates

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19927779

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19927779

Country of ref document: EP

Kind code of ref document: A1