CN111429501A

CN111429501A - Depth map prediction model generation method and device and depth map prediction method and device

Info

Publication number: CN111429501A
Application number: CN202010218038.1A
Authority: CN
Inventors: 顾晓东; 王明远; 杨永林
Original assignee: Beike Technology Co Ltd
Current assignee: You Can See Beijing Technology Co ltd AS
Priority date: 2020-03-25
Filing date: 2020-03-25
Publication date: 2020-07-17

Abstract

The embodiment of the disclosure discloses a depth map prediction model generation method and device and a depth map prediction method and device, wherein the method comprises the following steps: acquiring a preset number of training sample image pair sets; for each training sample image pair set in a preset number of training sample image pair sets, taking a panoramic image included in a training sample image pair in the training sample image pair set as the input of an initial model, taking a depth map corresponding to the input panoramic image as the expected output of the initial model, training the initial model, and obtaining a depth map prediction model corresponding to the training sample image pair set. The depth map prediction models can be used for depth map prediction of panoramic images with various latitude spans, and generalization capability and prediction accuracy of the models are improved.

Description

Depth map prediction model generation method and device and depth map prediction method and device

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a depth map prediction model generation method and apparatus, a depth map prediction method and apparatus, a computer-readable storage medium, and an electronic device.

Background

In the traditional three-dimensional data splicing process, the depth information of the image is needed to be used for alignment. The alignment is to splice point cloud data collected by different devices at different shooting positions so as to generate a three-dimensional model of a real scene.

Since the depth information is needed to be used in the alignment process, the image acquisition device is required to have a depth camera capable of acquiring the distance information of the image relative to the acquisition device. In order to reduce shooting cost, a model can be trained by adopting a deep learning-based method, and more panoramas and depth maps are generated. Typically, the panorama is a 2:1, the horizontal direction corresponds to 360 ° longitude and the vertical direction corresponds to 180 ° latitude. And taking the panoramic image as input, and directly estimating the depth value corresponding to each point on the panoramic image by using the model.

Disclosure of Invention

The embodiment of the disclosure provides a depth map prediction model generation method and device, a depth map prediction method and device, a computer-readable storage medium and an electronic device.

The embodiment of the disclosure provides a depth map prediction model generation method, which includes: acquiring a preset number of training sample image pair sets, wherein the training sample image pairs in each training sample image pair set comprise a panoramic image and a corresponding depth image with the same effective latitude span, and the effective latitude span is the latitude span of an effective area in the panoramic image and the depth image; for each training sample image pair set in a preset number of training sample image pair sets, taking a panoramic image included in a training sample image pair in the training sample image pair set as the input of an initial model, taking a depth map corresponding to the input panoramic image as the expected output of the initial model, training the initial model, and obtaining a depth map prediction model corresponding to the training sample image pair set.

In some embodiments, the initial model is used to perform a circular filling operation on the input panorama to obtain a filled image, and perform a convolution operation on the filled image.

In some embodiments, prior to obtaining the preset number of training sample image pair sets, the method further comprises: acquiring an initial image pair set, wherein an initial image pair in the initial image pair set comprises a panoramic image with a preset latitude span and a corresponding depth image; acquiring a preset latitude span set; for each latitude span in the set of latitude spans, performing the following steps based on the latitude span: intercepting a sub-panorama and a sub-depth map corresponding to the latitude span from the panorama and the depth map included in each initial image pair in the initial image pair set; performing pixel compensation on the intercepted sub panoramic image and sub depth image to obtain a panoramic image and a depth image with preset proportion as a training sample image pair; and determining each obtained training sample image pair as a training sample image pair set corresponding to the latitude span.

In some embodiments, truncating the sub-panorama and sub-depth map corresponding to the latitude span from the panorama and depth map included in each initial image pair of the set of initial image pairs comprises: randomly truncating a sub-panorama and a sub-depth map corresponding to the latitude span from the panorama and the depth map included in each initial image pair in the initial image pair set.

According to another aspect of the embodiments of the present disclosure, there is provided a depth map prediction method, including: acquiring a panorama to be predicted; determining the latitude span of an effective area in the panoramic image to be predicted; selecting a depth map prediction model matched with the latitude span from a depth map prediction model set trained in advance, wherein the depth map prediction model set is obtained by training based on the depth map prediction model generation method in advance; and inputting the panoramic image to be predicted into the selected depth image prediction model to obtain a predicted depth image.

In some embodiments, before obtaining the panorama to be predicted, the method further comprises: acquiring an initial panoramic image, wherein the initial panoramic image comprises an image area; determining whether the image area is a rectangular area; if the area is a rectangular area, determining the rectangular area as an effective area; if the image is not a rectangular area, intercepting the largest inscribed rectangular area from the image area as an effective area; and in response to the fact that the length-width ratio of the effective area is not the preset proportion, pixel completion is conducted on the area outside the effective area, and the panorama to be predicted with the preset proportion is obtained.

In some embodiments, the depth map prediction model is further configured to output a confidence level corresponding to each pixel point in the predicted depth map; and the method further comprises: for each pixel point in the prediction depth map, determining whether the confidence corresponding to the pixel point is greater than or equal to a preset threshold value; if the depth value is larger than or equal to the preset threshold value, keeping the depth value corresponding to the pixel point unchanged; and if the depth value is smaller than the preset threshold value, modifying the depth value corresponding to the pixel point to be the preset depth value.

According to another aspect of the embodiments of the present disclosure, there is provided a depth map prediction model generation apparatus including: the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a preset number of training sample image pair sets, the training sample image pairs in each training sample image pair set comprise a panoramic image and a corresponding depth image with the same effective latitude span, and the effective latitude span is the latitude span of an effective area in the panoramic image and the depth image; and the training module is used for taking a panoramic image included in the training sample image pair set as the input of an initial model and taking a depth map corresponding to the input panoramic image as the expected output of the initial model for each training sample image pair set in a preset number of training sample image pair sets, and training the initial model to obtain a depth map prediction model corresponding to the training sample image pair set.

In some embodiments, the apparatus further comprises: the second acquisition module is used for acquiring an initial image pair set, wherein the initial image pair in the initial image pair set comprises a panoramic image with a preset latitude span and a corresponding depth image; the third acquisition module is used for acquiring a preset latitude span set; a generating module for, for each latitude span in the set of latitude spans, performing the following steps based on the latitude span: intercepting a sub-panorama and a sub-depth map corresponding to the latitude span from the panorama and the depth map included in each initial image pair in the initial image pair set; performing pixel compensation on the intercepted sub panoramic image and sub depth image to obtain a panoramic image and a depth image with preset proportion as a training sample image pair; and determining each obtained training sample image pair as a training sample image pair set corresponding to the latitude span.

In some embodiments, the generation module is further to: randomly truncating a sub-panorama and a sub-depth map corresponding to the latitude span from the panorama and the depth map included in each initial image pair in the initial image pair set.

According to another aspect of the embodiments of the present disclosure, there is provided a depth map prediction apparatus including: the fourth acquisition module is used for acquiring the panoramic image to be predicted; the first determination module is used for determining the latitude span of the effective area in the panoramic image to be predicted; the selection module is used for selecting a depth map prediction model matched with the latitude span from a depth map prediction model set trained in advance, wherein the depth map prediction model set is obtained by training based on the depth map prediction model generation method in advance; and the prediction module is used for inputting the panoramic image to be predicted into the selected depth image prediction model to obtain a predicted depth image.

In some embodiments, the apparatus further comprises: a fifth obtaining module, configured to obtain an initial panorama, where the initial panorama includes an image area; a second determination module, configured to determine whether the image area is a rectangular area; a third determining module, configured to determine, if the rectangular area is a valid area, the rectangular area; a fourth determining module, configured to, if the image area is not a rectangular area, intercept a largest inscribed rectangular area from the image area as an effective area; and the pixel completion module is used for performing pixel completion on the region outside the effective region in response to the fact that the length-width ratio of the effective region is not the preset proportion, so that the panoramic image to be predicted with the preset proportion is obtained.

In some embodiments, the depth map prediction model is further configured to output a confidence level corresponding to each pixel point in the predicted depth map; and the apparatus further comprises: the correction module is used for determining whether the confidence corresponding to each pixel point in the predicted depth map is greater than or equal to a preset threshold or not; if the depth value is larger than or equal to the preset threshold value, keeping the depth value corresponding to the pixel point unchanged; and if the depth value is smaller than the preset threshold value, modifying the depth value corresponding to the pixel point to be the preset depth value.

According to another aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium storing a computer program for executing the above-mentioned depth map prediction model generation method or depth map prediction apparatus method.

According to another aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including: a processor; a memory for storing processor-executable instructions; and the processor is used for reading the executable instructions from the memory and executing the instructions to realize the depth map prediction model generation method or the depth map prediction device method.

Based on the depth map prediction model generation method and device, the depth map prediction method and device, the computer readable storage medium and the electronic device provided by the embodiments of the present disclosure, by setting a plurality of training sample image pair sets, each of which corresponds to one latitude span, and then training the model by using each of the training sample image pair sets, respectively, a depth map prediction model corresponding to each latitude span is obtained, so that the depth map prediction models can be used to perform depth map prediction on panoramas of various latitude spans, and the generalization capability and the prediction accuracy of the models are improved.

The technical solution of the present disclosure is further described in detail by the accompanying drawings and examples.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent by describing in more detail embodiments of the present disclosure with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the principles of the disclosure and not to limit the disclosure. In the drawings, like reference numbers generally represent like parts or steps.

Fig. 1 is a system diagram to which the present disclosure is applicable.

Fig. 2 is a flowchart illustrating a method for generating a depth map prediction model according to an exemplary embodiment of the present disclosure.

Fig. 3 is an exemplary schematic diagram of an active area in a panorama of an embodiment of the present disclosure.

Fig. 4 is an exemplary schematic diagram of a circular fill operation in a panorama of an embodiment of the present disclosure.

Fig. 5 is a flowchart illustrating a method for generating a depth map prediction model according to another exemplary embodiment of the present disclosure.

Fig. 6 is a flowchart illustrating a depth map prediction method according to an exemplary embodiment of the present disclosure.

Fig. 7 is a flowchart illustrating a depth map prediction method according to another exemplary embodiment of the present disclosure.

Fig. 8 is an exemplary schematic diagram of a truncated active area in a panorama of an embodiment of the present disclosure.

Fig. 9 is a schematic structural diagram of a depth map prediction model generation apparatus according to an exemplary embodiment of the present disclosure.

Fig. 10 is a schematic structural diagram of a depth map prediction apparatus according to an exemplary embodiment of the present disclosure.

Fig. 11 is a block diagram of an electronic device provided in an exemplary embodiment of the present disclosure.

Detailed Description

Hereinafter, example embodiments according to the present disclosure will be described in detail with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a subset of the embodiments of the present disclosure and not all embodiments of the present disclosure, with the understanding that the present disclosure is not limited to the example embodiments described herein.

It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.

It will be understood by those of skill in the art that the terms "first," "second," and the like in the embodiments of the present disclosure are used merely to distinguish one element from another, and are not intended to imply any particular technical meaning, nor is the necessary logical order between them.

It is also understood that in embodiments of the present disclosure, "a plurality" may refer to two or more and "at least one" may refer to one, two or more.

It is also to be understood that any reference to any component, data, or structure in the embodiments of the disclosure, may be generally understood as one or more, unless explicitly defined otherwise or stated otherwise.

In addition, the term "and/or" in the present disclosure is only one kind of association relationship describing an associated object, and means that three kinds of relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in the present disclosure generally indicates that the former and latter associated objects are in an "or" relationship.

It should also be understood that the description of the various embodiments of the present disclosure emphasizes the differences between the various embodiments, and the same or similar parts may be referred to each other, so that the descriptions thereof are omitted for brevity.

Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

The disclosed embodiments may be applied to electronic devices such as terminal devices, computer systems, servers, etc., which are operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with electronic devices, such as terminal devices, computer systems, servers, and the like, include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set top boxes, programmable consumer electronics, network pcs, minicomputer systems, mainframe computer systems, distributed cloud computing environments that include any of the above systems, and the like.

Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. The computer system/server may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.

Summary of the application

At present, equipment capable of collecting depth information is expensive in manufacturing cost, and poor in economical efficiency in large-scale industrial application. In the field of three-dimensional reconstruction of indoor and outdoor scenes, depth data is needed to form point clouds to build a model, and the point clouds at different positions obtained by a sensor are spliced according to distance information of the point clouds. However, the depth data acquisition Of three-dimensional reconstruction usually requires a high-cost dedicated depth sensor such as structured light, laser based on Time Of Flight (Time Of Flight) principle, and the like.

Based on the method, a machine learning method can be adopted to train the model so as to generate the panoramic image with the depth information by utilizing the panoramic image, the strong dependence on the depth information in the existing scheme is abandoned, and the acquisition cost of the image information is greatly reduced. When training a model, a sufficiently large database is prepared, each piece of data is a data pair (panorama, corresponding depth map), and then the model is trained by using the database. However, in different panorama acquisition modes, sometimes the complete 2:1, the longitude is generally kept as full as possible for 360 degrees, and in the latitude direction, the product often makes some trade-offs and abandons shooting the full latitude 180 degrees, but only a part of the full latitude is shot, and the prediction precision of the depth map prediction by using the panoramas is reduced.

Exemplary System

Fig. 1 illustrates an exemplary system architecture 100 for a depth map prediction model generation method and apparatus, a depth map prediction method and apparatus, to which embodiments of the present disclosure may be applied.

As shown in fig. 1, system architecture 100 may include terminal device 101, network 102, and server 103. Network 102 is the medium used to provide communication links between terminal devices 101 and server 103. Network 102 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

A user may use terminal device 101 to interact with server 103 over network 102 to receive or send messages and the like. Various communication client applications, such as an image processing application, a three-dimensional design application, and the like, may be installed on the terminal device 101.

The terminal device 101 may be various electronic devices including, but not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle-mounted terminal (e.g., a car navigation terminal), etc., and a fixed terminal such as a digital TV, a desktop computer, etc.

The server 103 may be a server that provides various services, such as a background information processing server that processes information such as panoramas uploaded by the terminal apparatus 101. The background information processing server may process the received panorama and depth map to obtain a processing result (e.g., a trained depth map prediction model, a predicted depth map, etc.).

The depth map prediction model generation method and the depth map prediction method provided in the embodiment of the present disclosure may be executed by the server 103 or the terminal device 101, and accordingly, the depth map prediction model generation device and the depth map prediction device may be provided in the server 103 or the terminal device 101.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. In the case that the panorama and the depth map do not need to be acquired from a remote place, the system architecture may not include a network, and only include a server or a terminal device.

Exemplary method

Fig. 2 is a flowchart illustrating a method for generating a depth map prediction model according to an exemplary embodiment of the present disclosure. The embodiment can be applied to an electronic device (such as the terminal device 101 or the server 103 shown in fig. 1), and as shown in fig. 2, the method includes the following steps:

step 201, a preset number of training sample image pair sets are obtained.

In this embodiment, the electronic device may obtain a preset number of training sample image pair sets from remote or local. The training sample image pairs in each training sample image pair set comprise a panoramic image and a corresponding depth image with the same effective latitude span, and the effective latitude span is the latitude span of an effective area in the panoramic image and the depth image.

The panorama may be directly obtained by shooting with a panorama shooting camera (for example, a camera of a mobile phone), or may be obtained by processing (for example, cutting, trimming, or the like) the panorama shot by the camera. The depth map may be directly obtained by shooting with a depth camera (e.g., a binocular stereo camera, a lidar, etc.) at the same position, or may be processed in the same manner as the panorama is processed. The panoramic image and the depth image have one-to-one correspondence, and each pixel point in the depth image has a corresponding depth value (namely, the distance between the shot point and the camera).

Typically, the first panorama is a spherical Projection (Equirectangular Projection) image, and thus, each point in the panorama has a longitude value and a latitude value. In this case, the aspect ratio of the panorama can be set to 2:1, and if the panorama is less than 180 ° due to the vertical viewing angle, the panorama can be subjected to supplementary pixels (for example, pixels with supplementary RGB values of 0).

The effective area is a rectangular image area in the panoramic image and the depth image, and accordingly, an area other than the effective area is an area (for example, a black or other color area) not including an image. The latitude span corresponding to the effective area is the vertical visual angle of the effective area. Generally, the latitude span of the whole panorama is 180 °, the ratio of the height of the effective region to the height of the whole image can be multiplied by 180 °, and the latitude span of the effective region is obtained. As shown in fig. 3, a rectangular image region 3011 in the panorama 301 is an effective region, and the latitude span is 90 °; the rectangular image area 3021 in the panorama 302 is an effective area and has a latitude span of 120 °.

Step 202, regarding each training sample image pair set in a preset number of training sample image pair sets, taking a panoramic image included in a training sample image pair in the training sample image pair set as an input of an initial model, taking a depth map corresponding to the input panoramic image as an expected output of the initial model, training the initial model, and obtaining a depth map prediction model corresponding to the training sample image pair set.

In this embodiment, for each training sample image pair set in a preset number of training sample image pair sets, the electronic device trains the initial model by using a machine learning method, where a panorama included in a training sample image pair in the training sample image pair set is used as an input of the initial model, and a depth map corresponding to the input panorama is used as an expected output of the initial model, so as to obtain a depth map prediction model corresponding to the training sample image pair set.

Each training sample image pair set corresponds to a preset initial model, and finally, a preset number of depth map prediction models can be obtained, wherein each depth map prediction model corresponds to a latitude span. The initial model may include neural networks of various structures, such as convolutional neural networks using DenseNet169 as the skeleton network. Convolutional neural networks may include convolutional layers, pooling layers, fully-connected layers, and the like.

The training process of the model is an optimal solution solving process, wherein the optimal solution is given in a data labeling mode, and the process of fitting the model to the optimal solution is mainly carried out iteratively by an error minimization method. For an input panorama, a loss function is set, the function can calculate the difference between the actual output and the expected output of the model, and the difference is conducted to the connection between each neuron in the neural network through a back propagation algorithm, and the difference signal conducted to each connection represents the contribution rate of the connection to the overall error. And then, updating and modifying the original weight by using a gradient descent algorithm.

The electronic device may train an initial model (which may include, for example, a convolutional neural network, a cyclic neural network, or the like) by using a machine learning method to take, as an input, a panorama included in a training sample image pair set and take, as an expected output, a depth map corresponding to the input panorama, and may obtain an actual output for each training input panorama. And the actual output is data actually output by the initial model and is used for representing the depth corresponding to each pixel point. Then, the electronic device may adopt a gradient descent method and a back propagation method, adjust parameters of the initial model based on actual output and expected output, use the model obtained after each parameter adjustment as the initial model for the next training, and end the training under the condition that a preset training end condition is met, so as to train and obtain the depth map prediction model corresponding to the training sample image pair set.

It should be noted that the preset training end condition may include, but is not limited to, at least one of the following: the training time exceeds the preset time; the training times exceed the preset times; the loss value calculated using a predetermined loss function (e.g., a cross entropy loss function) is less than a predetermined loss value threshold.

In some optional implementations, the initial model is used to perform a circular padding (padding) operation on the input panorama to obtain a padded image, and perform a convolution operation on the padded image. Specifically, since the longitude span of the panorama is 360 °, the leftmost side and the rightmost side are actually connected together. Therefore, the loop filling operation is to fill, for two vertical edges of the effective region, feature values (for example, RGB values or the like) corresponding to pixels on opposite sides of the edge to the outside of the edge.

As shown in fig. 4, each element (i.e., eigenvalue) in the matrix 401 corresponds to a pixel point, and for an edge 4011, the eigenvalue on the edge 4012 is filled to the left of the 4011, and at the same time, the eigenvalue on the 4011 is filled to the right of the 4012.

According to the implementation mode, through cyclic filling, when two side lines of the panoramic image are convoluted, the convolution kernels cover the opposite sides, and therefore the accuracy of processing the image is improved.

In some optional implementations, as shown in fig. 5, before step 201, the electronic device may further perform the following steps:

step 501, acquiring a preset latitude span set.

By way of example, a set of latitude spans may report the following latitude spans: c. C₀＝180°，c₁＝126°，c₂＝90°，c₃＝80°。

Step 502, for each latitude span in the set of latitude spans, performing the following steps based on the latitude span:

step 5021, intercepting a sub-panorama and a sub-depth map corresponding to the latitude span from the panorama and the depth map included in each initial image pair in the initial image pair set.

In particular, the electronic device may intercept in various ways. For example, the interception is performed according to a preset interception position.

Optionally, the electronic device may randomly intercept, from the panorama and the depth map included in each initial image pair of the initial image pair set, a sub-panorama and a sub-depth map corresponding to the latitude span. The randomly intercepted sub-panoramic view and sub-depth view can be positioned at any position in the original panoramic view and depth view, so that the model can learn the panoramic view and the depth view intercepted at any position, and the accuracy of model prediction is improved.

And step 5022, performing pixel compensation on the intercepted sub-panoramic image and the sub-depth image to obtain a panoramic image and a depth image with preset proportions as a training sample image pair.

Wherein the preset ratio may be an aspect ratio of 2:1, i.e. the longitude span is 360 °, and the filled latitude span is 180 °. The filled pixels may be set to any color, such as black.

Step 5023, determining each obtained training sample image pair as a training sample image pair set corresponding to the latitude span.

By performing the above steps 5021-5023, panoramas and depth maps of various latitude spans and various orientations can be obtained. Therefore, the images can be used for training the model, and the generalization capability of the model is improved.

According to the method provided by the embodiment of the disclosure, a plurality of training sample image pair sets are set, each training sample image pair set corresponds to one latitude span, and then the training models are respectively used for training the sample image pair sets to obtain the depth map prediction model corresponding to each latitude span, so that the depth map prediction models can be used for depth map prediction of panoramic images of various latitude spans, and the generalization capability and the prediction accuracy of the models are improved.

Fig. 6 is a flowchart illustrating a depth map prediction method according to an exemplary embodiment of the present disclosure. The embodiment can be applied to an electronic device (such as the terminal device 101 or the server 103 shown in fig. 1), and as shown in fig. 2, the method includes the following steps:

step 601, obtaining a panorama to be predicted.

In this embodiment, the electronic device may obtain the panorama to be predicted from a local or remote location. The panorama can be a spherical Projection (Equirectangular Projection) image, and thus, each point in the panorama has a longitude value and a latitude value. Wherein, the upper and lower sides of the panorama do not require complete viewing angles, and the vertical viewing angle (i.e. the dimension range) can be as low as half or even lower. The aspect ratio of the panorama can be set to 2:1 in general, and if the panorama is less than 180 ° due to a vertical viewing angle, supplementary pixels (e.g., pixels with supplementary RGB values of 0) can be applied to the panorama.

Step 602, determining the latitude span of the effective area in the panorama to be predicted.

In this embodiment, the electronic device may determine a latitude span of the active area in the panorama to be predicted. As an example, the latitude span of the entire panorama is typically 180 °, then the ratio of the height of the active area to the height of the entire image may be multiplied by 180 °, resulting in the latitude span of the active area.

Step 603, selecting a depth map prediction model matched with the latitude span from the pre-trained depth map prediction model set.

In this embodiment, the electronic device may select a depth map prediction model that matches a latitude span from a set of depth map prediction models trained in advance. The depth map prediction model set is obtained by training in advance based on the method described in the embodiment corresponding to fig. 2.

Each depth map prediction model in the depth map prediction model set corresponds to a latitude span, and the electronic device may match the latitude span c obtained in step 602 with the latitude span corresponding to each depth map prediction model. As an example, assume that each model has a sequence number m_iThe model corresponds to a latitude span of c_iAnd i is 1, 2 and 3. If c is equal to 180 deg., then model m is selected₀(ii) a If c is_i≤c＜c_i-1Then, model m is selected_i(ii) a If c is<c₃Then, model m is selected₃。

And step 604, inputting the panoramic image to be predicted into the selected depth image prediction model to obtain a predicted depth image.

In this embodiment, the electronic device may input the panorama to be predicted into the selected depth map prediction model, so as to obtain a predicted depth map. The predicted depth map may be output in various ways, for example, the obtained depth values of the respective pixel points are stored in a memory as the predicted depth map in a matrix form. Alternatively, the obtained depth data is converted into an image and displayed on a display.

The calculated depth map can be used for assisting in operations such as high-precision three-dimensional model alignment and splicing; meanwhile, the depth map can be converted into point cloud of a single point so as to perform subsequent three-dimensional reconstruction work on the whole indoor and outdoor scene, such as triangular surface tiling (panning), texture mapping (texture mapping) and the like.

In some alternative implementations, as shown in fig. 7, before step 601, the electronic device may further perform the following steps:

step 701, obtaining an initial panorama.

Wherein the initial panorama comprises an image area. The image area is an image actually taken by the camera when the initial panorama is generated. As shown in fig. 8, the area 8011 in the initial panorama 801 and the area 8021 in the initial panorama 802 are both composed of a plurality of rectangular images, and the

areas

8011 and 8021 are image areas.

At step 702, it is determined whether the image area is a rectangular area. If so, go to step 703, otherwise go to step 704.

In step 703, the rectangular area is determined as the effective area.

For example, 8021 in fig. 8 is a rectangular area, and 8021 is an active area.

Step 704, the largest inscribed rectangle area is cut out from the image area as the effective area.

For example, 8012 in fig. 8 is the largest inscribed rectangle area in the image area, and 8012 is the effective area.

Step 705, in response to determining that the aspect ratio of the effective region is not the preset ratio, performing pixel completion on the region outside the effective region to obtain the panorama to be predicted with the preset ratio.

The preset ratio is usually 2:1, if the effective area is a part of the original image, the pixels need to be supplemented, and if the effective area is the whole of the original image, the pixels do not need to be supplemented.

By executing steps 701 to 705, the panoramic image and the depth map can be adjusted to images of predetermined specifications, so that the influence of irregular shapes of image areas on model prediction is reduced, and the accuracy of model prediction is improved.

In some optional implementations, the depth map prediction model is further configured to output a confidence level for each pixel point in the predicted depth map. And the confidence coefficient is used for representing the probability whether the depth value of the corresponding pixel point is correct or not.

Based on the confidence level, the electronic device may further perform the following steps:

for each pixel point in the prediction depth map, determining whether the confidence corresponding to the pixel point is greater than or equal to a preset threshold value; if the depth value is larger than or equal to the preset threshold value, the depth value is credible, and the depth value corresponding to the pixel point is kept unchanged; if the depth value is smaller than the preset threshold value, the depth value is not credible, and the depth value corresponding to the pixel point is modified to be the preset depth value (for example, 0). The realization mode determines whether the depth value is credible or not through judging the credibility, thereby obtaining the depth information with high accuracy.

In the method provided by the embodiment corresponding to fig. 6, by selecting a model corresponding to the latitude span of the effective area in the panorama to be predicted from the plurality of depth map prediction models, the accuracy of obtaining depth information can be improved by using the characteristic that each model has high prediction accuracy for the panorama within a certain latitude span range, and the accuracy of subsequent operations such as three-dimensional model alignment, splicing and the like can be improved.

Exemplary devices

Fig. 9 is a schematic structural diagram of a depth map prediction model generation apparatus according to an exemplary embodiment of the present disclosure. The present embodiment can be applied to an electronic device, and as shown in fig. 9, the depth map prediction model generation apparatus includes: a first obtaining module 901, configured to obtain a preset number of training sample image pair sets, where a training sample image pair in each training sample image pair set includes a panorama and a corresponding depth map with the same effective latitude span, and the effective latitude span is a latitude span of an effective area in the panorama and the depth map; the training module 902 is configured to, for each training sample image pair set in a preset number of training sample image pair sets, use a panorama included in a training sample image pair in the training sample image pair set as an input of an initial model, use a depth map corresponding to the input panorama as an expected output of the initial model, train the initial model, and obtain a depth map prediction model corresponding to the training sample image pair set.

In this embodiment, the first obtaining module 901 may obtain a preset number of training sample image pair sets from a remote location or a local location. The training sample image pairs in each training sample image pair set comprise a panoramic image and a corresponding depth image with the same effective latitude span, and the effective latitude span is the latitude span of an effective area in the panoramic image and the depth image.

The effective area is a rectangular image area in the panoramic image and the depth image, and accordingly, an area other than the effective area is an area (for example, a black or other color area) not including an image. The latitude span corresponding to the effective area is the vertical visual angle of the effective area. Generally, the latitude span of the whole panorama is 180 °, the ratio of the height of the effective region to the height of the whole image can be multiplied by 180 °, and the latitude span of the effective region is obtained.

In this embodiment, for each training sample image pair set in the preset number of training sample image pair sets, the training module 902 may use a machine learning method to take a panorama included in a training sample image pair in the training sample image pair set as an input of the initial model, take a depth map corresponding to the input panorama as an expected output of the initial model, train the initial model, and obtain a depth map prediction model corresponding to the training sample image pair set.

The training module 902 may use a machine learning method to train an initial model (which may include a convolutional neural network, a cyclic neural network, or the like, for example) by using, for a certain training sample image pair set, a panorama included in a training sample image pair in the training sample image pair set as an input and using a depth map corresponding to the input panorama as an expected output, and may obtain an actual output for each training input panorama. And the actual output is data actually output by the initial model and is used for representing the depth corresponding to each pixel point. Then, the training module 902 may adjust parameters of the initial model based on the actual output and the expected output by using a gradient descent method and a back propagation method, take the model obtained after each parameter adjustment as the initial model for the next training, and terminate the training when a preset training termination condition is met, thereby obtaining the depth map prediction model corresponding to the training sample image pair set through training.

In some optional implementation manners, the initial model is used to perform a circular filling operation on the input panorama to obtain a filled image, and perform a convolution operation on the filled image.

In some optional implementations, the apparatus further comprises: a second obtaining module (not shown in the figure) for obtaining a set of initial image pairs, wherein an initial image pair in the set of initial image pairs includes a panoramic image of a preset latitude span and a corresponding depth image; a third obtaining module (not shown in the figure) for obtaining a preset latitude span set; a generating module (not shown in the figure) for, for each latitude span in the set of latitude spans, performing the following steps based on the latitude span: intercepting a sub-panorama and a sub-depth map corresponding to the latitude span from the panorama and the depth map included in each initial image pair in the initial image pair set; performing pixel compensation on the intercepted sub panoramic image and sub depth image to obtain a panoramic image and a depth image with preset proportion as a training sample image pair; and determining each obtained training sample image pair as a training sample image pair set corresponding to the latitude span.

In some optional implementations, the generation module is further to: randomly truncating a sub-panorama and a sub-depth map corresponding to the latitude span from the panorama and the depth map included in each initial image pair in the initial image pair set.

The depth map prediction model generation device provided by the above embodiment of the present disclosure obtains the depth map prediction model corresponding to each latitude span by setting a plurality of training sample image pair sets, where each training sample image pair set corresponds to one latitude span, and then training the model by using each training sample image pair set, so that the depth map prediction models can be used to perform depth map prediction on panoramic maps of various latitude spans, thereby improving generalization capability of the models and accuracy of the prediction.

Fig. 10 is a schematic structural diagram of a depth map prediction apparatus according to an exemplary embodiment of the present disclosure. The present embodiment can be applied to an electronic device, and as shown in fig. 10, the depth map prediction apparatus includes: a fourth obtaining module 1001, configured to obtain a panorama to be predicted; a first determining module 1002, configured to determine a latitude span of an effective area in a panorama to be predicted; a selecting module 1003, configured to select a depth map prediction model matched with a latitude span from a depth map prediction model set trained in advance, where the depth map prediction model set is obtained by training in advance based on the depth map prediction model generation method; and the prediction module 1004 is used for inputting the panoramic image to be predicted into the selected depth image prediction model to obtain a predicted depth image.

In this embodiment, the fourth obtaining module 1001 may obtain the panorama to be predicted from a local or remote location. The panorama can be a spherical Projection (Equirectangular Projection) image, and thus, each point in the panorama has a longitude value and a latitude value. Wherein, the upper and lower sides of the panorama do not require complete viewing angles, and the vertical viewing angle (i.e. the dimension range) can be as low as half or even lower. The aspect ratio of the panorama can be set to 2:1 in general, and if the panorama is less than 180 ° due to a vertical viewing angle, supplementary pixels (e.g., pixels with supplementary RGB values of 0) can be applied to the panorama.

In this embodiment, the first determining module 1002 may determine a latitude span of the effective area in the panorama to be predicted. As an example, the latitude span of the entire panorama is typically 180 °, then the ratio of the height of the active area to the height of the entire image may be multiplied by 180 °, resulting in the latitude span of the active area.

In this embodiment, the selection module 1003 may select a depth map prediction model matching the latitude span from a set of depth map prediction models trained in advance. The depth map prediction model set is obtained by training in advance based on the method described in the embodiment corresponding to fig. 2.

Each depth map prediction model in the depth map prediction model set corresponds to a latitude span, and the selection module 1003 may match the latitude span c obtained in step 602 with the latitude span corresponding to each depth map prediction model. As an example, assume that each model has a sequence number m_iThe model corresponds to a latitude span of c_iAnd i is 1, 2 and 3. If c is equal to 180 deg., then model m is selected₀(ii) a If c is_i≤c＜c_i-1Then, model m is selected_i(ii) a If c is<c₃Then, model m is selected₃。

In this embodiment, the prediction module 1004 may input the panorama to be predicted into the selected depth map prediction model to obtain a predicted depth map. The predicted depth map may be output in various ways, for example, the obtained depth values of the respective pixel points are stored in a memory as the predicted depth map in a matrix form. Alternatively, the obtained depth data is converted into an image and displayed on a display.

In some optional implementations, the apparatus further comprises: a fifth obtaining module (not shown in the figure) for obtaining an initial panorama, wherein the initial panorama comprises an image area; a second determination module (not shown in the figure) for determining whether the image area is a rectangular area; a third determining module (not shown in the figure) for determining the rectangular area as the effective area if the rectangular area is the effective area; a fourth determining module (not shown in the figure) for cutting out the largest inscribed rectangle area from the image area as an effective area if the image area is not a rectangle area; and a filling module (not shown in the figure) for performing pixel filling on the region outside the effective region to obtain the panoramic image to be predicted with the preset proportion in response to determining that the aspect ratio of the effective region is not the preset proportion.

In some optional implementations, the depth map prediction model is further configured to output a confidence level corresponding to each pixel point in the predicted depth map; and the apparatus further comprises: a correction module (not shown in the figure) for determining, for each pixel point in the predicted depth map, whether a confidence corresponding to the pixel point is greater than or equal to a preset threshold; if the depth value is larger than or equal to the preset threshold value, keeping the depth value corresponding to the pixel point unchanged; and if the depth value is smaller than the preset threshold value, modifying the depth value corresponding to the pixel point to be the preset depth value.

According to the depth map prediction device provided by the embodiment of the disclosure, the model corresponding to the latitude span of the effective area in the panorama to be predicted is selected from the multiple depth map prediction models, so that the characteristic that each model has high prediction accuracy for the panorama within a certain latitude span range can be utilized, the accuracy of obtaining depth information is improved, and the accuracy of subsequent operations such as alignment, splicing and the like of the three-dimensional model is improved.

Exemplary electronic device

Next, an electronic apparatus according to an embodiment of the present disclosure is described with reference to fig. 11. The electronic device may be either or both of the terminal device 101 and the server 103 as shown in fig. 1, or a stand-alone device separate from them, which may communicate with the terminal device 101 and the server 103 to receive the collected input signals therefrom.

FIG. 11 illustrates a block diagram of an electronic device in accordance with an embodiment of the disclosure.

As shown in fig. 11, the electronic device 1100 includes one or more processors 1101 and memory 1102.

The processor 1101 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 1100 to perform desired functions.

Memory 1102 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, Random Access Memory (RAM), cache memory (or the like). The non-volatile memory may include, for example, Read Only Memory (ROM), a hard disk, flash memory, and the like. One or more computer program instructions may be stored on a computer-readable storage medium and executed by the processor 1101 to implement the methods of the various embodiments of the present disclosure above and/or other desired functions. Various content such as panoramas, depth maps, etc. may also be stored in the computer readable storage medium.

In one example, the electronic device 1100 may further include: an input device 1103 and an output device 1104, which are interconnected by a bus system and/or other form of connection mechanism (not shown).

For example, when the electronic device is the terminal device 101 or the server 103, the input device 1103 may be a device such as a camera, a mouse, or a keyboard, and is used to input a panorama or the like. When the electronic device is a stand-alone device, the input device 1103 may be a communication network connector for receiving the inputted panorama or the like from the terminal device 101 and the server 103.

The output unit 1104 can output various information including the generated depth map prediction model, the predicted depth map, and the like to the outside. The output devices 1104 may include, for example, a display, speakers, printer, and remote output device connected to a communication network or the like.

Of course, for simplicity, only some of the components of the electronic device 1100 relevant to the present disclosure are shown in fig. 11, omitting components such as buses, input/output interfaces, and the like. In addition, electronic device 1100 may include any other suitable components depending on the particular application.

Exemplary computer program product and computer-readable storage Medium

In addition to the methods and apparatus described above, embodiments of the present disclosure may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform steps in methods according to various embodiments of the present disclosure as described in the "exemplary methods" section of this specification above.

The computer program product may write program code for carrying out operations for embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, embodiments of the present disclosure may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform steps in methods according to various embodiments of the present disclosure as described in the "exemplary methods" section above of this specification.

The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing describes the general principles of the present disclosure in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present disclosure are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present disclosure. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the disclosure is not intended to be limited to the specific details so described.

In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts in the embodiments are referred to each other. For the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The block diagrams of devices, apparatuses, systems referred to in this disclosure are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".

The methods and apparatus of the present disclosure may be implemented in a number of ways. For example, the methods and apparatus of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise. Further, in some embodiments, the present disclosure may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.

It is also noted that in the devices, apparatuses, and methods of the present disclosure, each component or step can be decomposed and/or recombined. These decompositions and/or recombinations are to be considered equivalents of the present disclosure.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit embodiments of the disclosure to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims

1. A depth map prediction model generation method comprises the following steps:

acquiring a preset number of training sample image pair sets, wherein the training sample image pairs in each training sample image pair set comprise a panoramic image and a corresponding depth image with the same effective latitude span, and the effective latitude span is the latitude span of an effective area in the panoramic image and the depth image;

and for each training sample image pair set in the preset number of training sample image pair sets, taking a panoramic image included in the training sample image pair set as the input of an initial model, taking a depth map corresponding to the input panoramic image as the expected output of the initial model, and training the initial model to obtain a depth map prediction model corresponding to the training sample image pair set.

2. The method of claim 1, wherein the initial model is used to perform a circular fill operation on the input panorama to obtain a filled image, and perform a convolution operation on the filled image.

3. The method of claim 1 or 2, wherein prior to said obtaining a preset number of training sample image pair sets, the method further comprises:

acquiring an initial image pair set, wherein an initial image pair in the initial image pair set comprises a panoramic image with a preset latitude span and a corresponding depth image;

acquiring a preset latitude span set;

for each latitude span in the set of latitude spans, performing the following steps based on the latitude span:

intercepting a sub-panorama and a sub-depth map corresponding to the latitude span from the panorama and the depth map included in each initial image pair in the initial image pair set;

performing pixel compensation on the intercepted sub panoramic image and sub depth image to obtain a panoramic image and a depth image with preset proportion as a training sample image pair;

and determining each obtained training sample image pair as a training sample image pair set corresponding to the latitude span.

4. The method of claim 3, wherein the truncating the sub-panorama and the sub-depth map corresponding to the latitude span from the panorama and the depth map included in each initial image pair of the set of initial image pairs comprises:

randomly intercepting a sub-panorama and a sub-depth map corresponding to the latitude span from the panorama and the depth map included in each initial image pair in the initial image pair set.

5. A depth map prediction method, comprising:

acquiring a panorama to be predicted;

determining the latitude span of the effective area in the panoramic image to be predicted;

selecting a depth map prediction model matched with the latitude span from a pre-trained depth map prediction model set, wherein the depth map prediction model set is obtained by pre-training based on the method of one of claims 1 to 4;

and inputting the panoramic image to be predicted into the selected depth image prediction model to obtain a predicted depth image.

6. The method of claim 5, wherein prior to said obtaining a panorama to predict, the method further comprises:

acquiring an initial panoramic image, wherein the initial panoramic image comprises an image area;

determining whether the image area is a rectangular area;

if the area is a rectangular area, determining the rectangular area as an effective area;

if the image area is not a rectangular area, intercepting the largest inscribed rectangular area from the image area as an effective area;

and in response to the fact that the length-width ratio of the effective area is not the preset proportion, pixel filling is carried out on the area outside the effective area, and the panorama to be predicted with the preset proportion is obtained.

7. A depth map prediction model generation apparatus comprising:

the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a preset number of training sample image pair sets, the training sample image pairs in each training sample image pair set comprise a panoramic image and a corresponding depth image with the same effective latitude span, and the effective latitude span is the latitude span of an effective area in the panoramic image and the depth image;

and the training module is used for taking a panoramic image included in the training sample image pair set as the input of an initial model and taking a depth map corresponding to the input panoramic image as the expected output of the initial model for each training sample image pair set in the preset number of training sample image pair sets, and training the initial model to obtain a depth map prediction model corresponding to the training sample image pair set.

8. A depth map prediction apparatus comprising:

the fourth acquisition module is used for acquiring the panoramic image to be predicted;

a first determining module, configured to determine a latitude span of an effective area in the panorama to be predicted;

a selection module, configured to select a depth map prediction model matching the latitude span from a pre-trained depth map prediction model set, where the depth map prediction model set is pre-trained based on the method of any one of claims 1 to 4;

and the prediction module is used for inputting the panoramic image to be predicted into the selected depth image prediction model to obtain a predicted depth image.

9. A computer-readable storage medium, the storage medium storing a computer program for performing the method of any of the preceding claims 1-6.

10. An electronic device, the electronic device comprising:

a processor;

a memory for storing the processor-executable instructions;

the processor is configured to read the executable instructions from the memory and execute the instructions to implement the method of any one of claims 1-6.