WO2023005169A1 - 深度图像生成方法和装置 - Google Patents

深度图像生成方法和装置 Download PDF

Info

Publication number
WO2023005169A1
WO2023005169A1 PCT/CN2022/072963 CN2022072963W WO2023005169A1 WO 2023005169 A1 WO2023005169 A1 WO 2023005169A1 CN 2022072963 W CN2022072963 W CN 2022072963W WO 2023005169 A1 WO2023005169 A1 WO 2023005169A1
Authority
WO
WIPO (PCT)
Prior art keywords
depth
image
color
depth image
target area
Prior art date
Application number
PCT/CN2022/072963
Other languages
English (en)
French (fr)
Inventor
顾晓东
潘慈辉
Original Assignee
贝壳技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 贝壳技术有限公司 filed Critical 贝壳技术有限公司
Publication of WO2023005169A1 publication Critical patent/WO2023005169A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/77Retouching; Inpainting; Scratch removal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/60Image enhancement or restoration using machine learning, e.g. neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/04Indexing scheme for image data processing or generation, in general involving 3D image data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30168Image quality inspection

Definitions

  • the present disclosure relates to image processing technology, in particular to a depth image generation method, device, electronic equipment and storage medium.
  • the accuracy of the depth values of the pixels of the depth image obtained in the above manner may be low.
  • Embodiments of the present disclosure provide a depth image generation method, device, electronic device, and storage medium, so as to improve the accuracy of depth values of pixels in the depth image.
  • a method for generating a depth image including:
  • the re-determining the depth values of the pixels in the target area based on the color image includes:
  • Re-determine depth values of pixels in the target area based on the color image and the position of the target area in the depth image.
  • re-determining the depth values of the pixels in the target area based on the color image and the position of the target area in the depth image Generate a new depth image, including:
  • the color image and the marked depth image are input to a pre-trained depth model, and a new depth image is generated through the depth model, wherein the depth model is used to generate new depth image.
  • the depth model includes an encoding module, a decoding module and a downsampling module
  • Said inputting the color image and the marked depth image into a pre-trained depth model, and generating a new depth image through the depth model includes:
  • a new depth image is generated by the decoding module based on the feature data generated by the encoding module and the feature data generated by the down-sampling module.
  • the encoding module is configured to perform a downsampling operation on the color image; the number of downsampling layers included in the downsampling module is greater than that included in the encoding module. The number of layers of the downsampling layer is one less.
  • the depth model is obtained by training in the following manner:
  • training samples in the training sample set include input data and expected output data
  • the input data includes color sample images and marked depth sample images corresponding to the color sample images
  • the expected output data includes the depth of the expected output sample image
  • the input data included in the training samples in the training sample set is used as an input, and the expected output data corresponding to the input data is used as an expected output to train a deep model.
  • the accuracy of the depth value of the pixel in the expected output depth sample image included in the training sample set is greater than the preset accuracy threshold.
  • the loss function of the depth model is determined based on at least one of the following:
  • the mean value of the relative error of the actual output data and the expected output data the mean value of the relative error of the gradient of the actual output data and the expected output data, and the structural similarity between the actual output data and the expected output data.
  • the position indicated by the position information in the training sample set is: a randomly determined rectangular area in the depth sample image for which depth values have been obtained.
  • the color sample images in the training sample set and the depth sample images corresponding to the color sample images are captured by the user's mobile terminal; or
  • the color sample images in the training sample set and the depth sample images corresponding to the color sample images are generated by the mobile terminal of the user based on the captured images.
  • the color image is a color panoramic image
  • the depth image is a depth panoramic image
  • the color sample images in the training sample set are color panoramic images
  • the training The depth sample images in the sample set are depth panorama images.
  • the color sample image in the training sample set and the depth sample image corresponding to the color sample image indicate the same scene, and the acquired color image and the acquired depth image indicate same scene.
  • a device for generating a depth image including:
  • an acquisition unit configured to acquire a color image and a depth image, wherein the scene indicated by the color image matches the scene indicated by the depth image;
  • a generating unit configured to, in response to determining that a target area exists in the depth image, redetermine depth values of pixels in the target area based on the color image, and generate a new depth image, wherein, in the target area The accuracy of the depth value of the pixels is less than or equal to the preset accuracy threshold.
  • the generating unit includes:
  • the determination subunit is configured to re-determine the depth values of the pixels in the target area based on the color image and the position of the target area in the depth image.
  • the determination subunit includes:
  • a marking module configured to mark the position of the target area of the depth image to obtain a marked depth image
  • the input module is configured to input the color image and the marked depth image into a pre-trained depth model, and generate a new depth image through the depth model, wherein the depth model is used to generate a new depth image based on the color image and The labeled depth image generates a new depth image.
  • the depth model includes an encoding module, a decoding module, and a downsampling module
  • the input module is specifically configured as:
  • a new depth image is generated by the decoding module based on the feature data generated by the encoding module and the feature data generated by the down-sampling module.
  • the encoding module is configured to perform a downsampling operation on the color image; the number of downsampling layers included in the downsampling module is greater than that included in the encoding module. The number of layers of the downsampling layer is one less.
  • the depth model is obtained by training a training unit, and the training unit includes:
  • the acquisition subunit is configured to acquire a training sample set, wherein the training samples in the training sample set include input data and expected output data, the input data includes a color sample image and a marked depth sample image corresponding to the color sample image, and it is expected that The output data includes a desired output depth sample image;
  • the training subunit is configured to use a machine learning algorithm to train the deep model by using the input data included in the training samples in the training sample set as input and the expected output data corresponding to the input data as the expected output.
  • the depth value accuracy of the pixels in the expected output depth sample image included in the training sample set is greater than the preset accuracy threshold.
  • the loss function of the depth model is determined based on at least one of the following:
  • the mean value of the relative error of the actual output data and the expected output data the mean value of the relative error of the gradient of the actual output data and the expected output data, and the structural similarity between the actual output data and the expected output data.
  • the position indicated by the position information in the training sample set is: a randomly determined rectangular area in the depth sample image for which a depth value has been obtained.
  • the color sample images in the training sample set and the depth sample images corresponding to the color sample images are captured by the user's mobile terminal; or
  • the color sample images in the training sample set and the depth sample images corresponding to the color sample images are generated by the mobile terminal of the user based on the captured images.
  • the color image is a color panoramic image
  • the depth image is a depth panoramic image
  • the color sample images in the training sample set are color panoramic images
  • the training The depth sample images in the sample set are depth panorama images.
  • the color sample image in the training sample set and the depth sample image corresponding to the color sample image indicate the same scene, and the acquired color image and the acquired depth image indicate same scene.
  • an electronic device including:
  • the processor is configured to execute the computer program stored in the memory, and when the computer program is executed, implement the method described in any one of the above-mentioned embodiments of the present disclosure.
  • a computer-readable medium When the computer program is executed by a processor, the method in any embodiment of the method for generating a depth image according to the first aspect above is implemented.
  • a computer program includes computer readable code, and when the computer readable code is run on a device, it causes a processor in the device to execute Instructions for each step in the method of any embodiment of the method for generating a depth image in the first aspect above.
  • a color image and a depth image can be acquired first, wherein the scene indicated by the color image matches the scene indicated by the depth image, and then, after determining If there is a target area in the depth image, re-determine the depth values of the pixels in the target area based on the color image to generate a new depth image, wherein the accuracy of the depth values of the pixels in the target area is less than or equal to the preset Accuracy Threshold.
  • Threshold the preset Accuracy
  • Fig. 1 is a flow chart of the first embodiment of the method for generating a depth image in the present disclosure.
  • Fig. 2 is a flow chart of the second embodiment of the depth image generation method of the present disclosure.
  • FIG. 3A is a schematic diagram of the first structure of the depth model in the depth image generation method of the present disclosure.
  • FIG. 3B is a second structural schematic diagram of a depth model in the depth image generation method of the present disclosure.
  • 4A-4B are schematic diagrams of application scenarios of an embodiment of the method for generating a depth image in the present disclosure.
  • Fig. 5 is a schematic structural diagram of an embodiment of a depth image generating device of the present disclosure.
  • Fig. 6 is a structural diagram of an electronic device provided by an exemplary embodiment of the present disclosure.
  • plural may refer to two or more than two, and “at least one” may refer to one, two or more than two.
  • the term "and/or" in the present disclosure is only an association relationship describing associated objects, indicating that there may be three relationships, for example, A and/or B may indicate: A exists alone, and A and B exist simultaneously , there are three cases of B alone.
  • the character "/" in the present disclosure generally indicates that the contextual objects are an "or" relationship.
  • Embodiments of the present disclosure may be applied to at least one electronic device of a terminal device, a computer system, and a server, which may operate with numerous other general purpose or special purpose computing system environments or configurations.
  • Examples of well-known computing systems, environments and/or configurations suitable for use with at least one of electronic devices in terminal devices, computer systems and servers include, but are not limited to: personal computer systems, server computer systems, thin clients, thick Client computers, handheld or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, networked personal computers, minicomputer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the foregoing, etc. .
  • At least one electronic device of a terminal device, a computer system, and a server may be described in the general context of computer system-executable instructions, such as program modules, being executed by the computer system.
  • program modules may include routines, programs, objects, components, logic, data structures, etc., that perform particular tasks or implement particular abstract data types.
  • the computer system/server can be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network.
  • program modules may be located in both local and remote computing system storage media including storage devices.
  • FIG. 1 shows a flow 100 of a first embodiment of a method for generating a depth image according to the present disclosure.
  • the depth image generation method includes:
  • the execution subject of the depth image generation method (such as a server, a terminal device, an image processing unit with an image processing function, etc.) can obtain color images and depth image.
  • the scene indicated by the color image matches the scene indicated by the depth image.
  • the scene indicated by the color image and the scene indicated by the depth image may be the same.
  • the scene indicated by the color image may also include the same part as the scene indicated by the depth image.
  • the depth image may be used as the depth image corresponding to the color image.
  • the color sample images in the training sample set and the depth sample images corresponding to the color sample images are captured by the mobile terminal of the user.
  • the user's mobile terminal may include but not limited to: a mobile phone, a tablet computer, and the like.
  • the depth image captured by the mobile terminal of the user needs to use the depth image generation method in the present disclosure to improve the accuracy of the depth value of the pixels in the depth image.
  • the color sample images in the training sample set and the depth sample images corresponding to the color sample images are generated by the mobile terminal of the user based on the captured images.
  • the user's mobile terminal may include but not limited to: a mobile phone, a tablet computer, and the like.
  • the original depth image acquired by the mobile terminal of the user with the lidar is relatively sparse.
  • a software method can be used to complement the acquired original sparse depth image, so as to obtain a denser depth image.
  • the accuracy of the depth values of the pixels in the depth image obtained by using the above software method is relatively low. Therefore, the depth image generated by the user's mobile terminal based on the captured depth image also needs to use the depth image generation method in the present disclosure to improve the accuracy of the depth values of the pixels in the depth image.
  • the color image and the depth image corresponding to the color image may also be acquired by any one or more devices having the color image and the depth image simultaneously or separately.
  • the color image in 101 is a color panoramic image
  • the depth image in 101 is a depth panoramic image
  • the color sample images in the training sample set are color panoramic images
  • the depth in the training sample set is a color panoramic image.
  • the sample image is a depth panorama image.
  • the above optional implementation can generate a more accurate depth panoramic image based on a color panoramic image with richer information.
  • the depth value of the pixel is a color panoramic image with richer information.
  • the color sample image in the training sample set and the depth sample image corresponding to the color sample image indicate the same scene, and the acquired color image and the acquired depth image indicate the same scene.
  • the corresponding color image region can be determined from the color sample image region, thus, the accuracy of the generated new depth image can be further improved through subsequent steps.
  • the image corresponding to the depth value whose accuracy is less than or equal to the preset accuracy threshold in the depth image may be The area is used as the target area, and further, the execution subject may redetermine the depth values of the pixels in the target area based on the color image to generate a new depth image.
  • the above-mentioned execution subject may determine the accuracy of the depth value of each pixel in the depth image in various ways.
  • some mobile phones and other devices can give a confidence level while shooting or generating a depth image through software.
  • the value of the confidence level is 0, 1 or 2.
  • a machine learning algorithm may also be used to determine the accuracy of the depth value of each pixel in the depth image.
  • the above execution subject may determine the depth values of pixels in the re-determined target area in the following manner:
  • the color image region matching the target region is determined from the color image.
  • the scene indicated by the color image area matching the target area may be the same as the scene indicated by the target area.
  • the determined color image area is input to the depth value determination model in the pre-training to obtain the depth value of the pixel points in the target area.
  • the above-mentioned depth value determination model may be used to determine the depth values of pixels in the depth image area matching the input color image area.
  • the above-mentioned depth value determination model may be a convolutional neural network trained by using a machine learning algorithm based on a training sample set.
  • the training samples in the above training sample set may include a color sample image and a depth sample image matched with the color sample image.
  • each new depth value generated may have a one-to-one correspondence with each depth value in the target area. Therefore, the execution subject may update each depth value of the target area to correspond to the depth value The new depth value of , thus generating a new depth image.
  • the depth image generation method can first acquire a color image and a depth image, wherein the scene indicated by the color image matches the scene indicated by the depth image, and then, when it is determined that there is a target area in the depth image Next, redetermine the depth values of the pixels in the target area based on the color image to generate a new depth image, wherein the accuracy of the depth values of the pixels in the target area is less than or equal to a preset accuracy threshold.
  • the depth values of the pixels in the image region with lower accuracy in the depth image corresponding to the color image can be re-determined, thereby generating a new depth image, thereby improving the accuracy of the pixels in the depth image.
  • the accuracy of the depth value is based on the color image, the depth values of the pixels in the image region with lower accuracy in the depth image corresponding to the color image.
  • FIG. 2 is a flowchart of a second embodiment of the depth image generation method of the present disclosure.
  • the depth image generation process 200 includes:
  • the execution subject of the depth image generation method (such as a server, a terminal device, an image processing unit with an image processing function, etc.) can obtain color images and depth image. Wherein, the scene indicated by the color image matches the scene indicated by the depth image.
  • step 201 is basically the same as step 101 in the embodiment corresponding to FIG. 1 , and will not be repeated here.
  • the image area corresponding to the depth value in the depth image whose accuracy is less than or equal to the preset accuracy threshold As the target area, based on the color image and the position of the target area in the depth image, the depth values of the pixels in the target area are re-determined, so as to obtain a new depth value of the target area.
  • the above execution subject may execute 202 in the following manner:
  • the position of the target area of the depth image is marked to obtain the marked depth image.
  • the color image and the marked depth image are input to the pre-trained depth model, and a new depth image is generated through the depth model.
  • the depth model is used to determine a new depth image based on the color image and the marked depth image.
  • the depth model includes an encoding module, a decoding module and a downsampling module. Based on this, a new depth image can be generated in the following manner: through the encoding module, based on the color image, the feature data of the color image is generated; through the downsampling module, based on the marked depth image, the feature data of the marked depth image is generated; Through the decoding module, a new depth image is generated based on the feature data generated by the encoding module and the feature data generated by the down-sampling module.
  • the output depth image may be a desired output depth image, or an actually output depth image.
  • the expected output depth image is input as the input data of the depth model (that is, the expected output data); in the process of using the depth model to generate a new depth image, the actual output depth model is used as the depth model.
  • the output data (that is, the actual output data) is output.
  • the actual output data of the depth model can be used as the generated new depth image.
  • the encoding module may be used to perform one or more down-sampling operations.
  • the encoding module may include at least one of the following: a residual network (ResNet) 50 architecture, a residual network (ResNet) 101 architecture, etc., and each downsampling operation may divide the length and width by 2 (or other values ) downsampling.
  • the decoding module can be used to perform one or more upsampling operations. For example, each downsampling operation may perform an upsampling by multiplying the length and width by 2 (or other values).
  • the downsampling module can be used to perform one or more downsampling operations. For example, each downsampling operation may perform a downsampling in which the length and width are divided by 2 (or other values).
  • FIG. 3A is a schematic diagram of the first structure of the depth model in the depth image generation method of the present disclosure.
  • the depth model includes a decoding module 303 , an encoding module 301 and a downsampling module 302 .
  • the input data of the decoding module 303 is the output data of the encoding module 301 and the output data of the down-sampling module 302 .
  • the input data of the encoding module 301 includes a color image
  • the input data of the downsampling module 302 includes a marked depth image
  • the output data of the decoding module 303 includes depth values of pixels in the target area.
  • the encoding module can be used to perform downsampling operations on color images; the number of downsampling layers included in the downsampling module is one less than the number of downsampling layers included in the encoding module, that is, When the determination of the depth value of a pixel point in a single target area is completed based on the depth model, the number of downsampling operations performed by the downsampling module is one less than the number of downsampling operations performed by the encoding module.
  • each downsampling layer can reduce the dimension of the image to 1/4 of the scale of the original image, that is, the downsampling operation of dividing the length and width by 2 .
  • the depth model is trained in the following ways:
  • the training samples in the training sample set include input data and expected output data
  • the input data includes color sample images and marked depth sample images corresponding to the color sample images
  • the expected output data includes expected output depth sample images.
  • the number of training samples in the training sample set, the size of color sample images, and the size of depth sample images can be set according to requirements. For example, 500,000 color panoramic images and depth panoramic images corresponding to color panoramic images can be prepared.
  • the panoramic image and the depth panoramic image may both be images with a length and a width of 640 pixels and 320 pixels, respectively.
  • the input data included in the training samples in the training sample set is used as an input, and the expected output data corresponding to the input data is used as an expected output to train a deep model.
  • a depth model can be obtained through machine learning algorithm training, so as to generate a new depth image through the depth model, thus further improving the accuracy of the depth value of the pixel in the depth image.
  • the accuracy of the depth values of each pixel in all expected output depth sample images included in the training sample set is greater than the above preset accuracy threshold.
  • the desired output depth sample image included in the training sample set may be obtained by means of a depth camera, a laser radar, or the like.
  • using depth images with high accuracy depth values to train the depth model can improve the accuracy of the depth values of the pixels in the target area output by the depth model, thereby further improving the final generation.
  • the accuracy of the depth image can improve the accuracy of the depth values of the pixels in the target area output by the depth model, thereby further improving the final generation.
  • the loss function of the depth model is determined based on at least one of the following:
  • structural similarity is an index to measure the similarity between two images.
  • the structural similarity index defines structural information as independent of brightness and contrast from the perspective of image composition, reflects the properties of the object structure in the scene, and models distortion as brightness, contrast and structure. A combination of different factors. Use the mean as an estimate of brightness, the standard deviation as an estimate of contrast, and the covariance as a measure of structural similarity.
  • the loss function L can be expressed as:
  • k1, k2, and k3 can be three predetermined constants, and the value is based on making the overall proportion of the second item k2*l_edge and the third item k3*l_ssim in L exceed the first item k1*l_depth .
  • l_depth is the mean value of the relative error between the actual output data and the expected output data;
  • l_edge is the mean value of the relative error of the gradient between the actual output data and the expected output data;
  • l_ssim is the structural similarity between the actual output data and the expected output data.
  • loss functions different from the above-mentioned loss functions may be determined for the depth model, which will not be repeated here.
  • the loss function can be determined for the depth model based on at least one of the above three, and the accuracy of the depth value of the pixels in the target area output by the depth model can be improved, thereby further improving the final generation.
  • the accuracy of the depth image can be determined for the depth model based on at least one of the above three, and the accuracy of the depth value of the pixels in the target area output by the depth model can be improved, thereby further improving the final generation. The accuracy of the depth image.
  • the location indicated by the location information in the training sample set is: a rectangular area in the depth sample image that has obtained depth values and is randomly determined.
  • the depth image acquisition device such as a mobile phone
  • this can further improve the accuracy of the pixels in the target area output by the depth model.
  • the accuracy of the depth value improves the training speed of the depth model.
  • the position of the predetermined rectangular area in the depth sample image may also be used as the position indicated by the position information in the training sample set.
  • the depth model in addition to being used to generate depth images based on color images and labeled depth images, the depth model can also be used to perform the following operations:
  • Operation 1 marking the position of the target area of the depth image to obtain the marked depth image
  • Operation 2 determine whether there is a depth value in the depth image whose accuracy is less than or equal to the preset accuracy threshold, and if it exists, take the image area corresponding to the depth value in the depth image whose accuracy is less than or equal to the preset accuracy threshold as the target area.
  • FIG. 3B is a second structural schematic diagram of a depth model in the depth image generation method of the present disclosure.
  • the depth model includes a decoding module 313 , an encoding module 311 and a downsampling module 312 .
  • the input data of the decoding module 313 is the output data of the encoding module 311 and the output data of the down-sampling module 312 .
  • the input data of the encoding module 311 includes a color image
  • the input data of the downsampling module 312 includes a marked depth image
  • the output data of the decoding module 313 includes depth values of pixels in the target area.
  • the coding module 311 can use ResNet50.
  • Enc1-5 in ResNet50 each performs a downsampling of the length and width divided by 2. That is, if the scale of the color image input by Enc1 is 1, then the scale of the feature data output by Enc1 is 1/2, the scale of the feature data output by Enc2 is 1/4...the scale of the feature data output by Enc5 is 1/ 32.
  • the feature data output by Enc5 is used as the input of decoding module 313 .
  • the feature data output by Enc5 and Enc4 can be used as the input of Dec4 included in the decoding module 313
  • the feature data output by Enc3 can be used as the input of Dec3 included in the decoding module 31
  • the feature data output by Enc2 can be used as the input of Dec2 included in the decoding module 313.
  • Input, the feature data output by Enc1 can be used as the input of Dec1 included in the decoding module 403 .
  • the decoding module 313 may perform Dec4-1 four times of upsampling of length and width times 2. Wherein, if the input scale of Dec4 is 1/32 and the output scale is 1/16, then the output scale of Dec3 is 1/8...the output scale of Dec1 is 1/2.
  • the marked depth images are divided by the downsampling module Dwn1-4 of 2 to obtain depth images with a scale of 1/2-1/16 respectively.
  • the feature data output by Dwn1-4 can be used as the input of Dec1-4 included in the decoding module 313 respectively.
  • an upsampling of its length and width by 2 can be performed first, and then the result of the upsampling can be combined with the feature data output by Enc4 and the feature output by Dwn4
  • the data is spliced, and then upsampled by multiplying the length and width by 2.
  • the characteristic data output by Enc3 and Dwn3 are input to Dec3, the two can be spliced first, and then the upsampling by multiplying the length and width by 2 is performed.
  • Enc2 After the feature data output by Dwn2 is input to Dec2, the two can be spliced first, and then the length and width are multiplied by 2 for upsampling. After the feature data output by Enc1 and Dwn1 are input to Dec1, the two can be spliced first. Then perform upsampling by multiplying the length and width by 2.
  • Dec1-4 may also perform a convolution operation after performing the upsampling operation.
  • a convolution operation after performing the upsampling operation.
  • two convolutional layers may be included in the structure of Dec1-4 respectively.
  • the above execution subject may also re-determine the depth values of the pixels in the target area in the following manner:
  • the color image area corresponding to the target area in the color image is determined.
  • the depth values of the pixels in the target area are determined based on the color image area.
  • the above steps may be performed in the following manner:
  • the color image and depth image are obtained by shooting with a mobile phone with its own lidar. Due to power consumption, the depth image acquired by the laser radar configured on the mobile phone is very sparse, so the captured depth image is complemented by software to obtain a denser depth image, and at the same time a confidence level is given.
  • the value of the confidence level is 0, 1, 2, where the greater the value of the confidence, the more accurate the depth value of the corresponding pixel.
  • mark the pixels whose confidence value is not 2 in the depth panorama image for example, set the depth value of the pixel point whose confidence value is not 2 in the above depth panorama image to 0.
  • our expected output is: the depth value of the pixel points in the unmarked image area remains unchanged; while the depth value of the pixel points in the marked image area needs to be re-determined, and there can be a smooth transition at the boundary.
  • the above depth model may be used to obtain a new depth panorama image, which will not be repeated here.
  • pixels with a confidence value of 0 may be provided in the depth panorama image itself, which means that there is no measurement value, and the instrument fails to provide a measurement value of the depth of this point.
  • pixels whose confidence value is 0 do not require re-determining their depth values; only the depth values of pixels in the image area whose depth values are set to 0 need to be re-determined.
  • FIGS. 4A-4B are schematic diagrams of application scenarios of an embodiment of the method for generating a depth image in the present disclosure.
  • the embodiment of the present application may also include the same or similar features and effects as those of the embodiment corresponding to FIG. 1 , which will not be repeated here.
  • the process 200 of the depth image generation method in this embodiment can determine the depth value of the pixel in the target area based on the color image and the position of the target area in the depth image, so as to refer to the The position of the region in the depth image determines the more accurate depth value of the pixel in the target region.
  • the present disclosure provides an embodiment of a device for generating a depth image, which corresponds to the method embodiment shown in FIG. 1 , except for the following
  • the device embodiment may also include the same or corresponding features as the method embodiment shown in FIG. 1 , and produce the same or corresponding effects as the method embodiment shown in FIG. 1 .
  • the device can be specifically applied to various electronic devices.
  • the depth image generation device 500 of this embodiment includes: an acquisition unit 501 configured to acquire a color image and a depth image, wherein the scene indicated by the color image matches the scene indicated by the depth image; the generation unit 502 , configured to, in response to determining that there is a target region in the depth image, redetermine the depth values of the pixels in the target region based on the color image, and generate a new depth image, wherein the accuracy of the depth values of the pixels in the target region is less than or equal to the preset accuracy threshold.
  • the acquiring unit 501 of the depth image generating apparatus 500 can acquire a color image and a depth image. Wherein, the scene indicated by the color image matches the scene indicated by the depth image.
  • the generation unit 502 may set the depth image acquired by the acquisition unit 501 The image area corresponding to the depth value of the accuracy threshold is set as the target area, and the depth values of the pixels in the target area are re-determined based on the color image to generate a new depth image.
  • the generating unit 502 includes:
  • the determination subunit (not shown in the figure) is configured to re-determine the depth value of the pixel in the target area based on the color image and the position of the target area in the depth image.
  • the determining subunit includes:
  • a marking module (not shown in the figure), configured to mark the position of the target region of the depth image, and obtain the marked depth image;
  • the input module (not shown in the figure) is configured to input the color image and the marked depth image to the pre-trained depth model, and generate a new depth image through the depth model, wherein the depth model is used to The marked depth image determines the new depth image.
  • the depth model includes an encoding module, a decoding module, and a downsampling module
  • the above output module is specifically configured as:
  • the characteristic data of the color image is generated
  • a new depth image is generated based on the feature data generated by the encoding module and the feature data generated by the down-sampling module.
  • the encoding module is configured to perform a downsampling operation on the color image; the number of downsampling layers included in the downsampling module is one less than the number of downsampling layers included in the encoding module.
  • the depth model is obtained by training a training unit (not shown in the figure), and the training unit includes:
  • the acquisition subunit (not shown in the figure) is configured to acquire a training sample set, wherein the training samples in the training sample set include input data and expected output data, and the input data includes color sample images and marked images of corresponding color sample images
  • the depth sample image, the expected output data includes the expected output depth sample image
  • the training subunit (not shown in the figure) is configured to adopt a machine learning algorithm, use the input data included in the training samples in the training sample set as input, and use the expected output data corresponding to the input data as the expected output to train a deep model.
  • the accuracy of the depth values of each pixel in all desired output depth sample images included in the training sample set is greater than a preset accuracy threshold.
  • the loss function of the depth model is determined based on at least one of the following:
  • the mean value of the relative error of the actual output data and the expected output data the mean value of the relative error of the gradient of the actual output data and the expected output data, and the structural similarity between the actual output data and the expected output data.
  • the position indicated by the position information in the training sample set is: a randomly determined rectangular area in the depth sample image for which the depth value has been obtained.
  • the color sample images in the training sample set and the depth sample images corresponding to the color sample images are captured by the user's mobile terminal; or
  • the color sample images in the training sample set and the depth sample images corresponding to the color sample images are generated by the mobile terminal of the user based on the captured images.
  • the color image is a color panoramic image
  • the depth image is a depth panoramic image
  • the color sample images in the training sample set are color panoramic images
  • the depth sample images in the training sample set are depth panoramic images .
  • the color sample image in the training sample set and the depth sample image corresponding to the color sample image indicate the same scene, and the acquired color image and the acquired depth image indicate the same scene.
  • the acquisition unit 501 can acquire a color image and a depth image, wherein the scene indicated by the color image matches the scene indicated by the depth image, and then the generation unit 502 can determine the depth If there is a target area in the image, redetermine the depth values of the pixels in the target area based on the color image to generate a new depth image, wherein the accuracy of the depth values of the pixels in the target area is less than or equal to the preset accuracy degree threshold. In this way, based on the color image, it is possible to redetermine the depth values of the pixels in the image region corresponding to the color image with lower accuracy to generate a new depth image, thereby increasing the depth value of the pixels in the depth image the accuracy.
  • the apparatus for generating a depth image may include: a processor; a memory for storing instructions executable by the processor; the processor is used for reading the executable instructions from the memory, and The instructions are executed to implement the depth image generation method provided by the exemplary embodiments of the present disclosure.
  • the electronic device may be either or both of the first device and the second device, or a stand-alone device independent of them, and the stand-alone device may communicate with the first device and the second device to receive collected data from them. input signal.
  • FIG. 6 illustrates a block diagram of an electronic device according to an embodiment of the disclosure.
  • the electronic device 6 includes one or more processors 601 and memory 602 .
  • the processor 601 may be a central processing unit (CPU) or other forms of processing units having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device to perform desired functions.
  • CPU central processing unit
  • Memory 602 may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory.
  • the volatile memory may include, for example, random access memory (RAM) and/or cache memory (cache).
  • the non-volatile memory may include, for example, a read-only memory (ROM), a hard disk, a flash memory, and the like.
  • One or more computer program instructions may be stored on the computer-readable storage medium, and the processor 601 may execute the program instructions to implement the depth image generation method and/or the above-mentioned various embodiments of the present disclosure. other desired features.
  • Various contents such as input signal, signal component, noise component, etc. may also be stored in the computer-readable storage medium.
  • the electronic device may further include: an input device 603 and an output device 604, and these components are interconnected through a bus system and/or other forms of connection mechanisms (not shown).
  • the input device 603 may be the above-mentioned microphone or microphone array for capturing the input signal of the sound source.
  • the input device 603 may be a communication network connector for receiving collected input signals from the first device and the second device.
  • the input device 603 may also include, for example, a keyboard, a mouse, and the like.
  • the output device 604 can output various information to the outside, including determined distance information, direction information, and the like.
  • the output device 604 may include, for example, a display, a speaker, a printer, a communication network and remote output devices connected thereto, and the like.
  • the electronic device may also include any other suitable components according to specific applications.
  • embodiments of the present disclosure may also be computer program products, which include computer program instructions that, when executed by a processor, cause the processor to perform the above-mentioned "exemplary method" of this specification.
  • the steps in the method for generating a depth image according to various embodiments of the present disclosure are described in the section.
  • the computer program product can be written in any combination of one or more programming languages to execute the program codes for performing the operations of the embodiments of the present disclosure, and the programming languages include object-oriented programming languages, such as Java, C++, etc. , also includes conventional procedural programming languages, such as the "C" language or similar programming languages.
  • the program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server to execute.
  • embodiments of the present disclosure may also be a computer-readable storage medium, on which computer program instructions are stored, and the computer program instructions, when executed by a processor, cause the processor to perform the above-mentioned "Exemplary Method" section of this specification.
  • the computer readable storage medium may employ any combination of one or more readable media.
  • the readable medium may be a readable signal medium or a readable storage medium.
  • the readable storage medium may include, but not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any combination thereof. More specific examples (non-exhaustive list) of readable storage media include: electrical connection with one or more conductors, portable disk, hard disk, random access memory (RAM), read only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
  • embodiments of the present disclosure may also be computer programs, and the computer programs may include computer readable codes.
  • the processor in the device executes the steps in the method for generating a depth image according to various embodiments of the present disclosure described in the above-mentioned "Exemplary Method" section of this specification.
  • the methods and apparatus of the present disclosure may be implemented in many ways.
  • the methods and apparatuses of the present disclosure may be implemented by software, hardware, firmware or any combination of software, hardware, and firmware.
  • the above sequence of steps for the method is for illustration only, and the steps of the method of the present disclosure are not limited to the sequence specifically described above unless specifically stated otherwise.
  • the present disclosure can also be implemented as programs recorded in recording media, the programs including machine-readable instructions for realizing the method according to the present disclosure.
  • the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

本公开实施例公开了一种深度图像生成方法、装置、电子设备和存储介质。该深度图像生成方法包括:获取彩色图像和深度图像,其中,彩色图像指示的场景与深度图像指示的场景相匹配;响应于确定深度图像中存在目标区域,基于彩色图像重新确定目标区域中的像素点的深度值,得到新的深度值,其中,目标区域中的像素点的深度值的准确度小于或等于预设准确度阈值。本公开实施例基于彩色图像重新确定对应该彩色图像的深度图像中准确度较低的图像区域中的像素点的深度值,从而生成新的深度图像,由此,提高了深度图像中的像素点的深度值的准确度。

Description

深度图像生成方法和装置
相关申请的交叉引用
本公开要求2021年07月27日提交的中国专利申请202110849655.6的权益,该申请的内容通过引用被合并于本文。
技术领域
本公开涉及图像处理技术,尤其是一种深度图像生成方法、装置、电子设备和存储介质。
背景技术
目前,设有激光雷达系统成为越来越多电子设备的发展趋势。例如,现阶段已有部分用户移动终端(例如手机)具备该功能。
然而,并非所有具备激光雷达的电子设备获得的深度图像中的像素点的深度值均具有较高的准确度。实践中,受限于电子设备功耗等原因,部分电子设备的激光雷达所获取的原始深度图像非常稀疏。一些电子设备可以采用软件的方法对所获取的原始稀疏深度图像进行补全,从而获得较稠密的深度图像。
但是,采用上述方式获得的深度图像的像素点的深度值的准确度可能较低。
发明内容
本公开实施例提供一种深度图像生成方法、装置、电子设备和存储介质,以提高深度图像中的像素点的深度值的准确度。
根据本公开实施例的一个方面,提供的一种深度图像生成方法,包括:
获取彩色图像和深度图像,其中,所述彩色图像指示的场景与所述深度图像指示的场景相匹配;
响应于确定所述深度图像中存在目标区域,基于所述彩色图像重新确定所述目标区域中的像素点的深度值,生成新的深度图像,其中,所述目标区域中的像素点的深度值的准确度小于或等于预设准确度阈值。
可选地,在本公开任一实施例的方法中,所述基于所述彩色图像重新确定所述目标区域中的像素点的深度值,包括:
基于所述彩色图像和所述目标区域在所述深度图像中的位置,重新确定所述目标区域中的像素点的深度值。
可选地,在本公开任一实施例的方法中,所述基于所述彩色图像和所述目标区域在所述深度图像中的位置,重新确定所述目标区域中的像素点的深度值,生成新的深度图像,包括:
对所述深度图像的所述目标区域的位置进行标记,得到标记后的深度图像;
将所述彩色图像和所述标记后的深度图像输入至预先训练的深度模型,经所述深 度模型生成新的深度图像,其中,所述深度模型用于基于彩色图像和标记后的深度图像生成新的深度图像。
可选地,在本公开任一实施例的方法中,所述深度模型包括编码模块、解码模块和降采样模块;以及
所述将所述彩色图像和所述标记后的深度图像输入至预先训练的深度模型,经所述深度模型生成新的深度图像,包括:
通过所述编码模块,基于所述彩色图像,生成所述彩色图像的特征数据;
通过所述降采样模块,基于所述标记后的深度图像,生成所述标记后的深度图像的特征数据;
通过所述解码模块,基于所述编码模块生成的特征数据和所述降采样模块生成的特征数据,生成新的深度图像。
可选地,在本公开任一实施例的方法中,所述编码模块用于对所述彩色图像执行降采样操作;所述降采样模块包含的降采样层的层数比所述编码模块包含的降采样层的层数少一。
可选地,在本公开任一实施例的方法中,所述深度模型通过以下方式训练得到:
获取训练样本集,其中,所述训练样本集中的训练样本包括输入数据和期望输出数据,输入数据包括彩色样本图像和对应彩色样本图像的标记后的深度样本图像,期望输出数据包括期望输出的深度样本图像;
采用机器学习算法,将所述训练样本集中的训练样本包括的输入数据作为输入,将对应输入数据的期望输出数据作为期望输出,训练得到深度模型。
可选地,在本公开任一实施例的方法中,所述训练样本集中包括的期望输出的深度样本图像中的像素点的深度值的准确度,大于所述预设准确度阈值。
可选地,在本公开任一实施例的方法中,所述深度模型的损失函数基于以下至少一项确定:
实际输出数据和期望输出数据的相对误差的均值、实际输出数据和期望输出数据的梯度的相对误差的均值、实际输出数据和期望输出数据之间的结构相似性。
可选地,在本公开任一实施例的方法中,所述训练样本集中位置信息指示的位置为:深度样本图像中随机确定的已获得深度值的矩形区域。
可选地,在本公开任一实施例的方法中,所述训练样本集中的彩色样本图像和对应彩色样本图像的深度样本图像由用户移动终端拍摄获得;或者
所述训练样本集中的彩色样本图像和对应彩色样本图像的深度样本图像由用户移动终端基于拍摄的图像生成。
可选地,在本公开任一实施例的方法中,所述彩色图像为彩色全景图像,所述深度图像为深度全景图像,所述训练样本集中的彩色样本图像为彩色全景图像,所述训练样本集中的深度样本图像为深度全景图像。
可选地,在本公开任一实施例的方法中,所述训练样本集中的彩色样本图像与对应该彩色样本图像的深度样本图像指示相同场景,所获取的彩色图像和所获取的深度图 像指示相同场景。
根据本公开实施例的第二个方面,提供的一种深度图像生成装置,包括:
获取单元,被配置成获取彩色图像和深度图像,其中,所述彩色图像指示的场景与所述深度图像指示的场景相匹配;
生成单元,被配置成响应于确定所述深度图像中存在目标区域,基于所述彩色图像重新确定所述目标区域中的像素点的深度值,生成新的深度图像,其中,所述目标区域中的像素点的深度值的准确度小于或等于预设准确度阈值。
可选地,在本公开任一实施例的装置中,所述生成单元包括:
确定子单元,被配置成基于所述彩色图像和所述目标区域在所述深度图像中的位置,重新确定所述目标区域中的像素点的深度值。
可选地,在本公开任一实施例的装置中,所述确定子单元包括:
标记模块,被配置成对所述深度图像的所述目标区域的位置进行标记,得到标记后的深度图像;
输入模块,被配置成将所述彩色图像和所述标记后的深度图像输入至预先训练的深度模型,经所述深度模型生成新的深度图像,其中,所述深度模型用于基于彩色图像和标记后的深度图像生成新的深度图像。
可选地,在本公开任一实施例的装置中,所述深度模型包括编码模块、解码模块和降采样模块;以及
所述输入模块,具体被配置成:
通过所述编码模块,基于所述彩色图像,生成所述彩色图像的特征数据;
通过所述降采样模块,基于所述标记后的深度图像,生成所述标记后的深度图像的特征数据;
通过所述解码模块,基于所述编码模块生成的特征数据和所述降采样模块生成的特征数据,生成新的深度图像。
可选地,在本公开任一实施例的装置中,所述编码模块用于对所述彩色图像执行降采样操作;所述降采样模块包含的降采样层的层数比所述编码模块包含的降采样层的层数少一。
可选地,在本公开任一实施例的装置中,所述深度模型通过训练单元训练得到,所述训练单元包括:
获取子单元,被配置成获取训练样本集,其中,所述训练样本集中的训练样本包括输入数据和期望输出数据,输入数据包括彩色样本图像和对应彩色样本图像的标记后的深度样本图像,期望输出数据包括期望输出的深度样本图像;
训练子单元,被配置成采用机器学习算法,将所述训练样本集中的训练样本包括的输入数据作为输入,将对应输入数据的期望输出数据作为期望输出,训练得到深度模型。
可选地,在本公开任一实施例的装置中,所述训练样本集中包括的期望输出的深度样本图像中的像素点的深度值的准确度,大于所述预设准确度阈值。
可选地,在本公开任一实施例的装置中,所述深度模型的损失函数基于以下至少一项确定:
实际输出数据和期望输出数据的相对误差的均值、实际输出数据和期望输出数据的梯度的相对误差的均值、实际输出数据和期望输出数据之间的结构相似性。
可选地,在本公开任一实施例的装置中,所述训练样本集中位置信息指示的位置为:深度样本图像中随机确定的已获得深度值的矩形区域。
可选地,在本公开任一实施例的装置中,所述训练样本集中的彩色样本图像和对应彩色样本图像的深度样本图像由用户移动终端拍摄获得;或者
所述训练样本集中的彩色样本图像和对应彩色样本图像的深度样本图像由用户移动终端基于拍摄的图像生成。
可选地,在本公开任一实施例的装置中,所述彩色图像为彩色全景图像,所述深度图像为深度全景图像,所述训练样本集中的彩色样本图像为彩色全景图像,所述训练样本集中的深度样本图像为深度全景图像。
可选地,在本公开任一实施例的装置中,所述训练样本集中的彩色样本图像与对应该彩色样本图像的深度样本图像指示相同场景,所获取的彩色图像和所获取的深度图像指示相同场景。
根据本公开实施例的第三个方面,提供的一种电子设备,包括:
存储器,用于存储计算机程序;
处理器,用于执行所述存储器中存储的计算机程序,且所述计算机程序被执行时,实现本公开上述任一实施例所述的方法。
根据本公开实施例的第四个方面,提供的一种计算机可读介质,该计算机程序被处理器执行时,实现如上述第一方面的深度图像生成方法中任一实施例的方法。
根据本公开实施例的第五个方面,提供的一种计算机程序,该计算机程序包括计算机可读代码,当该计算机可读代码在设备上运行时,使得该设备中的处理器执行用于实现如上述第一方面的深度图像生成方法中任一实施例的方法中各步骤的指令。
基于本公开上述实施例提供的深度图像生成方法、装置、电子设备和存储介质,可以首先获取彩色图像和深度图像,其中,彩色图像指示的场景与深度图像指示的场景相匹配,然后,在确定深度图像中存在目标区域的情况下,基于彩色图像重新确定目标区域中的像素点的深度值,生成新的深度图像,其中,目标区域中的像素点的深度值的准确度小于或等于预设准确度阈值。这样,可以基于彩色图像重新确定对应该彩色图像的深度图像中准确度较低的图像区域中的像素点的深度值生成新的深度图像,由此,提高了深度图像中的像素点的深度值的准确度。
下面通过附图和实施例,对本公开的技术方案做进一步的详细描述。
附图说明
构成说明书的一部分的附图描述了本公开的实施例,并且连同描述一起用于解释本公开的原理。
参照附图,根据下面的详细描述,可以更加清楚地理解本公开,其中:
图1为本公开深度图像生成方法的第一个实施例的流程图。
图2为本公开深度图像生成方法的第二个实施例的流程图。
图3A为本公开深度图像生成方法中的深度模型的第一个结构示意图。
图3B为本公开深度图像生成方法中的深度模型的第二个结构示意图。
图4A-图4B为本公开深度图像生成方法的一个实施例的应用场景示意图。
图5为本公开深度图像生成装置的一个实施例的结构示意图。
图6为本公开一示例性实施例提供的电子设备的结构图。
具体实施方式
现在将参照附图来详细描述本公开的各种示例性实施例。应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本公开的范围。
本领域技术人员可以理解,本公开实施例中的“第一”、“第二”等术语仅用于区别不同步骤、设备或模块等,既不代表任何特定技术含义,也不表示它们之间的必然逻辑顺序。
还应理解,在本公开实施例中,“多个”可以指两个或两个以上,“至少一个”可以指一个、两个或两个以上。
还应理解,对于本公开实施例中提及的任一部件、数据或结构,在没有明确限定或者在前后文给出相反启示的情况下,一般可以理解为一个或多个。
另外,本公开中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本公开中字符“/”,一般表示前后关联对象是一种“或”的关系。
还应理解,本公开对各个实施例的描述着重强调各个实施例之间的不同之处,其相同或相似之处可以相互参考,为了简洁,不再一一赘述。
同时,应当明白,为了便于描述,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本公开及其应用或使用的任何限制。
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为说明书的一部分。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。
本公开实施例可以应用于终端设备、计算机系统和服务器中的至少一种电子设备,其可与众多其它通用或专用计算系统环境或配置一起操作。适于与终端设备、计算机系统和服务器中的至少一种电子设备一起使用的众所周知的计算系统、环境和/或配置的例子包括但不限于:个人计算机系统、服务器计算机系统、瘦客户机、厚客户机、手持 或膝上设备、基于微处理器的系统、机顶盒、可编程消费电子产品、网络个人电脑、小型计算机系统﹑大型计算机系统和包括上述任何系统的分布式云计算技术环境,等等。
终端设备、计算机系统和服务器中的至少一种电子设备可以在由计算机系统执行的计算机系统可执行指令(诸如程序模块)的一般语境下描述。通常,程序模块可以包括例程、程序、目标程序、组件、逻辑、数据结构等等,它们执行特定的任务或者实现特定的抽象数据类型。计算机系统/服务器可以在分布式云计算环境中实施,分布式云计算环境中,任务是由通过通信网络链接的远程处理设备执行的。在分布式云计算环境中,程序模块可以位于包括存储设备的本地或远程计算系统存储介质上。
请参考图1,示出了根据本公开的深度图像生成方法的第一个实施例的流程100。该深度图像生成方法,包括:
101,获取彩色图像和深度图像。
在本实施例中,深度图像生成方法的执行主体(例如服务器、终端设备、具有图像处理功能的图像处理单元等)可以通过有线连接方式或者无线连接方式从其他电子设备或者本地,获取彩色图像和深度图像。
其中,彩色图像指示的场景与深度图像指示的场景相匹配。例如,彩色图像指示的场景与深度图像指示的场景可以相同。可选的,彩色图像指示的场景也可以和深度图像指示的场景包含相同的部分。
在这里,对于一张彩色图像,如果深度图像指示的场景与该彩色图像指示的场景相匹配,则可以将该深度图像作为对应该彩色图像的深度图像。
在本实施例的一些可选的实现方式中,训练样本集中的彩色样本图像和对应彩色样本图像的深度样本图像由用户移动终端拍摄获得。其中,用户移动终端可以包括但不限于:手机、平板电脑等等。
通常情况下,受限于用户移动终端功耗等原因,具有激光雷达的用户移动终端所获取的原始深度图像较为稀疏。因而,由用户移动终端拍摄获得的深度图像更需要采用本公开中的深度图像生成方法来提高深度图像中的像素点的深度值的准确度。
在本实施例的一些可选的实现方式中,训练样本集中的彩色样本图像和对应彩色样本图像的深度样本图像由用户移动终端基于拍摄的图像生成。其中,用户移动终端可以包括但不限于:手机、平板电脑等等。
可以理解,如上所述,具有激光雷达的用户移动终端所获取的原始深度图像较为稀疏。在此情况下,可以采用软件的方法对所获取的原始稀疏深度图像进行补全,从而获得较稠密的深度图像。然而,采用上述软件方法获得的深度图像中的像素点的深度值的准确度较低。因而,由用户移动终端基于拍摄的深度图像生成的深度图像,同样也需要采用本公开中的深度图像生成方法来提高深度图像中的像素点的深度值的准确度。
可选的,彩色图像和对应该彩色图像的深度图像也可以由任何具有彩色图像和深度图像的一台或多台设备同时或分别拍摄获得。
在本实施例的一些可选的实现方式中,101中的彩色图像为彩色全景图像,101中的深度图像为深度全景图像,训练样本集中的彩色样本图像为彩色全景图像,训练样本 集中的深度样本图像为深度全景图像。
可以理解,在上述彩色图像为彩色全景图像,上述深度图像为深度全景图像的情况下,上述可选的实现方式可以基于信息更为丰富的彩色全景图像生成准确度更高的深度全景图像中的像素点的深度值。
在本实施例的一些可选的实现方式中,训练样本集中的彩色样本图像与对应该彩色样本图像的深度样本图像指示相同场景,所获取的彩色图像和所获取的深度图像指示相同场景。
可以理解,在彩色样本图像与对应该彩色样本图像的深度样本图像指示相同场景的情况下,对于深度样本图像中准确度较低(即小于或等于预设准确度阈值)的深度值所在的图像区域,均可以从彩色样本图像区域中确定出相应的彩色图像区域,由此,通过后续步骤可以进一步提高所生成的新的深度图像的准确度。
102,响应于确定深度图像中存在目标区域,基于彩色图像重新确定目标区域中的像素点的深度值,生成新的深度图像。
在本实施例中,在确定深度图像中存在准确度小于或等于预设准确度阈值的深度值的情况下,可以将深度图像中准确度小于或等于预设准确度阈值的深度值对应的图像区域作为目标区域,进而,上述执行主体可以基于彩色图像重新确定目标区域中的像素点的深度值,生成新的深度图像。
这里,上述执行主体可以采用多种方式确定深度图像中各个像素点的深度值的准确度。例如,目前部分手机等设备可以在拍摄或通过软件生成深度图像的同时,给出置信度(confidence),置信度的取值是0、1或2,数值越大,对应的点的深度值越准确。此外,还可以采用机器学习算法,确定深度图像中各个像素点的深度值的准确度。
作为示例,上述执行主体可以采用如下方式确定重新确定目标区域中的像素点的深度值:
首先,从彩色图像中确定与目标区域相匹配的彩色图像区域。其中,与目标区域相匹配的彩色图像区域指示的场景可以与目标区域指示的场景相同。
然后,将所确定的彩色图像区域输入至预先训练内的深度值确定模型,得到目标区域中的像素点的深度值。其中,上述深度值确定模型可以用于确定与所输入的彩色图像区域相匹配的深度图像区域中的像素点的深度值。
示例性的,上述深度值确定模型可以是采用机器学习算法基于训练样本集合训练得到的卷积神经网络。其中,上述训练样本集合中的训练样本可以包括彩色样本图像和与彩色样本图像相匹配的深度样本图像。
可选的,所生成的各个新的深度值可以与目标区域中的各个深度值具有一一对应关系,因而,上述执行主体可以将目标区域的每个深度值,更新为与该深度值相对应的新的深度值,从而生成新的深度图像。
本公开的上述实施例提供的深度图像生成方法,可以首先获取彩色图像和深度图像,其中,彩色图像指示的场景与深度图像指示的场景相匹配,然后,在确定深度图像中存在目标区域的情况下,基于彩色图像重新确定目标区域中的像素点的深度值,生成 新的深度图像,其中,目标区域中的像素点的深度值的准确度小于或等于预设准确度阈值。这样,可以基于彩色图像重新确定对应该彩色图像的深度图像中准确度较低的图像区域中的像素点的深度值,从而生成新的深度图像,由此,提高了深度图像中的像素点的深度值的准确度。
进一步参考图2,图2是本公开的深度图像生成方法的第二个实施例的流程图。该深度图像生成流程200,包括:
201,获取彩色图像和深度图像。
在本实施例中,深度图像生成方法的执行主体(例如服务器、终端设备、具有图像处理功能的图像处理单元等)可以通过有线连接方式或者无线连接方式从其他电子设备或者本地,获取彩色图像和深度图像。其中,彩色图像指示的场景与深度图像指示的场景相匹配。
在本实施例中,步骤201与图1对应实施例中的步骤101基本一致,这里不再赘述。
202,响应于确定深度图像中存在目标区域,基于彩色图像和目标区域在深度图像中的位置,重新确定目标区域中的像素点的深度值,得到新的深度值。
在本实施例中,在确定深度图像中存在准确度小于或等于预设准确度阈值的深度值的情况下,将深度图像中准确度小于或等于预设准确度阈值的深度值对应的图像区域作为目标区域,基于彩色图像和目标区域在深度图像中的位置,重新确定该目标区域中的像素点的深度值,从而得到该目标区域的新的深度值。
在本实施例的一些可选的实现方式中,上述执行主体可以采用如下方式执行202:
第一步,对深度图像的目标区域的位置进行标记,得到标记后的深度图像。
第二步,将彩色图像和标记后的深度图像输入至预先训练的深度模型,经该深度模型生成新的深度图像。其中,深度模型用于基于彩色图像和标记后的深度图像确定新的深度图像。
在上述可选的实现方式的一些应用场景中,深度模型包括编码模块、解码模块和降采样模块。基于此,可以采用以下方式生成新的深度图像:通过编码模块,基于彩色图像,生成彩色图像的特征数据;通过降采样模块,基于标记后的深度图像,生成标记后的深度图像的特征数据;通过解码模块,基于编码模块生成的特征数据和降采样模块生成的特征数据,生成新的深度图像。
其中,输出的深度图像可以为期望输出的深度图像,或者实际输出的深度图像。在训练深度模型的过程中,期望输出的深度图像作为深度模型的输入数据(即期望输出数据)进行输入;在使用深度模型生成新的深度图像的过程中,实际输出的深度模型作为深度模型的输出数据(即实际输出数据)进行输出。此外,在使用深度模型的过程中,深度模型的实际输出数据即可作为所生成的新的深度图像。
示例性的,编码模块可以用于执行一次或多次降采样操作。例如,编码模块可以包括以下至少一项:残差网络(ResNet)50架构、残差网络(ResNet)101架构等等,每次 降采样操作可以进行一次长宽除以2(也可以是其他数值)的降采样。解码模块可以用于执行一次或多次上采样操作。例如,每次降采样操作可以进行一次长宽乘以2(也可以是其他数值)的上采样。降采样模块可以用于执行一次或多次降采样操作。例如,每次降采样操作可以进行一次长宽除以2(也可以是其他数值)的降采样。
作为示例,请参考图3A。图3A为本公开深度图像生成方法中的深度模型的第一个结构示意图。
在图3A中,深度模型包括解码模块303、编码模块301和降采样模块302。其中,解码模块303的输入数据为编码模块301的输出数据和降采样模块302的输出数据。编码模块301的输入数据包括彩色图像,降采样模块302的输入数据包括标记后的深度图像,解码模块303的输出数据包括目标区域中的像素点的深度值。
在上述应用场景的一些使用情况下,编码模块可以用于对彩色图像执行降采样操作;降采样模块包含的降采样层的层数比编码模块包含的降采样层的层数少一,也即在基于深度模型完成单个目标区域中的像素点的深度值的确定的情况下,降采样模块执行的降采样操作的次数比编码模块执行的降采样操作的次数少一。其中,各个降采样层所执行的降采样操作可以是相同的,例如,每个降采样层可以将图像降维为原图像的尺度的1/4,也即将长宽除以2的降采样操作。
可以理解,上述使用情况中,可以提高深度模型所生成的深度图像的准确度。
在上述可选的实现方式的一些应用场景中,深度模型通过以下方式训练得到:
首先,获取训练样本集。其中,训练样本集中的训练样本包括输入数据和期望输出数据,输入数据包括彩色样本图像和对应彩色样本图像的标记后的深度样本图像,期望输出数据包括期望输出的深度样本图像。
这里,训练样本集中训练样本的数量、彩色样本图像的尺寸、深度样本图像的尺寸等可以根据需求进行设定,例如,可以准备50万张彩色全景图像和对应彩色全景图像的深度全景图像,彩色全景图像和深度全景图像可以均是长、宽分别为640像素、320像素的图像。
然后,采用机器学习算法,将训练样本集中的训练样本包括的输入数据作为输入,将对应输入数据的期望输出数据作为期望输出,训练得到深度模型。
可以理解,上述应用场景中,可以采用机器学习算法训练得到深度模型,从而通过深度模型生成新的深度图像,因而进一步提高了深度图像中的像素点的深度值的准确度。
在上述应用场景的一些使用情况下,训练样本集中包括的所有期望输出的深度样本图像中的各个像素点的深度值的准确度均大于上述预设准确度阈值。其中,训练样本集中包括的期望输出的深度样本图像可以采用深度相机、激光雷达等方式获得。
可以理解,上述使用情况中,采用具有准确度较高的深度值的深度图像来训练深度模型,可以提高深度模型输出的目标区域中的像素点的深度值的准确度,从而进一步提高了最终生成的深度图像的准确度。
在上述应用场景的另一些使用情况下,深度模型的损失函数基于以下至少一项确 定:
实际输出数据(即深度模型实际输出的深度样本图像)和期望输出数据(即训练样本本集中期望输出的深度样本图像)的相对误差的均值、实际输出数据和期望输出数据的梯度的相对误差的均值、实际输出数据和期望输出数据之间的结构相似性(SSIM,Structural SIMilarity)。其中,结构相似性,是一种衡量两幅图像相似度的指标。作为结构相似性理论的实现,结构相似度指数从图像组成的角度将结构信息定义为独立于亮度、对比度的,反映场景中物体结构的属性,并将失真建模为亮度、对比度和结构三个不同因素的组合。用均值作为亮度的估计,标准差作为对比度的估计,协方差作为结构相似程度的度量。
作为示例,在上述深度模型的损失函数基于以上三者确定的情况下,该损失函数L可以表示为:
L=k1*l_depth+k2*l_edge+k3*l_ssim
其中,k1、k2、k3可以是预先确定的三个常量,取值的依据是使得第二项k2*l_edge、第三项k3*l_ssim两项在L中的总体比重超过第一项k1*l_depth。l_depth是实际输出数据和期望输出数据的相对误差的均值;l_edge是实际输出数据和期望输出数据的梯度的相对误差的均值;l_ssim是实际输出数据和期望输出数据之间的结构相似性。
可选的,还可以基于以上三者中的一项、两项或三项,为深度模型确定出不同于上述给出的损失函数的其他损失函数,在此不再赘述。
可以理解,在上述使用情况下,可以基于以上三者中的至少一项为深度模型确定损失函数,提高深度模型输出的目标区域中的像素点的深度值的准确度,从而进一步提高了最终生成的深度图像的准确度。
在上述应用场景的又一些使用情况下,训练样本集中位置信息指示的位置为:深度样本图像中随机确定的已获得深度值的矩形区域。
可以理解,在深度图像的获取设备(例如手机)的功耗较小的等场景下,深度样本图像可能存在未能获得深度值的区域。上述使用情况中,通过在深度样本图像中随机确定的已获得深度值的矩形区域,并将其作为训练样本集中位置信息指示的位置,这样可以进一步提高深度模型输出的目标区域中的像素点的深度值的准确度,提高深度模型的训练速度。
可选的,还可以将深度样本图像中预定位置的矩形区域所在的位置作为训练样本集中位置信息指示的位置。
可选的,除用于基于彩色图像和标记后的深度图像生成深度图像之外,深度模型还可以用于执行以下操作:
操作一,对深度图像的目标区域的位置进行标记,得到标记后的深度图像;
操作二,确定深度图像中是否存在准确度小于或等于预设准确度阈值的深度值,若存在,则将深度图像中准确度小于或等于预设准确度阈值的深度值对应的图像区域作为目标区域。
可以理解,本实施例或者本实施例的可选的实现方式中所描述的其他操作(例如 上述操作一、操作二中的至少一项)可以通过深度模型来实现,在深度模型用于实现基于彩色图像和标记后的深度图像生成新的深度图像之外的其他操作的情况下,本实施例可以不再重复执行该操作。
作为另一个示例,请参考图3B。图3B为本公开深度图像生成方法中的深度模型的第二个结构示意图。
在图3B中,深度模型包括解码模块313、编码模块311和降采样模块312。其中,解码模块313的输入数据为编码模块311的输出数据和降采样模块312的输出数据。编码模块311的输入数据包括彩色图像,降采样模块312的输入数据包括标记后的深度图像,解码模块313的输出数据包括目标区域中的像素点的深度值。
其中,编码模块311可以使用ResNet50。ResNet50中Enc1-5各自进行一次长宽除以2的降采样。也即,设Enc1输入的彩色图像的尺度是1,则Enc1输出的特征数据的尺度是1/2、Enc2输出的特征数据的尺度是1/4……Enc5输出的特征数据的尺度是1/32。Enc5输出的特征数据作为解码模块313的输入。这里,Enc5和Enc4输出的特征数据可以作为解码模块313包括的Dec4的输入,Enc3输出的特征数据可以作为解码模块313包括的Dec3的输入,Enc2输出的特征数据可以作为解码模块313包括的Dec2的输入,Enc1输出的特征数据可以作为解码模块403包括的Dec1的输入。
解码模块313可以进行Dec4-1四次长宽乘以2的上采样。其中,设Dec4的输入尺度是1/32、输出尺度是1/16,则Dec3的输出尺度是1/8……Dec1的输出尺度是1/2。
标记后的深度图像分别通过长宽除以2的降采样模块Dwn1-4,分别得到尺度为1/2-1/16的深度图像。这里,Dwn1-4输出的特征数据可以分别作为解码模块313包括的Dec1-4的输入。
此外,在Enc5输出的特征数据输入至解码模块313包括的Dec4之后,可以首先对其进行一次长宽乘以2的上采样,然后将上采样的结果与Enc4输出的特征数据以及Dwn4输出的特征数据进行拼接,随后再进行长宽乘以2的上采样,在Enc3、Dwn3输出的特征数据输入Dec3之后,可以先对二者进行拼接,然后再进行长宽乘以2的上采样,在Enc2、Dwn2输出的特征数据输入Dec2之后,可以先对二者进行拼接,然后再进行长宽乘以2的上采样,在Enc1、Dwn1输出的特征数据输入Dec1之后,可以先对二者进行拼接,然后再进行长宽乘以2的上采样。
可选的,Dec1-4除进行长宽乘以2的上采样操作之外,还可以在执行上采样操作后,进行卷积操作。例如,在Dec1-4的结构中可以分别包括两个卷积层。
最终,Dec1的输出即为新的深度图像。
可选的,上述执行主体还可以采用如下方式重新确定目标区域中的像素点的深度值:
首先,基于目标区域在深度图像中的位置,确定彩色图像中与该目标区域相对应的彩色图像区域。
然后,基于该彩色图像区域确定目标区域中的像素点的深度值。
作为本公开实施例的一个示例,可以通过如下方式执行上述步骤:
首先,通过自带激光雷达的手机拍摄得到彩色图像和深度图像。由于功耗原因,手机配置的激光雷达获取的深度图像非常稀疏,因此通过软件的方法补全所拍摄得深度图像,获得较稠密的深度图像,并同时给出置信度,置信度的取值是0、1、2,其中,置信度的数值越大,对应的像素点的深度值越准确。通过连续拍摄多张照片,拼接获得彩色全景图像、深度全景图像和置信度全景图像。
然后,将深度全景图像中置信度取值不是2的像素点进行标记,例如可以将上述深度全景图像中置信度取值不是2的像素点的深度值设置为0。
这里,我们期望的输出是:未标记的图像区域的像素点的深度值保持不变;而标记的图像区域的像素点的深度值则需要进行重新确定,同时在边界能有一个平滑过渡。
作为示例,可以采用上述深度模型来获得新的深度全景图像,在此不再赘述。
需要说明的是,深度全景图像中本身可能是存在置信度的取值为0的像素点的,这表示没有测量值,仪器未能给出该点的深度的测量值。在深度全景图像中置信度的取值为0的像素点,并不要求重新确定其深度值;需要重新确定深度值的仅包括设置为0的图像区域的像素点的深度值。
实验表明,采用上述方式获得的新的深度全景图像中测量得到的较准确的深度值获得了保持,拼缝处的断口被平滑连接,此外,还可以实现对深度(全景)图像的除噪,重新确定后的新的深度值的准确度高于重新确定前的深度值的准确度。
作为示例,请参考图4A-图4B。图4A-图4B为本公开深度图像生成方法的一个实施例的应用场景示意图。
如图4A所示,我们把图4A中,置信度不为2的部分(例如图4B中的黑色区域)抠出来(黑色区域),然后用补全这个区域。而我们比普通的图像修复问题有一个优势,我们还有彩色全景图作为辅助输入,帮助上述补全过程。我们期望的输出是:非黑色区域保持图4A中的值;而黑色区域的值,我们需要进行推测,同时在边界能有一个平滑过渡。可以通过图3A或图3B中所示的深度模型来实现该过程,在此不再赘述。
这里有一点需要说明,图4A中本身可能是存在黑色区域的,这表示没有测量值,仪器未能给出该点的深度的测量值。在图4A中存在的黑色区域,并不要求补全;需要补全的是图4B中的黑色区域但在图4A中不是黑色区域的部分。
下面返回图2。
需要说明的是,除上面所记载的内容外,本申请实施例还可以包括与图1对应的实施例相同或类似的特征、效果,在此不再赘述。
从图2中可以看出,本实施例中的深度图像生成方法的流程200可以基于彩色图像和目标区域在深度图像中的位置,来确定目标区域中的像素点的深度值,从而可以参考目标区域在深度图像中的位置确定出更为准确的目标区域中的像素点的深度值。
进一步参考图5,作为对上述各图所示方法的实现,本公开提供了一种深度图像生成装置的一个实施例,该装置实施例与图1所示的方法实施例相对应,除下面所记载的特征外,该装置实施例还可以包括与图1所示的方法实施例相同或相应的特征,以及产 生与图1所示的方法实施例相同或相应的效果。该装置具体可以应用于各种电子设备中。
如图5所示,本实施例的深度图像生成装置500包括:获取单元501,被配置成获取彩色图像和深度图像,其中,彩色图像指示的场景与深度图像指示的场景相匹配;生成单元502,被配置成响应于确定深度图像中存在目标区域,基于彩色图像重新确定目标区域中的像素点的深度值,生成新的深度图像,其中,目标区域中的像素点的深度值的准确度小于或等于预设准确度阈值。
在本实施例中,深度图像生成装置500的获取单元501可以获取彩色图像和深度图像。其中,彩色图像指示的场景与深度图像指示的场景相匹配。
在本实施例中,在确定深度图像中存在准确度小于或等于预设准确度阈值的深度值的情况下,生成单元502可以将上述获取单元501获取到的深度图像中准确度小于或等于预设准确度阈值的深度值对应的图像区域作为目标区域,基于彩色图像重新确定目标区域中的像素点的深度值,生成新的深度图像。
在本实施例的一些可选的实现方式中,生成单元502包括:
确定子单元(图中未示出),被配置成基于彩色图像和目标区域在深度图像中的位置,重新确定目标区域中的像素点的深度值。
在本实施例的一些可选的实现方式中,确定子单元包括:
标记模块(图中未示出),被配置成对深度图像的目标区域的位置进行标记,得到标记后的深度图像;
输入模块(图中未示出),被配置成将彩色图像和标记后的深度图像输入至预先训练的深度模型,经该深度模型生成新的深度图像,其中,深度模型用于基于彩色图像和标记后的深度图像确定新的深度图像。
在本实施例的一些可选的实现方式中,深度模型包括编码模块、解码模块和降采样模块;以及
上述输出模块,具体被配置成:
通过编码模块,基于彩色图像,生成彩色图像的特征数据;
通过降采样模块,基于标记后的深度图像,生成标记后的深度图像的特征数据;
通过解码模块,基于编码模块生成的特征数据和降采样模块生成的特征数据,生成新的深度图像。
在本实施例的一些可选的实现方式中,编码模块用于对彩色图像执行降采样操作;降采样模块包含的降采样层的层数比编码模块包含的降采样层的层数少一。
在本实施例的一些可选的实现方式中,深度模型通过训练单元(图中未示出)训练得到,训练单元包括:
获取子单元(图中未示出),被配置成获取训练样本集,其中,训练样本集中的训练样本包括输入数据和期望输出数据,输入数据包括彩色样本图像和对应彩色样本图像的标记后的深度样本图像,期望输出数据包括期望输出的深度样本图像;
训练子单元(图中未示出),被配置成采用机器学习算法,将训练样本集中的训练样本包括的输入数据作为输入,将对应输入数据的期望输出数据作为期望输出,训练 得到深度模型。
在本实施例的一些可选的实现方式中,训练样本集中包括的所有期望输出的深度样本图像中的各个像素点的深度值的准确度均大于预设准确度阈值。
在本实施例的一些可选的实现方式中,深度模型的损失函数基于以下至少一项确定:
实际输出数据和期望输出数据的相对误差的均值、实际输出数据和期望输出数据的梯度的相对误差的均值、实际输出数据和期望输出数据之间的结构相似性。
在本实施例的一些可选的实现方式中,训练样本集中位置信息指示的位置为:深度样本图像中随机确定的已获得深度值的矩形区域。
在本实施例的一些可选的实现方式中,训练样本集中的彩色样本图像和对应彩色样本图像的深度样本图像由用户移动终端拍摄获得;或者
训练样本集中的彩色样本图像和对应彩色样本图像的深度样本图像由用户移动终端基于拍摄的图像生成。
在本实施例的一些可选的实现方式中,彩色图像为彩色全景图像,深度图像为深度全景图像,训练样本集中的彩色样本图像为彩色全景图像,训练样本集中的深度样本图像为深度全景图像。
在本实施例的一些可选的实现方式中,训练样本集中的彩色样本图像与对应该彩色样本图像的深度样本图像指示相同场景,所获取的彩色图像和所获取的深度图像指示相同场景。
本公开的上述实施例提供的深度图像生成装置中,获取单元501可以获取彩色图像和深度图像,其中,彩色图像指示的场景与深度图像指示的场景相匹配,然后,生成单元502可以在确定深度图像中存在目标区域的情况下,基于彩色图像重新确定目标区域中的像素点的深度值,生成新的深度图像,其中,目标区域中的像素点的深度值的准确度小于或等于预设准确度阈值。这样,可以基于彩色图像重新确定对应该彩色图像的深度图像中准确度较低的图像区域中的像素点的深度值生成新的深度图像,由此,提高了深度图像中的像素点的深度值的准确度。
示例性实施例提供的深度图像生成装置可以包括:处理器;用于存储所述处理器可执行指令的存储器;所述处理器,用于从所述存储器中读取所述可执行指令,并执行所述指令以实现本公开示例性实施例提供的深度图像生成方法。
下面,参考图6来描述根据本公开实施例的电子设备。该电子设备可以是第一设备和第二设备中的任一个或两者、或与它们独立的单机设备,该单机设备可以与第一设备和第二设备进行通信,以从它们接收所采集到的输入信号。
图6图示了根据本公开实施例的电子设备的框图。
如图6所示,电子设备6包括一个或多个处理器601和存储器602。
处理器601可以是中央处理单元(CPU)或者具有数据处理能力和/或指令执行能力的其他形式的处理单元,并且可以控制电子设备中的其他组件以执行期望的功能。
存储器602可以包括一个或多个计算机程序产品,所述计算机程序产品可以包括各种形式的计算机可读存储介质,例如易失性存储器和/或非易失性存储器。所述易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。所述非易失性存储器例如可以包括只读存储器(ROM)、硬盘、闪存等。在所述计算机可读存储介质上可以存储一个或多个计算机程序指令,处理器601可以运行所述程序指令,以实现上文所述的本公开的各个实施例的深度图像生成方法以及/或者其他期望的功能。在所述计算机可读存储介质中还可以存储诸如输入信号、信号分量、噪声分量等各种内容。
在一个示例中,电子设备还可以包括:输入装置603和输出装置604,这些组件通过总线系统和/或其他形式的连接机构(未示出)互连。
例如,在该电子设备是第一设备或第二设备时,该输入装置603可以是上述的麦克风或麦克风阵列,用于捕捉声源的输入信号。在该电子设备是单机设备时,该输入装置603可以是通信网络连接器,用于从第一设备和第二设备接收所采集的输入信号。
此外,该输入装置603还可以包括例如键盘、鼠标等等。该输出装置604可以向外部输出各种信息,包括确定出的距离信息、方向信息等。该输出装置604可以包括例如显示器、扬声器、打印机、以及通信网络及其所连接的远程输出设备等等。
当然,为了简化,图6中仅示出了该电子设备中与本公开有关的组件中的一些,省略了诸如总线、输入/输出接口等等的组件。除此之外,根据具体应用情况,电子设备还可以包括任何其他适当的组件。
除了上述方法和设备以外,本公开的实施例还可以是计算机程序产品,其包括计算机程序指令,所述计算机程序指令在被处理器运行时使得所述处理器执行本说明书上述“示例性方法”部分中描述的根据本公开各种实施例的深度图像生成方法中的步骤。
所述计算机程序产品可以以一种或多种程序设计语言的任意组合来编写用于执行本公开实施例操作的程序代码,所述程序设计语言包括面向对象的程序设计语言,诸如Java、C++等,还包括常规的过程式程序设计语言,诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。
此外,本公开的实施例还可以是计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令在被处理器运行时使得所述处理器执行本说明书上述“示例性方法”部分中描述的根据本公开各种实施例的深度图像生成方法中的步骤。
所述计算机可读存储介质可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以包括但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述 的任意合适的组合。
另外,本公开的实施例还可以是计算机程序,该计算机程序可以包括计算机可读代码。当上述计算机可读代码在设备上运行时,该设备中的处理器执行本说明书上述“示例性方法”部分中描述的根据本公开各种实施例的深度图像生成方法中的步骤。
以上结合具体实施例描述了本公开的基本原理,但是,需要指出的是,在本公开中提及的优点、优势、效果等仅是示例而非限制,不能认为这些优点、优势、效果等是本公开的各个实施例必须具备的。另外,上述公开的具体细节仅是为了示例的作用和便于理解的作用,而非限制,上述细节并不限制本公开为必须采用上述具体的细节来实现。
本说明书中各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似的部分相互参见即可。对于系统实施例而言,由于其与方法实施例基本对应,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
本说明书中各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似的部分相互参见即可。对于系统实施例而言,由于其与方法实施例基本对应,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
可能以许多方式来实现本公开的方法和装置。例如,可通过软件、硬件、固件或者软件、硬件、固件的任何组合来实现本公开的方法和装置。用于所述方法的步骤的上述顺序仅是为了进行说明,本公开的方法的步骤不限于以上具体描述的顺序,除非以其它方式特别说明。此外,在一些实施例中,还可将本公开实施为记录在记录介质中的程序,这些程序包括用于实现根据本公开的方法的机器可读指令。因而,本公开还覆盖存储用于执行根据本公开的方法的程序的记录介质。
本公开的描述是为了示例和描述起见而给出的,而并不是无遗漏的或者将本公开限于所公开的形式。很多修改和变化对于本领域的普通技术人员而言是显然的。选择和描述实施例是为了更好说明本公开的原理和实际应用,并且使本领域的普通技术人员能够理解本公开从而设计适于特定用途的带有各种修改的各种实施例。

Claims (21)

  1. 一种深度图像生成方法,其特征在于,所述方法包括:
    获取彩色图像和深度图像,其中,所述彩色图像指示的场景与所述深度图像指示的场景相匹配;
    响应于确定所述深度图像中存在目标区域,基于所述彩色图像重新确定所述目标区域中的像素点的深度值,生成新的深度图像,其中,所述目标区域中的像素点的深度值的准确度小于或等于预设准确度阈值。
  2. 根据权利要求1所述的方法,其特征在于,所述基于所述彩色图像重新确定所述目标区域中的像素点的深度值,包括:
    基于所述彩色图像和所述目标区域在所述深度图像中的位置,重新确定所述目标区域中的像素点的深度值。
  3. 根据权利要求2所述的方法,其特征在于,所述基于所述彩色图像和所述目标区域在所述深度图像中的位置,重新确定所述目标区域中的像素点的深度值,生成新的深度图像,包括:
    对所述深度图像的所述目标区域的位置进行标记,得到标记后的深度图像;
    将所述彩色图像和所述标记后的深度图像输入至预先训练的深度模型,经所述深度模型生成新的深度图像,其中,所述深度模型用于基于彩色图像和标记后的深度图像生成新的深度图像。
  4. 根据权利要求3所述的方法,其特征在于,所述深度模型包括编码模块、解码模块和降采样模块;以及
    所述将所述彩色图像和所述标记后的深度图像输入至预先训练的深度模型,经所述深度模型生成新的深度图像,包括:
    通过所述编码模块,基于所述彩色图像,生成所述彩色图像的特征数据;
    通过所述降采样模块,基于所述标记后的深度图像,生成所述标记后的深度图像的特征数据;
    通过所述解码模块,基于所述编码模块生成的特征数据和所述降采样模块生成的特征数据,生成新的深度图像。
  5. 根据权利要求4所述的方法,其特征在于,所述编码模块用于对所述彩色图像执行降采样操作;所述降采样模块包含的降采样层的层数比所述编码模块包含的降采样层的层数少一。
  6. 根据权利要求3-5之一所述的方法,其特征在于,所述深度模型通过以下方式训练得到:
    获取训练样本集,其中,所述训练样本集中的训练样本包括输入数据和期望输出数据,输入数据包括彩色样本图像和对应彩色样本图像的标记后的深度样本图像,期望输出数据包括期望输出的深度样本图像;
    采用机器学习算法,将所述训练样本集中的训练样本包括的输入数据作为输入,将 对应输入数据的期望输出数据作为期望输出,训练得到深度模型。
  7. 根据权利要求6所述的方法,其特征在于,所述训练样本集中包括的期望输出的深度样本图像中的像素点的深度值的准确度,大于所述预设准确度阈值。
  8. 根据权利要求6或7所述的方法,其特征在于,所述深度模型的损失函数基于以下至少一项确定:
    实际输出数据和期望输出数据的相对误差的均值、实际输出数据和期望输出数据的梯度的相对误差的均值、实际输出数据和期望输出数据之间的结构相似性。
  9. 根据权利要求6-8之一所述的方法,其特征在于,所述训练样本集中位置信息指示的位置为:深度样本图像中随机确定的已获得深度值的矩形区域。
  10. 根据权利要求6-9之一所述的方法,其特征在于,所述训练样本集中的彩色样本图像与对应该彩色样本图像的深度样本图像指示相同场景,所获取的彩色图像和所获取的深度图像指示相同场景。
  11. 一种深度图像生成装置,其特征在于,所述装置包括:
    获取单元,被配置成获取彩色图像和深度图像,其中,所述彩色图像指示的场景与所述深度图像指示的场景相匹配;
    生成单元,被配置成响应于确定所述深度图像中存在目标区域,基于所述彩色图像重新确定所述目标区域中的像素点的深度值,生成新的深度图像,其中,所述目标区域中的像素点的深度值的准确度小于或等于预设准确度阈值。
  12. 根据权利要求11所述的装置,其特征在于,所述生成单元包括:
    确定子单元,被配置成基于所述彩色图像和所述目标区域在所述深度图像中的位置,重新确定所述目标区域中的像素点的深度值。
  13. 根据权利要求12所述的装置,其特征在于,所述确定子单元包括:
    标记模块,被配置成对所述深度图像的所述目标区域的位置进行标记,得到标记后的深度图像;
    输入模块,被配置成将所述彩色图像和所述标记后的深度图像输入至预先训练的深度模型,经所述深度模型生成新的深度图像,其中,所述深度模型用于基于彩色图像和标记后的深度图像生成新的深度图像。
  14. 根据权利要求13所述的装置,其特征在于,所述深度模型包括编码模块、解码模块和降采样模块;以及
    所述输入模块,具体被配置成:
    通过所述编码模块,基于所述彩色图像,生成所述彩色图像的特征数据;
    通过所述降采样模块,基于所述标记后的深度图像,生成所述标记后的深度图像的特征数据;
    通过所述解码模块,基于所述编码模块生成的特征数据和所述降采样模块生成的特征数据,生成新的深度图像。
  15. 根据权利要求14所述的装置,其特征在于,所述编码模块用于对所述彩色图像执行降采样操作;所述降采样模块包含的降采样层的层数比所述编码模块包含的降采 样层的层数少一。
  16. 根据权利要求13-15之一所述的装置,其特征在于,所述深度模型通过训练单元训练得到,所述训练单元包括:
    获取子单元,被配置成获取训练样本集,其中,所述训练样本集中的训练样本包括输入数据和期望输出数据,输入数据包括彩色样本图像和对应彩色样本图像的标记后的深度样本图像,期望输出数据包括期望输出的深度样本图像;
    训练子单元,被配置成采用机器学习算法,将所述训练样本集中的训练样本包括的输入数据作为输入,将对应输入数据的期望输出数据作为期望输出,训练得到深度模型。
  17. 根据权利要求16所述的装置,其特征在于,所述训练样本集中包括的期望输出的深度样本图像中的像素点的深度值的准确度,大于所述预设准确度阈值。
  18. 根据权利要求16或17所述的装置,其特征在于,所述深度模型的损失函数基于以下至少一项确定:
    实际输出数据和期望输出数据的相对误差的均值、实际输出数据和期望输出数据的梯度的相对误差的均值、实际输出数据和期望输出数据之间的结构相似性。
  19. 根据权利要求16-18之一所述的装置,其特征在于,所述训练样本集中位置信息指示的位置为:深度样本图像中随机确定的已获得深度值的矩形区域。
  20. 根据权利要求16-19之一所述的装置,其特征在于,所述训练样本集中的彩色样本图像与对应该彩色样本图像的深度样本图像指示相同场景,所获取的彩色图像和所获取的深度图像指示相同场景。
  21. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该计算机程序被处理器执行时,实现上述权利要求1-10任一所述的方法。
PCT/CN2022/072963 2021-07-27 2022-01-20 深度图像生成方法和装置 WO2023005169A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110849655.6A CN113592935A (zh) 2021-07-27 2021-07-27 深度图像生成方法和装置
CN202110849655.6 2021-07-27

Publications (1)

Publication Number Publication Date
WO2023005169A1 true WO2023005169A1 (zh) 2023-02-02

Family

ID=78250331

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/072963 WO2023005169A1 (zh) 2021-07-27 2022-01-20 深度图像生成方法和装置

Country Status (3)

Country Link
US (1) US20230035477A1 (zh)
CN (1) CN113592935A (zh)
WO (1) WO2023005169A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113592935A (zh) * 2021-07-27 2021-11-02 贝壳技术有限公司 深度图像生成方法和装置
CN113572978A (zh) * 2021-07-30 2021-10-29 北京房江湖科技有限公司 全景视频的生成方法和装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109767467A (zh) * 2019-01-22 2019-05-17 Oppo广东移动通信有限公司 图像处理方法、装置、电子设备和计算机可读存储介质
CN112036284A (zh) * 2020-08-25 2020-12-04 腾讯科技(深圳)有限公司 图像处理方法、装置、设备及存储介质
CN112102199A (zh) * 2020-09-18 2020-12-18 贝壳技术有限公司 深度图像的空洞区域填充方法、装置和系统
CN112802081A (zh) * 2021-01-26 2021-05-14 深圳市商汤科技有限公司 一种深度检测方法、装置、电子设备及存储介质
US20210208262A1 (en) * 2018-09-16 2021-07-08 Apple Inc. Calibration of a depth sensing array using color image data
CN113592935A (zh) * 2021-07-27 2021-11-02 贝壳技术有限公司 深度图像生成方法和装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10445861B2 (en) * 2017-02-14 2019-10-15 Qualcomm Incorporated Refinement of structured light depth maps using RGB color data
US10748247B2 (en) * 2017-12-26 2020-08-18 Facebook, Inc. Computing high-resolution depth images using machine learning techniques
CN108399610A (zh) * 2018-03-20 2018-08-14 上海应用技术大学 一种融合rgb图像信息的深度图像增强方法
CN112001914B (zh) * 2020-08-31 2024-03-01 三星(中国)半导体有限公司 深度图像补全的方法和装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210208262A1 (en) * 2018-09-16 2021-07-08 Apple Inc. Calibration of a depth sensing array using color image data
CN109767467A (zh) * 2019-01-22 2019-05-17 Oppo广东移动通信有限公司 图像处理方法、装置、电子设备和计算机可读存储介质
CN112036284A (zh) * 2020-08-25 2020-12-04 腾讯科技(深圳)有限公司 图像处理方法、装置、设备及存储介质
CN112102199A (zh) * 2020-09-18 2020-12-18 贝壳技术有限公司 深度图像的空洞区域填充方法、装置和系统
CN112802081A (zh) * 2021-01-26 2021-05-14 深圳市商汤科技有限公司 一种深度检测方法、装置、电子设备及存储介质
CN113592935A (zh) * 2021-07-27 2021-11-02 贝壳技术有限公司 深度图像生成方法和装置

Also Published As

Publication number Publication date
CN113592935A (zh) 2021-11-02
US20230035477A1 (en) 2023-02-02

Similar Documents

Publication Publication Date Title
KR102663519B1 (ko) 교차 도메인 이미지 변환 기법
CN108256479B (zh) 人脸跟踪方法和装置
CN108229419B (zh) 用于聚类图像的方法和装置
WO2023005169A1 (zh) 深度图像生成方法和装置
US11132392B2 (en) Image retrieval method, image retrieval apparatus, image retrieval device and medium
US20200175700A1 (en) Joint Training Technique for Depth Map Generation
WO2022105125A1 (zh) 图像分割方法、装置、计算机设备及存储介质
WO2023005386A1 (zh) 模型训练方法和装置
US9025889B2 (en) Method, apparatus and computer program product for providing pattern detection with unknown noise levels
US12008167B2 (en) Action recognition method and device for target object, and electronic apparatus
CN111915480B (zh) 生成特征提取网络的方法、装置、设备和计算机可读介质
US11714921B2 (en) Image processing method with ash code on local feature vectors, image processing device and storage medium
CN113674146A (zh) 图像超分辨率
US11507787B2 (en) Model agnostic contrastive explanations for structured data
CN113850714A (zh) 图像风格转换模型的训练、图像风格转换方法及相关装置
CN112861940A (zh) 双目视差估计方法、模型训练方法以及相关设备
CN113468344A (zh) 实体关系抽取方法、装置、电子设备和计算机可读介质
US9928408B2 (en) Signal processing
CN113762109B (zh) 一种文字定位模型的训练方法及文字定位方法
WO2020134674A1 (zh) 掌纹识别方法、装置、计算机设备和存储介质
CN111815748B (zh) 一种动画处理方法、装置、存储介质及电子设备
CN113516697A (zh) 图像配准的方法、装置、电子设备及计算机可读存储介质
CN116257611B (zh) 问答模型的训练方法、问答处理方法、装置及存储介质
CN114970470B (zh) 文案信息处理方法、装置、电子设备和计算机可读介质
CN116342887A (zh) 用于图像分割的方法、装置、设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22847790

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22847790

Country of ref document: EP

Kind code of ref document: A1