WO2023010559A1 - 深度图像采集装置、融合方法和终端设备 - Google Patents

深度图像采集装置、融合方法和终端设备 Download PDF

Info

Publication number
WO2023010559A1
WO2023010559A1 PCT/CN2021/111293 CN2021111293W WO2023010559A1 WO 2023010559 A1 WO2023010559 A1 WO 2023010559A1 CN 2021111293 W CN2021111293 W CN 2021111293W WO 2023010559 A1 WO2023010559 A1 WO 2023010559A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
rgb
camera
depth
resolution
Prior art date
Application number
PCT/CN2021/111293
Other languages
English (en)
French (fr)
Inventor
秦侠格
Original Assignee
深圳市汇顶科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市汇顶科技股份有限公司 filed Critical 深圳市汇顶科技股份有限公司
Priority to PCT/CN2021/111293 priority Critical patent/WO2023010559A1/zh
Priority to EP21916643.6A priority patent/EP4156085A4/en
Priority to US17/860,579 priority patent/US11928802B2/en
Publication of WO2023010559A1 publication Critical patent/WO2023010559A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/521Depth or shape recovery from laser ranging, e.g. using interferometry; from the projection of structured light
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S17/00Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
    • G01S17/02Systems using the reflection of electromagnetic waves other than radio waves
    • G01S17/06Systems determining position data of a target
    • G01S17/46Indirect determination of position data
    • G01S17/48Active triangulation systems, i.e. using the transmission and reflection of electromagnetic waves other than radio waves
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S17/00Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
    • G01S17/86Combinations of lidar systems with systems other than lidar, radar or sonar, e.g. with direction finders
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S17/00Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
    • G01S17/88Lidar systems specially adapted for specific applications
    • G01S17/89Lidar systems specially adapted for specific applications for mapping or imaging
    • G01S17/8943D imaging with simultaneous measurement of time-of-flight at a 2D array of receiver pixels, e.g. time-of-flight cameras or flash lidar
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/56Cameras or camera modules comprising electronic image sensors; Control thereof provided with illuminating means

Definitions

  • Embodiments of the present invention relate to the field of image processing, and in particular, to a depth image acquisition device, a fusion method, and a terminal device.
  • terminal devices such as mobile phones use RGB cameras for image acquisition, which can only obtain two-dimensional plane information, but cannot obtain accurate depth information, which limits the use scenarios of terminal devices.
  • Sensors such as lidar can obtain sparse depth maps of line scans, and further, can complement the sparse depth maps to obtain dense depth maps to meet the application scenarios related to 3D images.
  • one of the technical problems solved by the embodiments of the present invention is to provide a depth image acquisition device, a fusion method and a terminal device.
  • the first aspect of the embodiments of the present invention provides a depth image acquisition device, including: a transmitting module, configured to transmit a speckle array to a target, wherein the speckle array includes p speckles spaced apart from each other; a receiving module group, the receiving module includes an image sensor, the image sensor includes a sensor array, and the sensor array includes m*n pixel units, wherein each pixel unit includes a CMOS photodiode and a photoelectric signal reading circuit, and the photoelectric The diode is used to receive the speckle array reflected by the target, and generate a corresponding photocurrent signal according to the speckle array.
  • the light intensity is positively correlated, the photoelectric signal reading circuit is used to read the photocurrent signal and output the corresponding pixel signal; the processing unit is used to receive the pixel signal and generate a sparse depth map according to the pixel signal, and scatter
  • the number p of spots indicates the resolution of the sparse depth map
  • the processing unit is also used to align the RGB image with a resolution of a*b with the sparse depth map, and utilize a pre-trained image fusion model, The aligned sparse depth map and the RGB map are fused to obtain a dense depth map, wherein the resolution of the dense depth map is a*b.
  • the second aspect of the embodiments of the present invention provides a depth image fusion method, the depth image fusion method is applied to a terminal device including a depth camera and an RGB camera, and the method includes: using the depth camera to collect and distinguish A sparse depth map with a rate of p, and use the RGB camera to collect an RGB image; align the sparse depth map with a resolution of p and the RGB image with a resolution of a*b; use pre-trained image fusion
  • the model fuses the aligned sparse depth map and RGB map to obtain a dense depth map with a resolution of a*b.
  • the third aspect of the embodiments of the present invention provides a terminal device, including: a depth camera, which collects a sparse depth map with a resolution of p based on the IToF principle; an RGB camera, which collects an RGB map with a resolution of a*b; a memory for storing A pre-trained image fusion model; a processor that aligns the sparse depth map with a resolution of p and the RGB image with a resolution of a*b, and uses the image fusion model to convert the aligned sparse depth The image and the RGB image are fused to obtain a dense depth map with a resolution of a*b.
  • both the image acquisition and the acquisition of the sparse depth map based on the IToF principle are conducive to reducing the acquisition cost of the image depth information.
  • the depth camera in the embodiment of the present invention collects the sparse depth map based on the IToF principle , which is beneficial to reduce the configuration cost of the depth camera, making such a depth camera more suitable for low-cost terminal equipment such as mobile phones, and reducing the cost of the terminal equipment.
  • the aligned sparse depth map and RGB image are fused, which improves the accuracy of image fusion and improves the efficiency of image processing.
  • the sparse depth map and RGB map are fused to obtain a higher-precision dense depth map, which enriches the usage scenarios of terminal devices and improves user experience.
  • FIG. 1 is a schematic block diagram of a depth image acquisition device according to an embodiment of the present invention
  • FIG. 2A is a schematic flowchart of a depth image fusion method according to another embodiment of the present invention.
  • Fig. 2B is a schematic block diagram of an example of the depth image fusion method of Fig. 2A;
  • FIG. 3 is a schematic diagram of a speckle distribution map of a depth camera in an example of the depth image fusion method of FIG. 2A;
  • Fig. 4 is a schematic flowchart of a training method of an image fusion model according to another embodiment of the present invention.
  • FIG. 5 is a schematic flowchart of a depth image fusion method according to another embodiment of the present invention.
  • Fig. 6 is a schematic block diagram of a terminal device according to another embodiment of the present invention.
  • the solutions of the embodiments of the present invention can be applied to any computer equipment with data processing capabilities, including but not limited to mobile communication equipment, ultra-mobile personal computer equipment, portable entertainment equipment and other terminal equipment with data interaction functions.
  • mobile communication devices are characterized by having mobile communication functions and mainly aiming at providing voice and data communication, including: smart phones (such as iPhone), multimedia phones, feature phones, and low-end phones.
  • ultra-mobile personal computer devices belong to the category of personal computers, which have computing and processing functions, and generally have the characteristics of mobile Internet access, including: PDA, MID and UMPC devices, such as iPad.
  • portable entertainment devices can display and play multimedia content, including: audio and video players (such as iPod), handheld game consoles, e-books, as well as smart toys and portable car navigation devices.
  • terminal devices are characterized by portability (for example, miniaturization or wearable devices) and low cost.
  • portability for example, miniaturization or wearable devices
  • low cost people expect terminal devices to have strong image processing capabilities to provide richer functions and Better user experience.
  • terminal equipment has a high penetration rate and a large number, and a relatively mature upstream and downstream industrial chain for the production and assembly of terminal equipment has gradually formed.
  • various sensor hardware required by terminal equipment is provided exclusively by terminal equipment assembly manufacturers or downstream hardware manufacturers of OEMs.
  • the software algorithms for example, operating system or neural network model, etc.
  • both software manufacturers and hardware manufacturers provide corresponding high-performance software products or high-performance hardware products to upstream manufacturers, and software manufacturers or hardware manufacturers themselves often do not integrate each other's products with their own products. It is difficult for downstream suppliers to provide both software products and hardware products to upstream suppliers. In other words, this technical division of labor not only enables downstream manufacturers to provide higher-performance products, but also ensures the overall production efficiency of terminal equipment, thereby satisfying the equipment performance and shipment volume of terminal equipment.
  • an embodiment of the present invention provides an image fusion solution, and a depth image acquisition device according to an embodiment of the present invention will be specifically described below with reference to FIG. 1 .
  • the depth image acquisition device of Fig. 1 includes:
  • the emitting module 110 is configured to emit the speckle array to the target, wherein the speckle array includes p speckles spaced apart from each other;
  • the receiving module includes an image sensor, the image sensor includes a sensor array, and the sensor array includes m*n pixel units, wherein each pixel unit includes a CMOS photodiode and a photoelectric signal reading circuit, and the photodiode is used to receive The speckle array reflected by the target object generates a corresponding photocurrent signal according to the speckle array.
  • the current intensity indicated by the photocurrent signal is positively correlated with the light intensity of the light beam received by the photodiode.
  • the photoelectric signal reading circuit is used to read the light Current signal and output corresponding pixel signal;
  • the processing unit 130 is configured to receive pixel signals and generate a sparse depth map according to the pixel signals, the resolution of the sparse depth map indicates the number p of speckles, and the processing unit is also used to combine the RGB image with a resolution of a*b with the sparse depth Align the images, and use the pre-trained image fusion model to fuse the aligned sparse depth image and RGB image to obtain a dense depth image, where the resolution of the dense depth image is a*b.
  • generating the sparse depth map according to the pixel signal includes: emitting a speckle light array with a first phase for the target through a point light source array, and obtaining a reflected speckle light array with a second phase of the detection light, and at least based on The difference between the grayscale image of the first phase of the speckle ray array and the grayscale image of the second phase of the reflected speckle ray array determines the sparse depth map.
  • CMOS photodiodes which cost less Low, and the performance of CMOS photodiodes can guarantee the effect of IToF measurement.
  • the resolution of the sparse depth map refers to the number of depth image points or the number of depth values, that is, the number p of speckles or a value approximate to the number of speckles indicates the resolution of the sparse depth map
  • the emission module includes a light-emitting array containing q light-emitting points and a light-emitting driving circuit.
  • the resolution of a two-dimensional image is characterized by the number of pixels in two dimensions, for example, an RGB image of a*b.
  • the dense fusion image obtained by adopting the fusion method of the embodiment of the present invention includes a*b pixels, and each pixel has depth information. Therefore, a*b indicates the resolution of the dense fusion image.
  • the image collection of the target area in this embodiment is divided into two parts, that is, using the depth camera to collect the sparse depth map based on the principle of indirect time of flight (Indirect Time of Flight, IToF), and using the RGB camera Collect the RGB image.
  • RGB is the color representing the three color channels of red (red), green (green), and blue (blue).
  • the RGB camera is a camera that collects images based on the RGB color mode, and the images collected by the RGB camera are RGB images.
  • the RGB color mode is a color standard in the industry, and various colors are obtained by changing the three color channels of red, green, and blue and superimposing them with each other.
  • the RGB image can be obtained by collecting the target area with an RGB camera, and the pixels of the above three color channels record the imaging result of the target area.
  • the RGB camera in this case covers color cameras in a broad sense, and RGB cameras are not necessarily required to have an RGB filter layer.
  • Similar image sensors containing color filter arrays such as RGGB, RGBW, and RYYB are applicable to the deep image fusion method of the embodiment of the present invention.
  • the sparse depth map can be acquired by the depth camera based on the IToF principle for image acquisition.
  • the depth camera of the embodiment of the present invention may be provided with a speckle light source, that is, a light source formed by an array of separated point light sources.
  • a depth camera may also be called a speckle (Spot) IToF camera.
  • the point light source projected by the speckle IToF camera is sparse (speckle).
  • the obtained depth map is sparse, and the degree of sparseness of the depth map collected by the speckle IToF camera depends on the number of points of the speckle light source.
  • a conventional surface light source IToF camera performs image acquisition based on the IToF principle, but the surface light source IToF camera has a very limited detection distance and high power consumption.
  • the speckle IToF camera has lower light emission power, higher energy density, and longer detection distance, and can obtain a depth map with more depth information.
  • the separated point light source array ensures the lower cost of the camera and the quality of the depth information.
  • the depth camera (speckle IToF camera) with speckle light source in the embodiment of the present invention is different from sensors such as lidar.
  • lidar is based on the principle of direct time of flight (Direct Time of Flight, DToF)
  • DToF Direct Time of Flight
  • the depth camera with speckle light source obtains the depth information of the target object or target area based on the IToF principle, which makes its cost lower.
  • the speckle light source is beneficial to ensure the quality of depth information.
  • the photoelectric signal reading circuit is controlled by the reading control signal to output the pixel signal, wherein the pixel signal of each pixel unit includes the first phase pixel signal, the second phase pixel signal, the third phase pixel signal and For the fourth phase pixel signal, the phase differences between the reading control signals respectively corresponding to the first phase pixel signal, the second phase pixel signal, the third phase pixel signal and the fourth phase pixel signal are respectively
  • the processing unit generates a sparse depth map according to the first phase pixel signal, the second phase pixel signal, the third phase pixel signal and the fourth phase pixel signal, and obtains the phase of the read control signal corresponding to the first phase pixel signal and the phase of the emission pulse same phase. As a result, IToF detection on sparse depth maps is reliably achieved.
  • the photoelectric signal reading circuit only reads all the pixel units of the pixel row irradiated by the speckle.
  • the processing unit is specifically configured to: align the sparse depth map and the RGB map.
  • this process can also be called image registration.
  • camera parameters are used to align the sparse depth map and the RGB image, so that the matching degree between the sparse depth map and the RGB image is high, and the fusion accuracy of the trained image fusion model is improved.
  • Alignment reflects the correspondence between the acquisition targets of the sparse depth map and the RGB image.
  • the processing unit is further used to: obtain training samples, the training samples include aligned sparse depth image samples with a resolution of p and RGB image samples with a resolution of a*b, and Dense depth map samples.
  • the processing unit is specifically used to: use the aligned sparse depth image samples and RGB image samples as input, and use the dense depth image samples as supervision conditions to train the target neural network to obtain an image fusion model.
  • the depth camera and the RGB camera that collect the training samples can be calibrated to obtain various camera parameters, and the sparse depth image samples and RGB image samples can be aligned according to each camera parameter.
  • the camera parameters of the depth camera and the RGB camera that collect the training samples may be the same as the camera parameters of the depth camera and the RGB camera that collect the image to be fused, and of course, the parameters of the two cameras may also be different.
  • the above-mentioned training samples can be obtained by collecting the above-mentioned camera modules including a depth camera and an RGB camera.
  • the camera parameters of the depth camera and the RGB camera that collect the training samples may be the same as those of the depth camera and the RGB camera that collect the images to be fused.
  • the processing unit is specifically configured to: input the aligned sparse depth image and RGB image to a pre-trained image fusion model to obtain a dense depth image.
  • the image fusion model as an end-to-end neural network model, improves image fusion efficiency and improves data processing efficiency on the premise of ensuring image fusion accuracy.
  • the processing unit is further configured to: acquire an image acquisition instruction of a three-dimensional image application program installed in the terminal device, the image acquisition instruction instructs the receiving module and the transmitting module to respectively acquire the sparse depth map and the RGB image;
  • the dense depth map is returned to the 3D image application so that the 3D image application acquires 3D image information based on the dense depth map.
  • the 3D image application may include any one of an image background blur application, a 3D image reconstruction application, a virtual reality application or an augmented reality application.
  • an operating system may be installed in the terminal device, and the 3D image application runs on the operating system.
  • Operating systems include but are not limited to embedded operating systems, real-time operating systems, and the like.
  • the three-dimensional image application program may be a system application program or a third-party application program.
  • a camera module including a depth camera and an RGB camera may respond to an image acquisition instruction of a three-dimensional image application program to start image acquisition.
  • the three-dimensional image application program can issue an image acquisition instruction (in response to a user instruction or other associated instructions, etc.).
  • the 3D image application can call the image fusion model to input the aligned sparse depth map and RGB map into the image fusion model to obtain a dense depth map.
  • the depth image fusion method of Fig. 2A includes:
  • S220 Use the depth camera to collect a sparse depth image with a resolution of p based on the IToF principle, and use the RGB camera to collect an RGB image with a resolution of a*b.
  • RGB is the color representing the three color channels of red (red), green (green) and blue (blue).
  • the RGB camera is a camera that collects images based on the RGB color mode, and the images collected by the RGB camera are RGB images.
  • the RGB color mode is a color standard in the industry, and various colors are obtained by changing the three color channels of red, green, and blue and superimposing them with each other.
  • the RGB image can be obtained by collecting the target area with an RGB camera, and the pixels of the above three color channels record the imaging result of the target area.
  • the RGB camera in this case covers color cameras in a broad sense, and RGB cameras are not necessarily required to have an RGB filter layer. Similar image sensors containing color filter arrays such as RGGB, RGBW, and RYYB are applicable to the deep image fusion method of the embodiment of the present invention.
  • the sparse depth map can be acquired by the depth camera based on the IToF principle for image acquisition.
  • the depth camera of the embodiment of the present invention may be provided with a speckle light source, that is, a light source formed by an array of separated point light sources.
  • a depth camera may also be called a speckle (Spot) IToF camera.
  • the point light source projected by the speckle IToF camera is sparse (speckle).
  • the obtained depth map is sparse, and the degree of sparseness of the depth map collected by the speckle IToF camera depends on the number of points of the speckle light source.
  • a conventional surface light source IToF camera performs image acquisition based on the IToF principle, but the surface light source IToF camera has a very limited detection distance and high power consumption.
  • the speckle IToF camera has lower light emission power, higher energy density, and longer detection distance, and can obtain a depth map with more depth information.
  • the separated point light source array ensures the lower cost of the camera and the quality of the depth information.
  • the depth camera (speckle IToF camera) with speckle light source in the embodiment of the present invention is different from sensors such as lidar.
  • lidar is based on the principle of direct time of flight (Direct Time of Flight, DToF)
  • DToF Direct Time of Flight
  • the depth camera with speckle light source obtains the depth information of the target object or target area based on the IToF principle, which makes its cost lower.
  • the speckle light source is beneficial to ensure the quality of depth information.
  • the resolution of the sparse depth map refers to the number of depth image points or the number of depth values, that is, the number p of speckles or a value approximate to the number of speckles indicates the resolution of the sparse depth map
  • the emission module includes a light-emitting array containing q light-emitting points and a light-emitting driving circuit.
  • the resolution of a two-dimensional image is characterized by the number of pixels in two dimensions, for example, an RGB image of a*b.
  • the dense fusion image obtained by adopting the fusion method of the embodiment of the present invention includes a*b pixels, and each pixel has depth information. Therefore, a*b indicates the resolution of the dense fusion image.
  • the purpose of the alignment herein is at least to fuse the depth image collected by the depth camera and the RGB image collected by the RGB camera with respect to the same target collection area.
  • the calibration parameters of the depth camera and the RGB camera can be used to align the sparse depth map and the RGB map. Since the depth camera and the RGB camera perform image acquisition based on their respective local coordinate systems. When merging multiple (two or more) images, multiple images need to be aligned to the same coordinate system, and it is considered that the position coordinates of each image in the same coordinate system indicate and correspond to the same spatial position in the world coordinate system , so that multiple images are fused based on the corresponding positional relationship.
  • the same coordinate system may be the local coordinate system of any camera, or the world coordinate system.
  • the setting positions or angles (space orientation) of different cameras are different, and the corresponding images collected usually do not correspond to the same coordinate system.
  • the local coordinate system of each camera can be obtained through the camera parameters (for example, internal parameters and external parameters) of each camera
  • the transformation relationship with the world coordinate system so that the images collected by each camera can be aligned according to the camera parameters of each camera, in other words, the sparse depth map and the RGB image can be aligned.
  • image information can also be aligned based on sparse depth maps and RGB maps. For example, it can be determined that the sparse depth map and the RGB image correspond to the same target area and their respective position features, and perform image fusion according to their respective position features.
  • the training samples of the image fusion model are aligned sparse depth image samples and RGB image samples.
  • Sparse depth image samples and RGB image samples can be acquired by depth camera and RGB camera respectively.
  • the depth camera can be the same camera or the same type of camera as the camera that collects the sparse depth map (image to be fused), and the RGB camera can also be the same camera or the same type of camera as the camera that collects the RGB image (image to be fused).
  • the matching degree between the training sample data and the data to be fused is high, which can improve the image fusion effect of the model.
  • the image fusion model in this embodiment of the present invention may be an end-to-end neural network model.
  • the input of the image fusion model is the image to be fused
  • the output of the image fusion model is the fused image.
  • the image to be fused includes a sparse depth map with depth information and an RGB image with different color channel information. Through image fusion, the above image information can be image complemented to obtain a dense depth map.
  • the neural network in this embodiment includes but is not limited to convolutional neural network (Convolutional Neural Networks, CNN), feedforward neural network (feedforward neural network), generation confrontation network (Generative Adversarial Networks, GAN), encoder such as converter transformer Decoder (encoder-decoder) network.
  • CNN convolutional Neural Networks
  • feedforward neural network feedforward neural network
  • GAN generation confrontation network
  • encoder such as converter transformer Decoder (encoder-decoder) network.
  • the training methods of various embodiments of the present invention include but not limited to supervised learning, unsupervised learning and semi-supervised learning.
  • the depth camera can collect a sparse depth map
  • the cost of collecting a sparse depth map based on the IToF principle is low, which reduces the acquisition cost of image depth information, and can be applied to low-cost terminals such as mobile phones equipment.
  • the aligned sparse depth map and RGB image are fused, which improves the accuracy of image fusion and improves the efficiency of image processing.
  • the sparse depth map and RGB map are fused to obtain a higher-precision dense depth map, which enriches the usage scenarios of terminal devices and improves user experience.
  • RGB cameras due to the high popularity of RGB cameras, the multiplexing of RGB cameras can be realized in terminal devices equipped with RGB cameras. In other words, in application scenarios that do not require depth images, RGB cameras can still be used to perform conventional image processing. collection.
  • the solution of the embodiment of the present invention realizes a low-cost depth camera, in the industrial chain of terminal equipment, the depth camera as a high-performance hardware product and the image fusion model of a high-performance software product can be fused together, in other words, The same downstream manufacturer can provide the depth camera and image fusion model together as a high-performance image processing solution to upstream manufacturers, while ensuring the production efficiency of the entire industrial chain.
  • Fig. 2B shows a schematic block diagram of an example of the depth image fusion method in Fig. 2A.
  • the RGB image of the target area can be obtained through the RGB camera data, for example, as an RGB image of a two-dimensional color image.
  • a sparse depth map can be collected by a depth camera, for example, image depth processing can be performed according to a speckle distribution map collected by a depth camera to obtain a sparse depth map.
  • a pre-trained image fusion model is used to perform image fusion processing on the RGB image and the sparse depth image to obtain a dense depth image.
  • the depth camera herein may include a transmitting module, a receiving module and a processing unit.
  • the emission module can be used to emit the speckle array (array of point light sources) to the target.
  • the speckle array may include p speckles spaced apart from each other.
  • the receiving module may include an image sensor, and the image sensor may include a sensor array, and the sensor array may include m*n pixel units, wherein each pixel unit includes a CMOS photodiode and a photoelectric signal reading circuit, and the photodiode is used to receive The speckle array reflected by the target object generates a corresponding photocurrent signal according to the speckle array, and the current intensity indicated by the photocurrent signal is positively correlated with the light intensity irradiated by the light beam received by the photodiode.
  • the photoelectric signal reading circuit is used to read the photocurrent signal and output the corresponding pixel signal; the processing unit is used to receive the pixel signal and generate a sparse depth map according to the pixel signal, and the number p of the speckles indicates the resolution of the sparse depth map, The processing unit is also used to align the RGB image with a resolution of a*b and the sparse depth map, and use the pre-trained image fusion model to fuse the aligned sparse depth map and RGB image to obtain a dense depth map, where The resolution of the dense depth map is a*b.
  • the photoelectric signal reading circuit can be controlled by the reading control signal to output the pixel signal, wherein the pixel signal of each pixel unit includes the first phase pixel signal, the second phase pixel signal, the third phase pixel signal and the first phase pixel signal.
  • Four-phase pixel signals wherein the phase differences between the reading control signals corresponding to the first phase pixel signal, the second phase pixel signal, the third phase pixel signal and the fourth phase pixel signal respectively are as follows:
  • the processing unit generates a sparse depth map according to the first phase pixel signal, the second phase pixel signal, the third phase pixel signal and the fourth phase pixel signal, and obtains the phase of the read control signal corresponding to the first phase pixel signal and the phase of the emission pulse same phase.
  • Fig. 3 shows a schematic diagram of a speckle distribution map.
  • the speckle distribution map is an image collected by a point light source array set in the depth camera.
  • the distribution diagram of the light rays reflected by the point light source array in the image via the target area or the target object corresponds to the speckle distribution diagram.
  • the concentration of light emitted by each point light source in such a point light source array is far superior to that of a surface light source.
  • a depth camera can obtain a sparse depth map with high-quality depth information based on a low-cost IToF processing module. .
  • aligning the sparse depth image with resolution p and the RGB image with resolution a*b includes: according to the camera parameters calibrated for the depth camera and RGB camera, aligning the sparse depth image with resolution p
  • the sparse depth map is aligned with the RGB image with a resolution of a*b.
  • This process can also be called image registration.
  • camera parameters are used to align the sparse depth map and the RGB image, so that the matching degree between the sparse depth map and the RGB image is high, and the fusion accuracy of the trained image fusion model is improved. From an intuitive point of view, the alignment reflects the correspondence between the acquisition targets of the sparse depth map and the RGB image.
  • each image participating in the fusion (the aligned sparse depth map and RGB image) Parts (eg, pixels) are corresponding such that each part brings together depth information from the sparse depth map, and non-depth information from the RGB map, resulting in a reliable fused depth map.
  • the calibration parameters indicate the transformation relationship between the camera coordinate system and the world coordinate system.
  • the calibration parameters include camera intrinsic parameters and camera extrinsic parameters.
  • camera extrinsic parameters indicate the mapping from the world coordinate system to the camera coordinate system
  • camera intrinsic parameters indicate the mapping from the camera coordinate system to the image coordinate system.
  • the calibration of the parameters of the depth camera and the RGB camera can be performed before performing image acquisition. The obtained calibration parameters may be stored in advance, and then the pre-stored calibration parameters may be obtained.
  • the depth camera and the RGB camera are set in the camera module, and the camera parameters are obtained through camera calibration based on the camera module.
  • the depth camera and the RGB camera can be combined or assembled into a camera module, and then the camera module can be assembled into the terminal device as an integral part to improve the efficiency of device assembly.
  • the camera module can be installed in different devices as an independent component, and the calibration parameters of the camera module do not change with the device where it is located, which improves the setting flexibility of the camera module as a collection device.
  • the camera module provided with the depth camera and the RGB camera is also determined.
  • the calibration parameters can be stored in the storage module of the camera module.
  • the internal parameters and external parameters of the depth camera and RGB camera can be calibrated for the camera module respectively. It is also possible to calibrate the internal parameters of the depth camera and RGB camera before assembling them into a camera module, and calibrate the external parameters of the depth camera and RGB camera after assembling them into a camera module.
  • RGB cameras can obtain their own internal parameters after they leave the factory and before they are assembled into a camera module. In this way, after assembly, only the external parameters that indicate the relative orientation of each camera need to be calibrated to improve the efficiency of parameter calibration after assembly.
  • a point light source array is set in the depth camera, and correspondingly, using the depth camera to collect a sparse depth map with a resolution of p based on the IToF principle includes: sending a detection light with a first phase to the target area through the point light source array , and acquire the reflected ray with the second phase of the probe ray; determine the sparse depth with resolution p based at least on the difference between the grayscale image of the first phase of the probe ray and the grayscale image of the second phase of the reflected ray picture.
  • the depth camera can obtain the phase change information between the emitted light and the reflected light by collecting the light emitted by the separated point light source and reflected by the target area or object. Further, a depth map can be obtained by performing depth processing based on the phase change information. For example, based on the phase change information, the time gap information between the emission relationship and the reception of the reflected light rays can be transmitted. Based on the time gap information, the depth information of the target area or the target object can be determined to obtain a depth map.
  • the photoelectric signal reading circuit is controlled by the reading control signal to output the pixel signal, wherein the pixel signal of each pixel unit includes the first phase pixel signal, the second phase pixel signal, the third phase pixel signal and the fourth phase pixel signal.
  • Phase pixel signals wherein the phase differences between the read control signals corresponding to the first phase pixel signal, the second phase pixel signal, the third phase pixel signal and the fourth phase pixel signal respectively are:
  • the processing unit generates a sparse depth map according to the first phase pixel signal, the second phase pixel signal, the third phase pixel signal and the fourth phase pixel signal, and obtains the phase of the read control signal corresponding to the first phase pixel signal and the phase of the emission pulse same phase.
  • the depth camera with the speckle light source obtains the depth information of the target object or the target area based on the IToF principle, so that its cost is low.
  • the speckle light source is beneficial to ensure the quality of the depth information.
  • speckle IToF cameras have lower light emission power, higher energy density, and longer detection distances. In other words, although the depth map collected by the speckle IToF camera is sparse, the separated point light source array ensures the low cost of the camera and the quality of the depth information at the same time.
  • the speckle IToF camera may include a transmitting module, a receiving module and a processing unit.
  • the emission module can be used to emit the speckle array (array of point light sources) to the target.
  • the speckle array may include p speckles spaced apart from each other.
  • the receiving module may include an image sensor, and the image sensor may include a sensor array, and the sensor array may include m*n pixel units, wherein each pixel unit includes a CMOS photodiode and a photoelectric signal reading circuit, and the photodiode is used to receive The speckle array reflected by the target object generates a corresponding photocurrent signal according to the speckle array, and the current intensity indicated by the photocurrent signal is positively correlated with the light intensity irradiated by the light beam received by the photodiode.
  • the photoelectric signal reading circuit is used to read the photocurrent signal and output the corresponding pixel signal;
  • the processing unit is used to receive the pixel signal and generate a sparse depth map according to the pixel signal.
  • the number p of speckles indicates the resolution of the sparse depth map.
  • the processing unit is also used to combine the RGB image with a resolution of a*b with the sparse depth map Perform alignment, and use the pre-trained image fusion model to fuse the aligned sparse depth map and RGB map to obtain a dense depth map, where the resolution of the dense depth map is a*b.
  • CMOS photodiodes which have a lower cost , and the performance of the CMOS photodiode can guarantee the effect of IToF measurement.
  • the resolution of the sparse depth map refers to the number of depth image points or the number of depth values, that is, the number p of speckles or a value approximate to the number of speckles indicates the resolution of the sparse depth map
  • the emission module includes a light-emitting array containing q light-emitting points and a light-emitting driving circuit.
  • the resolution of a two-dimensional image is characterized by the number of pixels in two dimensions, for example, an RGB image of a*b.
  • the dense fusion image obtained by adopting the fusion method of the embodiment of the present invention includes a*b pixels, and each pixel has depth information. Therefore, a*b indicates the resolution of the dense fusion image.
  • the image fusion model is trained as follows: Obtain training samples, the training samples include aligned sparse depth image samples with resolution p and RGB image samples with resolution a*b, and resolution a The dense depth image sample of *b; take the aligned sparse depth image sample and RGB image sample as input, and use the dense depth image sample as the supervision condition to train the target neural network to obtain the image fusion model.
  • the depth camera and the RGB camera that collect the training samples can be calibrated to obtain various camera parameters, and the sparse depth image samples and RGB image samples can be aligned according to each camera parameter.
  • the camera parameters of the depth camera and the RGB camera that collect the training samples may be the same as the camera parameters of the depth camera and the RGB camera that collect the image to be fused, and of course, the parameters of the two cameras may also be different.
  • the above-mentioned training samples can be obtained by collecting the above-mentioned camera modules including a depth camera and an RGB camera.
  • the camera parameters of the depth camera and the RGB camera that collect the training samples may be the same as those of the depth camera and the RGB camera that collect the images to be fused.
  • the aligned sparse depth map and RGB image are fused to obtain a dense depth map with a resolution of a*b, which may include: inputting the aligned sparse depth map and RGB image To the pre-trained image fusion model, a dense depth map with a resolution of a*b is obtained.
  • the image fusion model as an end-to-end neural network model, improves image fusion efficiency and improves data processing efficiency on the premise of ensuring image fusion accuracy.
  • the depth image fusion method further includes: acquiring an image acquisition instruction of a three-dimensional image application program installed in the terminal device, the image acquisition instruction instructs the depth camera and the RGB camera to respectively acquire the sparse depth map and the RGB image; The depth map is returned to the 3D graphics application.
  • the 3D image application may include any one of an image background blur application, a 3D image reconstruction application, a virtual reality application or an augmented reality application.
  • an operating system may be installed in the terminal device, and the 3D image application runs on the operating system.
  • Operating systems include but are not limited to embedded operating systems, real-time operating systems, and the like.
  • the three-dimensional image application program may be a system application program or a third-party application program.
  • a camera module including a depth camera and an RGB camera may respond to an image acquisition instruction of a three-dimensional image application program to start image acquisition.
  • the three-dimensional image application program can issue an image acquisition instruction (in response to a user instruction or other associated instructions, etc.).
  • the 3D image application can call the image fusion model to input the aligned sparse depth map and RGB map into the image fusion model to obtain a dense depth map.
  • the 3D graphics application of this example leverages deep image fusion methods to provide a richer 3D graphics user experience.
  • the depth image fusion solution of an embodiment of the present invention has been described and illustrated in detail and generally in conjunction with FIGS. 1-3 above.
  • the depth image fusion method according to other embodiments of the present invention will be described and illustrated below with reference to FIG. 4 and FIG. 5 .
  • FIG. 4 is a schematic flowchart of a method for training an image fusion model according to another embodiment of the present invention.
  • S410 Calibrate the parameters of the depth camera and the RGB camera to obtain calibration parameters.
  • the calibration parameters indicate the transformation relationship between the camera coordinate system and the world coordinate system.
  • the calibration parameters include camera intrinsic parameters and camera extrinsic parameters.
  • camera extrinsic parameters indicate the mapping from the world coordinate system to the camera coordinate system
  • camera intrinsic parameters indicate the mapping from the camera coordinate system to the image coordinate system.
  • the calibration of the parameters of the depth camera and the RGB camera can be performed before performing image acquisition. The obtained calibration parameters may be stored in advance, and then the pre-stored calibration parameters may be obtained.
  • S420 Collect sparse depth image samples and RGB image samples.
  • the depth camera and RGB camera that collect training samples can be calibrated to obtain the parameters of each camera, and the sparse depth image samples and RGB image samples can be aligned according to each camera parameter.
  • the camera parameters of the depth camera and the RGB camera that collect the training samples may be the same as the camera parameters of the depth camera and the RGB camera that collect the image to be fused, and of course, the parameters of the two cameras may also be different.
  • the above-mentioned training samples can be obtained by collecting the above-mentioned camera modules including a depth camera and an RGB camera.
  • the camera parameters of the depth camera and the RGB camera that collect the training samples may be the same as those of the depth camera and the RGB camera that collect the images to be fused.
  • camera parameters are used to align sparse depth image samples and RGB image samples, so that the matching degree between sparse depth image samples and RGB image samples is high, and the fusion accuracy of the trained image fusion model is improved.
  • the aligned sparse depth image samples and RGB image samples can be used as input, and the dense depth image samples can be used as supervision conditions to train the target neural network to obtain an image fusion model.
  • Fig. 5 is a schematic flowchart of a depth image fusion method according to another embodiment of the present invention.
  • S510 Use the depth camera to collect a sparse depth image based on the IToF principle, and use the RGB camera to collect the RGB image.
  • the RGB image can be obtained by collecting the target area through an RGB camera, and the pixels of the above three color channels are used to record the imaging results of the target area.
  • a sparse depth map may have resolution p and an RGB map may have resolution a*b.
  • the resolution of the sparse depth map refers to the number of depth image points or the number of depth values, that is, the number p of speckles or a value similar to the number of speckles indicates the resolution of the sparse depth map, for example, the emission mode
  • the group includes a light-emitting array containing q light-emitting points and a light-emitting driving circuit.
  • the resolution of a two-dimensional image is characterized by the number of pixels in two dimensions, for example, an RGB image of a*b.
  • the dense fusion image obtained by adopting the fusion method of the embodiment of the present invention includes a*b pixels, and each pixel has depth information. Therefore, a*b indicates the resolution of the dense fusion image.
  • the sparse depth map can be obtained through the image acquisition of the depth camera based on the IToF principle.
  • the depth camera may be provided with a speckle light source, ie a light source formed by an array of separate point light sources.
  • the point light source array can be used to emit detection light for the target area, and obtain the reflected light of the detected light; apply the light change between the reflected light and the detected light to the IToF principle to obtain a sparse depth map.
  • S520 Align the sparse depth image and the RGB image according to the camera parameters calibrated for the depth camera and the RGB camera.
  • the calibration parameters indicate the transformation relationship between the camera coordinate system and the world coordinate system.
  • the calibration parameters include camera intrinsic parameters and camera extrinsic parameters.
  • camera extrinsic parameters indicate the mapping from the world coordinate system to the camera coordinate system
  • camera intrinsic parameters indicate the mapping from the camera coordinate system to the image coordinate system.
  • the calibration of the parameters of the depth camera and the RGB camera can be performed before performing image acquisition. The obtained calibration parameters may be stored in advance, and then the pre-stored calibration parameters may be obtained.
  • the aligned sparse depth map and RGB map are input to a pre-trained image fusion model to obtain a dense depth map.
  • FIG. 6 is a schematic block diagram of a terminal device, and the actions and steps of its components correspond to the description schemes in FIGS. 1-3 .
  • the terminal equipment in Figure 6 includes:
  • the depth camera 610 collects a sparse depth map based on the IToF principle.
  • the sparse depth map can be obtained by image acquisition by the depth camera based on the IToF principle.
  • the depth camera may be provided with a speckle light source, ie a light source formed by an array of separate point light sources.
  • a depth camera may also be called a speckle (Spot) IToF camera.
  • the point light source projected by the speckle IToF camera is sparse (speckle).
  • the obtained depth map is sparse, and the degree of sparseness of the depth map collected by the speckle IToF camera depends on the number of points of the speckle light source.
  • the RGB camera 620 collects RGB images.
  • RGB is the color representing the three color channels of red (red), green (green) and blue (blue).
  • the RGB camera is a camera that collects images based on the RGB color mode, and the images collected by the RGB camera are RGB images.
  • the RGB color mode is a color standard in the industry, and various colors are obtained by changing the three color channels of red, green, and blue and superimposing them with each other.
  • the RGB image can be obtained by collecting the target area with an RGB camera, and the pixels of the above three color channels record the imaging result of the target area.
  • the memory 630 stores a pre-trained image fusion model.
  • the storage may be installed with an operating system and applications running on the operating system.
  • the depth camera and the RGB camera can obtain the image acquisition instruction of the operating system or the application program through the processor, execute the corresponding image acquisition function and call the image fusion model.
  • the processor 640 aligns the sparse depth map and the RGB image, and uses the image fusion model to fuse the aligned sparse depth map and the RGB image to obtain a dense depth map.
  • the training samples of the image fusion model are aligned sparse depth image samples and RGB image samples.
  • Sparse depth image samples and RGB image samples can be acquired by depth camera and RGB camera respectively.
  • the depth camera can be the same camera or the same type of camera as the camera that collects the sparse depth map (image to be fused), and the RGB camera can also be the same camera or the same type of camera as the camera that collects the RGB image (image to be fused).
  • the matching degree between the training sample data and the data to be fused is high, which can improve the image fusion effect of the model.
  • the depth camera may include a transmitting module, a receiving module and a processing unit.
  • the emission module can be used to emit the speckle array (array of point light sources) to the target.
  • the speckle array may include p speckles spaced apart from each other.
  • the receiving module may include an image sensor, and the image sensor may include a sensor array, and the sensor array may include m*n pixel units, wherein each pixel unit includes a CMOS photodiode and a photoelectric signal reading circuit, and the photodiode is used to receive The speckle array reflected by the target object generates a corresponding photocurrent signal according to the speckle array, and the current intensity indicated by the photocurrent signal is positively correlated with the light intensity irradiated by the light beam received by the photodiode.
  • the photoelectric signal reading circuit is used to read the photocurrent signal and output the corresponding pixel signal; the processing unit is used to receive the pixel signal and generate a sparse depth map according to the pixel signal, and the number p of the speckles indicates the resolution of the sparse depth map, The processing unit is also used to align the RGB image with a resolution of a*b and the sparse depth map, and use the pre-trained image fusion model to fuse the aligned sparse depth map and RGB image to obtain a dense depth map, where The resolution of the dense depth map is a*b.
  • the photoelectric signal reading circuit can be controlled by the reading control signal to output the pixel signal, wherein the pixel signal of each pixel unit includes the first phase pixel signal, the second phase pixel signal, the third phase pixel signal and the first phase pixel signal.
  • Four-phase pixel signals wherein the phase differences between the reading control signals corresponding to the first phase pixel signal, the second phase pixel signal, the third phase pixel signal and the fourth phase pixel signal respectively are as follows:
  • the processing unit generates a sparse depth map according to the first phase pixel signal, the second phase pixel signal, the third phase pixel signal and the fourth phase pixel signal, and obtains the phase of the read control signal corresponding to the first phase pixel signal and the phase of the emission pulse same phase.
  • the depth camera can collect a sparse depth map
  • the cost of collecting a sparse depth map based on the IToF principle is low, which reduces the acquisition cost of image depth information, and can be applied to low-cost terminals such as mobile phones equipment.
  • the aligned sparse depth map and RGB image are fused, which improves the accuracy of image fusion and improves the efficiency of image processing.
  • the sparse depth map and RGB map are fused to obtain a higher-precision dense depth map, which enriches the usage scenarios of terminal devices and improves user experience.
  • RGB cameras due to the high popularity of RGB cameras, the multiplexing of RGB cameras can be realized in terminal devices equipped with RGB cameras. In other words, in application scenarios that do not require depth images, RGB cameras can still be used to perform conventional image processing. collection.
  • the solution of the embodiment of the present invention realizes a low-cost depth camera, in the industrial chain of terminal equipment, the depth camera as a high-performance hardware product and the image fusion model of a high-performance software product can be fused together, in other words, The same downstream manufacturer can provide the depth camera and image fusion model together as a high-performance image processing solution to upstream manufacturers, while ensuring the production efficiency of the entire industrial chain.
  • the processor is specifically configured to: align the sparse depth image and the RGB image according to the camera parameters calibrated for the depth camera and the RGB camera.
  • the depth camera and the RGB camera are set in the camera module, and the camera parameters are obtained through camera calibration based on the camera module.
  • a point light source array is set in the depth camera.
  • the depth camera is specifically configured to: use the point light source array to emit detection light with a first phase for the target area, and obtain the detection light with a second phase. reflected rays in two phases, and determining the sparse depth map based at least on the difference between the grayscale image of the first phase of the probed rays and the grayscale image of the second phase of the reflected rays.
  • the photoelectric signal reading circuit is controlled by the reading control signal to output the pixel signal, wherein the pixel signal of each pixel unit includes the first phase pixel signal, the second phase pixel signal, the third phase pixel signal and the fourth phase pixel signal.
  • Phase pixel signals wherein the phase differences between the read control signals corresponding to the first phase pixel signal, the second phase pixel signal, the third phase pixel signal and the fourth phase pixel signal respectively are:
  • the processing unit generates a sparse depth map according to the first phase pixel signal, the second phase pixel signal, the third phase pixel signal and the fourth phase pixel signal, and obtains the phase of the read control signal corresponding to the first phase pixel signal and the phase of the emission pulse same phase.
  • the image fusion model is obtained by training as follows: obtaining training samples, the training samples include aligned sparse depth image samples and RGB image samples, and dense depth image samples; using aligned sparse depth image samples and RGB image samples As input, the target neural network is trained with dense depth map samples as supervision conditions to obtain an image fusion model.
  • the processor is specifically configured to: input the aligned sparse depth image and RGB image to a pre-trained image fusion model to obtain a dense depth image.
  • the processor is also used to: acquire an image acquisition instruction of a three-dimensional image application program installed in the terminal device, the image acquisition instruction instructs the depth camera and the RGB camera to respectively acquire the sparse depth map and the RGB image;
  • the graph returns to the 3D graphics application.
  • the 3D image application includes any one of an image background blur application, a 3D image reconstruction application, a virtual reality application or an augmented reality application.
  • the terminal device in this embodiment is used to implement the corresponding methods in the foregoing multiple method embodiments, and has the beneficial effects of the corresponding method embodiments, so details are not repeated here.
  • the function implementation of each module in the apparatus of this embodiment reference may be made to the description of corresponding parts in the foregoing method embodiments, and details are not repeated here.
  • the improvement of a technology can be clearly distinguished as an improvement in hardware (for example, improvements in circuit structures such as diodes, transistors, and switches) or improvements in software (improvement in method flow).
  • improvements in circuit structures such as diodes, transistors, and switches
  • improvements in software improvement in method flow
  • the improvement of many current method flows can be regarded as the direct improvement of the hardware circuit structure.
  • Designers almost always get the corresponding hardware circuit structure by programming the improved method flow into the hardware circuit. Therefore, it cannot be said that the improvement of a method flow cannot be realized by hardware physical modules.
  • a programmable logic device Programmable Logic Device, PLD
  • PLD Programmable Logic Device
  • FPGA Field Programmable Gate Array
  • HDL Hardware Description Language
  • ABEL Advanced Boolean Expression Language
  • AHDL Altera Hardware Description Language
  • HDCal JHDL
  • Lava Lava
  • Lola MyHDL
  • PALASM RHDL
  • VHDL Very-High-Speed Integrated Circuit Hardware Description Language
  • Verilog Verilog
  • the controller may be implemented in any suitable way, for example the controller may take the form of a microprocessor or processor and a computer readable medium storing computer readable program code (such as software or firmware) executable by the (micro)processor , logic gates, switches, Application Specific Integrated Circuit (ASIC), programmable logic controllers, and embedded microcontrollers, examples of controllers include but are not limited to the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicone Labs C8051F320, the memory controller can also be implemented as part of the control logic of the memory.
  • ASIC Application Specific Integrated Circuit
  • controller in addition to realizing the controller in a purely computer-readable program code mode, it is entirely possible to make the controller use logic gates, switches, application-specific integrated circuits, programmable logic controllers, and embedded The same function can be realized in the form of a microcontroller or the like. Therefore, such a controller can be regarded as a hardware component, and the devices included in it for realizing various functions can also be regarded as structures within the hardware component. Or even, means for realizing various functions can be regarded as a structure within both a software module realizing a method and a hardware component.
  • a typical implementing device is a computer.
  • the computer may be, for example, a personal computer, a laptop computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or Combinations of any of these devices.
  • the embodiments of the present invention may be provided as methods, systems, or computer program products. Accordingly, the present invention can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions
  • the device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.
  • a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • Memory may include non-permanent storage in computer-readable media, in the form of random access memory (RAM) and/or nonvolatile memory such as read-only memory (ROM) or flash RAM. Memory is an example of computer readable media.
  • RAM random access memory
  • ROM read-only memory
  • flash RAM flash random access memory
  • Computer-readable media including both permanent and non-permanent, removable and non-removable media, can be implemented by any method or technology for storage of information.
  • Information may be computer readable instructions, data structures, modules of a program, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Flash memory or other memory technology, Compact Disc Read-Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cartridge, tape magnetic disk storage or other magnetic storage device or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
  • computer-readable media excludes transitory computer-readable media, such as modulated data signals and carrier waves.
  • the embodiments of the present invention may be provided as methods, systems or computer program products. Accordingly, the present invention can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • a computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • the invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer.
  • program modules include routines, programs, objects, components, data structures, etc. that perform particular transactions or implement particular abstract data types.
  • the invention may also be practiced in distributed computing environments where transactions are performed by remote processing devices that are linked through a communications network.
  • program modules may be located in both local and remote computer storage media including storage devices.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Electromagnetism (AREA)
  • Remote Sensing (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Optics & Photonics (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)

Abstract

本发明实施例提供一种深度图像采集装置、融合方法和终端设备。所述深度图像采集装置包括发射模组、接收模组和处理单元。所述发射模组用于发射散斑阵列到目标物,其中散斑阵列包括p个互相间隔的散斑。所述接收模组包括图像传感器。所述处理单元用于接收像素信号并根据像素信号生成稀疏深度图,散斑的个数p指示稀疏深度图的分辨率,处理单元还用于将分辨率为a*b的RGB图像与稀疏深度图进行对齐,并利用预先训练的图像融合模型,将对齐后的稀疏深度图和RGB图进行融合,得到分辨率为a*b的稠密深度图。本发明实施例的方案降低了终端设备的成本,同时也得到了更高精度的稠密深度图,丰富了终端设备的使用场景。

Description

深度图像采集装置、融合方法和终端设备 技术领域
本发明实施例涉及图像处理领域,尤其涉及一种深度图像采集装置、融合方法和终端设备。
背景技术
一般而言,诸如手机的终端设备采用RGB相机进行图像采集,只能得到二维的平面信息,无法得到准确的深度信息,限制了终端设备的使用场景。
诸如激光雷达的传感器能够获得线扫描的稀疏深度图,进一步地,可以对稀疏深度图进行补全得到稠密深度图,以满足与三维图像相关的应用场景。
但是,激光雷达这样的传感器由于成本过高,对于手机这样的终端设备,需要一种更低成本的深度图像方案。
发明内容
有鉴于此,本发明实施例所解决的技术问题之一在于提供一种深度图像采集装置、融合方法和终端设备。
本发明实施例的第一方面提供了一种深度图像采集装置,包括:发射模组,用于发射散斑阵列到目标物,其中所述散斑阵列包括p个互相间隔的散斑;接收模组,所述接收模组包括图像传感器,所述图像传感器包括传感器阵列,所述传感器阵列包括m*n个像素单元,其中每个像素单元包括CMOS光电二极管和光电信号读取电路,所述光电二极管用于接收经所述目标物反射的所述散斑阵列,并根据所述散斑阵列生成对应的光电流信号,所述光电流信号指示的电流强度与所述光电二极管所接收光束照射的光强正相关,所述光电信号读取电路用于读取所述光电流信号并输出对应的像素信号;处理单元,用于接收所述像素信号并根据所述像素信号生成稀疏深度图,散斑的个数p指示所述稀疏深度图的分辨率,所述处理单元还用于将分辨率为a*b的RGB图像与所述稀疏深度图进行对齐,并利用预先训练的图像融合模型,将对齐后的稀疏深度图和RGB图进行融合,得到稠密深度图,其中所述稠密深度图的分辨率为a*b。
本发明实施例的第二方面提供了一种深度图像融合方法,所述深度图像融合方法应用于包括深度相机和RGB相机的终端设备,所述方法包括:利用所述深度相机基于IToF原理采集分辨率为p的稀疏深度图,并且利用所述RGB相机采集RGB图;对所述分辨率为p的稀疏深度图和所述分辨率为a*b的RGB图进行对齐;利用预先训练的图像融合模型,将对齐后的稀疏深度图和RGB图进行融合,得到分辨率为a*b的稠密深度图。
本发明实施例的第三方面提供了一种终端设备,包括:深度相机,基于IToF原理采集分辨率为p的稀疏深度图;RGB相机,采集分辨率为a*b的RGB图;存储器,存储预先训练的图像融合模型;处理器,对所述分辨率为p的稀疏深度图和所述分辨率为a*b的RGB图进行对齐,并且利用所述图像融合模型,将对齐后的稀疏深度图和RGB图进行融合,得到分辨率为a*b的稠密深度图。
在本发明实施例的方案中,基于IToF原理进行图像采集与稀疏深度图的获取两者都有利于降低图像深度信息的获取成本,换言之,本发明实施例的深度相机基于IToF原理采集稀疏深度图,有利于降低深度相机的配置成本,使这样的深度相机更适用于诸如手机的低成本终端设备,降低了终端设备的成本。此外,通过预先训练的图像融合模型,将对齐后的稀疏深度图和RGB图进行融合,提高了图像融合的精度,提高了图像处理的效率。此外,对稀疏深度图和RGB图进行融合,得到了更高精度的稠密深度图,丰富了终端设备的使用场景,改善了用户体验。
附图说明
后文将参照附图以示例性而非限制性的方式详细描述本发明实施例的一些具体实施例。附图中相同的附图标记标示了相同或类似的部件或部分。本领域技术人员应该理解,这些附图未必是按比值绘制的。附图中:
图1为本发明的一个实施例的深度图像采集装置的示意性框图;
图2A为本发明的另一实施例的深度图像融合方法的示意性流程图;
图2B为图2A的深度图像融合方法的一个示例的示意性框图;
图3为图2A的深度图像融合方法的一个示例的深度相机的散斑分布图的示意图;
图4为本发明的另一实施例的图像融合模型的训练方法的示意性流程图;
图5为本发明的另一实施例的深度图像融合方法的示意性流程图;以及
图6为本发明的另一实施例的终端设备的示意性框图。
具体实施方式
下面结合本发明实施例附图进一步说明本发明实施例具体实现。
本发明实施例的方案可以适用于任何具有数据处理能力的计算机设备,包括但不限于移动通信设备、超移动个人计算机设备、便携式娱乐设备和其他具有数据交互功能的终端设备。
一般而言,移动通信设备的特点是具备移动通信功能,并且以提供话音、数据通信为主要目标,包括:智能手机(例如iPhone)、多媒体手机、功能性手机,以及低端手机等。另外,超移动个人计算机设备属于个人计算机的范畴,有计算和处理功能,一般也具备移动上网特性,包括:PDA、MID和UMPC设备等,例如iPad。另外,便携式娱乐设备可以显示和播放多媒体内容,包括:音频、视频播放器(例如iPod),掌上游戏机,电子书,以及智能玩具和便携式车载导航设备。
一方面,各种终端设备都具有便携性(例如,设备小型化或设备可穿戴)和低成本的特点,同时人们期望终端设备还能够具有较强的图像处理能力,以提供更丰富的功能和更好的用户体验。
另一方面,终端设备的普及率较高,数量较大,并且已经逐渐形成了较成熟的终端设备生产和组装的上下游产业链。例如,终端设备所需的各种传感器硬件由终端设备组装厂商或代工厂商的下游硬件厂商专门提供。终端设备中具有较强的数据处理能力的软件算法(例如,操作系统或神经网络模型等)也由相应的下游软件厂商专门提供。这样一来,软件厂商与硬件厂商由于均向上游厂商提供相应的高性能软件产品或高性能硬件产品,并且软件厂商或硬件厂商自身往往也不会将对方的产品与自身的产品进行整合,同一下游供应商难以向上游供应商提供软件产品与硬件产品两者。换言之,这种技术上专业分工既使下游厂商能够提供更高性能的产品,又保证了终端设备整体的生产效率,进而满足了终端设备的设备性能和出货量。
在这样的技术背景下,本发明实施例提供了一种图像融合方案,下面将结合图1具体说明本发明的一个实施例的深度图像采集装置。
图1的深度图像采集装置,包括:
发射模组110,用于发射散斑阵列到目标物,其中散斑阵列包括p个互相间隔的散斑;
接收模组120,接收模组包括图像传感器,图像传感器包括传感器阵列,传感器阵列包括m*n个像素单元,其中每个像素单元包括CMOS光电二极管和光电信号读取电路,光电二极管用于接收经目标物反射的散斑阵列,并根据散斑阵列生成对应的光电流信号,光电流信号指示的电流强度与光电二极管所接收光束照射的光强正相关,光电信号读取电路用于读取光电流信号并输出对应的像素信号;
处理单元130,用于接收像素信号并根据像素信号生成稀疏深度图,稀疏深度图的分辨率指示散斑的个数p,处理单元还用于将分辨率为a*b的RGB图像与稀疏深度图进行对齐,并利用预先训练的图像融合模型,将对齐后的稀疏深度图和RGB图进行融合,得到稠密深度 图,其中稠密深度图的分辨率为a*b。
应理解,根据像素信号生成稀疏深度图,包括:通过点光源阵列针对目标物发出具有第一相位的散斑光线阵列,并且获取探测光线的具有第二相位的反射散斑光线阵列,并且至少基于散斑光线阵列的第一相位的灰度图和反射散斑光线阵列的第二相位的灰度图之间的差异,确定稀疏深度图。
还应理解,传统激光雷达(例如,dToF相机和LiDAR)通常采用雪崩式二极管(Avalanche Photo Diode,APD),例如,单光子雪崩式二极管,而本发明实施例采用了CMOS光电二极管,其成本较低,并且CMOS光电二极管的性能能够保证IToF测量的效果。
还应理解,稀疏深度图的分辨率是指深度图像点的数目或者有深度值的数目,即,散斑的个数p或者与散斑的个数近似的值指示稀疏深度图的分辨率,例如,发射模组包括含有q个发光点的发光阵列和发光驱动电路,发光驱动电路受控于发射脉冲信号而驱动q个发光点发光以产生p个散斑,其中p=s*q,s为大于或等于1的整数。但是通常二维图像的分辨率采用两个维度的像素数进行表征,例如,a*b的RGB图像。采用本发明实施例的融合方法得到的稠密融合图包括a*b个像素,每个像素都具有深度信息,因此,a*b指示稠密融合图具有的分辨率。
还应理解,本实施例中对目标区域的图像采集分为两个部分,即,利用深度相机基于间接测量飞行时间(Indirect Time of Flight,IToF)原理进行稀疏深度图的采集、以及利用RGB相机进行RGB图的采集。RGB即是代表红(red)、绿(green)、蓝(blue)这三个颜色通道的颜色。RGB相机为基于RGB色彩模式进行图像采集的相机,利用RGB相机采集到的图像为RGB图。具体而言,RGB色彩模式是工业界的一种颜色标准,是通过对红、绿、蓝这三个颜色通道的变化以及它们相互之间的叠加来得到各种颜色。RGB图可以通过RGB相机对目标区域采集得到,并且上述三个颜色通道的像素来记录目标区域的成像结果。本案RGB相机涵盖广义上的彩色相机,不一定要求RGB相机具有RGB滤光层,类似含有RGGB、RGBW、RYYB等彩色滤光阵列的图像传感器均适用于本发明实施例的深度图像融合方法。
还应理解,稀疏深度图可以通过深度相机基于IToF原理进行图像采集获得。本发明实施例的深度相机可以设置有散斑式光源,即,由分离的点光源阵列形成的光源。这种深度相机也可以被称为散斑(Spot)IToF相机。散斑IToF相机投射的点光源是稀疏的(散斑),相应地,获得的深度图是稀疏的,且散斑IToF相机采集到的深度图的稀疏程度取决于散斑光源的点数。
还应理解,一般而言,常规的面光源IToF相机基于IToF原理进行图像采集,但是面光源IToF相机探测距离十分有限,并且功耗较大。与面光源IToF相机不同,散斑IToF相机的光发射功率更低、能量密度更高、探测距离更远,能够得到更多深度信息的深度图。换言之,虽然散斑IToF相机采集到的深度图是稀疏,但是分离的点光源阵列更保证了这种相机的较低成本与深度信息的质量。
本发明实施例中的具有散斑式光源的深度相机(散斑IToF相机)与诸如激光雷达的传感器不同,例如,激光雷达基于直接测量飞行时间(Direct Time of Flight,DToF)原理获得用于诸如目标测距和目标跟踪等目的的深度信息,因此,这样的传感器的成本较高,传感器的物理尺寸较大,不适于低成本的终端设备,也不适用于便携式设备或可穿戴设备。具有散斑式光源的深度相机基于IToF原理获得目标物体或目标区域的深度信息,使得其成本较低,另外,散斑式光源有利于保证深度信息的质量。
在另一些示例中,光电信号读取电路受控于读取控制信号以输出像素信号,其中每个像素单元的像素信号包括第一相位像素信号、第二相位像素信号、第三相位像素信号和第四相位像素信号,其中得到第一相位像素信号、第二相位像素信号、第三相位像素信号和第四相位像素信号分别所对应的读取控制信号之间相位差依次为
Figure PCTCN2021111293-appb-000001
处理单元根据第一相位像素信号、第二相位像素信号、第三相位像素信号和第四相位像素信号生成稀疏深度图,得到第 一相位像素信号所对应的读取控制信号的相位与发射脉冲的相位相同。由此,可靠地实现了对稀疏深度图的IToF探测。
在另一些示例中,光电信号读取电路仅读取散斑照射到的像素行的所有像素单元。
在另一些示例中,处理单元具体用于:对稀疏深度图和RGB图进行对齐。
具体而言,这一过程也可以被称为图像配准。基于这样的配置,利用相机参数对稀疏深度图和RGB图进行对齐(alignment),使得稀疏深度图和RGB图之间的匹配度较高,提高了训练得到的图像融合模型的融合精度。对齐反映了稀疏深度图和RGB图各自的采集目标的对应性,将对齐后的图像进行融合时,参与融合的各个图像(对齐后的稀疏深度图和RGB图)的各个部分(例如,像素)是对应的,从而使得每个部分汇集了稀疏深度图的深度信息、以及RGB图中的非深度信息,得到可靠的融合深度图。
在另一些示例中,处理单元还用于:获取训练样本,训练样本包括对齐的分辨率为p的稀疏深度图样本和分辨率为a*b的RGB图样本、以及分辨率为a*b的稠密深度图样本。处理单元具体用于:以对齐的稀疏深度图样本和RGB图样本作为输入,以稠密深度图样本作为监督条件,对目标神经网络进行训练,得到图像融合模型。
应理解,应理解,可以对采集训练样本的深度相机和RGB相机进行标定,得到各个相机参数,可以根据各个相机参数,将稀疏深度图样本和RGB图样本进行对齐。采集训练样本的深度相机和RGB相机的相机参数可以与采集待融合图像的深度相机和RGB相机的相机参数相同,当然,两个相机参数也可以不同。
在一个具体的示例中,可以通过上述的包括深度相机和RGB相机在内的相机模组采集获得上述训练样本。在这种情况下,采集训练样本的深度相机和RGB相机的相机参数可以与采集待融合图像的深度相机和RGB相机的相机参数相同。
在另一些示例中,处理单元具体用于:将对齐后的稀疏深度图和RGB图输入到预先训练的图像融合模型,得到稠密深度图。
应理解,图像融合模型作为一种端到端的神经网络模型,提高了图像融合效率,在保证图像融合精度的前提下提高了数据处理效率。
在另一些示例中,处理单元还用于:获取终端设备中安装的三维图像应用程序的图像获取指令,图像获取指令指示接收模组和发射模组分别对稀疏深度图和RGB图进行采集;将稠密深度图返回到三维图像应用程序,以使三维图像应用程序基于稠密深度图获取三维图像信息。
应理解,三维图像应用程序可以包括图像背景虚化应用、三维图像重建应用、虚拟现实应用或增强现实应用中的任一者。
例如,终端设备中可以安装有操作系统,三维图像应用程序运行与操作系统上。操作系统包括但不限于嵌入式操作系统、实时操作系统等。三维图像应用程序可以为系统应用程序,也可以为第三方应用程序。例如,包括深度相机和RGB相机的相机模组可以响应三维图像应用程序的图像获取指令开始执行图像的采集。
三维图像应用程序可以(响应用户指令或其他关联指令等)发出图像获取指令。三维图像应用程序可以调用图像融合模型将对齐的稀疏深度图和RGB图输入到图像融合模型中,得到稠密深度图。
下面将结合图2A具体说明本发明的一个实施例的深度图像融合方法。图2A的深度图像融合方法包括:
S220:利用深度相机基于IToF原理采集分辨率为p的稀疏深度图,并且利用RGB相机采集分辨率为a*b的RGB图。
应理解,RGB即是代表红(red)、绿(green)、蓝(blue)这三个颜色通道的颜色。RGB相机为基于RGB色彩模式进行图像采集的相机,利用RGB相机采集到的图像为RGB图。具体而言,RGB色彩模式是工业界的一种颜色标准,是通过对红、绿、蓝这三个颜色通 道的变化以及它们相互之间的叠加来得到各种颜色。RGB图可以通过RGB相机对目标区域采集得到,并且上述三个颜色通道的像素来记录目标区域的成像结果。本案RGB相机涵盖广义上的彩色相机,不一定要求RGB相机具有RGB滤光层,类似含有RGGB、RGBW、RYYB等彩色滤光阵列的图像传感器均适用于本发明实施例的深度图像融合方法。
还应理解,稀疏深度图可以通过深度相机基于IToF原理进行图像采集获得。本发明实施例的深度相机可以设置有散斑式光源,即,由分离的点光源阵列形成的光源。这种深度相机也可以被称为散斑(Spot)IToF相机。散斑IToF相机投射的点光源是稀疏的(散斑),相应地,获得的深度图是稀疏的,且散斑IToF相机采集到的深度图的稀疏程度取决于散斑光源的点数。
还应理解,一般而言,常规的面光源IToF相机基于IToF原理进行图像采集,但是面光源IToF相机探测距离十分有限,并且功耗较大。与面光源IToF相机不同,散斑IToF相机的光发射功率更低、能量密度更高、探测距离更远,能够得到更多深度信息的深度图。换言之,虽然散斑IToF相机采集到的深度图是稀疏,但是分离的点光源阵列更保证了这种相机的较低成本与深度信息的质量。
本发明实施例中的具有散斑式光源的深度相机(散斑IToF相机)与诸如激光雷达的传感器不同,例如,激光雷达基于直接测量飞行时间(Direct Time of Flight,DToF)原理获得用于诸如目标测距和目标跟踪等目的的深度信息,因此,这样的传感器的成本较高,传感器的物理尺寸较大,不适于低成本的终端设备,也不适用于便携式设备或可穿戴设备。具有散斑式光源的深度相机基于IToF原理获得目标物体或目标区域的深度信息,使得其成本较低,另外,散斑式光源有利于保证深度信息的质量。
还应理解,稀疏深度图的分辨率是指深度图像点的数目或者有深度值的数目,即,散斑的个数p或者与散斑的个数近似的值指示稀疏深度图的分辨率,例如,发射模组包括含有q个发光点的发光阵列和发光驱动电路,发光驱动电路受控于发射脉冲信号而驱动q个发光点发光以产生p个散斑,其中p=s*q,s为大于或等于1的整数。但是通常二维图像的分辨率采用两个维度的像素数进行表征,例如,a*b的RGB图像。采用本发明实施例的融合方法得到的稠密融合图包括a*b个像素,每个像素都具有深度信息,因此,a*b指示稠密融合图具有的分辨率。
S220:对分辨率为p的稀疏深度图和分辨率为a*b的RGB图进行对齐。
应理解,文中的对齐的目的至少为使得深度相机采集到的深度图与RGB相机采集的RGB图关于同一目标采集区域进行融合。
还应理解,可以采用深度相机和RGB相机的标定参数,对稀疏深度图和RGB图进行对齐。由于深度相机和RGB相机基于各自的本地坐标系进行图像采集。在对多个(两个或以上)图像的融合时,需要将多个图像对齐到同一坐标系,并且认为各个图像在同一坐标系下的位置坐标指示且对应于世界坐标系下的同一空间位置,从而基于对应的位置关系,将多个图像进行融合。此外,该同一坐标系可以为任一相机的本地坐标系,也可以为世界坐标系。此外,不同相机的设置位置或角度(空间方位)不同,采集到相应的图像通常不对应于同一坐标系,通过各个相机的相机参数(例如,内参和外参)可以获得各个相机的本地坐标系与世界坐标系的变换关系,从而根据各个相机的相机参数能够使各个相机采集到的图像对齐,换言之,可以使稀疏深度图和RGB图进行对齐。
此外,也可以基于稀疏深度图和RGB图基于图像信息进行对齐。例如,可以确定稀疏深度图和RGB图与相同目标区域对应各自的位置特征,并且根据各自的位置特征进行图像融合。
S230:利用预先训练的图像融合模型,将对齐后的稀疏深度图和RGB图进行融合,得到分辨率为a*b的稠密深度图。
应理解,图像融合模型的训练样本为对齐后的稀疏深度图样本和RGB图样本。稀疏深度图样本和RGB图样本可以分别由深度相机和RGB相机采集获得。深度相机可以与采集稀疏 深度图(待融合图像)的相机为同一相机或同一类型的相机,RGB相机也可以与采集RGB图(待融合图像)的相机为同一相机或同一类型的相机。当采集训练样本的相机与采集待融合图像的相机为同一类型的相机时,训练样本数据与待融合数据匹配度较高,能够使提高模型的图像融合效果。
还应理解,本发明实施例的图像融合模型可以为端到端的神经网络模型,换言之,图像融合模型的输入为待融合图像,图像融合模型的输出为融合后的图像。待融合图像包括具有深度信息的稀疏深度图以及具有不同颜色通道信息的RGB图,通过图像融合能够将上述图像信息进行图像补全,得到稠密深度图。
还应理解,利用上述训练样本可以对神经网络进行训练,即可得到本发明的各个实施例的图像融合模型。本实施例的神经网络包括但不限于卷积神经网络(Convolutional Neural Networks,CNN)、前馈神经网络(feedforward neural network)、生成对抗网络(Generative Adversarial Networks,GAN)、诸如变换器transformer的编码器解码器(encoder-decoder)网络。另外,本发明的各个实施例的训练方式包括但不限于监督学习、非监督学习和半监督学习。
在本发明实施例的方案中,由于深度相机能够采集稀疏深度图,基于IToF原理采集稀疏深度图深度相机的成本较低,降低了图像深度信息的获取成本,能够适用于诸如手机的低成本终端设备。此外,通过预先训练的图像融合模型,将对齐后的稀疏深度图和RGB图进行融合,提高了图像融合的精度,提高了图像处理的效率。此外,对稀疏深度图和RGB图进行融合,得到了更高精度的稠密深度图,丰富了终端设备的使用场景,改善了用户体验。
此外,由于RGB相机的普及率较高,能够在设置有RGB相机的终端设备中实现了RGB相机的复用,换言之,在不需要深度图像的应用场景中,仍然可以采用RGB相机执行常规的图像采集。
此外,由于本发明实施例的方案实现了低成本的深度相机,使得在终端设备的产业链条中,作为高性能硬件产品的深度相机与高性能软件产品的图像融合模型能够融合到一起,换言之,可以由同一下游厂商将深度相机和图像融合模型作为高性能图像处理方案一起提供到上游厂商,同时保证了整个产业链条的生产效率。
下面结合图2B对深度图像融合方法进一步说明。图2B示出了图2A的深度图像融合方法的一个例子的示意性框图。如图2B所示,可以通过RGB相机数据,得到目标区域的RGB图,例如,作为二维彩色图像的RGB图。此外,可以通过深度相机采集稀疏深度图,例如,可以根据深度相机采集到的散斑分布图进行图像深度处理,得到稀疏深度图。然后,利用预先训练的图像融合模型对RGB图和稀疏深度图进行图像融合处理,得到稠密深度图。
应理解,文中的深度相机可以包括发射模组、接收模组和处理单元。发射模组可以用于发射散斑阵列(点光源阵列)到目标物。散斑阵列可以包括p个互相间隔的散斑。此外,接收模组可以包括图像传感器,图像传感器可以包括传感器阵列,传感器阵列可以包括m*n个像素单元,其中每个像素单元包括CMOS光电二极管和光电信号读取电路,光电二极管用于接收经目标物反射的散斑阵列,并根据散斑阵列生成对应的光电流信号,光电流信号指示的电流强度与光电二极管所接收光束照射的光强正相关。光电信号读取电路用于读取光电流信号并输出对应的像素信号;处理单元,用于接收像素信号并根据像素信号生成稀疏深度图,散斑的个数p指示稀疏深度图的分辨率,处理单元还用于将分辨率为a*b的RGB图像与稀疏深度图进行对齐,并利用预先训练的图像融合模型,将对齐后的稀疏深度图和RGB图进行融合,得到稠密深度图,其中稠密深度图的分辨率为a*b。
还应理解,发射模组可以包括含有q个发光点的发光阵列和发光驱动电路,发光驱动电路受控于发射脉冲信号而驱动q个发光点发光以产生p个散斑,其中p=s*q,s为大于或等于1的整数。
还应理解,光电信号读取电路可以受控于读取控制信号以输出像素信号,其中每个像素 单元的像素信号包括第一相位像素信号、第二相位像素信号、第三相位像素信号和第四相位像素信号,其中得到第一相位像素信号、第二相位像素信号、第三相位像素信号和第四相位像素信号分别所对应的读取控制信号之间相位差依次为
Figure PCTCN2021111293-appb-000002
处理单元根据第一相位像素信号、第二相位像素信号、第三相位像素信号和第四相位像素信号生成稀疏深度图,得到第一相位像素信号所对应的读取控制信号的相位与发射脉冲的相位相同。
图3示出了散斑分布图的示意图。该散斑分布图为深度相机中设置的点光源阵列采集的图像。另外,图像中点光源阵列的光线经由目标区域或目标物体反射的光线的分布图与该散斑分布图对应。这样的点光源阵列中的每个点光源射出的光线汇聚性远远优于面光源,深度相机利用这样的点光源阵列基于低成本的IToF处理模块能够得到具有高质量的深度信息的稀疏深度图。
在另一些示例中,对分辨率为p的稀疏深度图和所述分辨率为a*b的RGB图进行对齐,包括:根据针对深度相机和RGB相机标定的相机参数,对分辨率为p的稀疏深度图和分辨率为a*b的RGB图进行对齐,这一过程也可以被称为图像配准。基于这样的配置,利用相机参数对稀疏深度图和RGB图进行对齐(alignment),使得稀疏深度图和RGB图之间的匹配度较高,提高了训练得到的图像融合模型的融合精度。从直观角度来看,对齐反映了稀疏深度图和RGB图各自的采集目标的对应性,将对齐后的图像进行融合时,参与融合的各个图像(对齐后的稀疏深度图和RGB图)的各个部分(例如,像素)是对应的,从而使得每个部分汇集了稀疏深度图的深度信息、以及RGB图中的非深度信息,得到可靠的融合深度图。
具体而言,标定参数指示相机坐标系与世界坐标系之间的变换关系,通过标定参数,对稀疏深度图和RGB图进行对齐,能够提高稀疏深度图和RGB图的匹配度。标定参数包括相机内参和相机外参,一般而言,相机外参指示从世界坐标系到相机坐标系的映射,相机内参指示从相机坐标系到图像坐标系的映射。另外,对深度相机和RGB相机的参数的标定可以在执行图像采集之前执行。可以预先存储获得的标定参数,然后获得预先存储的标定参数。
在另一些示例中,深度相机和RGB相机设置在相机模组中,相机参数基于相机模组进行相机标定得到。
具体而言,深度相机和RGB相机可以组合或组装成一个相机模组,继而相机模组可以作为一个整体部件组装到终端设备中,以提高设备组装效率。换言之,相机模组可以作为独立的组件设置在不同的设备中,相机模组的标定参数不随着所处的设备而变化,提高了作为采集装置的相机模组的设置灵活性。
另外,一旦深度相机和RGB相机的标定参数是确定的,设置有深度相机和RGB相机的相机模组也是确定的。此外,可以将标定参数存储在相机模组的存储模块中。具体而言,可以针对相机模组分别对深度相机和RGB相机的内参和外参进行标定。也可以在深度相机和RGB相机组装成相机模组之前,分别对深度相机和RGB相机的内参进行标定,在组装成相机模组之后,对深度相机和RGB相机的外参进行标定,深度相机和RGB相机在出厂后并且组装成相机模组前的情况下,可以获得各自的内参,这样在组装后只需要标定指示各个相机相对方位关系的外参,以提高组装后的参数标定效率。
在另一些示例中,深度相机中设置有点光源阵列,相应地,利用深度相机基于IToF原理采集分辨率为p的稀疏深度图,包括:通过点光源阵列针对目标区域发出具有第一相位的探测光线,并且获取探测光线的具有第二相位的反射光线;至少基于探测光线的第一相位的灰度图和反射光线的第二相位的灰度图之间的差异,确定分辨率为p的稀疏深度图。
换言之,作为应用IToF原理采集深度图像的一个示例,深度相机通过采集分离的点光源射出的光线由目标区域或目标物体反射的光线,深度相机可以获得射出光线和反射光线之间的相位变化信息,进一步地,基于相位变化信息进行深度处理,能够得到深度图。例如,可以基于相比变化信息,可以发射射出关系和接收反射光线之间的时间间隙信息。基于该时间间隙信息,能够确定目标区域或目标物体的深度信息,得到深度图。
具体而言,光电信号读取电路受控于读取控制信号以输出像素信号,其中每个像素单元的像素信号包括第一相位像素信号、第二相位像素信号、第三相位像素信号和第四相位像素信号,其中得到第一相位像素信号、第二相位像素信号、第三相位像素信号和第四相位像素信号分别所对应的读取控制信号之间相位差依次为
Figure PCTCN2021111293-appb-000003
处理单元根据第一相位像素信号、第二相位像素信号、第三相位像素信号和第四相位像素信号生成稀疏深度图,得到第一相位像素信号所对应的读取控制信号的相位与发射脉冲的相位相同。由此,可靠地实现了对稀疏深度图的IToF探测。
应理解,具有散斑式光源的深度相机基于IToF原理获得目标物体或目标区域的深度信息,使得其成本较低,另外,散斑式光源有利于保证深度信息的质量。还应理解,与面光源IToF相机不同,散斑IToF相机的光发射功率更低、能量密度更高、探测距离更远。换言之,虽然散斑IToF相机采集到的深度图是稀疏,但是分离的点光源阵列同时保证了这种相机的较低成本与深度信息的质量。
更具体地,散斑IToF相机可以包括发射模组、接收模组和处理单元。
发射模组可以用于发射散斑阵列(点光源阵列)到目标物。散斑阵列可以包括p个互相间隔的散斑。
此外,接收模组可以包括图像传感器,图像传感器可以包括传感器阵列,传感器阵列可以包括m*n个像素单元,其中每个像素单元包括CMOS光电二极管和光电信号读取电路,光电二极管用于接收经目标物反射的散斑阵列,并根据散斑阵列生成对应的光电流信号,光电流信号指示的电流强度与光电二极管所接收光束照射的光强正相关。光电信号读取电路用于读取光电流信号并输出对应的像素信号;
处理单元,用于接收像素信号并根据像素信号生成稀疏深度图,散斑的个数p指示稀疏深度图的分辨率,处理单元还用于将分辨率为a*b的RGB图像与稀疏深度图进行对齐,并利用预先训练的图像融合模型,将对齐后的稀疏深度图和RGB图进行融合,得到稠密深度图,其中稠密深度图的分辨率为a*b。
应理解,传统激光雷达(例如,dToF相机和LiDAR)通常采用雪崩式二极管(Avalanche Photo Diode,APD),例如,单光子雪崩式二极管,而本发明实施例采用了CMOS光电二极管,其成本较低,并且CMOS光电二极管的性能能够保证IToF测量的效果。
还应理解,稀疏深度图的分辨率是指深度图像点的数目或者有深度值的数目,即,散斑的个数p或者与散斑的个数近似的值指示稀疏深度图的分辨率,例如,发射模组包括含有q个发光点的发光阵列和发光驱动电路,发光驱动电路受控于发射脉冲信号而驱动q个发光点发光以产生p个散斑,其中p=s*q,s为大于或等于1的整数。但是通常二维图像的分辨率采用两个维度的像素数进行表征,例如,a*b的RGB图像。采用本发明实施例的融合方法得到的稠密融合图包括a*b个像素,每个像素都具有深度信息,因此,a*b指示稠密融合图具有的分辨率。
在另一些示例中,图像融合模型通过如下方式训练得到:获取训练样本,训练样本包括对齐的分辨率为p的稀疏深度图样本和分辨率为a*b的RGB图样本、以及分辨率为a*b的稠密深度图样本;以对齐的稀疏深度图样本和RGB图样本作为输入,以稠密深度图样本作为监督条件,对目标神经网络进行训练,得到图像融合模型。
应理解,可以对采集训练样本的深度相机和RGB相机进行标定,得到各个相机参数,可以根据各个相机参数,将稀疏深度图样本和RGB图样本进行对齐。采集训练样本的深度相机和RGB相机的相机参数可以与采集待融合图像的深度相机和RGB相机的相机参数相同,当然,两个相机参数也可以不同。
在一个具体的示例中,可以通过上述的包括深度相机和RGB相机在内的相机模组采集获得上述训练样本。在这种情况下,采集训练样本的深度相机和RGB相机的相机参数可以与采集待融合图像的深度相机和RGB相机的相机参数相同。
相应地,利用预先训练的图像融合模型,将对齐后的稀疏深度图和RGB图进行融合,得到分辨率为a*b的稠密深度图,可以包括:将对齐后的稀疏深度图和RGB图输入到预先训练的图像融合模型,得到分辨率为a*b的稠密深度图。
应理解,图像融合模型作为一种端到端的神经网络模型,提高了图像融合效率,在保证图像融合精度的前提下提高了数据处理效率。
在另一些示例中,深度图像融合方法还包括:获取终端设备中安装的三维图像应用程序的图像获取指令,图像获取指令指示深度相机和RGB相机分别对稀疏深度图和RGB图进行采集;将稠密深度图返回到三维图像应用程序。
应理解,三维图像应用程序可以包括图像背景虚化应用、三维图像重建应用、虚拟现实应用或增强现实应用中的任一者。
例如,终端设备中可以安装有操作系统,三维图像应用程序运行与操作系统上。操作系统包括但不限于嵌入式操作系统、实时操作系统等。三维图像应用程序可以为系统应用程序,也可以为第三方应用程序。例如,包括深度相机和RGB相机的相机模组可以响应三维图像应用程序的图像获取指令开始执行图像的采集。
三维图像应用程序可以(响应用户指令或其他关联指令等)发出图像获取指令。三维图像应用程序可以调用图像融合模型将对齐的稀疏深度图和RGB图输入到图像融合模型中,得到稠密深度图。
本示例的三维图像应用程序利用深度图像融合方法提供了更丰富的三维图像用户体验。
上面结合图1-图3对本发明一个实施例的深度图像融合方案进行了详细且一般性的描述和说明。下面将结合图4和图5对本发明的其他实施例的深度图像融合方法进行示例性的描述和说明。
图4为本发明的另一实施例的图像融合模型的训练方法的示意性流程图。
S410:对深度相机和RGB相机进行参数标定,得到标定参数。
具体而言,标定参数指示相机坐标系与世界坐标系之间的变换关系,通过标定参数,对稀疏深度图和RGB图进行对齐,能够提高稀疏深度图和RGB图的匹配度。标定参数包括相机内参和相机外参,一般而言,相机外参指示从世界坐标系到相机坐标系的映射,相机内参指示从相机坐标系到图像坐标系的映射。另外,对深度相机和RGB相机的参数的标定可以在执行图像采集之前执行。可以预先存储获得的标定参数,然后获得预先存储的标定参数。
S420:采集稀疏深度图样本和RGB图样本。
具体而言,可以对采集训练样本的深度相机和RGB相机进行标定,得到各个相机参数,可以根据各个相机参数,将稀疏深度图样本和RGB图样本进行对齐。采集训练样本的深度相机和RGB相机的相机参数可以与采集待融合图像的深度相机和RGB相机的相机参数相同,当然,两个相机参数也可以不同。
在一个具体的示例中,可以通过上述的包括深度相机和RGB相机在内的相机模组采集获得上述训练样本。在这种情况下,采集训练样本的深度相机和RGB相机的相机参数可以与采集待融合图像的深度相机和RGB相机的相机参数相同。
S430:利用标定参数,将稀疏深度图样本和RGB图样本对齐,生成训练样本。
具体而言,利用相机参数对稀疏深度图样本和RGB图样本进行对齐,使得稀疏深度图样本和RGB图样本之间的匹配度较高,提高了训练得到的图像融合模型的融合精度。
S440:通过训练样本,对目标神经网络进行训练,得到图像融合模型。
具体而言,可以以对齐的稀疏深度图样本和RGB图样本作为输入,以稠密深度图样本作为监督条件,对目标神经网络进行训练,得到图像融合模型。
图5为本发明的另一实施例的深度图像融合方法的示意性流程图。
S510:利用深度相机基于IToF原理采集稀疏深度图,并且利用RGB相机采集RGB图。
具体而言,RGB图可以通过RGB相机对目标区域采集得到,并且上述三个颜色通道的 像素来记录目标区域的成像结果。稀疏深度图可以具有分辨率为p,RGB图可以具有分辨率为a*b。稀疏深度图的分辨率是指深度图像点的数目或者有深度值的数目,即,散斑的个数p或者与散斑的个数近似的值指示稀疏深度图的分辨率,例如,发射模组包括含有q个发光点的发光阵列和发光驱动电路,发光驱动电路受控于发射脉冲信号而驱动q个发光点发光以产生p个散斑,其中p=s*q,s为大于或等于1的整数。但是通常二维图像的分辨率采用两个维度的像素数进行表征,例如,a*b的RGB图像。采用本发明实施例的融合方法得到的稠密融合图包括a*b个像素,每个像素都具有深度信息,因此,a*b指示稠密融合图具有的分辨率。
此外,稀疏深度图可以通过深度相机基于IToF原理进行图像采集获得。该深度相机可以设置有散斑式光源,即,由分离的点光源阵列形成的光源。进一步地,可以通过点光源阵列针对目标区域发出探测光线,并且获取探测光线的反射光线;将反射光线和探测光线之间的光线变化应用于IToF原理,得到稀疏深度图。
S520:根据针对深度相机和RGB相机标定的相机参数,对稀疏深度图和RGB图进行对齐。
具体而言,标定参数指示相机坐标系与世界坐标系之间的变换关系,通过标定参数,对稀疏深度图和RGB图进行对齐,能够提高稀疏深度图和RGB图的匹配度。标定参数包括相机内参和相机外参,一般而言,相机外参指示从世界坐标系到相机坐标系的映射,相机内参指示从相机坐标系到图像坐标系的映射。另外,对深度相机和RGB相机的参数的标定可以在执行图像采集之前执行。可以预先存储获得的标定参数,然后获得预先存储的标定参数。
S530:利用预先训练的图像融合模型,将对齐后的稀疏深度图和RGB图进行融合,得到稠密深度图。
具体而言,将对齐后的稀疏深度图和RGB图输入到预先训练的图像融合模型,得到稠密深度图。
下面将结合图6具体描述和说明本发明的另一实施例的终端设备。图6为一种终端设备的示意性框图,其各个部件的动作与步骤与图1-图3中的描述方案对应。换言之,图1-图3中的描述的方案以及各种实现方式和效果均适用于本实施例的终端设备。图6的终端设备包括:
深度相机610,基于IToF原理采集稀疏深度图。
应理解,稀疏深度图可以通过深度相机基于IToF原理进行图像采集获得。该深度相机可以设置有散斑式光源,即,由分离的点光源阵列形成的光源。这种深度相机也可以被称为散斑(Spot)IToF相机。散斑IToF相机投射的点光源是稀疏的(散斑),相应地,获得的深度图是稀疏的,且散斑IToF相机采集到的深度图的稀疏程度取决于散斑光源的点数。
RGB相机620,采集RGB图。
应理解,RGB即是代表红(red)、绿(green)、蓝(blue)这三个颜色通道的颜色。RGB相机为基于RGB色彩模式进行图像采集的相机,利用RGB相机采集到的图像为RGB图。具体而言,RGB色彩模式是工业界的一种颜色标准,是通过对红、绿、蓝这三个颜色通道的变化以及它们相互之间的叠加来得到各种颜色。RGB图可以通过RGB相机对目标区域采集得到,并且上述三个颜色通道的像素来记录目标区域的成像结果。
存储器630,存储预先训练的图像融合模型。
应理解,存储器可以安装有操作系统以及运行与该操作系统上的应用程序。深度相机和RGB相机可以经由处理器获取操作系统或应用程序的图像采集指令,执行相应的图像采集功能以及对图像融合模型的调用。
处理器640,对稀疏深度图和RGB图进行对齐,并且利用图像融合模型,将对齐后的稀疏深度图和RGB图进行融合,得到稠密深度图。
应理解,图像融合模型的训练样本为对齐后的稀疏深度图样本和RGB图样本。稀疏深度图样本和RGB图样本可以分别由深度相机和RGB相机采集获得。深度相机可以与采集稀疏 深度图(待融合图像)的相机为同一相机或同一类型的相机,RGB相机也可以与采集RGB图(待融合图像)的相机为同一相机或同一类型的相机。当采集训练样本的相机与采集待融合图像的相机为同一类型的相机时,训练样本数据与待融合数据匹配度较高,能够使提高模型的图像融合效果。
还应理解,深度相机可以包括发射模组、接收模组和处理单元。发射模组可以用于发射散斑阵列(点光源阵列)到目标物。散斑阵列可以包括p个互相间隔的散斑。此外,接收模组可以包括图像传感器,图像传感器可以包括传感器阵列,传感器阵列可以包括m*n个像素单元,其中每个像素单元包括CMOS光电二极管和光电信号读取电路,光电二极管用于接收经目标物反射的散斑阵列,并根据散斑阵列生成对应的光电流信号,光电流信号指示的电流强度与光电二极管所接收光束照射的光强正相关。光电信号读取电路用于读取光电流信号并输出对应的像素信号;处理单元,用于接收像素信号并根据像素信号生成稀疏深度图,散斑的个数p指示稀疏深度图的分辨率,处理单元还用于将分辨率为a*b的RGB图像与稀疏深度图进行对齐,并利用预先训练的图像融合模型,将对齐后的稀疏深度图和RGB图进行融合,得到稠密深度图,其中稠密深度图的分辨率为a*b。
还应理解,发射模组可以包括含有q个发光点的发光阵列和发光驱动电路,发光驱动电路受控于发射脉冲信号而驱动q个发光点发光以产生p个散斑,其中p=s*q,s为大于或等于1的整数。
还应理解,光电信号读取电路可以受控于读取控制信号以输出像素信号,其中每个像素单元的像素信号包括第一相位像素信号、第二相位像素信号、第三相位像素信号和第四相位像素信号,其中得到第一相位像素信号、第二相位像素信号、第三相位像素信号和第四相位像素信号分别所对应的读取控制信号之间相位差依次为
Figure PCTCN2021111293-appb-000004
处理单元根据第一相位像素信号、第二相位像素信号、第三相位像素信号和第四相位像素信号生成稀疏深度图,得到第一相位像素信号所对应的读取控制信号的相位与发射脉冲的相位相同。
在本发明实施例的方案中,由于深度相机能够采集稀疏深度图,基于IToF原理采集稀疏深度图深度相机的成本较低,降低了图像深度信息的获取成本,能够适用于诸如手机的低成本终端设备。此外,通过预先训练的图像融合模型,将对齐后的稀疏深度图和RGB图进行融合,提高了图像融合的精度,提高了图像处理的效率。此外,对稀疏深度图和RGB图进行融合,得到了更高精度的稠密深度图,丰富了终端设备的使用场景,改善了用户体验。
此外,由于RGB相机的普及率较高,能够在设置有RGB相机的终端设备中实现了RGB相机的复用,换言之,在不需要深度图像的应用场景中,仍然可以采用RGB相机执行常规的图像采集。
此外,由于本发明实施例的方案实现了低成本的深度相机,使得在终端设备的产业链条中,作为高性能硬件产品的深度相机与高性能软件产品的图像融合模型能够融合到一起,换言之,可以由同一下游厂商将深度相机和图像融合模型作为高性能图像处理方案一起提供到上游厂商,同时保证了整个产业链条的生产效率。
在另一些示例中,处理器具体用于:根据针对深度相机和RGB相机标定的相机参数,对稀疏深度图和RGB图进行对齐。
在另一些示例中,深度相机和RGB相机设置在相机模组中,相机参数基于相机模组进行相机标定得到。
在另一些示例中,深度相机中设置有点光源阵列,相应地,深度相机具体用于:通过所述点光源阵列针对目标区域发出具有第一相位的探测光线,并且获取所述探测光线的具有第二相位的反射光线,并且至少基于所述探测光线的第一相位的灰度图和所述反射光线的第二相位的灰度图之间的差异,确定所述稀疏深度图。
具体而言,光电信号读取电路受控于读取控制信号以输出像素信号,其中每个像素单元的像素信号包括第一相位像素信号、第二相位像素信号、第三相位像素信号和第四相位像素 信号,其中得到第一相位像素信号、第二相位像素信号、第三相位像素信号和第四相位像素信号分别所对应的读取控制信号之间相位差依次为
Figure PCTCN2021111293-appb-000005
处理单元根据第一相位像素信号、第二相位像素信号、第三相位像素信号和第四相位像素信号生成稀疏深度图,得到第一相位像素信号所对应的读取控制信号的相位与发射脉冲的相位相同。由此,可靠地实现了对稀疏深度图的IToF探测。
在另一些示例中,图像融合模型通过如下方式训练得到:获取训练样本,训练样本包括对齐的稀疏深度图样本和RGB图样本、以及稠密深度图样本;以对齐的稀疏深度图样本和RGB图样本作为输入,以稠密深度图样本作为监督条件,对目标神经网络进行训练,得到图像融合模型。
在另一些示例中,处理器具体用于:将对齐后的稀疏深度图和RGB图输入到预先训练的图像融合模型,得到稠密深度图。
在另一些示例中,处理器还用于:获取终端设备中安装的三维图像应用程序的图像获取指令,图像获取指令指示深度相机和RGB相机分别对稀疏深度图和RGB图进行采集;将稠密深度图返回到三维图像应用程序。
在另一些示例中,三维图像应用程序包括图像背景虚化应用、三维图像重建应用、虚拟现实应用或增强现实应用中的任一者。
本实施例的终端设备用于实现前述多个方法实施例中相应的方法,并具有相应的方法实施例的有益效果,在此不再赘述。此外,本实施例的装置中的各个模块的功能实现均可参照前述方法实施例中的相应部分的描述,在此亦不再赘述。
至此,已经对本主题的特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作可以按照不同的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序,以实现期望的结果。在某些实施方式中,多任务处理和并行处理可以是有利的。
在20世纪90年代,对于一个技术的改进可以很明显地区分是硬件上的改进(例如,对二极管、晶体管、开关等电路结构的改进)还是软件上的改进(对于方法流程的改进)。然而,随着技术的发展,当今的很多方法流程的改进已经可以视为硬件电路结构的直接改进。设计人员几乎都通过将改进的方法流程编程到硬件电路中来得到相应的硬件电路结构。因此,不能说一个方法流程的改进就不能用硬件实体模块来实现。例如,可编程逻辑器件(Programmable Logic Device,PLD)(例如现场可编程门阵列(Field Programmable Gate Array,FPGA))就是这样一种集成电路,其逻辑功能由用户对器件编程来确定。由设计人员自行编程来把一个数字系统“集成”在一片PLD上,而不需要请芯片制造厂商来设计和制作专用的集成电路芯片。而且,如今,取代手工地制作集成电路芯片,这种编程也多半改用“逻辑编译器(logic compiler)”软件来实现,它与程序开发撰写时所用的软件编译器相类似,而要编译之前的原始代码也得用特定的编程语言来撰写,此称之为硬件描述语言(Hardware Description Language,HDL),而HDL也并非仅有一种,而是有许多种,如ABEL(Advanced Boolean Expression Language)、AHDL(Altera Hardware Description Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL(Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby Hardware Description Language)等,目前最普遍使用的是VHDL(Very-High-Speed Integrated Circuit Hardware Description Language)与Verilog。本领域技术人员也应该清楚,只需要将方法流程用上述几种硬件描述语言稍作逻辑编程并编程到集成电路中,就可以很容易得到实现该逻辑方法流程的硬件电路。
控制器可以按任何适当的方式实现,例如,控制器可以采取例如微处理器或处理器以及存储可由该(微)处理器执行的计算机可读程序代码(例如软件或固件)的计算机可读介质、逻辑门、开关、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程逻辑控制器和嵌入微控制器的形式,控制器的例子包括但不限于以下微控制器:ARC 625D、Atmel  AT91SAM、Microchip PIC18F26K20以及Silicone Labs C8051F320,存储器控制器还可以被实现为存储器的控制逻辑的一部分。本领域技术人员也知道,除了以纯计算机可读程序代码方式实现控制器以外,完全可以通过将方法步骤进行逻辑编程来使得控制器以逻辑门、开关、专用集成电路、可编程逻辑控制器和嵌入微控制器等的形式来实现相同功能。因此这种控制器可以被认为是一种硬件部件,而对其内包括的用于实现各种功能的装置也可以视为硬件部件内的结构。或者甚至,可以将用于实现各种功能的装置视为既可以是实现方法的软件模块又可以是硬件部件内的结构。
上述实施例阐明的系统、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机。具体的,计算机例如可以为个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任何设备的组合。
为了描述的方便,描述以上装置时以功能分为各种单元分别描述。当然,在实施本发明时可以把各单元的功能在同一个或多个软件和/或硬件中实现。
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。
本领域技术人员应明白,本发明的实施例可提供为方法、系统或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本发明可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程序模块。一般地,程序模块包括执行特定事务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本发明,在这些分布式计算环境中,由通过通信网络而被连接的远程处理设备来执行事务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于系统实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
以上所述仅为本发明的实施例而已,并不用于限制本发明。对于本领域技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本发明的权利要求范围之内。

Claims (16)

  1. 一种深度图像采集装置,其特征在于,包括:
    发射模组,用于发射散斑阵列到目标物,其中所述散斑阵列包括p个互相间隔的散斑;
    接收模组,所述接收模组包括图像传感器,所述图像传感器包括传感器阵列,所述传感器阵列包括m*n个像素单元,其中每个像素单元包括CMOS光电二极管和光电信号读取电路,所述光电二极管用于接收经所述目标物反射的所述散斑阵列,并根据所述散斑阵列生成对应的光电流信号,所述光电流信号指示的电流强度与所述光电二极管所接收光束照射的光强正相关,所述光电信号读取电路用于读取所述光电流信号并输出对应的像素信号;
    处理单元,用于接收所述像素信号并根据所述像素信号生成稀疏深度图,散斑的个数p指示所述稀疏深度图的分辨率,所述处理单元还用于将分辨率为a*b的RGB图像与所述稀疏深度图进行对齐,并利用预先训练的图像融合模型,将对齐后的稀疏深度图和RGB图进行融合,得到稠密深度图,其中所述稠密深度图的分辨率为a*b。
  2. 根据权利要求1所述的深度图像采集装置,其特征在于,所述发射模组包括含有q个发光点的发光阵列和发光驱动电路,所述发光驱动电路受控于发射脉冲信号而驱动所述q个发光点发光以产生所述p个互相间隔的散斑,其中p=s*q,s为大于或等于1的整数。
  3. 根据权利要求2所述的深度图像采集装置,其特征在于,所述光电信号读取电路受控于读取控制信号以输出所述像素信号,其中每个像素单元的像素信号包括第一相位像素信号、第二相位像素信号、第三相位像素信号和第四相位像素信号,其中得到所述第一相位像素信号、所述第二相位像素信号、所述第三相位像素信号和所述第四相位像素信号分别所对应的所述读取控制信号之间相位差依次为
    Figure PCTCN2021111293-appb-100001
    所述处理单元根据所述第一相位像素信号、所述第二相位像素信号、所述第三相位像素信号和所述第四相位像素信号生成所述稀疏深度图,所述得到所述第一相位像素信号所对应的读取控制信号的相位与所述发射脉冲的相位相同。
  4. 根据权利要求3所述的深度图像采集装置,其特征在于,所述光电信号读取电路仅读取所述散斑照射到的像素行的所有像素单元。
  5. 根据权利要求3所述的深度图像采集装置,其特征在于,所述处理单元具体用于:对所述稀疏深度图和所述RGB图进行对齐。
  6. 根据权利要求3所述的深度图像采集装置,其特征在于,所述处理单元还用于:获取训练样本,所述训练样本包括对齐的分辨率为p的稀疏深度图样本和分辨率为a*b的RGB图样本、以及分辨率为a*b的稠密深度图样本,
    相应地,所述处理单元具体用于:以对齐的稀疏深度图样本和RGB图样本作为输入,以所述稠密深度图样本作为监督条件,对目标神经网络进行训练,得到所述图像融合模型。
  7. 根据权利要求6所述的深度图像采集装置,其特征在于,所述处理单元具体用于:将对齐后的稀疏深度图和RGB图输入到预先训练的图像融合模型,得到稠密深度图。
  8. 根据权利要求1所述的深度图像采集装置,其特征在于,所述处理单元还用于:
    获取终端设备中安装的三维图像应用程序的图像获取指令,所述图像获取指令指示所述接收模组和所述发射模组分别对所述稀疏深度图和所述RGB图进行采集;
    将所述稠密深度图返回到所述三维图像应用程序,以使所述三维图像应用程序基于所述稠密深度图获取三维图像信息。
  9. 根据权利要求8所述的深度图像采集装置,其特征在于,所述三维图像应用程序包括图像背景虚化应用、三维图像重建应用、虚拟现实应用或增强现实应用中的任一者。
  10. 一种终端设备,其特征在于,包括:
    深度相机,基于IToF原理采集分辨率为p的稀疏深度图;
    RGB相机,采集分辨率为a*b的RGB图;
    存储器,存储预先训练的图像融合模型;
    处理器,对所述分辨率为p的稀疏深度图和所述分辨率为a*b的RGB图进行对齐,并且 利用所述图像融合模型,将对齐后的稀疏深度图和RGB图进行融合,得到分辨率为a*b的稠密深度图。
  11. 根据权利要求10所述的终端设备,其特征在于,所述深度相机设置有形成为m*n个像素单元的点光源阵列,所述深度相机具体用于:通过所述形成为m*n个像素单元的点光源阵列针对目标区域发出具有第一相位的探测光线,并且获取所述探测光线的具有第二相位的反射光线,并且至少基于所述探测光线的第一相位的灰度图和所述反射光线的第二相位的灰度图之间的差异,确定所述分辨率为p的稀疏深度图。
  12. 一种深度图像融合方法,其特征在于,应用于包括深度相机和RGB相机的终端设备,所述方法包括:
    利用所述深度相机基于IToF原理采集分辨率为p的稀疏深度图,并且利用所述RGB相机采集分辨率为a*b的RGB图;
    对所述分辨率为p的稀疏深度图和所述分辨率为a*b的RGB图进行对齐;
    利用预先训练的图像融合模型,将对齐后的稀疏深度图和RGB图进行融合,得到分辨率为a*b的稠密深度图。
  13. 根据权利要求12所述的方法,其特征在于,所述对所述分辨率为p的稀疏深度图和所述分辨率为a*b的RGB图进行对齐,包括:
    根据针对所述深度相机和所述RGB相机标定的相机参数,对所述分辨率为p的稀疏深度图和所述分辨率为a*b的RGB图进行对齐。
  14. 根据权利要求12所述的方法,其特征在于,所述深度相机中设置有点光源阵列,相应地,所述利用所述深度相机基于IToF原理采集分辨率为p的稀疏深度图,包括:
    通过所述点光源阵列针对目标区域发出具有第一相位的探测光线,并且获取所述探测光线的具有第二相位的反射光线;
    至少基于所述探测光线的第一相位的灰度图和所述反射光线的第二相位的灰度图之间的差异,确定所述分辨率为p的稀疏深度图。
  15. 根据权利要求12所述的方法,其特征在于,所述图像融合模型通过如下方式训练得到:获取训练样本,所述训练样本包括对齐的分辨率为p的稀疏深度图样本和分辨率为a*b的RGB图样本、以及分辨率为a*b的稠密深度图样本;
    以对齐的稀疏深度图样本和RGB图样本作为输入,以所述稠密深度图样本作为监督条件,对目标神经网络进行训练,得到所述图像融合模型。
  16. 根据权利要求15所述的方法,其特征在于,所述利用预先训练的图像融合模型,将对齐后的稀疏深度图和RGB图进行融合,得到分辨率为a*b的稠密深度图,包括:
    将对齐后的稀疏深度图和RGB图输入到预先训练的图像融合模型,得到分辨率为a*b的稠密深度图。
PCT/CN2021/111293 2021-08-06 2021-08-06 深度图像采集装置、融合方法和终端设备 WO2023010559A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
PCT/CN2021/111293 WO2023010559A1 (zh) 2021-08-06 2021-08-06 深度图像采集装置、融合方法和终端设备
EP21916643.6A EP4156085A4 (en) 2021-08-06 2021-08-06 DEPTH IMAGE COLLECTION DEVICE, DEPTH IMAGE FUSION METHOD AND TERMINAL DEVICE
US17/860,579 US11928802B2 (en) 2021-08-06 2022-07-08 Apparatus for acquiring depth image, method for fusing depth images, and terminal device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/111293 WO2023010559A1 (zh) 2021-08-06 2021-08-06 深度图像采集装置、融合方法和终端设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/860,579 Continuation US11928802B2 (en) 2021-08-06 2022-07-08 Apparatus for acquiring depth image, method for fusing depth images, and terminal device

Publications (1)

Publication Number Publication Date
WO2023010559A1 true WO2023010559A1 (zh) 2023-02-09

Family

ID=85151844

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/111293 WO2023010559A1 (zh) 2021-08-06 2021-08-06 深度图像采集装置、融合方法和终端设备

Country Status (3)

Country Link
US (1) US11928802B2 (zh)
EP (1) EP4156085A4 (zh)
WO (1) WO2023010559A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170127036A1 (en) * 2015-10-29 2017-05-04 Samsung Electronics Co., Ltd. Apparatus and method for acquiring depth information
CN108716983A (zh) * 2018-04-28 2018-10-30 Oppo广东移动通信有限公司 光学元件检测方法和装置、电子设备、存储介质
CN109685842A (zh) * 2018-12-14 2019-04-26 电子科技大学 一种基于多尺度网络的稀疏深度稠密化方法
CN110992271A (zh) * 2020-03-04 2020-04-10 腾讯科技(深圳)有限公司 图像处理方法、路径规划方法、装置、设备及存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102312273B1 (ko) * 2014-11-13 2021-10-12 삼성전자주식회사 거리영상 측정용 카메라 및 그 동작방법
CN108269238B (zh) * 2017-01-04 2021-07-13 浙江舜宇智能光学技术有限公司 深度图像采集装置和深度图像采集系统及其图像处理方法
EP3621293B1 (en) * 2018-04-28 2022-02-09 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Image processing method, apparatus and computer-readable storage medium
CN112950694A (zh) * 2021-02-08 2021-06-11 Oppo广东移动通信有限公司 图像融合的方法、单颗摄像头模组、拍摄装置及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170127036A1 (en) * 2015-10-29 2017-05-04 Samsung Electronics Co., Ltd. Apparatus and method for acquiring depth information
CN108716983A (zh) * 2018-04-28 2018-10-30 Oppo广东移动通信有限公司 光学元件检测方法和装置、电子设备、存储介质
CN109685842A (zh) * 2018-12-14 2019-04-26 电子科技大学 一种基于多尺度网络的稀疏深度稠密化方法
CN110992271A (zh) * 2020-03-04 2020-04-10 腾讯科技(深圳)有限公司 图像处理方法、路径规划方法、装置、设备及存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4156085A4 *

Also Published As

Publication number Publication date
US20230042846A1 (en) 2023-02-09
EP4156085A4 (en) 2023-04-26
US11928802B2 (en) 2024-03-12
EP4156085A1 (en) 2023-03-29

Similar Documents

Publication Publication Date Title
US11888002B2 (en) Dynamically programmable image sensor
US20190281276A1 (en) Time-resolving sensor using shared ppd+spad pixel and spatial-temporal correlation for range measurement
CN108307180A (zh) 图像传感器中的像素、成像单元、用于测距的系统及方法
US10110881B2 (en) Model fitting from raw time-of-flight images
US20170289515A1 (en) High dynamic range depth generation for 3d imaging systems
US20160138910A1 (en) Camera for measuring depth image and method of measuring depth image
CN103731611A (zh) 深度传感器、图像捕获方法和图像处理系统
US20220392359A1 (en) Adaptive object detection
CN106256124B (zh) 结构化立体
CN113344839B (zh) 深度图像采集装置、融合方法和终端设备
CN112189147B (zh) 一种飞行时间ToF相机和一种ToF方法
WO2019241238A1 (en) Pixel cell with multiple photodiodes
US10795022B2 (en) 3D depth map
US20210044742A1 (en) Dynamically programmable image sensor
EP3170025B1 (en) Wide field-of-view depth imaging
US20240104744A1 (en) Real-time multi-view detection of objects in multi-camera environments
US20190045169A1 (en) Maximizing efficiency of flight optical depth sensors in computing environments
WO2023010559A1 (zh) 深度图像采集装置、融合方法和终端设备
US10989800B2 (en) Tracking using encoded beacons
KR20200092197A (ko) 증강 현실 영상 처리 장치, 증강 현실 영상 처리 방법, 전자 기기, 컴퓨터 프로그램 및 컴퓨터 판독 가능한 기록 매체
Chen Capturing fast motion with consumer grade unsynchronized rolling-shutter cameras
TW202206849A (zh) 用於測量的裝置及用以判定環境中兩點間之距離之方法
CN112987022A (zh) 测距方法及装置、计算机可读介质和电子设备
KR20230078675A (ko) 다수의 광 스펙트럼들을 캡처하는 카메라들을 사용하는 동시 위치측정 및 맵핑
US20240070886A1 (en) Mixed-mode depth imaging

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2021916643

Country of ref document: EP

Effective date: 20220713

NENP Non-entry into the national phase

Ref country code: DE