WO2022142596A1 - 一种图像处理方法,装置及储存介质 - Google Patents

一种图像处理方法,装置及储存介质 Download PDF

Info

Publication number
WO2022142596A1
WO2022142596A1 PCT/CN2021/123940 CN2021123940W WO2022142596A1 WO 2022142596 A1 WO2022142596 A1 WO 2022142596A1 CN 2021123940 W CN2021123940 W CN 2021123940W WO 2022142596 A1 WO2022142596 A1 WO 2022142596A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
depth
depth image
sky
light value
Prior art date
Application number
PCT/CN2021/123940
Other languages
English (en)
French (fr)
Inventor
曹旗
张小莉
张迪
张海航
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP21913341.0A priority Critical patent/EP4254320A1/en
Publication of WO2022142596A1 publication Critical patent/WO2022142596A1/zh
Priority to US18/341,617 priority patent/US20230342883A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/73Deblurring; Sharpening
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/60Image enhancement or restoration using machine learning, e.g. neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/77Retouching; Inpainting; Scratch removal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/002Image coding using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/32Indexing scheme for image data processing or generation, in general involving image mosaicing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30181Earth observation
    • G06T2207/30192Weather; Meteorology

Definitions

  • the invention relates to an image processing method, a device and a storage medium, and belongs to the technical field of image processing.
  • I(x) is the haze/haze/smoke image, the dehaze/haze/smoke image to be repaired, t(x) is the transmittance, and A is the atmospheric light value.
  • the dehaze/haze/smoke algorithm based on the atmospheric degradation model is to solve the transmittance t(x) and the atmospheric light value A when the haze/haze/smoke image I(x) is known. Finally, according to the above formula (1), I( x) Inpaint to get the inpainted haze/haze/smoke-free image J(x).
  • the traditional method mainly determines the transmittance according to the color image of the dehaze/haze/smoke to be repaired.
  • the powerful fitting ability of the neural network and a large amount of data are used to train the Combined with a mapping function, the color image of the dehaze/haze/smoke to be repaired is mapped to the transmittance t(x) through the mapping function. Since a large amount of data is required for training and fitting the mapping function, the fitted mapping function is limited by the scene in the data, which in turn affects the wide usability of the mapping function. The accuracy of the mapping function is limited, which leads to the low accuracy of the calculated transmittance, which makes the effect of the restored haze/haze/smoke-free image not good.
  • the embodiment of the present invention provides an image processing method, which integrates the completed depth information into the color image dehazing/haze algorithm, which solves the problem that the electronic equipment is not clear when shooting under bad weather conditions with low visibility. .
  • the embodiments of the present invention provide the following technical solutions:
  • an embodiment of the present invention provides an image processing method, and the image processing method includes the following steps:
  • first depth image and a color image are images captured by a depth sensor and a camera for the same scene, respectively;
  • Depth completion is performed on the first depth image to obtain a second depth image, and transmittance is determined according to the second depth image;
  • the color image is inpainted according to the atmospheric light value and the transmittance.
  • the depth image is used to determine the transmittance, and it is the depth image after depth completion, which can improve the accuracy of the transmittance and further improve the image processing effect.
  • performing depth completion on the first depth image to obtain the second depth image includes:
  • the color image and the second intermediate feature image to the first deep neural network to obtain a third depth image, where the second intermediate feature image is a feature image generated by an intermediate layer of the second deep neural network;
  • the first depth image and the first intermediate feature image to a second deep neural network to obtain a fourth depth image, where the first intermediate feature image is a feature image generated by an intermediate layer of the first neural network;
  • a fusion operation is performed on the third depth image and the fourth depth image to obtain the second depth image.
  • the color image and the feature image generated by the middle layer of the second neural network can be fully excavated and used to obtain the third depth image with sufficient depth information through the first deep neural network.
  • the second deep neural network performs deep inference on the first depth image and the feature image generated by the middle layer of the first neural network, and can fully mine and utilize the first depth image to obtain a fourth depth image with sufficient depth information, and then combine the two By fusing (or merging) a third depth image with sufficient depth information and a fourth depth image, a complete depth image with high quality, that is, a second depth image can be obtained.
  • the first deep neural network includes a first preprocessing network, a first encoder, and a first decoder, and the first preprocessing network is used to input the color input to the first deep neural network.
  • the image is transformed into a first feature image suitable for processing by the first encoder, the first encoder is used for feature encoding the first feature image, and the first decoder is used for the second feature output by the first encoder Image for feature decoding.
  • the first intermediate feature image includes a feature image generated by a convolutional layer in a coding unit of the first encoder and a feature image generated by an upsampling layer in a decoding unit of the first decoder.
  • the second deep neural network includes a second preprocessing network, a second encoder, and a second decoder, and the second preprocessing network is used for inputting the first input to the second deep neural network.
  • a depth image is transformed into a third feature image suitable for processing by the second encoder, the second encoder is used to perform feature encoding on the third feature image, and the second decoder is used for the output of the second encoder.
  • the second intermediate feature image includes a feature image generated by a convolutional layer in a coding unit of the second encoder and a feature image generated by an upsampling layer in a decoding unit of the second decoder.
  • determining the atmospheric light value includes:
  • the atmospheric light value is determined from the first depth image.
  • determining the atmospheric light value according to the first depth image includes:
  • the atmospheric light value is determined from the sky depth image and the color image.
  • the sky depth image segmented from the first depth image by using the depth information of the first depth image can more accurately identify the sky area, avoid the color interference of the measurement object, especially the white area, so as to effectively calculate the atmospheric light value.
  • determining a sky depth image for indicating a sky region from the first depth image includes:
  • the sky depth image is determined from the first depth image by a sliding window method.
  • the shape of the sliding window adopted by the sliding window method is a rectangle with the size of a row of pixels, and the step size of the sliding window is 1.
  • determining the atmospheric light value according to the sky depth image and the color image includes:
  • a sky color image for indicating a sky area is determined from the color image by using the sky depth image; the atmospheric light value is determined according to the sky color image.
  • using the sky depth image to determine, from the color image, a sky color image for indicating a sky area includes:
  • a sky color image indicating the sky area is determined from the binary map and the color image.
  • determining the atmospheric light value according to the sky color image includes:
  • the average value of the pixel points in the brightest part is determined as the atmospheric light value.
  • an embodiment of the present application further provides an image processing method, where the first depth image and the color image are images captured by a depth sensor and a camera for the same scene, respectively;
  • the color image is inpainted according to the atmospheric light value and the transmittance.
  • determining the atmospheric light value according to the first depth image includes:
  • the atmospheric light value is determined from the sky depth image and the color image.
  • determining a sky depth image for indicating a sky region from the first depth image includes:
  • the sky depth image is determined from the first depth image by a sliding window method.
  • determining the atmospheric light value according to the sky depth image and the color image includes:
  • a sky color image for indicating a sky area is determined from the color image using the sky depth image.
  • the atmospheric light value is determined from the sky color image.
  • the embodiments of the present application further provide an image processing apparatus, including:
  • the image acquisition module is used to acquire a first depth image and a color image, the first depth image and the color image are respectively images shot by the depth sensor and the camera for the same scene;
  • the atmospheric light value calculation module is used to calculate the atmospheric light value a transmittance calculation module for performing depth completion on the first depth image to obtain a second depth image, and determining transmittance according to the second depth image;
  • the transmittance calculation module is specifically configured to: provide the color image and the second intermediate feature image to the first deep neural network to obtain a third depth image, and the second intermediate feature image is the second A feature image generated by the intermediate layer of the deep neural network; the first depth image and the first intermediate feature image are provided to the second deep neural network to obtain a fourth depth image, and the first intermediate feature image is the image of the first neural network.
  • the feature image generated by the intermediate layer perform a fusion operation on the third depth image and the fourth depth image to obtain the second depth image.
  • the atmospheric light value calculation module is specifically configured to: determine the atmospheric light value according to the first depth image.
  • the atmospheric light value calculation module is specifically configured to: determine a sky depth image for indicating a sky area from the first depth image; determine the atmospheric light value according to the sky depth image and the color image .
  • the atmospheric light value calculation module is specifically configured to: determine the sky depth image from the first depth image through a sliding window method.
  • the atmospheric light value calculation module is specifically configured to: determine a sky color image for indicating a sky area from the color image by using the sky depth image; determine the atmospheric light value according to the sky color image .
  • an embodiment of the present application further provides an image processing apparatus, including:
  • an image acquisition module for acquiring a first depth image and a color image, where the first depth image and the color image are respectively images captured by a depth sensor and a camera for the same scene; an atmospheric light value calculation module for The depth image determines the atmospheric light value; the transmittance calculation module is used to determine the transmittance according to the first depth image; the image repair module is used to determine the restored image according to the atmospheric light value and the transmittance.
  • the atmospheric light value calculation module is specifically configured to: determine a sky depth image for indicating a sky area from the first depth image; determine the atmospheric light value according to the sky depth image and the color image .
  • the atmospheric light value calculation module is specifically configured to: determine the sky depth image from the first depth image through a sliding window method.
  • the atmospheric light value calculation module is specifically configured to: determine a sky color image for indicating a sky area from the color image by using the sky depth image; determine the atmospheric light value according to the sky color image .
  • an embodiment of the present application further provides an image processing apparatus, where the image processing apparatus may be a processor.
  • the image processing apparatus may be a processor.
  • the image processing apparatus may be a graphics processor (Graphics Processing Unit, GPU), a neural network processor (Neural-network Processing Unit, NPU), an advanced reduced instruction set processor (Advanced RISC Machines, ARM), etc.
  • the image processing device also includes storage.
  • the memory is used to store computer programs or instructions, and the processor is coupled to the memory.
  • the processor executes the computer program or instructions
  • the image processing apparatus executes the method executed by the processor in the above method embodiments.
  • the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed, the above-mentioned first aspect or the second aspect is executed by the processor. method.
  • a computer program product comprising instructions, when the computer program product runs on a computer, causing the computer to execute the method in any one of the implementations of the first aspect or the second aspect.
  • a chip in an eighth aspect, includes a processor and a data interface, the processor reads an instruction stored in a memory through the data interface, and executes any one of the first aspect or the second aspect above method in the implementation.
  • FIG. 1 is a schematic functional block diagram of a vehicle 100 according to an embodiment of the present application.
  • FIG. 2 is a structural diagram of a mobile phone 200 provided by an embodiment of the present application.
  • FIG. 4 is an implementation manner of performing depth completion on a depth image according to an embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of an image processing apparatus according to another embodiment of the present invention.
  • Depth image The depth image can also be called a distance image. Each pixel value in the depth image reflects the distance between the depth sensor and the object.
  • RGB color images are mainly divided into two types: RGB and CMYK.
  • the RGB color image is composed of three different color components, one is red, one is green, and the other is blue;
  • CMYK images are composed of four color components: cyan C, magenta M, yellow Y, and black K. CMYK type images are mainly used in the printing industry.
  • Registration in this application refers to image registration, which is a process of converting different images of the same scene into the same coordinate system. These images can be taken at different times (multi-temporal registration), from different sensors (multi-modal registration), and from different perspectives.
  • the spatial relationships between these images may be rigid (translations and rotations), affine (e.g. miscuts), homography, or complex large deformation models.
  • the color image and the depth image are usually registered, so there is a one-to-one correspondence between the pixels between the color image and the depth image.
  • Depth completion generally refers to the completion of a sparse depth image into a dense depth image, and the number of non-zero pixels contained in the completed dense depth image is greater than the number of non-zero pixels contained in the sparse depth image.
  • the present application mainly relates to an image processing method.
  • the image processing method mainly uses depth information in a depth image to perform restoration processing on a color image to obtain an enhanced color image, for example, for color images containing interference information such as smoke, fog or haze
  • smoke removal, fog removal or haze removal can be realized by using the image processing method of the present application, that is, the enhanced color image is the color image after smoke removal, fog removal or haze removal.
  • FIG. 1 is a schematic functional block diagram of a vehicle 100 provided by an embodiment of the present application.
  • the vehicle 100 may be configured in a fully or partially autonomous driving mode.
  • the vehicle 100 can obtain the surrounding environment information through the perception system 120, and obtain an automatic driving strategy based on the analysis of the surrounding environment information to realize fully automatic driving, or present the analysis result to the user to realize partial automatic driving.
  • Vehicle 100 may include various subsystems, such as infotainment system 110 , perception system 120 , decision control system 130 , drive system 140 , and computing platform 150 .
  • vehicle 100 may include more or fewer subsystems, and each subsystem may include multiple components. Additionally, each of the subsystems and components of the vehicle 100 may be interconnected by wired or wireless means.
  • infotainment system 110 may include communication system 111 , entertainment system 112 , and navigation system 113 .
  • Communication system 111 may include a wireless communication system that may wirelessly communicate with one or more devices, either directly or via a communication network.
  • the wireless communication system may use 3G cellular communications, such as CDMA, EVDO, GSM/GPRS, or 4G cellular communications, such as LTE. Or 5G cellular communications.
  • a wireless communication system can communicate with a wireless local area network (WLAN) using WiFi.
  • the wireless communication system may communicate directly with the device using an infrared link, Bluetooth, or ZigBee.
  • Other wireless protocols, such as various vehicle communication systems, for example, may include one or more dedicated short range communications (DSRC) devices, which may include communication between vehicles and/or roadside stations. Public and/or private data communications.
  • DSRC dedicated short range communications
  • the entertainment system 112 may include a central control screen, a microphone and a speaker.
  • the user can listen to the radio and play music in the car based on the entertainment system; or connect the mobile phone with the vehicle to realize the screen projection of the mobile phone on the central control screen. Touch type, the user can operate by touching the screen.
  • the user's voice signal can be acquired through a microphone, and some controls on the vehicle 100 by the user, such as adjusting the temperature in the vehicle, can be implemented according to the analysis of the user's voice signal. In other cases, music may be played to the user through the speaker.
  • the navigation system 113 may include a map service provided by a map provider, so as to provide navigation of the driving route for the vehicle 100, and the navigation system 113 may cooperate with the global positioning system 121 and the inertial measurement unit 122 of the vehicle.
  • the map service provided by the map provider can be a two-dimensional map or a high-precision map.
  • the perception system 120 may include several types of sensors that sense information about the environment surrounding the vehicle 100 .
  • the perception system 120 may include a global positioning system 121 (the global positioning system may be a GPS system, a Beidou system or other positioning systems), an inertial measurement unit (IMU) 122, a lidar 123, a millimeter wave radar 124, an ultrasonic radar 125, and a camera device 126, wherein the camera device 126 may also be referred to as a camera.
  • the perception system 120 may also include sensors that monitor the internal systems of the vehicle 100 (eg, an in-vehicle air quality monitor, a fuel gauge, an oil temperature gauge, etc.). Sensor data from one or more of these sensors can be used to detect objects and their corresponding characteristics (position, shape, orientation, velocity, etc.). This detection and identification is a critical function for the safe operation of the vehicle 100 .
  • the global positioning system 121 may be used to estimate the geographic location of the vehicle 100 .
  • the inertial measurement unit 122 is used to sense position and orientation changes of the vehicle 100 based on inertial acceleration.
  • the inertial measurement unit 122 may be a combination of an accelerometer and a gyroscope.
  • the lidar 123 may utilize laser light to sense objects in the environment in which the vehicle 100 is located.
  • lidar 123 may include one or more laser sources, laser scanners, and one or more detectors, among other system components.
  • the millimeter wave radar 124 may utilize radio signals to sense objects within the surrounding environment of the vehicle 100 .
  • radar 126 may be used to sense the speed and/or heading of objects.
  • the ultrasonic radar 125 may sense objects around the vehicle 100 using ultrasonic signals.
  • the camera 126 may be used to capture image information of the surrounding environment of the vehicle 100 .
  • the camera 126 may include a monocular camera, a binocular camera, a structured light camera, a panoramic camera, etc.
  • the image information acquired by the camera 126 may include still images or video stream information.
  • the depth image and the color image can be acquired by the sensors deployed in the perception system 120 , for example, the depth image and the color image in the same scene can be acquired by the lidar 123 and the camera 126 respectively.
  • the decision control system 130 includes a computing system 131 for analyzing and making decisions based on the information obtained by the perception system 120 , and the decision control system 130 further includes a vehicle controller 132 for controlling the power system of the vehicle 100 , and for controlling the steering of the vehicle 100 .
  • Computing system 131 may be operable to process and analyze various information acquired by perception system 120 in order to identify objects, objects, and/or features in the environment surrounding vehicle 100 .
  • the target may include pedestrians or animals, and the objects and/or features may include traffic signals, road boundaries, and obstacles.
  • the computing system 131 may use technologies such as object recognition algorithms, Structure from Motion (SFM) algorithms, and video tracking.
  • SFM Structure from Motion
  • computing system 131 may be used to map the environment, track objects, estimate the speed of objects, and the like.
  • the computing system 131 can analyze the obtained various information and derive a control strategy for the vehicle.
  • the vehicle controller 132 may be used to coordinately control the power battery and the engine 141 of the vehicle to improve the power performance of the vehicle 100 .
  • the steering system 133 is operable to adjust the heading of the vehicle 100 .
  • it may be a steering wheel system.
  • the throttle 134 is used to control the operating speed of the engine 141 and thus the speed of the vehicle 100 .
  • the braking system 135 is used to control the deceleration of the vehicle 100 .
  • the braking system 135 may use friction to slow the wheels 144 .
  • the braking system 135 may convert the kinetic energy of the wheels 144 into electrical current.
  • the braking system 135 may also take other forms to slow the wheels 144 to control the speed of the vehicle 100 .
  • Drive system 140 may include components that provide powered motion for vehicle 100 .
  • drive system 140 may include engine 141 , energy source 142 , driveline 143 , and wheels 144 .
  • the engine 141 may be an internal combustion engine, an electric motor, an air compression engine, or other types of engine combinations, such as a hybrid engine composed of a gasoline engine and an electric motor, and a hybrid engine composed of an internal combustion engine and an air compression engine.
  • Engine 141 converts energy source 142 into mechanical energy.
  • Examples of energy sources 142 include gasoline, diesel, other petroleum-based fuels, propane, other compressed gas-based fuels, ethanol, solar panels, batteries, and other sources of electricity.
  • the energy source 142 may also provide energy to other systems of the vehicle 100 .
  • Transmission 143 may transmit mechanical power from engine 141 to wheels 144 .
  • Transmission 143 may include a gearbox, a differential, and a driveshaft.
  • transmission 143 may also include other devices, such as clutches.
  • the drive shaft may include one or more axles that may be coupled to one or more wheels 121 .
  • Computing platform 150 may include at least one processor 151 that may execute instructions 153 stored in a non-transitory computer-readable medium such as memory 152 .
  • computing platform 150 may also be multiple computing devices that control individual components or subsystems of vehicle 100 in a distributed fashion.
  • Processor 151 may be any conventional processor, such as a commercially available CPU.
  • the processor 151 may further include, for example, a graphics processor (Graphic Process Unit: GPU), a Field Programmable Gate Array (Field Programmable Gate Array: FPGA), a System on a Chip (System Chip: SOC), an application-specific integrated chip ( Application Specific Integrated Circuit: ASIC) or a combination thereof.
  • FIG. 1 functionally illustrates the processor, memory, and other elements of the computer 110 in the same block, one of ordinary skill in the art will understand that the processor, computer, or memory may actually include a processor, a computer, or a memory that may or may not Multiple processors, computers, or memories stored within the same physical enclosure.
  • the memory may be a hard drive or other storage medium located within an enclosure other than computer 110 .
  • reference to a processor or computer will be understood to include reference to a collection of processors or computers or memories that may or may not operate in parallel.
  • some components such as the steering and deceleration components, may each have its own processor that only performs computations related to component-specific functions .
  • a processor may be located remotely from the vehicle and in wireless communication with the vehicle. In other aspects, some of the processes described herein are performed on a processor disposed within the vehicle while others are performed by a remote processor, including taking steps necessary to perform a single maneuver.
  • the memory 152 may contain instructions 153 (eg, program logic) that may be executed by the processor 151 to perform various functions of the vehicle 100 .
  • Memory 152 may also contain additional instructions including sending data to, receiving data from, interacting with and/or controlling one or more of infotainment system 110 , perception system 120 , decision control system 130 actuation system 140 instruction.
  • memory 152 may store data such as road maps, route information, vehicle location, direction, speed, and other such vehicle data, among other information. Such information may be used by the vehicle 100 and the computing platform 150 during operation of the vehicle 100 in autonomous, semi-autonomous, and/or manual modes.
  • Computing platform 150 may control the functions of vehicle 100 based on inputs received from various subsystems (eg, drive system 140 , perception system 120 , and decision control system 130 ). For example, computing platform 150 may utilize input from decision control system 130 in order to control steering system 133 to avoid obstacles detected by perception system 120 . In some embodiments, computing platform 150 is operable to provide control over many aspects of vehicle 100 and its subsystems.
  • various subsystems eg, drive system 140 , perception system 120 , and decision control system 130 .
  • computing platform 150 may utilize input from decision control system 130 in order to control steering system 133 to avoid obstacles detected by perception system 120 .
  • computing platform 150 is operable to provide control over many aspects of vehicle 100 and its subsystems.
  • the computing platform 150 may acquire a depth image and a color image from the perception system 120, and perform repair processing on the color image by using the depth information in the depth image to obtain an enhanced color image.
  • the implementation of the repair processing may be The form of software is stored in memory 152, and instructions 153 in memory 152 are invoked by processor 151 to perform the repair process.
  • the computing platform 150 may output the enhanced color image to other systems for further processing, such as outputting the enhanced color image to the infotainment system 110 for the driver to view the enhanced color image , or output the enhanced color image system to the decision control system 130 for relevant decision processing.
  • one or more of these components described above may be installed or associated with the vehicle 100 separately.
  • memory 152 may exist partially or completely separate from vehicle 100 .
  • the above-described components may be communicatively coupled together in a wired and/or wireless manner.
  • FIG. 1 should not be construed as a limitation on the embodiments of the present application.
  • a self-driving car traveling on a road can recognize objects within its surroundings to determine adjustments to the current speed.
  • the objects may be other vehicles, traffic control equipment, or other types of objects.
  • each identified object may be considered independently, and based on the object's respective characteristics, such as its current speed, acceleration, distance from the vehicle, etc., may be used to determine the speed at which the autonomous vehicle is to adjust.
  • the vehicle 100 or sensing and computing devices associated with the vehicle 100 may be based on the characteristics of the identified objects and the state of the surrounding environment (eg, traffic, rain, ice, etc.) to predict the behavior of the identified object.
  • each identified object is dependent on the behavior of the other, so it is also possible to predict the behavior of a single identified object by considering all identified objects together.
  • the vehicle 100 can adjust its speed based on the predicted behavior of the identified object.
  • the self-driving car can determine what steady state the vehicle will need to adjust to (eg, accelerate, decelerate, or stop) based on the predicted behavior of the object.
  • other factors may also be considered to determine the speed of the vehicle 100, such as the lateral position of the vehicle 100 in the road being traveled, the curvature of the road, the proximity of static and dynamic objects, and the like.
  • the computing device may also provide instructions to modify the steering angle of the vehicle 100 so that the self-driving car follows a given trajectory and/or maintains contact with objects in the vicinity of the self-driving car (eg, , cars in adjacent lanes on the road) safe lateral and longitudinal distances.
  • objects in the vicinity of the self-driving car eg, , cars in adjacent lanes on the road
  • the above-mentioned vehicle 100 may be a car, a truck, a motorcycle, a bus, a boat, an airplane, a helicopter, a lawn mower, an amusement vehicle, an amusement park vehicle, a construction equipment, a tram, a golf cart, a train, etc.
  • the embodiments of the present application No special restrictions are made.
  • Scenario 2 Terminal taking pictures
  • FIG. 2 is a structural diagram of a mobile phone 200 according to an embodiment of the present application.
  • the mobile phone 200 is only an example of a terminal, and the mobile phone 200 may have more or less components than those shown in FIG. 2 , and two components may be combined. or more components, or may have different component configurations.
  • the various components shown in Figure 2 may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.
  • the mobile phone 200 includes: an RF (Radio Frequency, radio frequency) circuit 210, a memory 220, an input unit 230, a display unit 240, a sensor 250, an audio circuit 260, and a wireless fidelity (Wireless Fidelity, Wi-Fi) module 270, the processor 280, and the power supply and other components.
  • RF Radio Frequency, radio frequency
  • the structure of the mobile phone shown in FIG. 2 does not constitute a limitation on the mobile phone, and may include more or less components than shown, or combine some components, or arrange different components.
  • the RF circuit 210 can be used for receiving and sending signals during transmission and reception of information or during a call. After receiving the downlink information of the base station, it can be processed by the processor 280; in addition, it can send uplink data to the base station.
  • RF circuits include, but are not limited to, antennas, at least one amplifier, transceivers, couplers, low noise amplifiers, duplexers, and the like.
  • the RF circuit 210 may also communicate with the network and other mobile devices via wireless communication.
  • the wireless communication may use any communication standard or protocol, including but not limited to Global System for Mobile Communications, General Packet Radio Service, Code Division Multiple Access, Wideband Code Division Multiple Access, Long Term Evolution, email, short message service, and the like.
  • Memory 220 may be used to store software programs and data.
  • the processor 280 executes various functions and data processing of the mobile phone 200 by running the software programs and data stored in the memory 220 .
  • the memory 220 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playback function, an image playback function, etc.) required for at least one function, and the like;
  • the use of the cell phone 200 creates data (such as audio data, phone book, etc.) and the like.
  • memory 220 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
  • the memory 220 stores an operating system that enables the mobile phone 200 to run, such as an operating system developed by Apple Inc. operating system, developed by Google Open source operating system, developed by Microsoft operating system, etc.
  • the input unit 230 may be used to receive input numerical or character information, and generate signal input related to user settings and function control of the mobile phone 200 .
  • the input unit 230 may include a touch panel 231 disposed on the front of the mobile phone 200 as shown in FIG. 1 , and may collect the user's touch operations on or near it (for example, the user uses any suitable objects or accessories such as a finger, a stylus, etc.) operations on the touch panel 231 or near the touch panel 231 ), and drive the corresponding connection device according to a preset program.
  • the touch panel 231 may include two parts, a touch detection device and a touch controller (not shown in FIG. 2 ).
  • the touch detection device detects the user's touch orientation, detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts it into contact coordinates, and then sends it to the touch controller.
  • the processor 280 can receive the instructions sent by the processor 280 and execute them.
  • the touch panel 231 can be implemented in various types such as resistive, capacitive, infrared, and surface acoustic waves.
  • the display unit 240 (that is, the display screen) can be used to display the information input by the user or the information provided to the user and a graphical user interface (Graphical User Interface, GUI) of various menus of the mobile phone 200.
  • the display unit 240 may include a display panel 241 disposed on the front of the mobile phone 200 .
  • the display panel 241 may be configured in the form of a liquid crystal display, a light emitting diode, or the like.
  • Cell phone 200 may also include at least one sensor 250, such as a camera device, depth sensor, light sensor, motion sensor, and other sensors.
  • the camera device may be a color camera for capturing color images
  • the depth sensor may be used to determine the depth information from the mobile phone 200 to the object
  • the light sensor may include an ambient light sensor and a proximity sensor.
  • the accelerometer sensor can detect the magnitude of acceleration in all directions (usually three axes), and can detect the magnitude and direction of gravity when it is stationary. games, magnetometer attitude calibration), vibration recognition related functions (such as pedometer, tapping), etc.; as for other sensors such as gyroscope, barometer, hygrometer, thermometer, infrared sensor, etc. that can be configured on the mobile phone 200, we will not Repeat.
  • the audio circuit 260 , the speaker 261 , and the microphone 262 can provide an audio interface between the user and the mobile phone 200 .
  • the audio circuit 260 can transmit the received audio data converted electrical signal to the speaker 261, and the speaker 261 converts it into a sound signal for output; on the other hand, the microphone 262 converts the collected sound signal into an electrical signal, which is converted by the audio circuit 260 After receiving, it is converted into audio data, and then the audio data is output to the RF circuit 210 for transmission to, for example, another mobile phone, or the audio data is output to the memory 220 for further processing.
  • Wi-Fi is a short-distance wireless transmission technology.
  • the mobile phone 200 can help users to send and receive emails, browse web pages, and access streaming media through the Wi-Fi module 270, which provides users with wireless broadband Internet access.
  • the processor 280 is the control center of the mobile phone 200, uses various interfaces and lines to connect various parts of the entire mobile phone, and executes the functions of the mobile phone 200 by running or executing the software programs stored in the memory 220 and calling the data stored in the memory 220. Various functions and processing data for overall monitoring of the mobile phone.
  • the processor 280 may include one or more processing units; the processor 280 may also integrate an application processor and a modem processor, wherein the application processor mainly processes the operating system, user interface and application programs, etc. , the modem processor mainly deals with wireless communication. It can be understood that, the above-mentioned modulation and demodulation processor may not be integrated into the processor 180 .
  • the processor 280 obtains the depth image and the color image from the sensor 250, and uses the depth information in the depth image to perform restoration processing on the color image to obtain an enhanced color image.
  • the restoration processing can be implemented by software.
  • the form is stored in memory 220, and instructions in memory 220 are invoked by processor 280 to perform the repair process.
  • the processor 280 may output the enhanced color image to other systems for further processing, such as outputting the enhanced color image to the display unit 240 for the user to view the enhanced color image.
  • the Bluetooth module 281 is used to exchange information with other devices through a short-distance communication protocol such as Bluetooth.
  • the mobile phone 200 can establish a Bluetooth connection with a wearable electronic device (such as a smart watch) that also has a Bluetooth module through the Bluetooth module 281, so as to perform data interaction.
  • a wearable electronic device such as a smart watch
  • Cell phone 200 also includes a power source 290 (eg, a battery) for powering the various components.
  • the power supply can be logically connected to the processor 280 through a power management system, so that functions such as managing charging, discharging, and power consumption are implemented through the power management system. It can be understood that, in the following embodiments, the power supply 290 can be used to supply power to the display panel 241 and the touch panel 231 .
  • FIG. 3 is an image processing method provided by an embodiment of the present invention.
  • the image processing method mainly uses depth information in a depth image to perform repair processing on a color image to obtain an enhanced color image.
  • the image method includes the following steps:
  • S301 Acquire a color image and a first depth image.
  • the depth sensor acquires the first depth image
  • the camera (or “camera”) acquires a color image
  • the color image may be an original color image
  • the color image and the first depth image can be obtained by simultaneously photographing the same scene at the same location with a paired and calibrated color camera and depth sensor, and then registering the two obtained images, or as required. Obtained from local storage or a local database, or received from an external data source (eg, the Internet, a server, a database, etc.) through an input device or transmission medium, and the like.
  • the color image and the first depth image are images corresponding to each other.
  • the color image collected by the sensor and the first depth image can be projected into the same coordinate system through image registration, so that the pixels of the two images correspond one-to-one. Considering that registration is a prior art, it is not described in detail in the present invention.
  • the depth sensor is used to sense the depth information of the environment, and the technical solutions used can be monocular stereo vision, binocular stereo vision (Stereo), structured light (Structure Light), time of flight (Time of Flight), LiDAR (LiDAR) ), camera arrays or one of other depth sensing technologies, which is not specifically limited here.
  • a specific embodiment of the present invention uses a laser radar as a depth sensor.
  • the lidar emits laser light into space through the laser transmitter, and then calculates the corresponding time for the object to reflect back to the laser receiver, and then infers the distance between the object and the lidar; and collects a large number of dense points on the surface of the target object in the process
  • the three-dimensional coordinates of the object are used to quickly determine the orientation, height and even shape of the object.
  • the laser transmitter emits laser light to other distant areas such as the sky, it does not return to the laser receiver, and the point of the corresponding area cannot be obtained.
  • the atmospheric light value may be determined according to the first depth image.
  • determining the atmospheric light value according to the first depth image includes S3021 and S3022.
  • S3021 Determine a sky depth image for indicating a sky area from the first depth image.
  • a sliding window method may be used to determine the sky depth image from the first depth image.
  • the shape of the sliding window can be determined.
  • the shape of the sliding window can be set as a rectangle with the size of one row of pixels, the step size of the sliding window is 1, and the rectangle with the size of one row of pixels refers to the width of the first depth image. and a rectangle with a length of 1 pixel.
  • the sky depth image can be determined from the first depth image in a top-down sliding manner. Specifically, the starting position of the sliding window is at the top of the first depth image and will start The starting line is used as the upper limit of the sky area, the top end of the sliding window is fitted with the top end of the first depth image, the sum of the pixel values in the current sliding window and the current number of lines are recorded, and it is judged whether the sum of the pixel values in the sliding window is not If the sum of the pixel values is zero, the sliding window will slide to the next line according to the step size, and the sum of the pixel values of the current sliding window and the current number of lines will be recorded again, and the sliding window will continue to slide down until it finds the pixels of a certain row of sliding windows. The sum of the values is non-zero, then the row is taken as the lower bound of the sky area, and the area consisting of the upper bound of the sky area and the lower bound of the sky area is the sky depth image in the first depth image
  • the present application does not specifically limit the shape of the sliding window, which may be a rectangle, a circle, or other shapes and combinations of other shapes.
  • the present invention also does not specifically limit the starting position and moving order of the sliding window.
  • the starting position of the sliding window can be any area of the image, and the moving order of the sliding window can be top-to-bottom, bottom-to-top , left-to-right, or right-to-left.
  • S3022 Determine the atmospheric light value according to the sky depth image and the color image.
  • the first depth image is binarized by using the sky depth image to obtain a binary image of the first depth image, and then the sky color used to indicate the sky area is determined according to the binary image and the color image image, and finally determine the atmospheric light value according to the sky color image.
  • the binary map of the first depth image is a map composed of 0 values and 1 values, wherein the pixel value of the sky area in the depth image corresponds to 1 in the binary image, and the pixel value of the non-sky area in the depth image is 1. The value corresponds to 0 in the binary graph.
  • the sky color image used to indicate the sky region is determined according to the binary image and the color image, and a new color image can be obtained by multiplying the binary image of the first depth image and the corresponding positions of the pixels in the color image.
  • image, the new color image is the sky color image used to indicate the sky area.
  • Determining the atmospheric light value according to the sky color image specifically includes: firstly determining the pixel points of the brightest part from the sky color image, and the pixel points of the brightest part are the pixels of the brightest part in the sky color image, that is, the most bright part of the pixel points in the sky color image.
  • the pixels of the bright part such as the brightest 0.1% of the pixels, and then the average value of the pixels of the brightest part is determined as the atmospheric light value.
  • the range of the atmospheric light value is the gray value of 0 to 255, and the larger the value is, the higher the brightness of the atmospheric light is.
  • S303 Perform depth completion on the first depth image to obtain a second depth image, and determine the transmittance according to the second depth image.
  • the first depth image is generally a sparse depth image
  • the second depth image determines the transmittance.
  • a sparse depth image generally refers to a depth image containing a small number of non-zero pixels
  • a dense depth image refers to a depth image containing many non-zero images, that is, contains more non-zero pixels than the sparse depth image.
  • transmittance can also be called transmittance coefficient or transmittance, which refers to the ratio of transmitted light flux to incident light flux, and its value ranges from 0 to 100%.
  • S304 Perform repair processing on the color image according to the atmospheric light value and the transmittance.
  • the color image is repaired to obtain a repaired image J(x), where the repair processing refers to the image to be repaired , that is, the original color image, and perform related image processing to obtain a clear color image, for example, repair a color image containing fog/haze/smoke into a fog/haze/smoke-free color image.
  • I(x) is the color image, i.e. the image to be repaired, such as an image containing fog/haze/smoke
  • t(x) is the transmittance
  • A is the atmospheric light value
  • J(x) is the modified image , such as an image that has been dehaze/haze/smoke.
  • the depth image is used to determine the transmittance, and it is the depth image after depth completion, which can improve the accuracy of the transmittance and further improve the image processing effect.
  • FIG. 4 is an implementation manner of depth completion for a depth image provided by an embodiment of the present invention.
  • the color image and the first depth image input depth image are respectively input to the first deep neural network 310 and the second deep neural network 320 to obtain the third depth image and the fourth depth image, respectively, and then the third depth image and The fourth depth image is input to the fusion module 330 for fusion processing to obtain a second depth image, that is, a completed dense depth image.
  • the first deep neural network 310 and the second deep neural network 320 have the same structure.
  • the first deep neural network 310 includes a first preprocessing network 311 , a first encoder 312 and a first decoder 313 .
  • the first preprocessing network 311 is used to transform the color image input to the first deep neural network 310 into a first feature image suitable for processing by the first encoder 312, and the first encoder 312 is used to perform a feature on the first feature image.
  • the first decoder 313 is configured to perform feature decoding on the second feature image output by the first encoder 312 .
  • the first encoder 312 includes N downsampling actions, and the first decoder 313 includes N upsampling actions.
  • the second deep neural network 320 includes a second preprocessing network 321 , a second encoder 322 and a second decoder 323 .
  • the second preprocessing network 321 is used for transforming the first depth image input to the second deep neural network 320 into a third feature image suitable for processing by the second encoder 322, and the second encoder 322 is used for processing the third feature image Perform feature encoding, and the second decoder 323 is configured to perform feature decoding on the fourth feature image output by the second encoder 322 network.
  • the second encoder 322 includes N downsampling actions, and the second decoder 323 includes N upsampling actions.
  • the first deep neural network 310 obtains a third depth image from the color image and some intermediate layer feature images in the second deep neural network 320 . Therefore, the input of the first deep neural network 310 includes two inputs, one input is a color image, and the other input is a feature image output by some intermediate layers in the second deep neural network 320. The output is a third depth image.
  • the second deep neural network 320 obtains a fourth depth image according to the first depth image and some intermediate layer feature images in the first deep neural network 310 . Therefore, the input of the second deep neural network 320 includes two inputs, one input is the first depth image, the other input is the feature image output by the middle layer of the first deep neural network 310, the second deep neural network 320 The output is a depth image.
  • the fusion module 330 may be configured to fuse the third depth image output by the first deep neural network 310 with the fourth depth image output by the second deep neural network 320 to generate a completed second depth image.
  • the specific implementation of the fusion operation may include: performing concat processing on the third depth image and the fourth depth image, and then performing at least one convolution operation on the concatenated image to obtain a completed second depth image .
  • first deep neural network 310 and the second deep neural network 320 are further described as follows.
  • the first preprocessing network 311 preprocesses the color image to obtain the first feature image.
  • the first preprocessing network 311 can be used to transform the input color image into a first feature image suitable for processing by the first encoder 312 , and input the first feature image to the first encoder 312 .
  • the first preprocessing network 311 may be composed of at least one convolutional layer.
  • the first preprocessing network 311 performs convolution processing on the color image, so that the number of channels of the color image is changed without changing the size.
  • the color image input to the first deep neural network is in RGB format. Then the number of channels of the color image is 3, which are R (red), G (green), and B (blue). For example, if the size of the input color image is h*w*3, after the color image passes through the first preprocessing network 311, the size of the output first feature image is h*w*16.
  • the number of channels of the first feature image and the third feature image output after the color image passes through the first preprocessing network 311 is kept the same.
  • S12 Input the first feature image into the first encoder 312 to obtain the second feature image.
  • the first encoder 312 may be composed of N coding units. Each coding unit has the same structure and the same coding steps.
  • the first encoding unit 314 in the first encoder 312 includes at least one convolution layer and a downsampling layer, the convolution layer is used to extract features, and the operation of the convolution layer does not change the input feature map size.
  • the feature image output by at least one convolutional layer can be saved, and the feature image output by at least one convolutional layer is spliced with the feature image generated by the intermediate layer in the second deep neural network 320. Indicates splicing.
  • the feature image generated by the intermediate layer in the second deep neural network 320 may be the feature map generated after the convolution layer operation in the encoding unit 324 of the second encoder 322 in the second deep neural network 320 .
  • the number of feature image channels output after the stitching operation is doubled.
  • the spliced feature image is then passed through a convolution layer (eg, a 1*1 convolution layer) for channel dimension reduction to reduce the number of channels of the spliced feature image to the number of channels before splicing.
  • a convolution layer eg, a 1*1 convolution layer
  • the feature image output after being processed by the convolution layer is down-sampled.
  • the structure of the first coding unit 314 shown in FIG. 4 is just an example. In practice, the structure of the coding unit can be adjusted to a certain extent, for example, the number of convolutional layers is adjusted. Do limit.
  • This application does not limit the number N of coding units in the first encoder 312, and does not specifically limit the number of convolutional layers and downsampling coefficients in each coding unit.
  • a first feature image with a size of h*w*16 is input to the first encoder 312, where 16 is the number of channels, h is the pixel length, and w is the pixel width.
  • the output feature image has a size of h*w*16.
  • the feature image output in the previous step is spliced with the feature map generated by the intermediate layer in the second deep neural network 320, and the number of channels of the output feature image is doubled, that is, the size of the image is h*w*32.
  • the feature image output in the previous step passes through a 1*1 convolution layer, which is used to reduce the number of channels of the spliced feature image to 16 again, that is, the size becomes h*w*16 again.
  • the feature image output in the previous step is subjected to a downsampling process with a downsampling coefficient of 1/2, the pixel length h becomes 1/2 of the original, the pixel width w becomes 1/2 of the original, and the number of channels is doubled , that is, a feature image with a size of 1/2h*1/2w*32 is obtained.
  • S13 Input the second feature image into the first decoder 313 to obtain a third depth image.
  • the first decoder network 313 may be composed of N decoding units, where N is an integer greater than 1, and the value of N is not limited in this application.
  • each decoding unit in the first decoder 313 may be the same.
  • the first decoding unit 315 in the first decoder 313 includes an upsampling layer and at least one convolution layer.
  • the second feature image is subjected to an up-sampling process to obtain the up-sampled output feature image, and then the feature image generated after up-sampling, the feature map generated by the intermediate layer in the first encoder 312 and the second deep depth neural
  • the feature maps generated by the middle layer of the second decoder 323 in the network 320 are concatenated.
  • the spliced feature image is first passed through a convolution layer that can realize channel dimension reduction, such as a 1*1 convolution layer, and then the output feature image after channel dimension reduction is passed through at least one convolution layer, where,
  • the convolutional layers included in the first decoding unit 315 are all used to extract features without changing the size of the input feature map.
  • the feature image generated after up-sampling will be provided to the decoding unit of the second decoder 323 of the second deep neural network 320 for splicing processing.
  • the structure of the first decoding unit 315 shown in FIG. 4 is only an example. In practice, the structure of the decoding unit can be adjusted to a certain extent, for example, the number of convolutional layers is adjusted, which is not covered in this application. Do limit.
  • the present invention does not limit the number N of decoding units in the first decoder 313, and does not specifically limit the number of convolutional layers and upsampling coefficients in each decoding unit.
  • N the number of decoding units in the first decoder 313, and does not specifically limit the number of convolutional layers and upsampling coefficients in each decoding unit.
  • a specific example is used for further description here.
  • a second feature image 317 with a size of 1/8h*1/8w*128 is input to the first decoder 313 .
  • the feature image passes through an upsampling convolutional layer with an upsampling coefficient of 1/2, the pixel length 1/8h becomes 1/4h, the pixel width 1/8w becomes the original 1/4h, and the number of channels becomes the original half of , that is, to obtain a dimension of
  • the feature image output in the previous step the feature map generated by the middle layer of the first encoder 312 in the first deep neural network 310 and the feature map generated by the middle layer of the second decoder 323 in the second deep neural network 320 are processed.
  • the number of channels of the output spliced feature image is doubled, that is, the size of the output feature image is 1/4h*1/4w*192.
  • the feature image output in the previous step is passed through a 1*1 convolution layer, which is used to reduce the number of channels of the feature image after splicing to the number of channels before splicing, that is, the size becomes 1/4h again *1/4w*64.
  • the output feature image size is 1/4h*1/4w*64.
  • the second preprocessing network 321 preprocesses the first depth image to obtain a third feature image.
  • the second preprocessing network 321 can be used to transform the input first depth image into a third feature image suitable for processing by the second encoder 322, and input the third feature image to the second encoder 322.
  • the second preprocessing network 321 may be composed of at least one convolutional layer, and the second preprocessing network 321 performs convolution processing on the first depth image, so that the number of channels of the first depth image is changed without changing the size.
  • the first depth image input to the second neural network has only one channel number, that is, a gray value.
  • the gray value represents the distance from a certain pixel in the scene to the depth sensor.
  • the size of the input first depth image is h*w*1
  • the size of the output third feature image is h*w*16.
  • the number of channels of the third feature image and the first feature image are the same.
  • S22 Input the third characteristic image into the second encoder 322 to obtain the fourth characteristic image.
  • the third feature image is input to the second encoder 322 .
  • the second encoder 322 may be composed of N coding units. Each coding unit has the same structure and the same coding steps.
  • the second encoder 322 includes at least one convolution layer and a sampling layer, the convolution layer is used to extract features, and the operation of the convolution layer does not change the input feature map. Size.
  • the feature image output by at least one convolution layer can be saved, and the feature image output by at least one convolution layer is spliced with the feature image generated by the intermediate layer in the first deep neural network 310.
  • the first deep neural network 310 The feature image generated by the middle layer in the first deep neural network 310 may be the feature map generated after the convolutional layer operation in the encoding unit 314 of the first encoder 312 in the first deep neural network 310 .
  • the number of feature image channels output after the stitching operation is doubled.
  • pass the spliced feature image through a convolution layer eg, a 1*1 convolution layer
  • a convolution layer eg, a 1*1 convolution layer
  • the feature image output after being processed by the convolution layer is processed by downsampling.
  • the structure of the second coding unit 324 shown in FIG. 5 is just an example. In practice, the structure of the coding unit can be adjusted to a certain extent, for example, the number of convolutional layers has been adjusted. Do limit.
  • This application does not limit the number N of coding units of the second encoder 322, and does not specifically limit the number of convolutional layers and downsampling coefficients in each coding unit.
  • a third feature image of size h*w*16 is input to the second encoder 322, where 16 is the number of channels, h is the pixel length, and w is the pixel width.
  • the output feature image has a size of h*w*16.
  • the feature image output in the previous step is spliced with the feature map generated by the intermediate layer in the first deep neural network 310, and the number of channels of the output feature image is doubled, that is, the size of the image is h*w*32.
  • the feature image output in the previous step passes through a 1*1 convolution layer, which is used to reduce the number of channels of the spliced feature image to 16 again, that is, the size becomes h*w*16 again.
  • the feature image output in the previous step is subjected to a downsampling process with a downsampling coefficient of 1/2, the pixel length h becomes 1/2 of the original, the pixel width w becomes 1/2 of the original, and the number of channels is doubled , that is, a feature image with a size of 1/2h*1/2w*32 is obtained.
  • S23 Input the fourth feature image into the second decoder 323 to obtain a fourth depth image.
  • the fourth feature image is input to the second decoder 323, and the second decoder 323 outputs the fourth depth image.
  • the second decoder 323 may be composed of N decoding units, where N is an integer greater than 1, and the value of N is not limited in this application.
  • each decoding unit in the second decoder 323 may be the same.
  • the second decoding unit 325 in the second decoder 323 includes an upsampling layer and at least one convolutional layer.
  • the fourth feature image is subjected to an up-sampling process to obtain the up-sampled output feature image, and then the feature image generated after up-sampling, the feature map generated by the intermediate layer in the second encoder 322 and the first deep depth neural
  • the feature maps generated by the intermediate layers in the first decoder 313 in the network 310 are concatenated.
  • the spliced feature image is first passed through a convolution layer that can realize channel dimension reduction, such as a 1*1 convolution layer, and then the output feature image after channel dimension reduction is passed through at least one convolution layer, where,
  • the convolutional layers included in the second decoding unit 325 are all used to extract features without changing the size of the input feature map.
  • the feature image generated after up-sampling will be provided to the decoding unit of the first decoder 313 of the first deep neural network 310 for splicing processing.
  • the structure of the second decoding unit 325 shown in FIG. 4 is only an example. In practice, the structure of the decoding unit can be adjusted to a certain extent, for example, the number of convolutional layers is adjusted, which is not covered in this application. Do limit.
  • the present invention does not limit the number N of decoding units in the second decoder 323, and does not specifically limit the number of convolutional layers and upsampling coefficients in each decoding unit.
  • N the number of decoding units in the second decoder 323
  • convolutional layers and upsampling coefficients in each decoding unit In order to make the above steps clearer, a specific example is used for further description here.
  • a fourth feature image with a size of 1/8h*1/8w*128 is input to the second decoder 323.
  • the feature image passes through an upsampling convolutional layer with an upsampling coefficient of 1/2, the pixel length 1/8h becomes 1/4h, the pixel width 1/8w becomes the original 1/4h, and the number of channels becomes the original half of , that is, a feature image with a size of 1/4h*1/4w*64 is obtained.
  • the feature image output in the previous step, the feature map generated by the middle layer of the first decoder 313 in the first deep neural network 310 and the feature map generated by the middle layer of the second encoder 322 in the second deep neural network 320 are processed.
  • the number of channels of the output spliced feature image is doubled, that is, the size of the output feature image is 1/4h*1/4w*192.
  • the feature image output in the previous step is passed through a 1*1 convolution layer, which is used to reduce the number of channels of the feature image after splicing to the number of channels before splicing, that is, the size becomes 1/4h again *1/4w*64.
  • the output feature image size is 1/4h*1/4w*64.
  • the above-mentioned depth completion is mainly realized by using the first deep neural network and the second deep neural network for reasoning.
  • the following will further describe how to train the first deep neural network and the second deep neural network.
  • a deep neural network is taken as an example, and the training method of the second deep neural network is similar.
  • an initial first deep neural network is built, and parameters of the initial first deep neural network are initialized, for example, the initialization parameters may adopt random values.
  • training samples which are color images and images with registered depth information.
  • the training samples can be obtained from the actual scene through cameras and depth sensors, or open-source color images and registered depth images can be obtained from the database. If the number of training samples is insufficient, the training samples can also be used for data enhancement of the training set by blurring, cropping, adding noise, and generating training samples using an adversarial network.
  • the value of the loss function can be calculated using the mean squared error, which is the sum of the squares of the difference between the depth image output by the network and the real depth image. to evaluate how close the predicted depth image is to the ground-truth depth image.
  • the optimizer can use the adam optimizer for backpropagation in neural network training to update the parameters in the network. By updating the parameters in the network iteratively, the predicted depth image is close to the real depth image. The training ends when the value of the loss function no longer changes.
  • FIG. 5 and FIG. 6 provide schematic structural diagrams of possible image processing apparatuses according to embodiments of the present application. These image processing apparatuses can be used to achieve the beneficial effects of the foregoing method embodiments.
  • the image processing apparatus 500 includes an image acquisition module 510 , an atmospheric light value calculation module 520 , a transmittance calculation module 530 , and an image restoration module 540 .
  • the image acquisition module 510 is used to execute S301; the atmospheric light value calculation module 520 is used to execute S302; the transmittance calculation module 530 is used to execute S303 ; The image repairing module 540 is used for executing S304. More detailed descriptions of the above image acquisition module 510 , atmospheric light value calculation module 520 , transmittance calculation module 530 , and image restoration module 540 can be obtained directly by referring to the relevant descriptions in the method embodiment shown in FIG. 3 , which will not be repeated here. .
  • FIG. 6 is a schematic diagram of a hardware structure of an image processing apparatus according to an embodiment of the present application.
  • the image processing apparatus 600 shown in FIG. 6 includes a memory 601 , a processor 602 , a communication interface 603 , and a bus 604 .
  • the memory 601 , the processor 602 , and the communication interface 603 are connected to each other through the bus 604 for communication.
  • Memory 601 may be ROM, static storage and RAM.
  • the memory 601 may store a program, and when the program stored in the memory 601 is executed by the processor 602, the processor 602 and the communication interface 603 are used to execute each step of the image processing method of the embodiment of the present application.
  • the processor 602 may use a general-purpose CPU, a microprocessor, an ASIC, a GPU, or one or more integrated circuits, and is used to execute a related program, so as to realize the functions required to be performed by the units in the image processing apparatus of the embodiments of the present application, Or execute the image processing method of the method embodiment of the present application.
  • the processor 602 may also be an integrated circuit chip with signal processing capability.
  • each step of the image processing method in the embodiment of the present application may be completed by an integrated logic circuit of hardware in the processor 602 or an instruction in the form of software.
  • the above-mentioned processor 602 may also be a general-purpose processor, DSP, ASIC, FPGA or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • the methods, steps, and logic block diagrams disclosed in the embodiments of this application can be implemented or executed.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the steps of the method disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
  • the software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art.
  • the storage medium is located in the memory 4001, and the processor 602 reads the information in the memory 601, and combines its hardware to complete the functions required to be performed by the units included in the image processing apparatus of the embodiments of the present application, or to perform the image processing of the method embodiments of the present application. method.
  • the communication interface 603 implements communication between the apparatus 600 and other devices or a communication network using a transceiving device such as, but not limited to, a transceiver.
  • a transceiving device such as, but not limited to, a transceiver.
  • the image to be processed can be acquired through the communication interface 603 .
  • Bus 604 may include a pathway for communicating information between various components of device 600 (eg, memory 601, processor 602, communication interface 603).
  • processor in the embodiments of the present application may be a central processing unit (central processing unit, CPU), and the processor may also be other general-purpose processors, digital signal processors (digital signal processors, DSP), application-specific integrated circuits (application specific integrated circuit, ASIC), off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the memory in the embodiments of the present application may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically programmable Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory.
  • Volatile memory may be random access memory (RAM), which acts as an external cache.
  • RAM random access memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • DRAM synchronous dynamic random access memory
  • SDRAM synchronous dynamic random access memory
  • DDR SDRAM double data rate synchronous dynamic random access memory
  • enhanced SDRAM enhanced synchronous dynamic random access memory
  • SLDRAM synchronous connection dynamic random access memory Fetch memory
  • direct memory bus random access memory direct rambus RAM, DR RAM
  • the above embodiments may be implemented in whole or in part by software, hardware, firmware or any other combination.
  • the above-described embodiments may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions or computer programs. When the computer instructions or computer programs are loaded or executed on a computer, all or part of the processes or functions described in the embodiments of the present application are generated.
  • the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission to another website site, computer, server or data center by wire (eg, infrared, wireless, microwave, etc.).
  • the computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, or the like that contains one or more sets of available media.
  • the usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVDs), or semiconductor media.
  • the semiconductor medium may be a solid state drive.
  • At least one means one or more, and “plurality” means two or more.
  • At least one item(s) below” or similar expressions thereof refer to any combination of these items, including any combination of single item(s) or plural items(s).
  • at least one item (a) of a, b, or c can represent: a, b, c, a-b, a-c, b-c, or a-b-c, where a, b, c may be single or multiple .
  • the size of the sequence numbers of the above-mentioned processes does not mean the sequence of execution, and the execution sequence of each process should be determined by its functions and internal logic, and should not be dealt with in the embodiments of the present application. implementation constitutes any limitation.
  • the disclosed system, apparatus and method may be implemented in other manners.
  • the apparatus embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium.
  • the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution, and the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

本申请实施例提供一种图像处理方法,涉及计算机视觉技术领域,能够在雾/霾/烟极端天气情况下对彩色图像进行修复处理。该图像处理方法包括获取第一深度图像和彩色图像,该第一深度图像和该彩色图像分别是由深度传感器和相机针对同一场景拍摄的图像;确定大气光值;对该第一深度图像进行深度补全以得到第二深度图像,并且根据该第二深度图像确定透射率;根据该大气光值和该透射率对该彩色图像进行修复处理。由于第二深度图像经过深度补全,相较于第一深度图像有更多的深度信息,而第二深度图像用于确定透射率,因此,在得到更高精度的透射率,进而提高图像处理的效果。

Description

一种图像处理方法,装置及储存介质 技术领域
本发明涉及一种图像处理方法、装置及存储介质,属于图像处理技术领域。
背景技术
在日常生活中,雾、霾、烟、雨等天气现象会使来自物体的光信号减弱,使得摄像头获取的图像出现模糊不清的现象,极大的影响了图像在各个领域的实用价值。因此,提出一种消除雾、霾、烟,雨等天气现象对图像质量的影响,增加图像的可视度的方法是很有必要的。
目前大多数的去雾/霾/烟算法基于大气退化模型:
Figure PCTCN2021123940-appb-000001
其中,I(x)是有雾/霾/烟图像,是待修复的去雾/霾/烟图像,t(x)是透射率,A是大气光值。
基于大气退化模型的去雾/霾/烟算法就是已知有雾/霾/烟图像I(x)去求解透射率t(x)和大气光值A,最后根据上述公式(1)对I(x)进行修复以得到修复后的无雾/霾/烟图像J(x)。
针对透射率t(x)的计算,传统方法主要根据待修复的去雾/霾/烟的彩色图像来确定透射率,具体地,首先利用神经网络强大的拟合能力和大量的数据去训练拟合一个映射函数,将待修复的去雾/霾/烟的彩色图像通过映射函数映射至透射率t(x)。由于在训练拟合该映射函数时需要大量的数据来进行训练,这导致拟合出来的该映射函数受数据中场景的限制,进而影响该映射函数的广泛使用性,并且由于拟合出来的该映射函数准确性有限,导致计算出的透射率的精度低,使得修复后的无雾/霾/烟图像的效果不好。
发明内容
本发明实施例提供一种图像处理方法,将补全后的深度信息融入彩色图像去雾/霾算法中,很好解决了在可视度低的恶劣天气状况下,电子设备拍摄不清楚的问题。为达到上述目的,本发明实施例提供如下技术方案:
第一方面,本发明实施例提供了一种图像处理方法,该图像处理方法包括以下步骤:
获取第一深度图像和彩色图像,该第一深度图像和该彩色图像分别是由深度传感器和相机针对同一场景拍摄的图像;
确定大气光值;
对该第一深度图像进行深度补全以得到第二深度图像,并且根据该第二深度图像 确定透射率;
根据该大气光值和该透射率对该彩色图像进行修复处理。
由于传统的采用深度学习算法来确定透射率的方式泛化能力差,导致复原后的图像效果不太理想。本实施例中采用深度图像来确定透射率,并且是经过深度补全后的深度图像,可以提高透射率的精度,进而提高图像的处理效果。
在一种可能实现方式中,第一深度图像进行深度补全以得到第二深度图像包括:
将该彩色图像和第二中间特征图像提供给第一深度神经网络以得到第三深度图像,该第二中间特征图像为第二深度神经网络的中间层产生的特征图像;
将该第一深度图像和第一中间特征图像提供给第二深度神经网络以得到第四深度图像,该第一中间特征图像为第一神经网络的中间层产生的特征图像;
将该第三深度图像和该第四深度图像进行融合操作以得到该第二深度图像。
由上可知,通过第一深度神经网络对该彩色图像和第二神经网络的中间层产生的特征图像进行深度推理可以充分挖掘并利用彩色图像信息以获取深度信息充分的第三深度图像,以及通过第二深度神经网络对该第一深度图像和第一神经网络的中间层产生的特征图像进行深度推理可以充分挖掘并利用该第一深度图像以获取深度信息充分的第四深度图像,再将两个深度信息充分的第三深度图像和第四深度图像进行融合(或者说合并)可以得到质量高的完整深度图像,即第二深度图像。
在一种可能实现方式中,该第一深度神经网络包括第一预处理网络、第一编码器和第一解码器,该第一预处理网络用于将输入给该第一深度神经网络的彩色图像变换为适用于该第一编码器处理的第一特征图像,该第一编码器用于对该第一特征图像进行特征编码,该第一解码器用于对该第一编码器输出的第二特征图像进行特征解码。
在一种可能实现方式中,第一中间特征图像包括第一编码器的编码单元内的卷积层所产生的特征图像和第一解码器的解码单元内的上采样层所产生的特征图像。
在一种可能实现方式中,该第二深度神经网络包括第二预处理网络、第二编码器和第二解码器,该第二预处理网络用于将输入给该第二深度神经网络的第一深度图像变换为适用于该第二编码器处理的第三特征图像,该第二编码器用于对该第三特征图像进行特征编码,该第二解码器用于对该第二编码器输出的第四特征图像进行特征解码。
在一种可能实现方式中,第二中间特征图像包括第二编码器的编码单元内的卷积层所产生的特征图像和第二解码器的解码单元内的上采样层所产生的特征图像。
在一种可能实现方式中,确定大气光值包括:
根据该第一深度图像确定该大气光值。
在一种可能实现方式中,根据该第一深度图像确定该大气光值包括:
从该第一深度图像中确定用于指示天空区域的天空深度图像;
根据该天空深度图像和该彩色图像确定该大气光值。
采用第一深度图像的深度信息从第一深度图像中分割出来的天空深度图像更能准确的识别天空区域,避免测量对象的颜色干扰,特别是白色区域,从而有效的计算出大气光值。
在一种可能实现方式中,从该第一深度图像中确定用于指示天空区域的天空深度图像包括:
通过滑窗法从该第一深度图像中确定该天空深度图像。
在一种可能实现方式中,该滑窗法所采用的滑窗的形状为一行像素大小的长方形,该滑窗的步长为1。
在一种可能实现方式中,根据该天空深度图像和该彩色图像确定该大气光值包括:
利用该天空深度图像从该彩色图像中确定出用于指示天空区域的天空彩色图像;根据该天空彩色图像确定该大气光值。
在一种可能的实现方式中,利用该天空深度图像从该彩色图像中确定出用于指示天空区域的天空彩色图像包括:
利用该天空深度图像对第一深度图像进行二值化操作以得到二值图;
根据该二值图和该彩色图像确定用于指示该天空区域的天空彩色图像。
在一种可能实现方式中,根据该天空彩色图像确定该大气光值包括:
从该天空彩色图像中确定最亮部分的像素点;
将该最亮部分的像素点的平均值确定为该大气光值。
第二方面,本申请实施例还提供了一种图像处理方法,该第一深度图像和所述彩色图像分别是由深度传感器和相机针对同一场景拍摄的图像;
根据该第一深度图像确定大气光值;
根据该第一深度图像确定透射率;
根据该大气光值和该透射率对所述彩色图像进行修复处理。
在一种可能实现方式中,根据该第一深度图像确定大气光值包括:
从所述第一深度图像中确定用于指示天空区域的天空深度图像;
根据所述天空深度图像和所述彩色图像确定所述大气光值。
在一种可能实现方式中,从该第一深度图像中确定用于指示天空区域的天空深度图像包括:
通过滑窗法从该第一深度图像中确定该天空深度图像。
在一种可能实现方式中,根据该天空深度图像和该彩色图像确定该大气光值包括:
利用该天空深度图像从该彩色图像中确定出用于指示天空区域的天空彩色图像根据该天空彩色图像确定该大气光值。
第三方面,本申请实施例还提供了一种图像处理装置,包括:
图像获取模块,用于获取第一深度图像和彩色图像,该第一深度图像和该彩色图像分别是由深度传感器和相机针对同一场景拍摄的图像;大气光值计算模块,用于计算大气光值;透射率计算模块,用于对该第一深度图像进行深度补全以得到第二深度图像,并且根据该第二深度图像确定透射率;
在一种可能实现方式中,该透射率计算模块具体用于:将该彩色图像和第二中间特征图像提供给第一深度神经网络以得到第三深度图像,该第二中间特征图像为第二 深度神经网络的中间层产生的特征图像;将该第一深度图像和第一中间特征图像提供给该第二深度神经网络以得到第四深度图像,该第一中间特征图像为第一神经网络的中间层产生的特征图像;将该第三深度图像和该第四深度图像进行融合操作以得到该第二深度图像。
在一种可能实现方式中,该大气光值计算模块具体用于:根据该第一深度图像确定该大气光值。
在一种可能实现方式中,该大气光值计算模块具体用于:从该第一深度图像中确定用于指示天空区域的天空深度图像;根据该天空深度图像和该彩色图像确定该大气光值。
在一种可能实现方式中,该大气光值计算模块具体用于:通过滑窗法从该第一深度图像中确定该天空深度图像。
在一种可能实现方式中,该大气光值计算模块具体用于:利用该天空深度图像从该彩色图像中确定出用于指示天空区域的天空彩色图像;根据该天空彩色图像确定该大气光值。
第四方面,本申请实施例还提供了一种图像处理装置,包括:
图像获取模块,用于获取第一深度图像和彩色图像,该第一深度图像和该彩色图像分别是由深度传感器和相机针对同一场景拍摄的图像;大气光值计算模块,用于根据该第一深度图像确定大气光值;透射率计算模块,用于根据该第一深度图像确定透射率;图像修复模块,用于根据大气光值和透射率确定修复后的图像。
在一种可能实现方式中,该大气光值计算模块具体用于:从该第一深度图像中确定用于指示天空区域的天空深度图像;根据该天空深度图像和该彩色图像确定该大气光值。
在一种可能实现方式中,该大气光值计算模块具体用于:通过滑窗法从该第一深度图像中确定该天空深度图像。
在一种可能实现方式中,该大气光值计算模块具体用于:利用该天空深度图像从该彩色图像中确定出用于指示天空区域的天空彩色图像;根据该天空彩色图像确定该大气光值。
第五方面,本申请实施例还提供了一种图像处理装置,该图像处理装置可以为处理器。例如,图形处理器(Graphics Processing Unit,GPU)、神经网络处理器(Neural-network Processing Unit,NPU)、高级精简指令集处理器(Advanced RISC Machines,ARM)等,可选的,该图像处理装置还包括储存器。其中,该存储器用于存储计算机程序或指令,处理器与存储器耦合,当处理器执行该计算机程序或指令时,使图像处理装置执行上述方法实施例中由处理器所执行的方法。
第六方面,本申请提供了一种计算机可读存储介质,该计算机可读存储介质存储有计算机程序,当该计算机程序被运行时,实现上述第一方面或第二方面中由处理器执行的方法。
第七方面,提供一种包含指令的计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行上述第一方面或第二方面中的任意一种实现方式中的方 法。
第八方面,提供一种芯片,所述芯片包括处理器与数据接口,所述处理器通过所述数据接口读取存储器上存储的指令,执行上述第一方面或第二方面中的任意一种实现方式中的方法。
本申请中,处理器和图像处理装置的名字对设备本身不构成限定,在实际实现中,这些设备可以以其他名称出现。只要各个设备的功能和本申请类似,属于本申请权利要求及其等同技术的范围之内。
附图说明
图1为本申请实施例提供的车辆100的一个功能框图示意;
图2为本申请实施例提供的手机200的结构图;
图3为本发明实施例提供的一种图像处理方法;
图4为本发明实施例提供的一种对深度图像进行深度补全的实现方式;
图5为本发明实施例提供的一种图像处理装置结构示意图;
图6为本发明又一个实施例提供的一种图像处理装置结构示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例进行描述。
本申请的说明书和权利要求书及所述附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
首先,对本申请中的部分用语进行解释说明,以便于本领域技术人员理解。
1、深度图像:深度图像也可以称作距离图像,深度图像中每个像素值反映深度传感器距离物体的距离。
2、彩色图像:彩色图像主要分为两种类型:RGB及CMYK。其中RGB的彩色图像是由三种不同颜色成分组合而成,一个为红色,一个为绿色,另一个为蓝色;
CMYK类型的图像则由四个颜色成分组成:青C、品M、黄Y、黑K。CMYK类型的图像主要用于印刷行业。
3、配准:本申请中的配准指的是图像配准,图像配准是将同一个场景的不同图像转换到同样的坐标系统中的过程。这些图像可以是不同时间拍摄的(多时间配准),可以是不同传感器拍摄的(多模配准),可以是不同视角拍摄的。这些图像之间的空 间关系可能是刚体的(平移和旋转)、仿射的(例如错切),也有可能是单应性的,或者是复杂的大型形变模型。
在本申请中,彩色图像和深度图像通常是配准的,因此,彩色图像和深度图像之间的像素点之间具有一对一的对应关系。
4、深度补全:一般是指将稀疏深度图像补全为稠密深度图像,补全后的稠密深度图像中包含的非零像素的数量要多于稀疏深度图像中包含的非零像素的数量。
本申请主要涉及一种图像处理方法,该图像处理方法主要是利用深度图像中的深度信息对彩色图像做修复处理以得到增强的彩色图像,比如,针对包含烟、雾或霾等干扰信息的彩色图像,利用本申请的图像处理方法可以实现去烟、去雾或去霾,即增强的彩色图像为去烟、去雾或去霾之后的彩色图像。
在介绍具体的图像处理方法之前,先对本申请技术方案所适用的场景进行相关介绍。
场景一:车载感知
图1是本申请实施例提供的车辆100的一个功能框图示意。可以将车辆100配置为完全或部分自动驾驶模式。例如:车辆100可以通过感知系统120获取其周围的环境信息,并基于对周边环境信息的分析得到自动驾驶策略以实现完全自动驾驶,或者将分析结果呈现给用户以实现部分自动驾驶。
车辆100可包括各种子系统,例如信息娱乐系统110、感知系统120、决策控制系统130、驱动系统140以及计算平台150。可选地,车辆100可包括更多或更少的子系统,并且每个子系统都可包括多个部件。另外,车辆100的每个子系统和部件可以通过有线或者无线的方式实现互连。
在一些实施例中,信息娱乐系统110可以包括通信系统111,娱乐系统112以及导航系统113。
通信系统111可以包括无线通信系统,无线通信系统可以直接地或者经由通信网络来与一个或多个设备无线通信。例如,无线通信系统可使用3G蜂窝通信,例如CDMA、EVD0、GSM/GPRS,或者4G蜂窝通信,例如LTE。或者5G蜂窝通信。无线通信系统可利用WiFi与无线局域网(wireless local area network,WLAN)通信。在一些实施例中,无线通信系统可利用红外链路、蓝牙或ZigBee与设备直接通信。其他无线协议,例如各种车辆通信系统,例如,无线通信系统可包括一个或多个专用短程通信(dedicated short range communications,DSRC)设备,这些设备可包括车辆和/或路边台站之间的公共和/或私有数据通信。
娱乐系统112可以包括中控屏,麦克风和音响,用户可以基于娱乐系统在车内收听广播,播放音乐;或者将手机和车辆联通,在中控屏上实现手机的投屏,中控屏可以为触控式,用户可以通过触摸屏幕进行操作。在一些情况下,可以通过麦克风获取用户的语音信号,并依据对用户的语音信号的分析实现用户对车辆100的某些控制,例如调节车内温度等。在另一些情况下,可以通过音响向用户播放音乐。
导航系统113可以包括由地图供应商所提供的地图服务,从而为车辆100提供行驶路线的导航,导航系统113可以和车辆的全球定位系统121、惯性测量单元122配 合使用。地图供应商所提供的地图服务可以为二维地图,也可以是高精地图。
感知系统120可包括感测关于车辆100周边的环境的信息的若干种传感器。例如,感知系统120可包括全球定位系统121(全球定位系统可以是GPS系统,也可以是北斗系统或者其他定位系统)、惯性测量单元(inertial measurement unit,IMU)122、激光雷达123、毫米波雷达124、超声雷达125以及摄像装置126,其中,摄像装置126也可以称作摄像头。感知系统120还可包括被监视车辆100的内部系统的传感器(例如,车内空气质量监测器、燃油量表、机油温度表等)。来自这些传感器中的一个或多个的传感器数据可用于检测对象及其相应特性(位置、形状、方向、速度等)。这种检测和识别是车辆100的安全操作的关键功能。
全球定位系统121可用于估计车辆100的地理位置。
惯性测量单元122用于基于惯性加速度来感测车辆100的位置和朝向变化。在一些实施例中,惯性测量单元122可以是加速度计和陀螺仪的组合。
激光雷达123可利用激光来感测车辆100所位于的环境中的物体。在一些实施例中,激光雷达123可包括一个或多个激光源、激光扫描器以及一个或多个检测器,以及其他系统组件。
毫米波雷达124可利用无线电信号来感测车辆100的周边环境内的物体。在一些实施例中,除了感测物体以外,雷达126还可用于感测物体的速度和/或前进方向。
超声雷达125可以利用超声波信号来感测车辆100周围的物体。
摄像装置126可用于捕捉车辆100的周边环境的图像信息。摄像装置126可以包括单目相机、双目相机、结构光相机以及全景相机等,摄像装置126获取的图像信息可以包括静态图像,也可以包括视频流信息。
通过感知系统120中部署的传感器可以获取深度图像和彩色图像,比如,通过激光雷达123和摄像装置126分别获取同一个场景下的深度图像和彩色图像。
决策控制系统130包括基于感知系统120所获取的信息进行分析决策的计算系统131,决策控制系统130还包括对车辆100的动力系统进行控制的整车控制器132,以及用于控制车辆100的转向系统133、油门134和制动系统135
计算系统131可以操作来处理和分析由感知系统120所获取的各种信息以便识别车辆100周边环境中的目标、物体和/或特征。该目标可以包括行人或者动物,该物体和/或特征可包括交通信号、道路边界和障碍物。计算系统131可使用物体识别算法、运动中恢复结构(Structure from Motion,SFM)算法、视频跟踪等技术。在一些实施例中,计算系统131可以用于为环境绘制地图、跟踪物体、估计物体的速度等等。计算系统131可以将所获取的各种信息进行分析并得出对车辆的控制策略。
整车控制器132可以用于对车辆的动力电池和引擎141进行协调控制,以提升车辆100的动力性能。
转向系统133可操作来调整车辆100的前进方向。例如在一个实施例中可以为方向盘系统。
油门134用于控制引擎141的操作速度并进而控制车辆100的速度。
制动系统135用于控制车辆100减速。制动系统135可使用摩擦力来减慢车轮144。 在一些实施例中,制动系统135可将车轮144的动能转换为电流。制动系统135也可采取其他形式来减慢车轮144转速从而控制车辆100的速度。
驱动系统140可包括为车辆100提供动力运动的组件。在一个实施例中,驱动系统140可包括引擎141、能量源142、传动系统143和车轮144。引擎141可以是内燃机、电动机、空气压缩引擎或其他类型的引擎组合,例如汽油发动机和电动机组成的混动引擎,内燃引擎和空气压缩引擎组成的混动引擎。引擎141将能量源142转换成机械能量。
能量源142的示例包括汽油、柴油、其他基于石油的燃料、丙烷、其他基于压缩气体的燃料、乙醇、太阳能电池板、电池和其他电力来源。能量源142也可以为车辆100的其他系统提供能量。
传动装置143可以将来自引擎141的机械动力传送到车轮144。传动装置143可包括变速箱、差速器和驱动轴。在一个实施例中,传动装置143还可以包括其他器件,比如离合器。其中,驱动轴可包括可耦合到一个或多个车轮121的一个或多个轴。
车辆100的部分或所有功能受计算平台150控制。计算平台150可包括至少一个处理器151,处理器151可以执行存储在例如存储器152这样的非暂态计算机可读介质中的指令153。在一些实施例中,计算平台150还可以是采用分布式方式控制车辆100的个体组件或子系统的多个计算设备。
处理器151可以是任何常规的处理器,诸如商业可获得的CPU。替选地,处理器151还可以包括诸如图像处理器(Graphic Process Unit:GPU),现场可编程门阵列(Field Programmable Gate Array:FPGA)、片上系统(Sysem on Chip:SOC)、专用集成芯片(Application Specific Integrated Circuit:ASIC)或它们的组合。尽管图1功能性地图示了处理器、存储器、和在相同块中的计算机110的其它元件,但是本领域的普通技术人员应该理解该处理器、计算机、或存储器实际上可以包括可以或者可以不存储在相同的物理外壳内的多个处理器、计算机、或存储器。例如,存储器可以是硬盘驱动器或位于不同于计算机110的外壳内的其它存储介质。因此,对处理器或计算机的引用将被理解为包括对可以或者可以不并行操作的处理器或计算机或存储器的集合的引用。不同于使用单一的处理器来执行此处所描述的步骤,诸如转向组件和减速组件的一些组件每个都可以具有其自己的处理器,所述处理器只执行与特定于组件的功能相关的计算。
在此处所描述的各个方面中,处理器可以位于远离该车辆并且与该车辆进行无线通信。在其它方面中,此处所描述的过程中的一些在布置于车辆内的处理器上执行而其它则由远程处理器执行,包括采取执行单一操纵的必要步骤。
在一些实施例中,存储器152可包含指令153(例如,程序逻辑),指令153可被处理器151执行来执行车辆100的各种功能。存储器152也可包含额外的指令,包括向信息娱乐系统110、感知系统120、决策控制系统130驱动系统140中的一个或多个发送数据、从其接收数据、与其交互和/或对其进行控制的指令。
除了指令153以外,存储器152还可存储数据,例如道路地图、路线信息,车辆的位置、方向、速度以及其它这样的车辆数据,以及其他信息。这种信息可在车辆100 在自主、半自主和/或手动模式中操作期间被车辆100和计算平台150使用。
计算平台150可基于从各种子系统(例如,驱动系统140、感知系统120和决策控制系统130)接收的输入来控制车辆100的功能。例如,计算平台150可利用来自决策控制系统130的输入以便控制转向系统133来避免由感知系统120检测到的障碍物。在一些实施例中,计算平台150可操作来对车辆100及其子系统的许多方面提供控制。
本实施例中,计算平台150可以从感知系统120中获取深度图像和彩色图像,利用深度图像中的深度信息对彩色图像进行修复处理以获得增强的彩色图像,具体地,该修复处理的实现可以软件的形式存储在存储器152中,由处理器151调用存储器152中的指令153来执行该修复处理。在获取增强的彩色图像之后,计算平台150可以将增强的彩色图像输出到其他系统以做进一步处理,比如将增强的彩色图像输出到信息娱乐娱乐系统110以供驾驶员观察到增强后的彩色图像,或者将增强的彩色图像系统输出到决策控制系统130以做相关决策处理。
可选地,上述这些组件中的一个或多个可与车辆100分开安装或关联。例如,存储器152可以部分或完全地与车辆100分开存在。上述组件可以按有线和/或无线方式来通信地耦合在一起。
可选地,上述组件只是一个示例,实际应用中,上述各个模块中的组件有可能根据实际需要增添或者删除,图1不应理解为对本申请实施例的限制。
在道路行进的自动驾驶汽车,如上面的车辆100,可以识别其周围环境内的物体以确定对当前速度的调整。所述物体可以是其它车辆、交通控制设备、或者其它类型的物体。在一些示例中,可以独立地考虑每个识别的物体,并且基于物体的各自的特性,诸如它的当前速度、加速度、与车辆的间距等,可以用来确定自动驾驶汽车所要调整的速度。
可选地,车辆100或者与车辆100相关联的感知和计算设备(例如计算系统131、计算平台150)可以基于所识别的物体的特性和周围环境的状态(例如,交通、雨、道路上的冰、等等)来预测所述识别的物体的行为。可选地,每一个所识别的物体都依赖于彼此的行为,因此还可以将所识别的所有物体全部一起考虑来预测单个识别的物体的行为。车辆100能够基于预测的所述识别的物体的行为来调整它的速度。换句话说,自动驾驶汽车能够基于所预测的物体的行为来确定车辆将需要调整到(例如,加速、减速、或者停止)什么稳定状态。在这个过程中,也可以考虑其它因素来确定车辆100的速度,诸如,车辆100在行驶的道路中的横向位置、道路的曲率、静态和动态物体的接近度等等。
除了提供调整自动驾驶汽车的速度的指令之外,计算设备还可以提供修改车辆100的转向角的指令,以使得自动驾驶汽车遵循给定的轨迹和/或维持与自动驾驶汽车附近的物体(例如,道路上的相邻车道中的轿车)的安全横向和纵向距离。
上述车辆100可以为轿车、卡车、摩托车、公共汽车、船、飞机、直升飞机、割草机、娱乐车、游乐场车辆、施工设备、电车、高尔夫球车、火车等,本申请实施例不做特别的限定。
场景二:终端拍照
图2为本申请实施例提供的手机200的结构图,手机200仅仅是终端的一个范例,并且手机200可以具有比图2中所示出的更过的或者更少的部件,可以组合两个或更多的部件,或者可以具有不同的部件配置。图2中所示出的各种部件可以在包括一个或多个信号处理和/或专用集成电路在内的硬件、软件、或硬件和软件的组合中实现。
如图2所示,手机200包括:RF(Radio Frequency,射频)电路210、存储器220、输入单元230、显示单元240、传感器250、音频电路260、无线保真(Wireless Fidelity,Wi-Fi)模块270、处理器280、以及电源等部件。本领域技术人员可以理解,图2中示出的手机结构并不构成对手机的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
下面结合图2对手机200的各个构成部件进行具体的介绍:
RF电路210可用于收发信息或通话过程中,信号的接收和发送,可以将基站的下行信息接收后,给处理器280处理;另外,将涉及上行的数据发送给基站。通常,RF电路包括但不限于天线、至少一个放大器、收发信机、耦合器、低噪声放大器、双工器等器件。此外,RF电路210还可以通过无线通信与网络和其他移动设备通信。所述无线通信可以使用任一通信标准或协议,包括但不限于全球移动通讯系统、通用分组无线服务、码分多址、宽带码分多址、长期演进、电子邮件、短消息服务等。
存储器220可用于存储软件程序及数据。处理器280通过运行存储在存储器220的软件程序及数据,从而执行手机200的各种功能以及数据处理。存储器220可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据手机200的使用所创建的数据(比如音频数据、电话本等)等。此外,存储器220可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。在以下实施例中,存储器220存储有使得手机200能运行的操作系统,例如苹果公司所开发的
Figure PCTCN2021123940-appb-000002
操作系统,谷歌公司所开发的
Figure PCTCN2021123940-appb-000003
开源操作系统,微软公司所开发的
Figure PCTCN2021123940-appb-000004
操作系统等。
输入单元230(如触摸屏)可用于接收输入的数字或字符信息,以及产生与手机200的用户设置以及功能控制有关的信号输入。具体地,输入单元230可以包括如图1所示设置在手机200正面的触控面板231,可收集用户在其上或附近的触摸操作(比如用户使用手指、触笔等任何适合的物体或附件在触控面板231上或在触控面板231附近的操作),并根据预先设定的程式驱动相应的连接装置。可选的,触控面板231可包括触摸检测装置和触摸控制器两个部分(图2中未示出)。其中,触摸检测装置检测用户的触摸方位,并检测触摸操作带来的信号,将信号传送给触摸控制器;触摸控制器从触摸检测装置上接收触摸信息,并将它转换成触点坐标,再送给处理器280,并能接收处理器280发来的指令并加以执行。此外,可以采用电阻式、电容式、红外线以及表面声波等多种类型实现触控面板231。
显示单元240(即显示屏)可用于显示由用户输入的信息或提供给用户的信息以及手机200的各种菜单的图形用户界面(Graphical User Inter face,GUI)。显示单元240可包括设置在手机200正面的显示面板241。其中,显示面板241可以采用液晶 显示器、发光二极管等形式来配置。
手机200还可以包括至少一种传感器250,比如摄像头装置、深度传感器、光传感器、运动传感器以及其他传感器。具体地,摄像装置可以为彩色摄像头,用来拍摄彩色图像,深度传感器可以用来确定手机200到物体的深度信息,光传感器可包括环境光传感器及接近传感器。作为运动传感器的一种,加速计传感器可检测各个方向上(一般为三轴)加速度的大小,静止时可检测出重力的大小及方向,可用于识别手机姿态的应用(比如横竖屏转化、相关游戏、磁力计姿态校准)、振动识别相关功能(比如计步器、敲击)等;至于手机200还可配置的陀螺仪、气压计、湿度计、温度计、红外线传感器等其他传感器,在此不再赘述。
音频电路260、扬声器261,麦克风262可提供用户与手机200之间的音频接口。音频电路260可将接收到的音频数据转换后的电信号,传输到扬声器261,由扬声器261转换为声音信号输出;另一方面,麦克风262将收集的声音信号转换为电信号,由音频电路260接收后转换为音频数据,再将音频数据输出至RF电路210以发送给比如另一手机,或者将音频数据输出至存储器220以便进一步处理。
Wi-Fi属于短距离无线传输技术,手机200可以通过Wi-Fi模块270帮助用户收发电子邮件、浏览网页和访问流媒体等,它为用户提供了无线的宽带互联网访问。
处理器280是手机200的控制中心,利用各种接口和线路连接整个手机的各个部分,通过运行或执行存储在存储器220内的软件程序,以及调用存储在存储器220内的数据,执行手机200的各种功能和处理数据,从而对手机进行整体监控。在一些实施例中,处理器280可包括一个或多个处理单元;处理器280还可以集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器180中。
本实施例中,处理器280从传感器250中获取深度图像和彩色图像,利用深度图像中的深度信息对彩色图像进行修复处理以获得增强的彩色图像,具体地,该修复处理的实现可以软件的形式存储在存储器220中,由处理器280调用存储器220中的指令来执行该修复处理。在获取增强的彩色图像之后,处理器280可以将增强的彩色图像输出到其他系统以做进一步处理,比如将增强的彩色图像输出到显示单元240以供用户观看增强后的彩色图像。
蓝牙模块281,用于通过蓝牙这种短距离通讯协议来与其他设备进行信息交互。例如,手机200可以通过蓝牙模块281与同样具备蓝牙模块的可穿戴电子设备(例如智能手表)建立蓝牙连接,从而进行数据交互。
手机200还包括给各个部件供电的电源290(比如电池)。电源可以通过电源管理系统与处理器280逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗等功能。可以理解的是,在以下实施例中,电源290可以用于给显示面板241及触控面板231供电。
需要说明的是,以上只是示意性的介绍了两个场景,本申请对于未在本申请中介绍的其他适应的场景不做限定,比如,本申请还可以应用于安防监控场景等其他计算 机视觉场景。
图3为本发明实施例提供的一种图像处理方法,该图像处理方法主要是利用深度图像中的深度信息对彩色图像做修复处理以得到增强的彩色图像。具体地,该图像方法包括以下步骤:
S301:获取彩色图像和第一深度图像。
具体地,深度传感器获取第一深度图像,相机(或称“摄像头”)获取彩色图像,该彩色图像可以是原始彩色图像。
值得一提的是,彩色图像与第一深度图像可通过配对并校准的彩色相机与深度传感器在同一位置同时对同一场景进行拍摄,然后将所得的两种图像进行配准而得到,或者根据需要从本地存储器或本地数据库被获取,或者通过输入装置或传输媒介而从外部数据源(例如,互联网、服务器、数据库等)被接收,等等。彩色图像和第一深度图像是相互对应的图像,例如,可通过图像配准,将传感器采集到的彩色图像与第一深度图像投影到相同的坐标系中,使得两种图像像素一一对应。考虑到配准是一个现有技术,在本发明中不做具体描述。
该深度传感器用于感测环境的深度信息,采用的技术方案可以是单目立体视觉,双目立体视觉(Stereo),结构光(Structure Light),飞行时间(Time of Flight),激光雷达(LiDAR),相机阵列(Camera arrays)或是其他深度感测技术中的一种,在此不做具体限定。为了更清楚阐述本发明的具体实施例,本发明的一种具体实施例采用激光雷达作为深度传感器。
激光雷达通过激光发射器向空间发射激光,随后计算物体反射回激光接收器的相应时间,据此推断物体与激光雷达之间的距离;并通过此过程中收集到的目标对象表面大量密集的点的三维坐标,以此快速判断物体的方位、高度甚至是形状。但是,激光发射器向天空等其他较远的区域发射激光并不会返回到激光接收器,也就无法得到对应区域的点。
S302:确定大气光值。
具体地,可以根据第一深度图像确定该大气光值。
进一步,根据第一深度图像确定该大气光值包括S3021和S3022。
S3021:从第一深度图像中确定用于指示天空区域的天空深度图像。
具体地,可以采用滑窗法从第一深度图像中确定该天空深度图像。
首先可以确定滑窗的形状,本申请实施例中可以将滑窗的形状设置为一行像素大小的长方形,滑窗的步长为1,一行像素大小的长方形是指宽度为第一深度图像的宽度且长度为1个像素的长方形。
在确定出滑窗的形状之后,可以采用自上而下的滑动方式从第一深度图像中确定该天空深度图像,具体地,滑窗的起始位置处于第一深度图像的最上方并且将起始行作为天空区域的上限,该滑窗的最上端与第一深度图像的最上端贴合,记录当前滑窗内的像素值之和以及当前行数,判断滑窗内的像素值之和是否为零,若像素值之和为零,则滑窗按照步长往下一行滑动,再次记录当前滑窗的像素值之和以及当前行数,一直往下滑动,直至找到某一行滑窗的像素值之和为非零,然后将该行作为天空区域 的下限,由天空区域上限和天空区域下限组成的区域即为第一深度图像中的天空深度图像。
需要说明的是,本申请对滑窗的形状不做具体限定,可以是长方形,也可以是圆形,或是其他形状以及其他形状的组合。本发明对滑窗的起始位置和移动顺序也不做具体限定,该滑窗的起始位置可以是图像的任意一个区域,该滑窗的移动顺序可以是自上而下、自下而上、从左到右或从右到左。
S3022:根据该天空深度图像和该彩色图像确定该大气光值。
具体地,利用该天空深度图像对第一深度图像进行二值化操作以得到第一深度图像的二值图,然后根据该二值图和该彩色图像确定用于指示所述天空区域的天空彩色图像,最后根据该天空彩色图像确定该大气光值。
其中,第一深度图像的二值图为由0值和1值组成的图,其中,深度图像中天空区域的像素值在二值图中对应的值为1,深度图像中非天空区域的像素值在二值图中对应的值为0。
根据该二值图和该彩色图像确定用于指示所述天空区域的天空彩色图像可以通过将第一深度图像的二值图和彩色图像中的像素点的对应位置进行相乘以得到新的彩色图像,该新的彩色图像即为用于指示天空区域的天空彩色图像。
根据该天空彩色图像确定该大气光值具体包括:首先从该天空彩色图像中确定最亮部分的像素点,最亮部分的像素点为该天空彩色图像中亮度靠前的部分像素点,即最亮部分的像素点,比如最亮的0.1%的像素点,然后将最亮部分的像素点的平均值确定为该大气光值。一般地,该大气光值的范围为灰度值0~255,取值越大,代表大气光亮度越高。
需要说明的是,本申请也可以采用其他计算大气光值的方法来和本申请实施例提供的确定透射率的方式进行结合以确定最终修复的图像。
S303:对第一深度图像进行深度补全以得到第二深度图像,并且根据第二深度图像确定透射率。
具体地,由于第一深度图像一般是稀疏深度图像,因此,需要对第一深度图像进行深度补全以得到一个稠密深度图像,即第二深度图像,在确定出第二深度图像之后,再根据第二深度图像确定该透射率。其中,稀疏深度图像一般是指包含少量非零像素的深度图像,稠密深度图像是指包含较多非零图像的深度图像,即比稀疏深度图像包含更多的非零像素。
为方便描述,此处先不具体描述如何将第一深度图像(即稀疏深度图像)进行深度补全以得到第二深度图像(即稠密深度图像),下文图4所对应的实施例中将对深度补全技术做具体介绍。
如下将对如何根据第二深度图像确定透射率做进一步说明:
首先选取不同的大气散射系数β∈{0.4,0.6,0.8,1.0,1.4,1.6},然后将第二深度图像作为深度信息d(x)代入如下公式(2)即可得到透视率t(x)。
t(x)=e -βd(x)  (2)
其中,透射率又可以被称为透射系数或透射比,指的是透射光通量与入射光通量 之比,取值范围为0~100%。
S304:根据该大气光值和该透射率对该彩色图像进行修复处理。
具体地,根据大气光值和透视率t(x),利用如下大气退化模型公式,对该彩色图像进行修复处理以得到修复后的图像J(x),其中,修复处理是指对待修复的图像,即原始彩色图像,进行相关图像处理以得到一个清楚的彩色图像,例如,将包含有雾/霾/烟的彩色图像修复成去雾/霾/烟的彩色图像。
Figure PCTCN2021123940-appb-000005
其中,I(x)是彩色图像,即待修复的图像,比如包含有雾/霾/烟的图像,t(x)是透射率,A是大气光值,J(x)是修改后的图像,比如已经去雾/霾/烟的图像。
本实施例中采用深度图像来确定透射率,并且是经过深度补全后的深度图像,可以提高透射率的精度,进而提高图像的处理效果。
图4为本发明实施例提供的一种对深度图像进行深度补全的实现方式。
具体地,将彩色图像和第一深度图像输入深度图像分别输入给第一深度神经网络310和第二深度神经网络320以分别得到第三深度图像和第四深度图像,然后将第三深度图像和第四深度图像输入给融合模块330进行融合处理以得到第二深度图像,即补全后的稠密深度图像。
第一深度神经网路310和第二深度神经网络320结构相同。
第一深度神经网络310包括第一预处理网络311、第一编码器312和第一解码器313。第一预处理网络311用于将输入给第一深度神经网络310的彩色图像变换为适用于第一编码器312处理的第一特征图像,第一编码器312用于对第一特征图像进行特征编码,第一解码器313用于对第一编码器312输出的第二特征图像进行特征解码。第一编码器312包括N次下采样动作,第一解码器313包括N次上采样动作。
第二深度神经网络320包括第二预处理网络321、第二编码器322和第二解码器323。第二预处理网络321用于将输入给第二深度神经网络320的第一深度图像变换为适用于第二编码器322处理的第三特征图像,第二编码器322用于对第三特征图像进行特征编码,第二解码器323用于对第二编码器322网络输出的第四特征图像进行特征解码。第二编码器322包括N次下采样动作,第二解码器323包括N次上采样动作。
第一深度神经网络310根据彩色图像以及第二深度神经网络320中的一些中间层特征图像以获得第三深度图像。因此,第一深度神经网络310的输入包括两处输入,一处输入为彩色图像,另一处输入为第二深度神经网络320中的一些中间层输出的特征图像,第一深度神经网络310的输出为第三深度图像。
第二深度神经网络320根据第一深度图像以及第一深度神经网络310中的一些中间层特征图像以获得第四深度图像。因此,第二深度神经网络320的输入包括两处输入,一处输入为第一深度图像,另一处输入为第一深度神经网络310的中间层输出的特征图像,第二深度神经网络320的输出为深度图像。
融合模块330可用于将第一深度神经网络310输出的第三深度图像与第二深度神经网络320输出的第四深度图像进行融合以产生补全后的第二深度图像。
其中,融合操作的具体实现可包括:将第三深度图像和第四深度图像进行拼接(concat)处理,然后将拼接后的图像进行至少一个卷积操作即可得到补全后的第二深度图像。
如下对上述第一深度神经网络310和第二深度神经网络320的具体实现步骤做进一步说明。
首先对第一深度神经网络310的具体实现过程进行说明:
S11:第一预处理网络311对彩色图像进行预处理以得到第一特征图像。
具体地,第一预处理网络311可用于将输入的彩色图像变换为适用于第一编码器312处理的第一特征图像,并将所述第一特征图像输入到第一编码器312。第一预处理网络311可由至少一层卷积层构成。第一预处理网络311对彩色图像进行卷积处理,使该彩色图像的通道数改变,而不改变尺寸。
可理解是,输入第一深度神经网络的彩色图像为RGB格式。则该彩色图像的通道数为3,分别是R(红色),G(绿色),B(蓝色)。例如,输入的彩色图像的尺寸为h*w*3,则该彩色图像经过第一预处理网络311后,输出的第一特征图像的尺寸为h*w*16。
彩色图像通过第一预处理网络311后输出的第一特征图像和第三特征图像的通道数保持一致。
S12:将第一特征图像输入第一编码器312以得到第二特征图像。
具体地,将第一特征图像输入第一编码器312。第一编码器312可由N个编码单元构成。每个编码单元结构相同,编码步骤也均相同。
以第一编码器312内的第一编码单元314为例,第一编码单元314包括至少一个卷积层和下采样层,卷积层用于提取特征,卷积层的操作不改变输入特征图的尺寸大小。至少一个卷积层输出的特征图像可保存下来,并且至少一个卷积层输出的特征图像与第二深度神经网络320中的中间层产生的特征图像进行拼接,图4中符合
Figure PCTCN2021123940-appb-000006
表示拼接。具体地,第二深度神经网络320中的中间层产生的特征图像可以为第二深度神经网络320中第二编码器322的编码单元324中卷积层操作后产生的特征图。拼接操作后输出的特征图像通道数增加一倍。之后,将拼接后的特征图像再经过一个用于实现通道降维的卷积层(例如1*1的卷积层)以实现将拼接后的特征图像的通道数降为拼接前的通道数量。最后,将经过卷积层处理之后输出的特征图像进行下采样处理。
需要说明的是,图4所示的第一编码单元314的结构只是一种示例,实际中,编码单元的结构可以做一定程度的调整,比如卷积层的数量有调整,本申请对此不做限定。
本申请对第一编码器312内的编码单元数量N不作限制,对每个编码单元内的卷积层数量和下采样系数不作具体限制。
为了使以上步骤更加清楚,这里以一个具体的实例进行说明。
例如,将一个尺寸为h*w*16的第一特征图像输入到第一编码器312,其中16为通道数,h为像素长度,w为像素宽度。该特征图像在经过所有的卷积层提取特征后,输出的特征图像的尺寸为h*w*16。接着上一步输出的特征图像与第二深度神经网络 320中的中间层产生的特征图进行拼接,输出的特征图像通道数增加一倍,即图像的尺寸为h*w*32。然后上一步输出的特征图像经由一个1*1的卷积层,该卷积层用于将拼接后的特征图像的通道数重新降为16,即尺寸重新变为h*w*16。最后,上一步输出的特征图像经由一个下采样系数为1/2的下采样处理后,像素长度h变为原来的1/2,像素宽度w变为原来的1/2,通道数增加一倍,即得到一个尺寸为1/2h*1/2w*32的特征图像。
S13:将第二特征图像输入第一解码器313以得到第三深度图像。
具体地,将第一编码器312输出的第二特征图像输入第一解码器313,第一解码器313输出第三深度图像。第一解码器网络313可由N个解码单元构成,其中,N为大于1的整数,本申请对N的值不作限制。
第一解码器313中每个解码单元结构和解码步骤可以相同,以第一解码器313内的第一解码单元315为例,第一解码单元315包括一个上采样层和至少一个卷积层。首先,第二特征图像经过一个上采样处理以得到上采样后的输出特征图像,然后将上采样后产生的特征图像、第一编码器312中的中间层产生的特征图和第二深深度神经网络320中第二解码器323的中间层产生的特征图进行拼接。之后,将拼接后的特征图像首先经过一个可以实现通道降维的卷积层,比如1*1的卷积层,再将通道降维后的输出特征图像经由至少一层卷积层,其中,第一解码单元315中所包括的卷积层均用于提取特征且不改变输入特征图的尺寸大小。另外,上采样后产生的特征图像会提供给第二深度神经网络320的第二解码器323的解码单元做拼接处理。
需要说明的是,图4所示的第一解码单元315的结构只是一种示例,实际中,解码单元的结构可以做一定程度的调整,比如卷积层的数量有调整,本申请对此不做限定。
本发明对第一解码器313内的解码单元数量N不作限制,对每个解码单元内的卷积层数量和上采样系数不作具体限制。为了使以上步骤更加清楚,这里以一个具体的实例进一步说明。
例如,将一个尺寸为1/8h*1/8w*128的第二特征图像317输入到第一解码器313。该该特征图像经由一个上采样系数为1/2的上采样卷积层后,像素长度1/8h变为1/4h,像素宽度1/8w变为原来的1/4h,通道数变为原来的一半,即得到一个尺寸为
1/4h*1/4w*64的特征图像。接着,将上一步输出的特征图像、第一深度神经网络310中第一编码器312的中间层产生的特征图和第二深度神经网络320中第二解码器323的中间层产生的特征图进行拼接,输出拼接后的特征图像通道数增加两倍,即输出的特征图像的尺寸为1/4h*1/4w*192。然后,上一步输出的特征图像经由一个1*1的卷积层,该卷积层用于将拼接后的特征图像的通道数重新降为拼接前的通道数量,即尺寸重新变为1/4h*1/4w*64。最后,经过所有卷积层提取特征后,输出的特征图像尺寸为1/4h*1/4w*64。
接着对第二深度神经网络320的具体实现过程进行说明:
S21:第二预处理网络321对第一深度图像进行预处理以得到第三特征图像。
具体地,第二预处理网络321可用于将输入的第一深度图像变换为适于第二编码 器322处理的第三特征图像,并将第三特征图像输入到第二编码器322。第二预处理网络321可由至少一层卷积层构成,第二预处理网络321对第一深度图像进行卷积处理,使该第一深度图像的通道数改变,而不改变尺寸。
可理解是,输入第二神经网络的第一深度图像只有一个通道数,即灰度值。该灰度值表示场景中的某一个像素点距离深度传感器的距离。例如,输入的第一深度图像的尺寸为h*w*1,则该第一深度图像经过第二预处理网络321后,输出的第三特征图像的尺寸为h*w*16。
第一深度图像通过第二预处理网络321后输出的第三特征图像和第一特征图像的通道数保持一致。
S22:将第三特征图像输入第二编码器322以得到第四特征图像。
具体地,将第三特征图像输入第二编码器322。第二编码器322可由N个编码单元构成。每个编码单元结构相同,编码步骤也均相同。
以第二编码器322内的第二编码单元324为例,第二编码器322包括至少一个卷积层和采样层,卷积层用于提取特征,卷积层的操作不改变输入特征图的尺寸大小。至少一个卷积层输出的特征图像可保存下来,并且至少一个卷积层输出的特征图像与第一深度神经网络310中的中间层产生的特征图像进行拼接,具体地,第一深度神经网络310中的中间层产生的特征图像可以为第一深度神经网络310中第一编码器312的编码单元314中卷积层操作后产生的特征图。拼接操作后输出的特征图像通道数增加一倍。之后,将拼接后的特征图像经由一个用于实现通道降维的卷积层(例如1*1的卷积层)以实现将拼接后的特征图像的通道数降为拼接前的通道数量。最后,将经过卷积层处理之后输出的特征图像经由下采样处理。需要说明的是,图5所示的第二编码单元324的结构只是一种示例,实际中,编码单元的结构可以做一定程度的调整,比如卷积层的数量有调整,本申请对此不做限定。
本申请对第二编码器322的编码单元数量N不作限制,对每个编码单元内的卷积层数量和下采样系数不作具体限制。
为了使以上步骤更加清楚,这里以一个具体的实例进行说明。
例如,将一个尺寸为h*w*16的第三特征图像输入到第二编码器322,其中16为通道数,h为像素长度,w为像素宽度。该第三特征图像在经过所有的卷积层提取特征后,输出的特征图像的尺寸为h*w*16。接着上一步输出的特征图像与第一深度神经网络310中的中间层产生的特征图进行拼接,输出的特征图像通道数增加一倍,即图像的尺寸为h*w*32。然后上一步输出的特征图像经由一个1*1的卷积层,该卷积层用于将拼接后的特征图像的通道数重新降为16,即尺寸重新变为h*w*16。最后,上一步输出的特征图像经由一个下采样系数为1/2的下采样处理后,像素长度h变为原来的1/2,像素宽度w变为原来的1/2,通道数增加一倍,即得到一个尺寸为1/2h*1/2w*32的特征图像。
S23:将第四特征图像输入第二解码器323以得到第四深度图像。
具体地,将第四特征图像输入第二解码器323,第二解码器323输出第四深度图像。第二解码器323可由N个解码单元构成,其中,N为大于1的整数,本申请对N 的值不作限制。
第二解码器323中每个解码单元结构和解码步骤可以相同。以第二解码器323内的第二解码单元325为例,第二解码单元325包括一个上采样层和至少一个卷积层。首先,第四特征图像经过一个上采样处理以得到上采样后的输出特征图像,然后将上采样后产生的特征图像、第二编码器322中的中间层产生的特征图和第一深深度神经网络310中第一解码器313中的中间层产生的特征图进行拼接。之后,将拼接后的特征图像首先经过一个可以实现通道降维的卷积层,比如1*1的卷积层,再将通道降维后的输出特征图像经由至少一层卷积层,其中,第二解码单元325中所包括卷积层均用于提取特征且不改变输入特征图的尺寸大小。另外,上采样后产生的特征图像会提供给第一深度神经网络310的第一解码器313的解码单元做拼接处理。
需要说明的是,图4所示的第二解码单元325的结构只是一种示例,实际中,解码单元的结构可以做一定程度的调整,比如卷积层的数量有调整,本申请对此不做限定。
本发明对第二解码器323内的解码单元数量N不作限制,对每个解码单元内的卷积层数量和上采样系数不作具体限制。为了使以上步骤更加清楚,这里以一个具体的实例进一步说明。
例如,将一个尺寸为1/8h*1/8w*128的第四特征图像输入到第二解码器323。该该特征图像经由一个上采样系数为1/2的上采样卷积层后,像素长度1/8h变为1/4h,像素宽度1/8w变为原来的1/4h,通道数变为原来的一半,即得到一个尺寸为1/4h*1/4w*64的特征图像。接着,将上一步输出的特征图像、第一深度神经网络310中第一解码器313的中间层产生的特征图和第二深度神经网络320中第二编码器322的中间层产生的特征图进行拼接,输出的拼接后的特征图像通道数增加两倍,即输出的特征图像的尺寸为1/4h*1/4w*192。然后,上一步输出的特征图像经由一个1*1的卷积层,该卷积层用于将拼接后的特征图像的通道数重新降为拼接前的通道数量,即尺寸重新变为1/4h*1/4w*64。最后,经过所有卷积层提取特征后,输出的特征图像尺寸为1/4h*1/4w*64。
进一步,上述深度补全主要是利用第一深度神经网络和第二深度神经网络进行推理来实现,下面将对如何训练第一深度神经网络和第二深度神经网络做进一步说明,如下将以训练第一深度神经网络为例,第二深度神经网络的训练方式类似。
首先,搭建一个初始的第一深度神经网络,对该初始的第一深度神经网络的参数进行初始化,例如,初始化参数可以采用随机值。
其次,获取训练样本,训练样本为彩色图像和已配准深度信息图像。该训练样本可以从实际场景通过摄像头和深度传感器获取、或者从数据库上获取开源的彩色图像和配准的深度图像。若训练样本数量不足,该训练样本也可以通过对样本模糊化,裁剪,添加噪声,使用对抗网络生成训练样本等操作来对训练集进行数据增强。
然后,确定损失函数和优化器。可以使用均方误差计算损失函数的值,即网络输出的深度图像与真实的深度图像的差值的平方和。以评估预测深度图像与真值深度图像的接近程度。优化器可以使用adam优化器,用于神经网络训练中反向传播更新网 络中的参数。通过不断的迭代更新网络中的参数使预测的深度图像逼近真实深度图像。在损失函数的值不再改变时,训练结束。
图5和图6为本申请的实施例提供可能的图像处理装置的结构示意图。这些图像处理装置可以用于实现上述方法实施例所具备的有益效果。
如图5所示,图像处理装置500包括图像获取模块510、大气光值计算模块520、透射率计算模块530、图像修复模块540。图像处理装置500用于实现上述图3中所示的方法实施例的功能时;图像获取模块510用于执行S301;大气光值计算模块520用于执行S302;透射率计算模块530用于执行S303;图像修复模块540用于执行S304。有关上述图像获取模块510、大气光值计算模块520、透射率计算模块530、和图像修复模块540更详细的描述可以直接参考图3所示的方法实施例中相关描述直接得到,这里不加赘述。
应理解,以上图像处理装置500的单元的划分仅仅是一种逻辑功能的划分。
图6是本申请实施例的图像处理装置的硬件结构示意图。图6所示的图像处理装置600包括存储器601、处理器602、通信接口603以及总线604。其中,存储器601、处理器602、通信接口603通过总线604实现彼此之间的通信连接。
存储器601可以是ROM,静态存储设备和RAM。存储器601可以存储程序,当存储器601中存储的程序被处理器602执行时,处理器602和通信接口603用于执行本申请实施例的图像处理方法的各个步骤。
处理器602可以采用通用的,CPU,微处理器,ASIC,GPU或者一个或多个集成电路,用于执行相关程序,以实现本申请实施例的图像处理装置中的单元所需执行的功能,或者执行本申请方法实施例的图像处理方法。
处理器602还可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,本申请实施例的图像处理方法的各个步骤可以通过处理器602中的硬件的集成逻辑电路或者软件形式的指令完成。
上述处理器602还可以是通用处理器、DSP、ASIC、FPGA或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器4001,处理器602读取存储器601中的信息,结合其硬件完成本申请实施例的图像处理装置中包括的单元所需执行的功能,或者执行本申请方法实施例的图像处理方法。
通信接口603使用例如但不限于收发器一类的收发装置,来实现装置600与其他设备或通信网络之间的通信。例如,可以通过通信接口603获取待处理图像。
总线604可包括在装置600各个部件(例如,存储器601、处理器602、通信接口603)之间传送信息的通路。
应理解,本申请实施例中的处理器可以为中央处理单元(central processing unit, CPU),该处理器还可以是其他通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
还应理解,本申请实施例中的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的随机存取存储器(random access memory,RAM)可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。
上述实施例,可以全部或部分地通过软件、硬件、固件或其他任意组合来实现。当使用软件实现时,上述实施例可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令或计算机程序。在计算机上加载或执行所述计算机指令或计算机程序时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以为通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集合的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质。半导体介质可以是固态硬盘。
应理解,本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况,其中A,B可以是单数或者复数。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系,但也可能表示的是一种“和/或”的关系,具体可参考前后文进行理解。
本申请中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“以下至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b,或c中的至少一项(个),可以表示:a,b,c,a-b,a-c,b-c,或a-b-c,其中a,b,c可以是单个,也可以是多个。
应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (30)

  1. 一种图像处理方法,其特征在于,包括:
    获取第一深度图像和彩色图像,所述第一深度图像和所述彩色图像分别是由深度传感器和相机针对同一场景拍摄的图像;
    确定大气光值;
    对所述第一深度图像进行深度补全以得到第二深度图像,并且根据所述第二深度图像确定透射率;
    根据所述大气光值和所述透射率对所述彩色图像进行修复处理。
  2. 根据权利要求1所述的图像处理方法,其特征在于,所述对所述第一深度图像进行深度补全以得到第二深度图像包括:
    将所述彩色图像和第二中间特征图像提供给第一深度神经网络以得到第三深度图像,所述第二中间特征图像为第二深度神经网络的中间层产生的特征图像;
    将所述第一深度图像和第一中间特征图像提供给所述第二深度神经网络以得到第四深度图像,所述第一中间特征图像为所述第一深度神经网络的中间层产生的特征图像;
    将所述第三深度图像和所述第四深度图像进行融合操作以得到所述第二深度图像。
  3. 根据权利要求2所述的图像处理方法,其特征在于,所述第一深度神经网络包括第一预处理网络、第一编码器和第一解码器,所述第一预处理网络用于将输入给所述第一深度神经网络的彩色图像变换为适用于所述第一编码器处理的第一特征图像,所述第一编码器用于对所述第一特征图像进行特征编码,所述第一解码器用于对所述第一编码器输出的第二特征图像进行特征解码。
  4. 根据权利要求3所述的图像处理方法,其特征在于,
    所述第一中间特征图像包括所述第一编码器的编码单元内的卷积层所产生的特征图像和所述第一解码器的解码单元内的上采样层所产生的特征图像。
  5. 根据权利要求2所述的图像处理方法,其特征在于,所述第二深度神经网络包括第二预处理网络、第二编码器和第二解码器,所述第二预处理网络用于将输入给所述第二深度神经网络的第一深度图像变换为适用于所述第二编码器处理的第三特征图像,所述第二编码器用于对所述第三特征图像进行特征编码,所述第二解码器用于对所述第二编码器输出的第四特征图像进行特征解码。
  6. 根据权利要求5所述的图像处理方法,其特征在于,所述第二中间特征图像包括所述第二编码器的编码单元内的卷积层所产生的特征图像和所述第二解码器的解码单元内的上采样层所产生的特征图像。
  7. 根据权利要求1-6任一所述的图像处理方法,其特征在于,所述确定大气光值包括:
    根据所述第一深度图像确定所述大气光值。
  8. 根据权利要求7所述的图像处理方法,其特征在于,所述根据所述第一深度图像确定所述大气光值包括:
    从所述第一深度图像中确定用于指示天空区域的天空深度图像;
    根据所述天空深度图像和所述彩色图像确定所述大气光值。
  9. 根据权利要求8所述的图像处理方法,其特征在于,所述从所述第一深度图像中确定用于指示天空区域的天空深度图像包括:
    通过滑窗法从所述第一深度图像中确定所述天空深度图像。
  10. 根据权利要求9所述的图像处理方法,其特征在于,所述滑窗法所采用的滑窗的形状为一行像素大小的长方形,所述滑窗的步长为1。
  11. 根据权利要求8-10任一所述的图像处理方法,其特征在于,所述根据所述天空深度图像和所述彩色图像确定所述大气光值包括:
    利用所述天空深度图像从所述彩色图像中确定出用于指示天空区域的天空彩色图像;
    根据所述天空彩色图像确定所述大气光值。
  12. 根据权利要求11所述的图像处理方法,其特征在于,所述利用所述天空深度图像从所述彩色图像中确定出用于指示天空区域的天空彩色图像包括:
    利用所述天空深度图像对所述第一深度图像进行二值化操作以得到二值图;
    根据所述二值图和所述彩色图像确定用于指示所述天空区域的天空彩色图像。
  13. 根据权利要求11或12所述的图像处理方法,其特征在于,所述根据所述天空彩色图像确定所述大气光值包括:
    从所述天空彩色图像中确定最亮部分的像素点;
    将所述最亮部分的像素点的平均值确定为所述大气光值。
  14. 一种图像处理方法,其特征在于,包括:
    获取第一深度图像和彩色图像,所述第一深度图像和所述彩色图像分别是由深度传感器和相机针对同一场景拍摄的图像;
    根据所述第一深度图像确定大气光值;
    根据所述第一深度图像确定透射率;
    根据所述大气光值和所述透射率对所述彩色图像进行修复处理。
  15. 根据权利要去14所述的图像处理方法,其特征在于,所述根据所述第一深度图像确定大气光值包括:
    从所述第一深度图像中确定用于指示天空区域的天空深度图像;
    根据所述天空深度图像和所述彩色图像确定所述大气光值。
  16. 根据权利要求15所述的图像处理方法,其特征在于,所述从所述第一深度图像中确定用于指示天空区域的天空深度图像包括:
    通过滑窗法从所述第一深度图像中确定所述天空深度图像。
  17. 根据权利要求15或16所述的图像处理方法,其特征在于,所述根据所述天空深度图像和所述彩色图像确定所述大气光值包括:
    利用所述天空深度图像从所述彩色图像中确定出用于指示天空区域的天空彩色图像;
    根据所述天空彩色图像确定所述大气光值。
  18. 一种图像处理装置,其特征在于,包括:
    图像获取模块,用于获取第一深度图像和彩色图像,所述第一深度图像和所述彩色图像分别是由深度传感器和相机针对同一场景拍摄的图像;
    大气光值计算模块,用于计算大气光值;
    透射率计算模块,用于对所述第一深度图像进行深度补全以得到第二深度图像,并且根据所述第二深度图像确定透射率;
    图像修复模块,用于根据大气光值和透射率确定修复后的图像。
  19. 根据权利要求18所述的图像处理装置,其特征在于,所述透射率计算模块具体用于:
    将所述彩色图像和第二中间特征图像提供给第一深度神经网络以得到第三深度图像,所述第二中间特征图像为第二深度神经网络的中间层产生的特征图像;
    将所述第一深度图像和第一中间特征图像提供给所述第二深度神经网络以得到第四深度图像,所述第一中间特征图像为第一神经网络的中间层产生的特征图像;
    将所述第三深度图像和所述第四深度图像进行融合操作以得到所述第二深度图像。
  20. 根据权利要求18或19所述的图像处理装置,其特征在于,所述大气光值计算模块具体用于:
    根据所述第一深度图像确定所述大气光值。
  21. 根据权利要求20所述的图像处理装置,其特征在于,所述大气光值计算模块具体用于:
    从所述第一深度图像中确定用于指示天空区域的天空深度图像;
    根据所述天空深度图像和所述彩色图像确定所述大气光值。
  22. 根据权利要求21所述的图像处理装置,其特征在于,所述大气光值计算模块具体用于:
    通过滑窗法从所述第一深度图像中确定所述天空深度图像。
  23. 根据权利要21或22所述的图像处理装置,其特征在于,所述大气光值计算模块具体用于:
    利用所述天空深度图像从所述彩色图像中确定出用于指示天空区域的天空彩色图像;
    根据所述天空彩色图像确定所述大气光值。
  24. 一种图像处理装置,其特征在于,包括:
    图像获取模块,用于获取第一深度图像和彩色图像,所述第一深度图像和所述彩色图像分别是由深度传感器和相机针对同一场景拍摄的图像;
    大气光值计算模块,用于根据所述第一深度图像确定大气光值;
    透射率计算模块,用于根据所述第一深度图像确定透射率;
    图像修复模块,用于根据大气光值和透射率确定修复后的图像。
  25. 根据权利要求24所述的方法,其特征在于,所述大气光值计算模块具体用于:
    从所述第一深度图像中确定用于指示天空区域的天空深度图像;
    根据所述天空深度图像和所述彩色图像确定所述大气光值。
  26. 根据权利要求25所述的图像处理方法,其特征在于,所述大气光值计算模块具体用于:
    通过滑窗法从所述第一深度图像中确定所述天空深度图像。
  27. 根据权利要求25或26所述的图像处理方法,其特征在于,所述大气光值计算模块具体用于:
    利用所述天空深度图像从所述彩色图像中确定出用于指示天空区域的天空彩色图像;
    根据所述天空彩色图像确定所述大气光值。
  28. 一种图像处理装置,其特征在于,包括:
    至少一个处理器;
    至少一个存储器,存储有计算机程序,当所述计算机程序被处理器执行时,实现如权利要求1-13或14-17中的任意一项所述的图像处理方法。
  29. 一种存储指令的计算机可读存储介质,其特征在于,所述存储介质中存储有计算机程序或指令,当所述计算机程序或指令被图像处理装置执行时,实现如权利要求1-13或14-17中任一项所述图像处理方法。
  30. 一种计算机程序产品,所述计算机程序产品包括:计算机程序代码,当所述计算机程序代码运行时,使得处理器执行权利要求1-13或14-17中的任意一种方法。
PCT/CN2021/123940 2020-12-30 2021-10-14 一种图像处理方法,装置及储存介质 WO2022142596A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21913341.0A EP4254320A1 (en) 2020-12-30 2021-10-14 Image processing method and apparatus, and storage medium
US18/341,617 US20230342883A1 (en) 2020-12-30 2023-06-26 Image processing method and apparatus, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011629186.9A CN114693536A (zh) 2020-12-30 2020-12-30 一种图像处理方法,装置及储存介质
CN202011629186.9 2020-12-30

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/341,617 Continuation US20230342883A1 (en) 2020-12-30 2023-06-26 Image processing method and apparatus, and storage medium

Publications (1)

Publication Number Publication Date
WO2022142596A1 true WO2022142596A1 (zh) 2022-07-07

Family

ID=82133504

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/123940 WO2022142596A1 (zh) 2020-12-30 2021-10-14 一种图像处理方法,装置及储存介质

Country Status (4)

Country Link
US (1) US20230342883A1 (zh)
EP (1) EP4254320A1 (zh)
CN (1) CN114693536A (zh)
WO (1) WO2022142596A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115170443B (zh) * 2022-09-08 2023-01-13 荣耀终端有限公司 一种图像处理方法、拍摄方法及电子设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160196637A1 (en) * 2015-01-06 2016-07-07 The Regents Of The University Of California Raw sensor image and video de-hazing and atmospheric light analysis methods and systems
CN110097589A (zh) * 2019-04-29 2019-08-06 广东工业大学 一种应用于稀疏地图稠密化的深度补全方法
CN110136079A (zh) * 2019-05-05 2019-08-16 长安大学 基于场景深度分割的图像去雾方法
CN110910327A (zh) * 2019-11-26 2020-03-24 福州大学 一种基于掩模增强网络模型的无监督深度补全方法
CN111091501A (zh) * 2018-10-24 2020-05-01 天津工业大学 一种大气散射去雾模型的参数估计方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160196637A1 (en) * 2015-01-06 2016-07-07 The Regents Of The University Of California Raw sensor image and video de-hazing and atmospheric light analysis methods and systems
CN111091501A (zh) * 2018-10-24 2020-05-01 天津工业大学 一种大气散射去雾模型的参数估计方法
CN110097589A (zh) * 2019-04-29 2019-08-06 广东工业大学 一种应用于稀疏地图稠密化的深度补全方法
CN110136079A (zh) * 2019-05-05 2019-08-16 长安大学 基于场景深度分割的图像去雾方法
CN110910327A (zh) * 2019-11-26 2020-03-24 福州大学 一种基于掩模增强网络模型的无监督深度补全方法

Also Published As

Publication number Publication date
US20230342883A1 (en) 2023-10-26
CN114693536A (zh) 2022-07-01
EP4254320A1 (en) 2023-10-04

Similar Documents

Publication Publication Date Title
US11580754B2 (en) System and method for large-scale lane marking detection using multimodal sensor data
US11967140B2 (en) System and method for vehicle wheel detection
CN110543814B (zh) 一种交通灯的识别方法及装置
US10528851B2 (en) System and method for drivable road surface representation generation using multimodal sensor data
US10528823B2 (en) System and method for large-scale lane marking detection using multimodal sensor data
CN112955897A (zh) 用于三维(3d)对象检测的系统和方法
CN111071152B (zh) 一种鱼眼图像处理系统和方法
US10712556B2 (en) Image information processing method and augmented reality AR device
JPWO2019003953A1 (ja) 画像処理装置および画像処理方法
US20230342883A1 (en) Image processing method and apparatus, and storage medium
AU2019241892B2 (en) System and method for vehicle wheel detection
WO2021063012A1 (zh) 视频通话人脸呈现方法、视频通话装置及汽车
WO2021217575A1 (zh) 用户感兴趣对象的识别方法以及识别装置
CN115170630B (zh) 地图生成方法、装置、电子设备、车辆和存储介质
CN115164910B (zh) 行驶路径生成方法、装置、车辆、存储介质及芯片
CN114842455B (zh) 障碍物检测方法、装置、设备、介质、芯片及车辆
CN114973178A (zh) 模型训练方法、物体识别方法、装置、车辆及存储介质
WO2024092559A1 (zh) 一种导航方法及相应装置
CN115082886B (zh) 目标检测的方法、装置、存储介质、芯片及车辆
CN114842454B (zh) 障碍物检测方法、装置、设备、存储介质、芯片及车辆
CN115221260B (zh) 数据处理方法、装置、车辆及存储介质
CN114821511B (zh) 杆体检测方法、装置、车辆、存储介质及芯片
CN115221261A (zh) 地图数据融合方法、装置、车辆及存储介质
CN115223122A (zh) 物体的三维信息确定方法、装置、车辆与存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21913341

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021913341

Country of ref document: EP

Effective date: 20230630

NENP Non-entry into the national phase

Ref country code: DE