CN115619666A - Image processing method, image processing apparatus, storage medium, and electronic device - Google Patents
Image processing method, image processing apparatus, storage medium, and electronic device Download PDFInfo
- Publication number
- CN115619666A CN115619666A CN202211242710.6A CN202211242710A CN115619666A CN 115619666 A CN115619666 A CN 115619666A CN 202211242710 A CN202211242710 A CN 202211242710A CN 115619666 A CN115619666 A CN 115619666A
- Authority
- CN
- China
- Prior art keywords
- image
- processed
- gain
- information
- features
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012545 processing Methods 0.000 title claims abstract description 52
- 238000003672 processing method Methods 0.000 title claims abstract description 23
- 238000013507 mapping Methods 0.000 claims abstract description 65
- 238000000034 method Methods 0.000 claims abstract description 48
- 230000006870 function Effects 0.000 claims description 92
- 238000000605 extraction Methods 0.000 claims description 38
- 230000004927 fusion Effects 0.000 claims description 18
- 239000000284 extract Substances 0.000 claims description 16
- 230000009466 transformation Effects 0.000 claims description 11
- 238000011176 pooling Methods 0.000 claims description 7
- 239000013598 vector Substances 0.000 claims description 3
- 238000004590 computer program Methods 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 13
- 238000010801 machine learning Methods 0.000 abstract description 5
- 238000012549 training Methods 0.000 description 19
- 230000008569 process Effects 0.000 description 14
- 238000010586 diagram Methods 0.000 description 11
- 238000005286 illumination Methods 0.000 description 8
- 238000004891 communication Methods 0.000 description 7
- 230000004913 activation Effects 0.000 description 5
- 230000009286 beneficial effect Effects 0.000 description 5
- 230000002349 favourable effect Effects 0.000 description 4
- 238000010295 mobile communication Methods 0.000 description 4
- 230000000007 visual effect Effects 0.000 description 4
- 230000002411 adverse Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 239000002131 composite material Substances 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 239000013307 optical fiber Substances 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 239000013589 supplement Substances 0.000 description 2
- 101100168115 Neurospora crassa (strain ATCC 24698 / 74-OR23-1A / CBS 708.71 / DSM 1257 / FGSC 987) con-6 gene Proteins 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000003213 activating effect Effects 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012886 linear function Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000012797 qualification Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/90—Dynamic range modification of images or parts thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/52—Scale-space analysis, e.g. wavelet analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/56—Extraction of image or video features relating to colour
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20172—Image enhancement details
- G06T2207/20208—High dynamic range [HDR] image processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Processing (AREA)
Abstract
The disclosure provides an image processing method, an image processing device, a storage medium and an electronic device, and relates to the technical field of image and video processing. The method comprises the following steps: acquiring an image to be processed; extracting prior information from the image to be processed, wherein the prior information comprises one or more of bright channel information, dark channel information and high-frequency information; extracting basic features from the image to be processed by using a pre-trained brightness gain model, extracting prior features from the prior information, and combining the basic features and the prior features to obtain brightness gain data; and performing pixel value mapping on the image to be processed based on the brightness gain data to obtain a high dynamic range image corresponding to the image to be processed. According to the method, the prior information and the machine learning method are combined, high-quality brightness gain can be achieved for the image, and the high-dynamic-range image with a real brightness effect is obtained.
Description
Technical Field
The present disclosure relates to the field of image and video processing technologies, and in particular, to an image processing method, an image processing apparatus, a computer-readable storage medium, and an electronic device.
Background
The brightness in a real environment has a large dynamic range, for example, a highlight region and a dark region may occur simultaneously in the same scene. In images shot by digital shooting devices such as smartphones and cameras, due to the limitation of the bit width of the images, the brightness levels of the images are generally limited, for example, an image with a bit width of 8 bits only has 256 brightness levels. Therefore, when a real environment is shot, a real brightness range can be compressed to a limited brightness level, so that loss of detail information and color distortion in a highlight area or a dark area are easily caused, and a well-arranged brightness effect in the real environment cannot be displayed, namely the problem that the brightness effect in an image is not real exists.
Disclosure of Invention
The present disclosure provides an image processing method, an image processing apparatus, a computer-readable storage medium, and an electronic device to solve, at least to some extent, a problem of unreal brightness effects in an image.
According to a first aspect of the present disclosure, there is provided an image processing method including: acquiring an image to be processed; extracting prior information from the image to be processed, wherein the prior information comprises one or more of bright channel information, dark channel information and high-frequency information; extracting basic features of the image to be processed by using a pre-trained brightness gain model, extracting prior features of the prior information, and obtaining brightness gain data by combining the basic features and the prior features; and performing pixel value mapping on the image to be processed based on the brightness gain data to obtain a high dynamic range image corresponding to the image to be processed.
According to a second aspect of the present disclosure, there is provided an image processing apparatus comprising: an image acquisition module configured to acquire an image to be processed; a priori information extraction module configured to extract a priori information from the image to be processed, wherein the priori information comprises one or more of bright channel information, dark channel information and high-frequency information; the brightness gain processing module is configured to extract basic features from the image to be processed by using a pre-trained brightness gain model, extract prior features from the prior information, and obtain brightness gain data by combining the basic features and the prior features; and the pixel value mapping module is configured to perform pixel value mapping on the image to be processed based on the brightness gain data to obtain a high dynamic range image corresponding to the image to be processed.
According to a third aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the image processing method of the first aspect described above and possible implementations thereof.
According to a fourth aspect of the present disclosure, there is provided an electronic apparatus comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the image processing method of the first aspect and possible implementations thereof via execution of the executable instructions.
The technical scheme of the disclosure has the following beneficial effects:
on one hand, the method combines prior information and a machine learning method, provides a brightness gain model capable of combining basic characteristics and prior characteristics of an image, can mine and learn more comprehensive and sufficient image characteristics, outputs more accurate brightness gain data, further performs pixel value mapping on the image to be processed to obtain a high dynamic range image, and has a more real brightness effect, thereby effectively improving the image quality. On the other hand, the scheme realizes the brightness gain based on the single-frame image, does not need to use adjacent frame information, is favorable for reducing the shooting time, the calculated amount of image processing and the memory occupation, can be completed based on conventional shooting equipment, and does not need a professional high-dynamic camera. Therefore, the scheme is low in implementation cost, beneficial to being deployed in light-weight scenes such as a mobile terminal and the like or scenes with high real-time requirements (such as scenes for processing videos in real time) and the like, and high in practicability and universality. And adverse phenomena such as ghost image and the like possibly caused by multi-frame fusion of high dynamic range images can be improved.
Drawings
FIG. 1 shows a schematic diagram of the runtime environment system architecture of the present exemplary embodiment;
fig. 2 shows a flowchart of an image processing method in the present exemplary embodiment;
fig. 3 shows a schematic diagram of a bright channel image and a dark channel image in the present exemplary embodiment;
fig. 4 shows a schematic diagram of a high-frequency image in the present exemplary embodiment;
fig. 5 shows a flowchart for obtaining luminance gain data in the present exemplary embodiment;
FIG. 6 is a diagram showing a luminance gain model and a process thereof according to the present exemplary embodiment;
FIG. 7 shows a schematic diagram of a multi-scale feature extraction unit in the present exemplary embodiment;
fig. 8 shows a schematic flowchart of an image processing method in the present exemplary embodiment;
FIG. 9 illustrates a flow chart for training a luminance gain model in the present exemplary embodiment;
FIG. 10 shows a schematic diagram of training a luminance gain model in the present exemplary embodiment;
fig. 11 is a diagram showing an example of a processing effect of an image to be processed in the present exemplary embodiment;
fig. 12 is a schematic diagram showing the configuration of an image processing apparatus in the present exemplary embodiment;
fig. 13 shows a schematic configuration diagram of an electronic device in the present exemplary embodiment.
Detailed Description
Exemplary embodiments of the present disclosure will be described more fully hereinafter with reference to the accompanying drawings.
The drawings are schematic illustrations of the present disclosure and are not necessarily drawn to scale. Some of the block diagrams shown in the figures may be functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in hardware modules or integrated circuits, or in a network, processor or microcontroller. Embodiments may be embodied in many different forms and should not be construed as limited to the examples set forth herein. The described features, structures, or characteristics of the disclosure may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough explanation of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that one or more of the specific details can be omitted, or one or more of the specific details can be replaced with other methods, components, devices, steps, etc., in implementing the aspects of the disclosure.
A single frame image captured by a digital capture device such as a smartphone or a camera is typically 8 bits wide, has 256 levels of brightness (or 256 levels of color), and is difficult to adequately describe brightness information of a real environment, and such an image is referred to as a Low Dynamic Range (LDR) image. In contrast, a High Dynamic Range (HDR) image, which can realize a larger Dynamic Range and has a more realistic luminance effect, is provided.
In the related art, the implementation schemes of the high dynamic range image mainly include the following three schemes, and each scheme has certain problems and defects:
(1) professional high dynamic cameras are used which can directly take high dynamic range images. However, high-dynamic cameras have the problems of high price, no portability and the like; also, it is not possible to integrate a high-motion camera into a light-weight photographing apparatus (such as a smartphone, a portable camera, or the like). Therefore, the practicability and the popularity of the scheme are low.
(2) And multi-exposure fusion, namely collecting multi-frame low dynamic range images under different exposure degrees and fusing the multi-frame low dynamic range images into a high dynamic range image. However, the scheme requires a long shooting time, a high calculation amount and memory occupation, and is high in implementation cost, and if the camera moves during the collection of the multi-frame low dynamic range image, the fused high dynamic range image may have adverse phenomena such as "ghost" and the like.
(3) Deep learning is employed to train a neural network model for converting low dynamic range images to high dynamic range images. However, the current correlation model is difficult to fully mine the image features, and details at the local exposure position are easily lost, so that the expansion effect of the dynamic range of the image is limited, and the quality of the finally obtained image is not high.
In view of one or more of the above problems, exemplary embodiments of the present disclosure provide an image processing method for processing an image to be processed to generate a high dynamic range image.
The operating environment system architecture of the present exemplary embodiment is described below with reference to fig. 1.
Referring to fig. 1, a system architecture 100 may include a terminal 110 and a server 120. The terminal 110 may be an electronic device such as a desktop computer, a smart phone, a tablet computer, and an intelligent wearable device. The server 120 generally refers to a background system that provides image processing-related services in the exemplary embodiment, and may be a single server or a cluster of multiple servers. The terminal 110 and the server 120 may form a connection through a wired or wireless communication link for data interaction.
In one embodiment, the image processing method in the present exemplary embodiment may be performed by the terminal 110. Illustratively, the brightness gain model may be configured and trained by the server 120, deployed to the terminal 110; alternatively, the brightness gain model may be configured and trained locally by the terminal 110; still alternatively, the brightness gain model may be configured by the server 120 and deployed to the terminal 110, with the brightness gain model trained locally by the terminal 110. The terminal 110 obtains an image to be processed, which may be, for example, a currently shot image, or an image stored locally (such as an image in an album), or an image transmitted by other devices through a network, or an image downloaded from the internet, and extracts prior information from the image to be processed, and then obtains brightness gain data by using a brightness gain model, and further processes the image to be processed to obtain a high dynamic range image.
In one embodiment, the image processing method in the present exemplary embodiment may be performed by the server 120. For example, the brightness gain model may be configured and trained by the server 120, the terminal 110 uploads the image to be processed to the server 120, and the server 120 executes an image processing method to obtain a high dynamic range image. The high dynamic range image may also be subsequently returned to the terminal 110.
As can be seen from the above, in the present exemplary embodiment, the execution subject of the image processing method may be the terminal 110 or the server 120, which is not limited by the present disclosure.
The flow of the image processing method will be described with reference to fig. 2.
Referring to fig. 2, the flow of the image processing method may include the following steps S210 to S240:
step S210, acquiring an image to be processed;
step S220, extracting prior information from the image to be processed, wherein the prior information comprises one or more of bright channel information, dark channel information and high-frequency information;
step S230, extracting basic features from the image to be processed by using a pre-trained brightness gain model, extracting prior features from prior information, and obtaining brightness gain data by combining the basic features and the prior features;
step S240, performing pixel value mapping on the image to be processed based on the brightness gain data to obtain a high dynamic range image corresponding to the image to be processed.
Based on the method, on one hand, a luminance gain model capable of combining the basic characteristics and the prior characteristics of the image is provided by combining the prior information and a machine learning method, so that the comprehensive and sufficient image characteristics can be mined and learned, accurate luminance gain data is output, pixel value mapping is performed on the image to be processed, a high dynamic range image is obtained, the high dynamic range image has a real luminance effect, and the image quality is effectively improved. On the other hand, the scheme realizes the brightness gain based on the single-frame image, does not need to use adjacent frame information, is favorable for reducing the shooting time, the calculated amount of image processing and the memory occupation, can be completed based on conventional shooting equipment, and does not need a professional high-dynamic camera. Therefore, the scheme is low in implementation cost, beneficial to being deployed in light-weight scenes such as a mobile terminal and the like or scenes with high real-time requirements (such as scenes for processing videos in real time) and the like, and high in practicability and universality. And adverse phenomena such as ghost image and the like possibly caused by multi-frame fusion of high dynamic range images can be improved.
Each step in fig. 2 is explained in detail below.
Referring to fig. 2, in step S210, an image to be processed is acquired.
The image to be processed is an image for which improvement of the luminance effect is required, and may be a low dynamic range image. The source of the image to be processed is not limited in the present disclosure, for example, the image to be processed may be a currently photographed image, or a locally stored image (such as an image in an album), or an image transmitted by other devices through a network, or an image downloaded from the internet, etc. Further, the image to be processed may be any type of image such as an RGB image, a YUV image, a grayscale image, or the like.
In one embodiment, the image to be processed may be any frame image in the video.
With continued reference to fig. 2, in step S220, a priori information is extracted from the image to be processed, where the a priori information includes one or more of bright channel information, dark channel information, and high frequency information.
In the present exemplary embodiment, the a priori information refers to the guiding or supplementary information introduced from the human visual aspect for the luminance gain model. Under the condition of not adopting prior information, the brightness gain model is taken as a machine learning model, information in an image can be learned from the perspective of a machine, and the information learning in some aspects is insufficient, so that the output result is influenced. By introducing the prior information, the brightness gain model can be guided to focus on the information of one or more specific aspects in the learning image, or the learned information of the brightness gain model is supplemented, so that the brightness gain task can be finished by the model in high quality.
Each of the prior information and how to extract the prior information will be described in detail below.
1. Bright channel information
In the image, a pixel point in a local area has a larger value on a certain color channel, and the value represents the maximum illumination intensity of the local area. The color channel representing the maximum illumination intensity is the bright channel. Bright channel information may include, but is not limited to: the value of the bright channel, the distribution of the bright channel, the size of the local area, etc.
There may be underexposed areas (e.g., dark areas) in the image, and it is difficult to represent a true luminance level in the image because the luminance of the image in these areas can usually reach the lowest luminance level. The bright channel information can be used to estimate the illumination distribution in an image, especially in an underexposed area, to present richer exposure or brightness information.
In an embodiment, the extracting the prior information from the image to be processed may include:
traversing each pixel point in the image to be processed, and taking the maximum value of the channel values of the pixel points in the local area with the current pixel point as the center as the bright channel value of the current pixel point to obtain the bright channel information of the image to be processed.
The maximum value of the channel values of the pixel points in the local area can represent the maximum illumination intensity of the local area. By traversing each pixel point in the image to be processed, the bright channel value of each pixel point can be determined, and the bright channel values of all the pixel points form a set, namely the bright channel information of the image to be processed. Furthermore, the bright channel values of all the pixel points can be arranged according to the positions of the pixel points in the image to be processed, so that a matrix formed by the bright channel values is obtained and is represented in the form of an image, and the image is a bright channel image of the image to be processed. That is, the bright channel information may be in the form of a bright channel image.
Illustratively, the bright channel information may be obtained by the following formula:
wherein x represents any pixel point, I bright (x) Representing the value of the x point in the bright channel image (i.e., the bright channel value). Ω (x) represents a local area centered at x in the image to be processed, and the size of the local area is not limited in the present disclosure, for example, the local area may be set to 1 × 1 size to keep more image details, or the local area may be set to 3 × 3, 5 × 5 size according to the size of the image to be processed. I.C. A c Each of the RGB channels representing the image to be processed. Formula (1) shows that in a local area with x as the center in an image to be processed, the maximum value is selected from the channel values of each pixel point to serve as the bright channel value of the x point.
Referring to fig. 3, a bright channel image is extracted from an image to be processed, and compared with the image to be processed, details of an underexposed area in the bright channel image are clearer and a hierarchical structure is more obvious, so that richer brightness detail information can be provided for a brightness gain model, and the brightness gain model is more favorable for extracting image features and learning image information.
2. Dark channel information
The dark channel is the opposite concept than the light channel. In the image, a pixel point in a local area has a smaller value on a certain color channel, and the value represents the minimum illumination intensity of the local area. The color channel representing the minimum illumination intensity is the dark channel. Dark channel information may include, but is not limited to: the value of the dark channel, the distribution of the dark channel, the size of the local area, etc.
There may be over-exposed areas (e.g., highlight areas) in the image, and since the brightness of the image in these areas can usually reach the highest brightness level, it is difficult to represent the real brightness level in the image. The dark channel information can be used to estimate the illumination distribution in the image, especially in the over-exposed areas, to present more abundant exposure or brightness information.
In an embodiment, the extracting the prior information from the image to be processed may include:
traversing each pixel point in the image to be processed, and taking the minimum value of all channel values of all pixel points in a local area with the current pixel point as the center as the dark channel value of the current pixel point to obtain the dark channel information of the image to be processed.
The minimum value of the channel values of the pixel points in the local area can represent the minimum illumination intensity of the local area. By traversing each pixel point in the image to be processed, the dark channel value of each pixel point can be determined, and the dark channel values of all the pixel points form a set, namely the dark channel information of the image to be processed. Furthermore, the dark channel values of all the pixel points can be arranged according to the positions of the pixel points in the image to be processed, so that a matrix composed of the dark channel values is obtained and expressed in the form of an image, and the image is a dark channel image of the image to be processed. That is, the dark channel information may be in the form of a dark channel image.
Illustratively, the dark channel information may be obtained by the following formula:
wherein x represents any pixel point, I dark (x) Representing the value of the x point in the dark channel image (i.e., the dark channel value). Ω (x) represents a local area centered at x in the image to be processed, and the size of the local area is not limited in the present disclosure, for example, the local area may be set to 1 × 1 size to retain more image details, or the local area may be set to 3 × 3, 5 × 5 size according to the size of the image to be processed. I is c Each of the RGB channels representing the image to be processed. Formula (2) shows that in a local area which takes x as the center in an image to be processed, the minimum value is selected from all channel values of all pixel points to be used as the dark channel value of the x point.
Referring to fig. 3, the dark channel image is extracted from the image to be processed, and compared with the image to be processed, details of an overexposed region in the dark channel image are clearer and a hierarchical structure is more obvious, so that richer brightness detail information can be provided for the brightness gain model, the brightness gain model is more favorable for extracting image features, and image information can be learned.
3. High frequency information
The high frequency information is image information corresponding to a portion of the frequency domain where the frequency of the image signal is high, and is usually image detail information, such as edges, dense texture, and the like.
In one embodiment, the high frequency information includes horizontal direction high frequency information, vertical direction high frequency information, and diagonal direction high frequency information. The foregoing extracting of the prior information from the image to be processed may include the following steps:
acquiring a pre-configured mean value basis function, a pre-configured horizontal difference value basis function and a pre-configured vertical difference value basis function;
performing wavelet transformation on the image to be processed by using the mean basis function and the horizontal difference basis function to obtain high-frequency information in the horizontal direction;
performing wavelet transformation on the image to be processed by using the mean value basis function and the vertical difference value basis function to obtain high-frequency information in the vertical direction;
and performing wavelet transformation on the image to be processed by using the horizontal difference value basis function and the vertical difference value basis function to obtain high-frequency information in the diagonal direction.
The wavelet transform has excellent time-frequency local characteristics and multi-scale characteristics, and can decompose an image into a high-frequency part and a low-frequency part so as to extract information of the high-frequency part. The image to be processed, the bright channel information and the dark channel information are mainly information of an image airspace. Through wavelet transformation, high-frequency information can be extracted on the image frequency domain level, and the high-frequency information can supplement information on the spatial domain level, so that a subsequent brightness gain model can learn sufficient information conveniently.
The basis function is a function for extracting image information in the wavelet transform, and may be regarded as a filter. The mean value basis function can keep low-frequency information by calculating a mean value; the horizontal difference value basis function can keep high-frequency information in the horizontal direction by calculating the difference value in the horizontal direction; the vertical difference basis function can retain high-frequency information in the vertical direction by calculating the difference in the vertical direction. Illustratively, three basis functions may be shown as follows:
wherein L1 represents a mean basis function, L2 represents a horizontal difference basis function, and L3 represents a vertical difference basis function. In the wavelet transformation, the L1 can be used for performing line filtering on the image to be processed, and the L2 can be used for performing line filtering on the image to be processed to obtain high-frequency information in the horizontal direction; the image to be processed can be filtered by utilizing the L1, and the image to be processed can be filtered by utilizing the L3, so that high-frequency information in the vertical direction can be obtained; the L2 may be used to filter the image to be processed, and the L3 may be used to filter the image to be processed in a column manner, so as to obtain the high-frequency information in the diagonal direction.
The horizontal direction high frequency information, the vertical direction high frequency information, and the diagonal direction high frequency information may be represented in the form of an image. Referring to fig. 4, three kinds of high-frequency information are extracted from the image to be processed by using a combination of the three kinds of basis functions in formula (3), and a horizontal high-frequency image (denoted as LH), a vertical high-frequency image (denoted as HL), and a diagonal high-frequency image (denoted as HH) are obtained.
In one embodiment, a non-decimated Haar wavelet transform may be employed to extract the high frequency information. In the non-sampling Haar wavelet transform, the down-sampling operation is not needed to be carried out on the image to be processed, so that the extracted high-frequency information is adaptive to the size or resolution of the image to be processed, and the loss of detail information is avoided. For example, a horizontal direction high-frequency image, a vertical direction high-frequency image and a diagonal direction high-frequency image are extracted by adopting non-sampling Haar wavelet transform, and the sizes of the images are the same as the size of the image to be processed, so that the detail information of the image to be processed can be embodied more accurately.
In one embodiment, the image to be processed may be downsampled, and the downsampled image may be upsampled, so that the upsampled image has the same resolution as the image to be processed; and then subtracting the image to be processed from the up-sampled image to obtain the high-frequency information of the image to be processed.
It should be noted that, if the image to be processed is a color image, such as an RGB image, high-frequency information may be extracted from each channel image of the image to be processed, for example, a horizontal high-frequency image, a vertical high-frequency image, and a diagonal high-frequency image may be extracted from an R channel image, and a G channel image and a B channel image are processed in the same manner, so as to obtain 9 high-frequency images. The luminance component may also be extracted from the image to be processed, where the luminance component may be a gray value of the image to be processed, and for example, the luminance component of each pixel point in the image to be processed may be calculated by the following formula:
L=0.299×R+0.587×G+0.114×B (4)
where L represents a luminance component value (i.e., a gray value). Then, high-frequency information is extracted from the luminance component (i.e. the grayscale image of the image to be processed), such as a horizontal high-frequency image, a vertical high-frequency image, and a diagonal high-frequency image from which the luminance component can be extracted, to obtain 3 high-frequency images. In the brightness gain, only the brightness of the image needs to be optimized, the chromaticity of the image does not need to be changed, the brightness component and the chromaticity component in the image to be processed are separated by extracting the brightness component, and the prior information and the basic characteristics are extracted from the brightness component, so that the image processing process can be simplified, the complexity of a brightness gain model is reduced, and the processing efficiency is improved.
Three kinds of prior information are explained above, and provide richer image information from three different aspects. In the exemplary embodiment, any one or more of the images can be selected and used according to specific requirements, and then, corresponding kinds of prior information are extracted from the image to be processed.
In one embodiment, the specific one of the light channel information and the dark channel information, or both, may be used according to the exposure condition in the image to be processed. The exposure condition may be determined according to shooting parameters (such as exposure time, sensitivity, etc.) of the image to be processed, or may be determined according to brightness distribution information of the image to be processed. For example, the luminance value of each pixel point in the image to be processed may be counted, for example, a luminance histogram or the like is counted, whether the image to be processed has an underexposure condition or an overexposure condition is determined according to the proportion occupied by the pixel points in the maximum luminance interval (or the maximum luminance level) or the minimum luminance interval (or the minimum luminance level), if the proportion occupied by the pixel points in the maximum luminance interval exceeds a predetermined proportion threshold, it indicates that the image to be processed has the overexposure condition, and if the proportion occupied by the pixel points in the minimum luminance interval exceeds a predetermined proportion threshold, it indicates that the image to be processed has the underexposure condition. Generally, if an under-exposure condition exists in an image to be processed, determining to use bright channel information; if the overexposure condition exists in the image to be processed, determining to use dark channel information; and if the underexposure condition and the overexposure condition exist in the image to be processed at the same time, determining to use the bright channel information and the dark channel information.
In one embodiment, in order for the luminance gain model to sufficiently learn the image information, all three kinds of prior information may be used in the image processing process, i.e., the prior information extracted in step S220 may include bright channel information, dark channel information, and high frequency information.
With reference to fig. 2, in step S230, a pre-trained luminance gain model is used to extract basic features from the image to be processed, prior features are extracted from the prior information, and luminance gain data is obtained by combining the basic features and the prior features.
The luminance gain model may be any type of machine learning model, such as a neural network model. The basic characteristics are the characteristics which are extracted by the brightness gain model under the condition of not adopting prior information and are used for processing the image to be processed, and the information which is learned by the model based on the image to be processed is reflected. The prior feature is a feature extracted by the luminance gain model on the prior information. For example, the prior information includes bright channel information, dark channel information, and high frequency information, and the corresponding prior feature may include a bright channel feature, a dark channel feature, and a high frequency feature. Because the prior information is extracted based on human experience, the model can further learn the prior information from the perspective of a machine to obtain the prior characteristics. In one embodiment, the prior feature may also be equivalent to the prior information, that is, the prior information may be used as the prior feature and directly combined with the basic feature, so that the model learns the prior information during the combination and subsequent processing.
The brightness gain data is used for pixel value mapping of the image to be processed to achieve brightness gain. For example, the brightness gain data may include a brightness gain coefficient of each pixel point in the image to be processed, a brightness gain index for the whole image to be processed, and the like. It should be noted that the brightness gain in this document refers to optimizing the brightness distribution in the image, and may include increasing the brightness or decreasing the brightness, and the "gain" should not be understood as merely increasing the brightness.
The luminance gain model may have at least two input layers, one for inputting the image to be processed and another for inputting a priori information. If the image to be processed is a color image, such as an RGB image, the image to be processed may be input to the luminance gain model for processing, or a luminance component (i.e., a gray image) of the image to be processed may be input to the luminance gain model for processing. This is related to the specific structure of the luminance gain model, for example, if an input layer for inputting an image to be processed is a single channel, a luminance component of the image to be processed may be input to the input layer, and if the input layer is three channels, three channel images of R, G, and B of the image to be processed may be input to the input layer. In one embodiment, the luminance component of the image to be processed may be substituted for the image to be processed for input into the luminance gain model, considering that the luminance component of the image to be processed is already able to provide image luminance information for the luminance gain model, and the model does not need to learn chrominance information of the image to be processed. The brightness gain model can extract basic features from the brightness component of the image to be processed and combine with the prior features to obtain brightness gain data. Hereinafter, the luminance gain model and its processing, training process, and the like will be described by taking as an example a case where a luminance component of an image to be processed is input. It should be understood that the implementation principle of the scheme is the same for the case of inputting the image to be processed, except for the difference in model structure.
In one embodiment, the luminance gain model may include a first gain submodel, a second gain submodel, a third gain submodel, and a gain combination. Referring to fig. 5, the extracting basic features from the image to be processed by using the pre-trained luminance gain model, extracting prior features from the prior information, and obtaining luminance gain data by combining the basic features and the prior features may include the following steps S510 to S540:
step S510, extracting basic features of an image to be processed through a first gain sub-model, extracting high-frequency features of high-frequency information, fusing the basic features and the high-frequency features, and mapping the fused basic features and the fused high-frequency features to a brightness gain space to obtain a first gain component;
step S520, bright channel characteristics are extracted from the bright channel information through the second gain submodel, and the bright channel characteristics are mapped to a brightness gain space to obtain a second gain component;
step S530, dark channel characteristics are extracted from the dark channel information through a third gain submodel, and the dark channel characteristics are mapped to a brightness gain space to obtain a third gain component;
in step S540, the first gain component, the second gain component, and the third gain component are combined by the gain combining layer to obtain luminance gain data.
The first gain submodel, the second gain submodel, and the third gain submodel may be three parts in parallel in the luminance gain model, and are respectively used for outputting a first gain component, a second gain component, and a third gain component. The gain combining layer can be connected to the tail parts of the first gain submodel, the second gain submodel and the third gain submodel, and the three gain components are combined to obtain final brightness gain data.
The first gain submodel is a main branch and can extract basic features from the image to be processed or the brightness component of the image to be processed, extract high-frequency features from high-frequency information and then fuse the basic features and the high-frequency features. The image to be processed and the high-frequency information can represent the original brightness information in the image to be processed from the spatial domain and the frequency domain, so the basic characteristics and the high-frequency characteristics are fused before the fusion with the bright channel characteristics and the dark channel characteristics, and the fusion result can contain the main characteristics of the image to be processed. And mapping the fusion result to a brightness gain space to obtain a first gain component. The luminance gain space is a feature space in which the luminance gain data is located, and may be an output feature space of the luminance gain model. The first gain component is a luminance gain component calculated based on original luminance information in the image to be processed, and can cover the entire area of the image to be processed.
In the second gain submodel, bright channel features may be extracted for the bright channel information. The bright channel information breaks through the range of the original brightness information in the image to be processed, and particularly can represent the brightness information of the underexposed area in the image to be processed to a certain extent, so that the bright channel characteristics can include some characteristics originally missing in the image to be processed, such as the characteristics of the underexposed area. And mapping the bright channel characteristics to a brightness gain space to obtain a second gain component. The second gain component is a luminance gain component calculated based on the underexposed luminance information missing in the image to be processed, which can cover the underexposed area of the image to be processed.
In the third gain submodel, dark channel features may be extracted for the dark channel information. The dark channel information breaks through the range of the original brightness information in the image to be processed, and particularly can represent the brightness information of the overexposed area in the image to be processed to a certain extent, so that the dark channel characteristics can include some characteristics originally missing in the image to be processed, such as the characteristics of the overexposed area. And mapping the dark channel characteristics to a brightness gain space to obtain a third gain component. The third gain component is a luminance gain component calculated based on the over-exposed luminance information missing in the image to be processed, which can cover the over-exposed area of the image to be processed.
The second gain submodel and the third gain submodel are auxiliary branches, and the bright channel information and the dark channel information enable the brightness gain model to reconstruct the information of the poor exposure area in the image to be processed, so that the obtained second gain component and the third gain component can supplement the brightness gain data of the poor exposure area into the first gain component. By combining the first gain component, the second gain component and the third gain component, more complete brightness gain data can be obtained.
Based on the method of fig. 5, three submodels in parallel are respectively used for processing the basic feature and the high-frequency feature, the bright channel feature and the dark channel feature, and no intersection exists between the three submodels before the gain components are fused. Therefore, each sub-model is enabled to be dedicated to learning and processing image characteristics in specific aspects, the structural complexity and the gradient complexity of the model are reduced, and the training process of the model is accelerated.
In one embodiment, the high frequency information extracted in step S220 includes high frequency information in at least two directions, such as horizontal direction high frequency information and vertical direction high frequency information. The extracting basic features from the image to be processed through the first gain sub-model, extracting high-frequency features from the high-frequency information, fusing the basic features and the high-frequency features, and mapping the fused basic features and the fused high-frequency features to the brightness gain space to obtain the first gain component may include the following steps:
extracting basic features from the image to be processed through a first gain sub-model, extracting corresponding high-frequency features from the high-frequency features in each direction, sequentially fusing the basic features and the high-frequency features in each direction, and mapping the fused features to a brightness gain space to obtain a first gain component.
That is, the high-frequency features in different directions of the basic features are not fused together, but sequentially fused one after another. Generally, after each high-frequency feature is fused, the obtained features may be subjected to dimension arrangement or further learning of the features, for example, one or more convolutions may be performed after each high-frequency feature is fused, and then the high-frequency feature is fused next. Therefore, the brightness gain model can gradually learn the high-frequency characteristics in different directions, full learning is facilitated, the model is easier to converge, and the training process is accelerated.
Exemplarily, referring to fig. 6, the luminance component of the image to be processed is input into the first gain sub-model, and the basic feature is extracted by the multi-scale feature extraction unit; and simultaneously, inputting the horizontal high-frequency image, the vertical high-frequency image and the diagonal high-frequency image of the image to be processed into the first gain submodel, and extracting the horizontal high-frequency feature, the vertical high-frequency feature and the diagonal high-frequency feature through the convolution unit respectively. According to the sequence of horizontal, vertical and diagonal lines, firstly fusing basic features with horizontal high-frequency features, further extracting features through a multi-scale feature extraction unit, then fusing with vertical high-frequency features, further extracting features through a multi-scale feature extraction unit, finally fusing with diagonal high-frequency features, and further extracting features through a multi-scale feature extraction unit. The resulting features are then mapped to a luminance gain space by a feature mapping layer, resulting in a first gain component. Therefore, high-frequency information in different directions is sequentially merged into the basic features, and the comprehensiveness of the features and the accuracy of the first gain component can be guaranteed.
It should be understood that the sequence among the horizontal direction high-frequency information, the vertical direction high-frequency information and the diagonal direction high-frequency information may also be changed. For example, the basic features can be fused with the diagonal high-frequency features, then fused with the vertical high-frequency features, and finally fused with the horizontal high-frequency features, so that high-frequency information in different directions can be fused into the basic features, and the comprehensiveness of the features and the accuracy of the first gain component are ensured.
In an embodiment, the first gain sub-model may also extract basic features from the image to be processed, extract corresponding high-frequency features from the high-frequency features in each direction, fuse the basic features and the high-frequency features in each direction together, and map the fused features to a luminance gain space to obtain the first gain component. Compared with sequential fusion, the processing mode of fusion is beneficial to simplifying the first gain submodel, reducing the complexity of the whole brightness gain model, further reducing the calculated amount and improving the efficiency.
In fig. 6, the convolution unit may extract features from a conventional scale. The convolution unit can comprise one or more convolution layers connected in sequence, and can also comprise a matched pooling layer, an activation layer and the like. Illustratively, a convolution unit may include two 3 × 3 × 64 convolution layers. Because the detail information scale in the horizontal direction high-frequency image, the vertical direction high-frequency image or the diagonal direction high-frequency image usually has not great fluctuation, corresponding high-frequency characteristics can be extracted through the convolution unit, the processing process is simpler, and the extracted high-frequency characteristics are more sufficient.
The multi-scale feature extraction unit may extract features from different scales. For example, the multi-scale feature extraction unit comprises a pixel feature branch, a local feature branch, a global feature branch and a feature fusion layer, wherein the three branches respectively extract features from a pixel scale, a global scale and a global scale, and then are fused by the feature fusion layer to obtain very comprehensive and sufficient feature information. As shown in fig. 6, the first gain sub-model may include one or more multi-scale feature extraction units, which may be used to extract basic features from the luminance component of the image to be processed, and may also be used to extract features from the intermediate features after the high-frequency features are fused, so that comprehensive and sufficient feature information may be extracted in each step.
Since there may be situations where the local exposure is not consistent with the overall brightness in the image to be processed. For example, the entire image is exposed low, but individual areas such as light portions are very bright; or the overall brightness of the image is high but some areas are underexposed. The multi-scale feature extraction unit is capable of extracting features based on comparison between different scales of image information to learn the image information.
In one embodiment, in the multi-scale feature extraction unit, the pixel feature branch comprises a non-expansion convolutional layer, the local feature branch comprises an expansion convolutional layer with a first expansion coefficient, and the global feature branch comprises an expansion convolutional layer with a second expansion coefficient; the first coefficient of expansion is less than the second coefficient of expansion, e.g., the first coefficient of expansion is (2, 2) and the second coefficient of expansion is (4, 4). The extracting basic features from the image to be processed by the first gain sub-model may include the following steps:
extracting pixel characteristics of the image to be processed through the pixel characteristic branch;
extracting local features of the image to be processed through the local feature branches;
extracting global features of the image to be processed through the global feature branch;
and fusing the pixel characteristics, the local characteristics and the global characteristics of the image to be processed through the characteristic fusion layer to obtain the basic characteristics of the image to be processed.
Among them, the non-expansion convolutional layer corresponds to a convolutional layer having an expansion coefficient of 1 (minimum expansion coefficient). From the expansion-free convolutional layer to the expansion convolutional layer with the first expansion coefficient and then to the expansion convolutional layer with the second expansion coefficient, the reception field of convolution is gradually increased, so that the scale of extracting the features is gradually increased, the pixel feature branch can extract the feature with the minimum scale, namely the pixel feature, the local feature branch can extract the feature with the slightly larger scale, namely the local feature, and the global feature branch can extract the feature with the maximum scale, namely the global feature. And fusing the characteristics of the three scales to obtain basic characteristics.
By arranging the expansion convolution layer, a larger receptive field can be realized under the condition of not increasing additional convolution parameters and calculation amount, so that information in a larger range in an image can be acquired, and characteristics with a larger scale can be extracted. The method is beneficial to improving the quality of the basic feature and the intermediate feature after the subsequent fusion of the high-frequency feature, and further improving the accuracy of the first gain component.
Fig. 7 shows a schematic structure of a multi-scale feature extraction unit. The local feature branch can also comprise a non-expansion convolution layer besides the expansion convolution layer with the first expansion coefficient; the global feature branch may include, in addition to the expansion convolution layer of the second expansion coefficient, an expansion-free convolution layer and an expansion convolution layer of the first expansion coefficient. The processing procedure of the multi-scale feature extraction unit is explained below with reference to fig. 7.
In the pixel feature branch, the expansion-free convolution can be performed on the image to be processed to extract the features in the smallest receptive field. Illustratively, conv-1 represents the expansion-free convolution layer in the pixel feature branch, and the parameters thereof can be referred to in Table 1. The convolution kernel size is 1 × 1; the coefficient of expansion is (1, 1), i.e. no expansion; the channel number is 64, which represents that the pixel characteristics of 64 channels can be output, and the channel number can be set according to specific requirements; the activation function is ReLU (corrected Linear Unit), or other activation functions such as sigmoid and tanh may be used. Conv-1 is a single-pixel convolution, and can process a single pixel, extract high-frequency detail features of a pixel level and obtain pixel features.
In the local feature branch, the expansion convolution without expansion convolution and the expansion convolution of the first expansion coefficient may be sequentially performed on the image to be processed to obtain the local feature of the image to be processed. Namely, convolution is carried out successively by convolution layers with gradually increased receptive fields, so that the scale of the extracted features is gradually increased, and compared with direct adoption of expansion convolution, the loss of image information (particularly detail information) can be reduced. Illustratively, conv-2 represents the expansion-free convolutional layer in the local feature branch, and Conv-3 represents the expansion convolutional layer with the first expansion coefficient in the local feature branch, and the parameters thereof can be referred to Table 1. The convolution kernels are all 3 × 3 in size, and the first expansion coefficient is (2, 2), so that Con-3 is equivalent to a neighborhood feature capable of extracting 5 × 5 receptive field sizes; the number of channels is 64; the activation functions are all relus. Local features are obtained by sequential treatment of Conv-2 and Conv-3.
In the global feature branch, the image to be processed can be subjected to expansion-free convolution through the global feature branch to obtain a first feature image, then expansion convolution of a first expansion coefficient is performed to obtain a second feature image, and then expansion convolution of a second expansion coefficient is performed to obtain a third feature image. That is, the convolution is performed successively by the convolution layers with gradually increased receptive fields, so that the scale of the extracted features is gradually increased, namely, the scale of the features is gradually increased from the first feature image to the second feature image and then to the third feature image. Illustratively, conv-4 represents the non-expansion convolutional layer in the global feature branch, conv-5 represents the expansion convolutional layer with the first expansion coefficient in the global feature branch, and Conv-6 represents the expansion convolutional layer with the second expansion coefficient in the global feature branch, and the parameters thereof can be referred to as shown in Table 1. The convolution kernels are all 3 × 3 in size, the first expansion coefficient is (2, 2), and the second expansion coefficient is (4, 4), so that Con-6 corresponds to a neighborhood feature capable of extracting a 7 × 7 receptive field size; the number of channels is 64; the activation functions are all relus. The third feature image is obtained by processing Conv-4, conv-5 and Conv-6 in this order.
Further, each channel in the third feature image may be weighted by using the global average pooling value of each channel in the third feature image, so as to obtain the global feature of the image to be processed. The global average pooling value of each channel in the third feature image can represent the overall image characteristics to a certain extent, and each channel in the third feature image is weighted by taking the global average pooling value as a weight, so that the obtained global features have information of the overall image characteristics. For example, the third feature image is a 5 × 5 × 64 feature image, and global average pooling may be performed on each of the 64 channels to obtain a weight of 1 × 1 × 64, and then the weights are used to weight the 5 × 5 feature images of the 64 channels, so as to obtain a 5 × 5 × 1 global feature.
TABLE 1
Name of convolutional layer | Convolution kernel size | Coefficient of expansion | Number of output channels | Activating a function |
Conv-1 | 1×1 | (1,1) | 64 | ReLU |
Conv-2 | 3×3 | (1,1) | 64 | ReLU |
Conv-3 | 3×3 | (2,2) | 64 | ReLU |
Conv-4 | 3×3 | (1,1) | 64 | ReLU |
Conv-5 | 3×3 | (2,2) | 64 | ReLU |
Conv-6 | 3×3 | (4,4) | 64 | ReLU |
And fusing the pixel characteristics, the local characteristics and the global characteristics to obtain the multi-granularity basic characteristics. In the present exemplary embodiment, the fusion, combination, and the like may each include operations of addition, concatenation, and the like.
In one embodiment, the second gain submodel and the third gain submodel may each include one or more multi-scale feature extraction units. The bright channel information is a bright channel image, and the dark channel information is a dark channel image. Correspondingly, bright channel features can be extracted from the bright channel image through a multi-scale feature extraction unit in the second gain sub-model; and extracting dark channel features from the dark channel image by a multi-scale feature extraction unit in the third gain submodel.
Therefore, the bright channel features and the dark channel features of different scales can be comprehensively and fully extracted through the multi-scale feature extraction unit, and the richness of the bright channel features and the dark channel features is improved.
Fig. 6 shows that the second gain submodel and the third gain submodel each include two multi-scale feature extraction units, which are not limited by the present disclosure. The continuous feature extraction is performed through the multi-scale feature extraction units, so that the model can learn deeper features, and the feature quality is improved.
In addition to the multi-scale feature extraction unit, a feature mapping layer may be disposed in each of the first gain sub-model, the second gain sub-model, and the third gain sub-model, and is used for mapping a basic feature or a prior feature (including a bright channel feature or a dark channel feature) to a luminance gain space. Generally, the feature mapping layer may perform a function of dimension conversion, for example, if the basic feature or the prior feature is a multi-channel feature, and the luminance gain data is single-channel data, the multi-channel feature may be converted into the single-channel data through the feature mapping layer, so as to implement the dimension matching. Illustratively, the feature mapping layer may be a 3 × 3 × 1 convolutional layer.
It should be understood that in the luminance gain model, the multi-scale feature extraction unit may be disposed at different positions to extract different features. The parameters of the multi-scale feature extraction units at different positions are generally different, which is related to the training result of the model, but of course, may be the same, and this disclosure does not limit this. Similarly, the parameters of the convolution units at different positions in the luminance gain model may be the same or different, and the parameters of the feature mapping layers at different positions may be the same or different.
The brightness gain model and the processing thereof are explained above. The luminance gain data is output by processing of the luminance gain model.
With continued reference to fig. 2, in step S240, the pixel value mapping is performed on the image to be processed based on the luminance gain data, so as to obtain a high dynamic range image corresponding to the image to be processed.
The brightness gain data may include a brightness gain coefficient of each pixel point in the image to be processed, a linear coefficient when the coefficient is normal, or a brightness gain index for the whole image to be processed. The pixel value mapping refers to mapping an original pixel value in an image to be processed into a new pixel value through a relationship of a linear function, a power function, or the like, thereby converting the image to be processed into a high dynamic range image. For example, pixel values of the image to be processed may be multiplied by linear coefficients in the luminance gain data to complete the pixel value mapping.
In one embodiment, the image to be processed is an RGB image; the brightness gain data comprises a brightness gain coefficient of each pixel point in the image to be processed. The pixel value mapping of the image to be processed based on the brightness gain data to obtain the high dynamic range image corresponding to the image to be processed may include the following steps:
and based on the brightness gain coefficient of each pixel point, mapping the pixel values of all channels of each pixel point to the same degree, and forming a high dynamic range image by the pixel values of all channels mapped by each pixel point.
That is, for a certain pixel, the luminance gain coefficients of the R, G, B channels are the same. This is because, when the luminance gain is performed, if only the luminance component is adjusted, the ratio between the luminance component and the chrominance component is changed, which easily causes color distortion and deviates from the real scene color. When the two pixel values (R1, G1, B1) and (R2, G2, B2) in the RGB space are proportional, the image chromaticity is unchanged, i.e., the following relationship is satisfied:
where λ may be considered as a luminance gain factor. Thus, the pixel value mapping may satisfy the following relationship:
wherein R is HDR 、G HDR 、B HDR Representing R, G, B channel pixel values in the mapped high dynamic range image, R LDR 、G LDR 、B LDR And representing the R, G and B channel pixel values in the image to be processed before mapping. L is HDR And L LDR Respectively representing the brightness of the high dynamic range image and the image to be processed without obtaining L HDR And L LDR And outputting brightness gain data through a brightness gain model, wherein the brightness gain data comprises a brightness gain coefficient lambda of each pixel point, namely, calculating the pixel value of each channel after each pixel point is mapped through a formula (6), thereby forming a high dynamic range image. The method can ensure that the gain degrees of different color channels are the same, and the chroma of the image cannot be changed.
In the pixel value mapping process shown in formula (6), the pixel values before and after mapping are in a linear relationship, so that the original bit width of the image can be maintained, and for example, both the image to be processed and the high dynamic range image can be 8-bit or 10-bit images. Compared with the image to be processed, the data volume of the high dynamic range image is not obviously increased, so that the memory occupation is lower.
In one embodiment, the luminance gain data may include a luminance gain index in addition to the luminance gain coefficient, as indicated by a, and the pixel value mapping may satisfy the following relationship:
the linear and power function superposition relationship is adopted between the pixel values before and after mapping, the image bit width is usually changed, if the image to be processed is an image with 8bit width, the brightness gain index is greater than 1, and the mapped high dynamic range image can be an image with 10bit width, so that the high dynamic range image has a larger dynamic range, and the reality effect of brightness and color is favorably improved.
Fig. 8 shows a schematic flow of an image processing method. Firstly, the image to be processed is an RGB image and can be separated into three channel images of R, G and B. Then, referring to the above step S220, a luminance component, a bright channel image, a dark channel image, and high-frequency images in three directions are extracted from the image to be processed. Next, referring to step S230, processing the luminance component and the high-frequency image through the first gain sub-model in the luminance gain model to obtain a first gain component, processing the bright channel image through the second gain sub-model in the luminance gain model to obtain a second gain component, processing the dark channel image through the third gain sub-model in the luminance gain model to obtain a third gain component, and combining the first gain component, the second gain component, and the third gain component through the gain combination layer in the luminance gain model to obtain luminance gain data; and finally, multiplying the brightness gain data by each channel image respectively, wherein the pixel values at the same position can be multiplied, and the multiplication result is equivalent to the pixel value mapping of each channel image to obtain each mapped channel image, and finally a high dynamic range image in an RGB format is formed.
In one embodiment, the process of step S240 may be set to the brightness gain model. For example, an output layer for performing pixel value mapping on the image to be processed (or each channel image of the image to be processed) according to the luminance gain data and outputting a high dynamic range image may be added to the luminance gain model shown in fig. 6. And taking the high dynamic range image as a final output result of the brightness gain model. Therefore, the steps S230 and S240 can be performed based on the brightness gain model, so that the end-to-end processing is realized, and the deployment difficulty of the scheme can be reduced. Of course, the output layer does not introduce new learnable parameters, and the output layer is used as a part of the brightness gain model or a part other than the brightness gain model, and the training process of the brightness gain model is not affected, so the disclosure does not limit this.
Exemplary embodiments of the present disclosure also provide a method of training a luminance gain model. Referring to fig. 9, the image processing method may further include the following steps S910 to S940:
step S910, acquiring a first sample image and a high dynamic range reference image corresponding to the first sample image;
step S920, inputting the first sample image and the prior information of the first sample image into a brightness gain model to be trained, and outputting sample brightness gain data through the brightness gain model;
step S930, performing pixel value mapping on the first sample image by using the sample brightness gain data to obtain a second sample image;
in step S940, the parameters of the luminance gain model are updated by comparing the first sample image, the second sample image, and the high dynamic range reference image, so as to train the luminance gain model.
The first sample image is a sample image requiring luminance gain, and the high dynamic range reference image is a high-quality real HDR image corresponding to the first sample image and can be used as a label (ground route). The first sample image and the high dynamic range reference image form a paired supervised data set. The present disclosure does not limit the origin of the data set. Illustratively, a multi-exposure data set can be obtained, a low dynamic range image is selected from each scene, a square local block with the side length of 64, 128, 256 and the like is cut from the low dynamic range image by taking a random pixel point as a center, then random rotation is carried out, the size of the square local block is adjusted to 128 multiplied by 128, and Gaussian noise with the variance of 5-12 is added to form a first sample image. And performing multi-frame fusion on the selected low dynamic range image and the image of the same scene under different exposure degrees to form an HDR image, taking the HDR image with higher quality, and performing manual processing and the like to obtain a final high dynamic range reference image.
The prior information of the first sample image may be the same as the type of the prior information of the image to be processed, and the content of step S220 is referred to in the extraction process, which is not described herein again.
The processing procedure of step S920 is the same as step S230, and the processing procedure of step S930 is the same as step S240. For ease of distinction, the luminance gain data obtained during training is referred to as sample luminance gain data. The second sample image is a high dynamic range image corresponding to the first sample image, and of course, since the luminance gain model is not trained or not sufficiently trained, the accuracy of the sample luminance gain data may be low, resulting in a possible poor quality of the second sample image. Therefore, the first sample image, the second sample image and the high dynamic range reference image can be compared to construct a loss function, the parameter update gradient of the brightness gain model is calculated through the loss function value, and then the parameters are updated, so that the brightness gain model is trained.
In one embodiment, since the high dynamic range reference image is a label, the loss function value, such as L1 loss, L2 loss, etc., may be calculated according to the difference between the high dynamic range reference image and the second sample image, so as to update the parameters of the luminance gain model.
In one embodiment, the first sample image and the second sample image may be compared to ensure that the second sample image is not color-distorted compared to the first sample image, and the loss function value may be calculated according to the degree of color distortion, thereby updating the parameters of the brightness gain model.
In one embodiment, the first sample image, the second sample image, and the high dynamic range reference image are all RGB images. The above updating the parameters of the luminance gain model by comparing the first sample image, the second sample image and the high dynamic range reference image may include the following steps:
representing the first sample image and the second sample image as vectors containing three dimensions, and calculating cosine similarity between the first sample image and the second sample image; the three dimensions comprise an R channel image, a G channel image and a B channel image;
determining a first loss function value according to the structural similarity between the second sample image and the high dynamic range reference image and the cosine similarity between the first sample image and the second sample image;
parameters of the brightness gain model are updated based on the first loss function value.
The training target of the brightness gain model may include: the difference of the second sample image and the high dynamic range reference image is minimized. To preserve the image structure characteristics when performing the brightness gain, SSIM can be used, as follows:
since the RGB color image contains both luminance information and chrominance information, in order to keep the hue unchanged when performing luminance gain, a cosine similarity function can be introduced to prevent the proportion imbalance of the RGB three channels, as follows:
where X is the second sample image, Y is the high dynamic range reference image, and Z is the first sample image. X i And Z i Representing the ith channel component of images X and Z respectively, c representing the total number of image channels, c being equal to 3 in RGB space. By the constraint of the cosine similarity, it can be ensured that the images X and Z share the same hue.
Considering that the numerical range of SSIM and cosine similarity is between (0, 1), a first loss function can be constructed as follows:
Loss 1(X,Y,z) =1-SSIM(X,Y)+k1·[1-cos-similarity(X,Z)) (10)
where k1 is a weight for controlling the ratio of the two components, and may be set to 0.3, for example.
And updating the parameters of the brightness gain model through the first loss function value, so that the brightness gain model can keep the tone of the image unchanged and avoid color distortion while realizing high-quality brightness gain.
In one embodiment, the image processing method may further include the steps of:
gray value mapping is carried out on the gray image of the first sample image by utilizing the sample brightness gain data to obtain a gray image corresponding to the second sample image;
the above updating the parameters of the luminance gain model by comparing the first sample image, the second sample image and the high dynamic range reference image includes:
determining a first loss function value by comparing the first sample image, the second sample image and the high dynamic range reference image;
determining a second loss function value by comparing the gray level image corresponding to the second sample image with the gray level image corresponding to the high dynamic range reference image;
parameters of the brightness gain model are updated based on the first loss function value and the second loss function value.
The calculation process of the first loss function value may refer to equations (8) to (10).
The grayscale image corresponding to the second sample image may be a grayscale image (i.e., a luminance component of the second sample image) into which the second sample image is converted, and may be an image obtained by numerically mapping the grayscale image of the first sample image (i.e., the luminance component of the first sample image) by using the sample luminance gain data. The grayscale image corresponding to the high dynamic range reference image may be a luminance component of the high dynamic range reference image. By comparing the grayscale image corresponding to the second sample image with the grayscale image corresponding to the high dynamic range reference image, a second loss function can be constructed to ensure that the grayscale image corresponding to the second sample image is close to the grayscale image corresponding to the high dynamic range reference image. Illustratively, the second loss function may be constructed using the structural similarity between the two grayscale images, as follows:
Loss 2(X,Y,z) =1-SSIM(L x ,L Y ) (11)
the parameters of the brightness gain model are updated based on the first and second loss function values, e.g., by weighting the first and second loss function values, a composite loss function value is calculated for updating the model parameters. Can improve the training effect and accelerate the training process.
Fig. 10 shows a schematic diagram of training a luminance gain model. After the first sample image is obtained, extracting a brightness component, a bright channel image, a dark channel image and a high-frequency image from the first sample image; inputting the extracted information into a brightness gain model, and outputting sample brightness gain data; respectively carrying out pixel value mapping on the RGB channel images of the first sample image by using the sample brightness gain data, and generating a second sample image according to the RGB channel images subjected to pixel value mapping; determining a first loss function value by comparing the second sample image with the high dynamic range reference image; performing numerical value mapping on the brightness component of the first sample image by using the sample brightness gain data to obtain a gray sample image, wherein the gray sample image can also be regarded as a gray image corresponding to the second sample image; determining a second mismatch value by comparing the gray sample image with a gray reference image (i.e., a gray image corresponding to the high dynamic range reference image, which is also a luminance component of the high dynamic range reference image); calculating a composite loss function value by weighting the first loss function value and the second loss function value; parameters of the brightness gain model are updated based on the integrated loss function values.
The disclosure does not limit the hyper-parameters of the training set. Illustratively, the brightness gain model may be optimized using a batch size of 64 random gradient descent. The parameters in which momentum and weight decay may be set to 0.9 and 0.0001, respectively. Furthermore, the learning rate is initialized to 0.01 and decreases by one tenth every 30 rounds (epoch).
The completion of the training of the brightness gain model is determined by training the brightness gain model to achieve a predetermined training target, such as the qualification of the test index of the brightness gain model, the convergence of the loss function value (the first loss function value or the integrated loss function value), the achievement of a predetermined number of rounds, and the like. It can be deployed in an actual application environment, such as the method flow of fig. 2 can be executed to accomplish actual image processing.
Fig. 11 shows an example of processing effects of two images to be processed. As shown in fig. 11, the left side is the image to be processed, the middle is the high dynamic range image processed by the present scheme, and the right side is the high dynamic range reference image (i.e. the expected image). In the image to be processed, the dynamic range is low, and due to the existence of a highlight sky background, the ground part is underexposed, and the details are difficult to see. In the high dynamic range image processed by the scheme, the image contrast is high, the brightness of an under-exposed area is effectively improved while a high-light area is ensured, visitors on a grassland are clearly visible, the high-light sky area is well layered, the visual effect is good, and the high-dynamic range image is very close to a high dynamic range reference image.
Table 2 shows the performance comparison of this scheme with the existing inverse tone mapping algorithm model. It can be seen that the processed Image of the scheme has the largest PU-PSNR (visual uniformity-Peak Signal to Noise Ratio), PU-SSIM (visual uniformity-Structural Similarity), Q _ score (Q score), and the smallest NIQE (Natural Image Quality Evaluator), which indicates that the processed Image of the scheme has rich Image details and structures and higher Image Quality.
TABLE 2
Model name | PU-SSIM | PU-PSNR | Q_score | NIQE |
EUI | 0.714 | 32.545 | 52.502 | 3.390 |
LAN | 0.837 | 35.826 | 57.689 | 3.301 |
KOV | 0.826 | 38.534 | 57.684 | 3.414 |
HDRT | 0.637 | 30.817 | 48.074 | 8.188 |
ExpandNet | 0.828 | 38.232 | 52.419 | 3.019 |
MS-ATCNN | 0.911 | 40.450 | 57.740 | 3.019 |
This scheme | 0.924 | 41.642 | 58.863 | 2.949 |
Exemplary embodiments of the present disclosure also provide an image processing apparatus. Referring to fig. 12, the image processing apparatus 1200 may include:
an image acquisition module 1210 configured to acquire an image to be processed;
a priori information extraction module 1220 configured to extract a priori information from the image to be processed, the priori information including one or more of bright channel information, dark channel information, and high frequency information;
the brightness gain processing module 1230 is configured to extract basic features from the image to be processed by using a pre-trained brightness gain model, extract prior features from the prior information, and obtain brightness gain data by combining the basic features and the prior features;
the pixel value mapping module 1240 is configured to perform pixel value mapping on the image to be processed based on the brightness gain data to obtain a high dynamic range image corresponding to the image to be processed.
In one embodiment, the a priori information includes bright channel information, dark channel information, high frequency information; the prior characteristics comprise bright channel characteristics, dark channel characteristics and high-frequency characteristics.
In one embodiment, the luminance gain model includes a first gain sub-model, a second gain sub-model, a third gain sub-model, and a gain combination layer; the above-mentioned luminance gain data that is obtained by extracting the basic feature to the image to be processed by using the pre-trained luminance gain model, extracting the prior feature from the prior information, and combining the basic feature and the prior feature includes:
extracting basic features of an image to be processed through a first gain sub-model, extracting high-frequency features of high-frequency information, fusing the basic features and the high-frequency features, and mapping the fused basic features and the fused high-frequency features to a brightness gain space to obtain a first gain component;
extracting bright channel characteristics from the bright channel information through a second gain submodel, and mapping the bright channel characteristics to a brightness gain space to obtain a second gain component;
extracting dark channel characteristics from the dark channel information through a third gain submodel, and mapping the dark channel characteristics to a brightness gain space to obtain a third gain component;
and combining the first gain component, the second gain component and the third gain component through the gain combination layer to obtain brightness gain data.
In one embodiment, the high frequency information includes high frequency information in at least two directions; the extracting basic features of the image to be processed through the first gain sub-model, extracting high-frequency features of the high-frequency information, and mapping the basic features and the high-frequency features to the brightness gain space after being fused to obtain a first gain component includes:
extracting basic features from the image to be processed through a first gain sub-model, extracting corresponding high-frequency features from the high-frequency features in each direction, sequentially fusing the basic features and the high-frequency features in each direction, and mapping the fused features to a brightness gain space to obtain a first gain component.
In one embodiment, the first gain submodel includes one or more multi-scale feature extraction units; the multi-scale feature extraction unit comprises a pixel feature branch, a local feature branch, a global feature branch and a feature fusion layer; the pixel characteristic branch comprises a non-expansion convolution layer, the local characteristic branch comprises an expansion convolution layer with a first expansion coefficient, and the global characteristic branch comprises an expansion convolution layer with a second expansion coefficient; the first coefficient of expansion is less than the second coefficient of expansion; the extracting of the basic features of the image to be processed through the first gain submodel includes:
extracting pixel characteristics of the image to be processed through the pixel characteristic branch;
extracting local features of the image to be processed through the local feature branches;
extracting global features of the image to be processed through the global feature branch;
and fusing the pixel characteristics, the local characteristics and the global characteristics of the image to be processed through the characteristic fusion layer to obtain the basic characteristics of the image to be processed.
In one embodiment, the local feature branches further comprise an unexpanded convolutional layer; the above extracting the local features of the image to be processed through the local feature branch includes:
sequentially carrying out expansion convolution without expansion convolution and expansion convolution of a first expansion coefficient on the image to be processed through the local characteristic branch to obtain the local characteristic of the image to be processed;
the global feature branch further comprises an expansion-free convolutional layer and an expansion convolutional layer with a first expansion coefficient; the above extracting global features of the image to be processed through the global feature branch includes:
and sequentially carrying out expansion-free convolution, expansion convolution of the first expansion coefficient and expansion convolution of the second expansion coefficient on the image to be processed through the global feature branch to obtain a third feature image, and weighting each channel in the third feature image by using the global average pooling value of each channel in the third feature image to obtain the global feature of the image to be processed.
In one embodiment, the second gain submodel and the third gain submodel each include one or more multi-scale feature extraction units; the bright channel information is a bright channel image, and the dark channel information is a dark channel image; the above-mentioned bright passageway characteristic of bright passageway information extraction through the second gain submodel includes:
extracting bright channel characteristics from the bright channel image through a multi-scale characteristic extraction unit in the second gain submodel;
the above dark channel feature extraction for the dark channel information by the third gain submodel includes:
and extracting dark channel features from the dark channel image by a multi-scale feature extraction unit in the third gain submodel.
In an embodiment, the extracting the prior information from the image to be processed includes:
traversing each pixel point in the image to be processed, and respectively taking the maximum value and the minimum value in each channel value of each pixel point in a local area taking the pixel point as the center as the bright channel value and the dark channel value of the pixel point to obtain the bright channel information and the dark channel information of the image to be processed.
In one embodiment, the high frequency information includes horizontal direction high frequency information, vertical direction high frequency information, and diagonal direction high frequency information; the extracting of the prior information from the image to be processed includes:
acquiring a pre-configured mean value basis function, a pre-configured horizontal difference value basis function and a pre-configured vertical difference value basis function;
performing wavelet transformation on the image to be processed by using the mean basis function and the horizontal difference basis function to obtain high-frequency information in the horizontal direction;
performing wavelet transformation on the image to be processed by using the mean value basis function and the vertical difference value basis function to obtain high-frequency information in the vertical direction;
and performing wavelet transformation on the image to be processed by using the horizontal difference value basis function and the vertical difference value basis function to obtain high-frequency information in the diagonal direction.
In one embodiment, the image processing apparatus 1200 may further include a model training module configured to:
acquiring a first sample image and a high dynamic range reference image corresponding to the first sample image;
inputting the first sample image into a brightness gain model to be trained, and outputting sample brightness gain data through the brightness gain model;
performing pixel value mapping on the first sample image by using the sample brightness gain data to obtain a second sample image;
and updating parameters of the brightness gain model by comparing the first sample image, the second sample image and the high dynamic range reference image so as to train the brightness gain model.
In one embodiment, the first sample image, the second sample image, and the high dynamic range reference image are RGB images; the above updating the parameters of the luminance gain model by comparing the first sample image, the second sample image and the high dynamic range reference image includes:
representing the first sample image and the second sample image as vectors containing three dimensions, and calculating cosine similarity between the first sample image and the second sample image; the three dimensions comprise an R channel image, a G channel image and a B channel image;
determining a first loss function value according to the structural similarity between the second sample image and the high dynamic range reference image and the cosine similarity between the first sample image and the second sample image;
parameters of the brightness gain model are updated based on the first loss function value.
In one embodiment, the model training module is further configured to:
gray value mapping is carried out on the gray image of the first sample image by utilizing the sample brightness gain data to obtain a gray image corresponding to the second sample image;
the updating of the parameter of the luminance gain model by comparing the first sample image, the second sample image and the high dynamic range reference image includes:
determining a first loss function value by comparing the first sample image, the second sample image and the high dynamic range reference image;
determining a second loss function value by comparing the grayscale image corresponding to the second sample image with the grayscale image corresponding to the high dynamic range reference image;
parameters of the brightness gain model are updated based on the first loss function value and the second loss function value.
In one embodiment, the image to be processed is an RGB image; the brightness gain data comprises a brightness gain coefficient of each pixel point in the image to be processed; the above mapping the pixel value of the image to be processed based on the brightness gain data to obtain the high dynamic range image corresponding to the image to be processed includes:
and based on the brightness gain coefficient of each pixel point, mapping the pixel values of all channels of each pixel point to the same degree, and forming a high dynamic range image by the pixel values of all channels mapped by each pixel point.
The specific details of each part in the above device have been described in detail in the method part embodiments, and details that are not disclosed may be referred to in the method part embodiments, and thus are not described again.
Exemplary embodiments of the present disclosure also provide a computer-readable storage medium, which may be implemented in the form of a program product, including program code for causing an electronic device to perform the steps according to various exemplary embodiments of the present disclosure described in the above-mentioned "exemplary method" section of this specification, when the program product is run on the electronic device. In an alternative embodiment, the program product may be embodied as a portable compact disc read only memory (CD-ROM) and include program code, and may be run on an electronic device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In situations involving remote computing devices, the remote computing devices may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to external computing devices (e.g., through the internet using an internet service provider).
Exemplary embodiments of the present disclosure also provide an electronic device, such as the terminal 110 or the server 120 described above. The electronic device may include a processor and a memory. The memory stores executable instructions of the processor, such as may be program code. The processor executes the executable instructions to perform the image processing method in the present exemplary embodiment.
The following takes the mobile terminal 1300 in fig. 13 as an example, and exemplifies the configuration of the electronic device. It will be appreciated by those skilled in the art that the configuration of figure 13 can also be applied to fixed type devices, in addition to components specifically intended for mobile purposes.
As shown in fig. 13, the mobile terminal 1300 may specifically include: a processor 1301, a memory 1302, a bus 1303, a mobile communication module 1304, an antenna 1, a wireless communication module 1305, an antenna 2, a display screen 1306, a camera module 1307, an audio module 1308, a power module 1309, and a sensor module 1310.
The memory 1302 may be used to store computer-executable program code, which includes instructions. The processor 1301 executes various functional applications and data processing of the mobile terminal 1300 by executing instructions stored in the memory 1302. The memory 1302 may also store application data, such as files for storing images, videos, and the like.
The communication function of the mobile terminal 1300 may be implemented by the mobile communication module 1304, the antenna 1, the wireless communication module 1305, the antenna 2, the modem processor, the baseband processor, and the like. The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. The mobile communication module 1304 may provide mobile communication solutions such as 3G, 4G, 5G, etc. applied to the mobile terminal 1300. The wireless communication module 1305 may provide a wireless communication solution such as wireless lan, bluetooth, near field communication, etc. applied to the mobile terminal 1300.
The display 1306 is used to implement display functions such as displaying user interfaces, images, videos, and the like.
The camera module 1307 is used to implement a shooting function. In one embodiment, an ISP may be provided in the camera module 1307, and the image to be processed is output by the ISP.
The audio module 1308 is used for implementing audio functions, such as playing audio, collecting voice, and the like.
The power module 1309 is used to implement power management functions, such as charging a battery, supplying power to a device, monitoring a battery status, and the like.
The sensor module 1310 may include one or more sensors for implementing corresponding sensing functions.
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit, according to exemplary embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
As will be appreciated by one skilled in the art, aspects of the present disclosure may be embodied as a system, method or program product. Accordingly, various aspects of the present disclosure may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.), or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system. Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements that have been described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is to be limited only by the terms of the appended claims.
Claims (16)
1. An image processing method, comprising:
acquiring an image to be processed;
extracting prior information from the image to be processed, wherein the prior information comprises one or more of bright channel information, dark channel information and high-frequency information;
extracting basic features of the image to be processed by using a pre-trained brightness gain model, extracting prior features of the prior information, and obtaining brightness gain data by combining the basic features and the prior features;
and mapping the pixel value of the image to be processed based on the brightness gain data to obtain a high dynamic range image corresponding to the image to be processed.
2. The method of claim 1, wherein the a priori information comprises bright channel information, dark channel information, high frequency information; the prior characteristics comprise bright channel characteristics, dark channel characteristics and high-frequency characteristics.
3. The method of claim 2, wherein the brightness gain model comprises a first gain sub-model, a second gain sub-model, a third gain sub-model, and a gain combination;
the method for extracting basic features of the image to be processed by using a pre-trained brightness gain model, extracting prior features of the prior information, and obtaining brightness gain data by combining the basic features and the prior features comprises the following steps:
extracting basic features from the image to be processed through the first gain sub-model, extracting high-frequency features from the high-frequency information, fusing the basic features and the high-frequency features, and mapping the fused basic features and the fused high-frequency features to a brightness gain space to obtain a first gain component;
extracting bright channel characteristics from the bright channel information through the second gain submodel, and mapping the bright channel characteristics to the brightness gain space to obtain a second gain component;
extracting dark channel characteristics from the dark channel information through the third gain submodel, and mapping the dark channel characteristics to the brightness gain space to obtain a third gain component;
and combining the first gain component, the second gain component and the third gain component through the gain combination layer to obtain the brightness gain data.
4. The method according to claim 3, wherein the high frequency information comprises high frequency information in at least two directions;
extracting basic features from the image to be processed through the first gain sub-model, extracting high-frequency features from the high-frequency information, fusing the basic features and the high-frequency features, and mapping the fused basic features and the fused high-frequency features to a brightness gain space to obtain a first gain component, wherein the method comprises the following steps:
extracting basic features from the image to be processed through the first gain sub-model, extracting corresponding high-frequency features from the high-frequency features in each direction, sequentially fusing the basic features and the high-frequency features in each direction, and mapping the fused features to the brightness gain space to obtain the first gain component.
5. The method of claim 3, wherein the first gain submodel includes one or more multi-scale feature extraction units; the multi-scale feature extraction unit comprises a pixel feature branch, a local feature branch, a global feature branch and a feature fusion layer; the pixel feature branch comprises a non-expansion convolutional layer, the local feature branch comprises an expansion convolutional layer with a first expansion coefficient, and the global feature branch comprises an expansion convolutional layer with a second expansion coefficient; the first coefficient of expansion is less than the second coefficient of expansion;
the extracting basic features of the image to be processed through the first gain submodel comprises the following steps:
extracting the pixel characteristics of the image to be processed through the pixel characteristic branch;
extracting local features of the image to be processed through the local feature branches;
extracting global features of the image to be processed through the global feature branches;
and fusing the pixel characteristics, the local characteristics and the global characteristics of the image to be processed through the characteristic fusion layer to obtain the basic characteristics of the image to be processed.
6. The method of claim 5, wherein the local feature branches further comprise an unexpanded convolutional layer; the extracting the local feature of the image to be processed through the local feature branch comprises:
sequentially carrying out expansion-free convolution and expansion convolution of the first expansion coefficient on the image to be processed through the local feature branch to obtain local features of the image to be processed;
the global feature branch further comprises an expansion-free convolutional layer and an expansion convolutional layer of the first expansion coefficient; the extracting the global features of the image to be processed through the global feature branch comprises the following steps:
and sequentially performing expansion-free convolution, expansion convolution of the first expansion coefficient and expansion convolution of the second expansion coefficient on the image to be processed through the global feature branch to obtain a third feature image, and weighting each channel in the third feature image by using the global average pooling value of each channel in the third feature image to obtain the global feature of the image to be processed.
7. The method of claim 5, wherein the second gain submodel and the third gain submodel each comprise one or more multi-scale feature extraction units; the bright channel information is a bright channel image, and the dark channel information is a dark channel image;
the extracting bright channel features from the bright channel information by the second gain submodel includes:
extracting bright channel features from the bright channel image by a multi-scale feature extraction unit in the second gain sub-model;
the extracting dark channel characteristics from the dark channel information by the third gain submodel includes:
and extracting dark channel features from the dark channel image by a multi-scale feature extraction unit in the third gain sub-model.
8. The method of claim 2, wherein the extracting a priori information from the image to be processed comprises:
traversing each pixel point in the image to be processed, and respectively taking the maximum value and the minimum value of each channel value of each pixel point in a local area taking the pixel point as the center as a bright channel value and a dark channel value of the pixel point to obtain the bright channel information and the dark channel information of the image to be processed.
9. The method according to claim 2, wherein the high frequency information includes horizontal direction high frequency information, vertical direction high frequency information, diagonal direction high frequency information; the extracting of the prior information from the image to be processed includes:
acquiring a pre-configured mean value basis function, a pre-configured horizontal difference value basis function and a pre-configured vertical difference value basis function;
performing wavelet transformation on the image to be processed by using the average basis function and the horizontal difference basis function to obtain horizontal high-frequency information;
performing wavelet transformation on the image to be processed by using the average value basis function and the vertical difference value basis function to obtain the vertical high-frequency information;
and performing wavelet transformation on the image to be processed by using the horizontal difference value basis function and the vertical difference value basis function to obtain the high-frequency information in the diagonal direction.
10. The method of claim 1, further comprising:
acquiring a first sample image and a high dynamic range reference image corresponding to the first sample image;
inputting the first sample image and the prior information of the first sample image into the brightness gain model to be trained, and outputting sample brightness gain data through the brightness gain model;
performing pixel value mapping on the first sample image by using the sample brightness gain data to obtain a second sample image;
updating parameters of the brightness gain model by comparing the first sample image, the second sample image, and the high dynamic range reference image to train the brightness gain model.
11. The method of claim 10, wherein the first sample image, the second sample image, and the high dynamic range reference image are all RGB images; the updating the parameters of the brightness gain model by comparing the first sample image, the second sample image and the high dynamic range reference image comprises:
representing the first sample image and the second sample image as vectors containing three dimensions, and calculating cosine similarity between the first sample image and the second sample image; the three dimensions comprise an R channel image, a G channel image and a B channel image;
determining a first loss function value according to the structural similarity between the second sample image and the high dynamic range reference image and the cosine similarity between the first sample image and the second sample image;
updating parameters of the brightness gain model based on the first loss function value.
12. The method of claim 10, further comprising:
performing gray value mapping on the gray image of the first sample image by using the sample brightness gain data to obtain a gray image corresponding to the second sample image;
the updating the parameters of the brightness gain model by comparing the first sample image, the second sample image, and the high dynamic range reference image comprises:
determining a first loss function value by comparing the first sample image, the second sample image, and the high dynamic range reference image;
determining a second loss function value by comparing the grayscale image corresponding to the second sample image with the grayscale image corresponding to the high dynamic range reference image;
updating a parameter of the brightness gain model based on the first loss function value and the second loss function value.
13. The method according to claim 1, wherein the image to be processed is an RGB image; the brightness gain data comprises a brightness gain coefficient of each pixel point in the image to be processed; the pixel value mapping is performed on the image to be processed based on the brightness gain data to obtain a high dynamic range image corresponding to the image to be processed, and the method comprises the following steps:
and performing equal-degree mapping on the pixel value of each channel of each pixel point based on the brightness gain coefficient of each pixel point, and forming the high dynamic range image by using the pixel value of each channel mapped by each pixel point.
14. An image processing apparatus characterized by comprising:
an image acquisition module configured to acquire an image to be processed;
a priori information extraction module configured to extract a priori information from the image to be processed, wherein the priori information comprises one or more of bright channel information, dark channel information and high-frequency information;
the brightness gain processing module is configured to extract basic features from the image to be processed by using a pre-trained brightness gain model, extract prior features from the prior information, and obtain brightness gain data by combining the basic features and the prior features;
and the pixel value mapping module is configured to perform pixel value mapping on the image to be processed based on the brightness gain data to obtain a high dynamic range image corresponding to the image to be processed.
15. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of claims 1 to 13.
16. An electronic device, comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the method of any of claims 1 to 13 via execution of the executable instructions.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211242710.6A CN115619666A (en) | 2022-10-11 | 2022-10-11 | Image processing method, image processing apparatus, storage medium, and electronic device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211242710.6A CN115619666A (en) | 2022-10-11 | 2022-10-11 | Image processing method, image processing apparatus, storage medium, and electronic device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115619666A true CN115619666A (en) | 2023-01-17 |
Family
ID=84862145
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211242710.6A Pending CN115619666A (en) | 2022-10-11 | 2022-10-11 | Image processing method, image processing apparatus, storage medium, and electronic device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115619666A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117094895A (en) * | 2023-09-05 | 2023-11-21 | 杭州一隅千象科技有限公司 | Image panorama stitching method and system |
-
2022
- 2022-10-11 CN CN202211242710.6A patent/CN115619666A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117094895A (en) * | 2023-09-05 | 2023-11-21 | 杭州一隅千象科技有限公司 | Image panorama stitching method and system |
CN117094895B (en) * | 2023-09-05 | 2024-03-26 | 杭州一隅千象科技有限公司 | Image panorama stitching method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111741211B (en) | Image display method and apparatus | |
CN113284054B (en) | Image enhancement method and image enhancement device | |
CN102341825B (en) | Multi-modal tone-mapping of images | |
CN111915526A (en) | Photographing method based on brightness attention mechanism low-illumination image enhancement algorithm | |
CN111598776B (en) | Image processing method, image processing device, storage medium and electronic apparatus | |
CN114119378A (en) | Image fusion method, and training method and device of image fusion model | |
CN109785252B (en) | Night image enhancement method based on multi-scale residual error dense network | |
CN112348747A (en) | Image enhancement method, device and storage medium | |
CN111372006B (en) | High dynamic range imaging method and system for mobile terminal | |
CN113962859B (en) | Panorama generation method, device, equipment and medium | |
CN113284055A (en) | Image processing method and device | |
CN111242860A (en) | Super night scene image generation method and device, electronic equipment and storage medium | |
CN115984570A (en) | Video denoising method and device, storage medium and electronic device | |
CN112927162A (en) | Low-illumination image oriented enhancement method and system | |
CN116547694A (en) | Method and system for deblurring blurred images | |
CN115619666A (en) | Image processing method, image processing apparatus, storage medium, and electronic device | |
CN116309116A (en) | Low-dim-light image enhancement method and device based on RAW image | |
CN116957948A (en) | Image processing method, electronic product and storage medium | |
CN115311149A (en) | Image denoising method, model, computer-readable storage medium and terminal device | |
CN115063301A (en) | Video denoising method, video processing method and device | |
CN111861940A (en) | Image toning enhancement method based on condition continuous adjustment | |
CN114638764B (en) | Multi-exposure image fusion method and system based on artificial intelligence | |
WO2023110878A1 (en) | Image processing methods and systems for generating a training dataset for low-light image enhancement using machine learning models | |
CN115330633A (en) | Image tone mapping method and device, electronic equipment and storage medium | |
CN115601792A (en) | Cow face image enhancement method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |