CN115037915B - Video processing method and processing device - Google Patents

Video processing method and processing device Download PDF

Info

Publication number
CN115037915B
CN115037915B CN202110245949.8A CN202110245949A CN115037915B CN 115037915 B CN115037915 B CN 115037915B CN 202110245949 A CN202110245949 A CN 202110245949A CN 115037915 B CN115037915 B CN 115037915B
Authority
CN
China
Prior art keywords
processed
image
position information
video frame
mth
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110245949.8A
Other languages
Chinese (zh)
Other versions
CN115037915A (en
Inventor
李�瑞
张俪耀
陆洋
刘蒙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202110245949.8A priority Critical patent/CN115037915B/en
Publication of CN115037915A publication Critical patent/CN115037915A/en
Application granted granted Critical
Publication of CN115037915B publication Critical patent/CN115037915B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N9/00Details of colour television systems
    • H04N9/64Circuits for processing colour signals
    • H04N9/646Circuits for processing colour signals for image enhancement, e.g. vertical detail restoration, cross-colour elimination, contour correction, chrominance trapping filters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Studio Devices (AREA)

Abstract

The application provides a video processing method and a video processing device. According to the technical scheme provided by the application, the position information of the shooting device at the imaging moment of one or more images to be processed in the video frames to be processed is obtained, the ideal motion curve of the shooting device is estimated according to the position information corresponding to a plurality of video frames in the video frames to be processed, the position information of the shooting device at the imaging moment of the shooting device corresponding to the one or more images to be processed in the video frames to be processed on the ideal motion curve is used as the ideal position information of the shooting device in the video frames to be processed, then the transformation matrix capable of correcting the position information of the imaging moment of all the images to be processed in the video frames to be processed is calculated, and finally the target HDR image of the video frames to be processed is fused according to all the images to be processed and the corresponding transformation matrix in the video frames to be processed, so that the target video is further generated. The method avoids the amplification of the ghost, improves the HDR fusion precision and reduces the occurrence probability of the ghost.

Description

Video processing method and processing device
Technical Field
The present application relates to the field of digital image processing, and in particular, to a video processing method and a processing apparatus.
Background
With the rapid development of video image technology, the requirements for video viewing experience are also increasing, and high dynamic range (high dynamic range, HDR) video has been gradually applied to the field of film and video special effects. Compared to low dynamic range (low dynamic range, LDR) images, HDR video can present wider brightness and more colors, enabling better presentation of visual effects.
In the method for obtaining the HDR video, the HDR image of each frame is fused by using the LDR images with different exposure time, then the HDR video is generated according to the HDR images of all frames, and finally the video stabilization is carried out on the HDR video, so that the obtained HDR video has stability.
Unacceptable floaters often appear in the HDR video obtained by the method, which affects the visual experience of users.
Disclosure of Invention
The application provides a video processing method and a processing device, which can avoid the problem that the ghost of an image in an obtained HDR video is amplified, improve the precision of HDR fusion and reduce the occurrence probability of the ghost.
In a first aspect, the present application provides a video processing method. The method comprises the following steps: obtaining a video to be processed, wherein the video to be processed comprises J video frames to be processed, each video frame to be processed in the J video frames to be processed comprises K images to be processed, the K images to be processed are in one-to-one correspondence with K exposure time lengths, and the J and the K are integers larger than 1; acquiring position information of imaging moment of a first to-be-processed image in K to-be-processed images contained in an mth to-be-processed video frame in J to-be-processed video frames by a shooting device of the to-be-processed video, wherein m is an integer and is taken from 1 to J; estimating a motion curve of the shooting device when shooting the video to be processed according to the position information of the shooting device at the imaging moment of the first image to be processed, wherein the first image to be processed corresponds to all the video frames to be processed in the J video frames to be processed respectively; determining the position information of the imaging moment of the first to-be-processed image in the mth to-be-processed video frame on the motion curve of the shooting device as ideal position information of the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame of the shooting device, wherein n is an integer and is taken from 1 to K; determining a first transformation matrix from the position information of the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame to the ideal position information of the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame according to the ideal position information of the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame by the shooting device and the position information of the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame by the shooting device; according to a first transformation matrix between the position information of the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame and the ideal position information of the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame, carrying out high dynamic range HDR fusion processing on the K to-be-processed images in the mth to-be-processed video frame to obtain a target video frame corresponding to the mth to-be-processed video frame, wherein J target video frames corresponding to the J to-be-processed video frames form a target video.
According to the video processing method provided by the application, before HDR fusion is carried out, the first transformation matrix between the position information of all the to-be-processed images in each to-be-processed video frame at the imaging moment and the ideal position information is calculated, namely, the video stabilizing image is placed before the HDR fusion, so that the amplification of ghost images in the video stabilizing image is avoided. Through the first transformation matrix, all the images to be processed in each video frame to be processed can be corrected to ideal positions, that is, before HDR fusion, the position information of all the images to be processed in each video frame to be processed is registered to the images shot by the shooting device at the same ideal position, so that all the images to be processed are registered once, and further registration fusion can be performed through a subsequent HDR fusion module, so that registration between coarse and fine images to be processed is formed. Therefore, the method also improves the precision of HDR fusion and reduces the probability of ghost occurrence.
With reference to the first aspect, in a possible implementation manner, the image to be processed includes h rows of pixels, and the method further includes: determining a second transformation matrix from the position information of the imaging moment of the ith row of the nth to-be-processed image in the mth to the ideal position information of the imaging moment of the ith row of the nth to-be-processed image in the mth to-be-processed video frame according to the ideal position information of the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame of the shooting device and the position information of the imaging moment of the ith row of the nth to-be-processed image in the mth to-be-processed video frame of the shooting device, wherein h is an integer greater than 1, i is an integer and is taken from 1 to h; correspondingly, according to a first transformation matrix between the position information of the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame and the ideal position information of the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame, performing high dynamic range HDR fusion processing on the K to-be-processed images in the mth to-be-processed video frame to obtain a target video frame corresponding to the mth to-be-processed video frame, wherein J target video frames corresponding to the J to-be-processed video frames form a target video, and the method comprises the following steps: and performing high dynamic range HDR fusion processing on K to-be-processed images in the m to-be-processed video frames according to a first transformation matrix between the position information of the imaging moment of the n to-be-processed images in the m to-be-processed video frames and ideal position information of the imaging moment of the i row of the n to-be-processed images in the m to-be-processed video frames and a second transformation matrix between the position information of the imaging moment of the i row of the n to-be-processed images in the m to-be-processed video frames and ideal position information of the imaging moment of the i row of the n to-be-processed images in the m to-be-processed video frames, so as to obtain target video frames corresponding to the m to-be-processed video frames, wherein J target video frames corresponding to the J to-be-processed video frames form target videos.
The video processing method provided by the application not only calculates the first transformation matrix for correcting each image to be processed to the ideal position, but also calculates the second transformation matrix for correcting each line on each image to be processed to the ideal position, so that under the condition that the whole image to be processed is corrected to the ideal position, each line in the image to be processed is also ensured to be corrected to the ideal position, thereby completing the registration between the inner lines of each image to be processed, improving the accuracy of each image to be processed in shooting, and further improving the registration degree between the images to be processed in each video frame, thereby greatly reducing the ghost problem occurring in fusion.
With reference to the first aspect, in one possible implementation manner, the capturing device for acquiring the video to be processed includes, in the J video frames to be processed, position information of imaging time of a first image to be processed in the K images to be processed, where the m video frame to be processed includes: according to the information recorded by the motion sensor, obtaining sensor information of imaging time of a first to-be-processed image in K to-be-processed images in an mth to-be-processed video frame in J to-be-processed video frames by the shooting device through an interpolation function; and integrating the sensor information of the imaging moment of the first image to be processed to obtain the position information of the imaging moment of the first image to be processed.
With reference to the first aspect, in one possible implementation manner, the motion sensor includes a gyroscope or an inertial measurement unit.
With reference to the first aspect, in one possible implementation manner, according to ideal position information of imaging time of an nth to-be-processed image in an mth to-be-processed video frame of the photographing device and position information of imaging time of an nth to-be-processed image in an mth to-be-processed video frame of the photographing device, determining a first transformation matrix between the position information of imaging time of the nth to-be-processed image in the mth to-be-processed video frame to the ideal position information of imaging time of the nth to-be-processed image in the mth to-be-processed video frame, includes: calculating an ideal position information of a shooting device at the imaging moment of an nth to-be-processed image in an mth to-be-processed video frame and a transformation matrix R between the position information of the shooting device at the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame; transform matrix R, according to formula TRT -1 And transforming to obtain a first transformation matrix, wherein T is a parameter matrix of the shooting device.
With reference to the first aspect, in one possible implementation manner, according to a first transformation matrix between actual position information corresponding to imaging time of an nth to-be-processed image in the mth to-be-processed video frame and corresponding ideal position information, performing high dynamic range HDR fusion processing on K to-be-processed images in the mth to-be-processed video frame to obtain a target video frame corresponding to the mth to-be-processed video frame, including: carrying out affine transformation on an nth to-be-processed image in an mth to-be-processed video frame through a corresponding first transformation matrix to obtain an image after affine transformation of the nth to-be-processed image in the mth to-be-processed video frame; inputting the affine transformed image into an HDR fusion module to generate a target video frame corresponding to the mth video frame to be processed; or, inputting the nth to-be-processed image and the corresponding first transformation matrix in the mth to-be-processed video frame to the HDR fusion module at the same time to generate a target video frame corresponding to the mth to-be-processed video frame.
With reference to the first aspect, in one possible implementation manner, each image to be processed is a native RAW image.
With reference to the first aspect, in one possible implementation manner, the K images to be processed include a first exposure image, a second exposure image, and a third exposure image, where the first exposure image, the second exposure image, and the third exposure image are in one-to-one correspondence with a first exposure duration, a second exposure duration, and a third exposure duration, where the first exposure duration is longer than the second exposure duration, the second exposure duration is longer than the third exposure duration, and the second exposure image is the first image to be processed.
In a second aspect, the present application provides a video processing apparatus comprising: the device comprises a video acquisition module to be processed, a video processing module and a video processing module, wherein the video processing module is used for acquiring video to be processed, the video to be processed comprises J video frames to be processed, each video frame to be processed in the J video frames to be processed comprises K images to be processed, the K images to be processed are in one-to-one correspondence with K exposure time lengths, and J and K are integers larger than 1; the device comprises a first to-be-processed image position information acquisition module, a second to-be-processed image position information acquisition module and a processing module, wherein the first to-be-processed image position information acquisition module is used for acquiring the position information of an imaging moment of a first to-be-processed image in K to-be-processed images, wherein the m to-be-processed video frame is an m to-be-processed video frame in J to-be-processed video frames, and m is an integer and is taken from 1 to J; the estimating module is used for estimating a motion curve when the shooting device shoots the video to be processed according to the position information of the shooting device at the imaging moment of the first image to be processed, which corresponds to all the video frames to be processed in the J video frames to be processed respectively; the ideal position information determining module is used for determining the position information of the imaging moment of the first to-be-processed image in the mth to-be-processed video frame of the shooting device on the motion curve as the ideal position information of the imaging moment of the nth to-be-processed image of the shooting device in the mth to-be-processed video frame, wherein n is an integer and is taken from 1 to K; a first transformation matrix determining module, configured to determine a first transformation matrix between the position information of the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame and the ideal position information of the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame according to the ideal position information of the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame by the shooting device and the position information of the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame by the shooting device; the fusion module is used for carrying out high dynamic range HDR fusion processing on K to-be-processed images in the m-th to-be-processed video frame according to a first transformation matrix between the position information of the imaging moment of the n-th to-be-processed image in the m-th to-be-processed video frame and the ideal position information of the imaging moment of the n-th to-be-processed image in the m-th to-be-processed video frame, so as to obtain a target video frame corresponding to the m-th to-be-processed video frame, wherein J target video frames corresponding to the J to-be-processed video frames form a target video.
With reference to the second aspect, in a possible implementation manner, the image to be processed includes h rows of pixels, and the apparatus further includes: a second transformation matrix determining module, configured to determine a second transformation matrix between position information of imaging time of an ith row of an nth to-be-processed image in the mth to-be-processed video frame and ideal position information of imaging time of an ith row of the nth to-be-processed image in the mth to-be-processed video frame according to ideal position information of imaging time of the nth to-be-processed image in the mth to-be-processed video frame of the photographing device and position information of imaging time of an ith row of the nth to-be-processed image in the mth to-be-processed video frame of the photographing device, where h is an integer greater than 1, i is an integer, and is taken from 1 to h; correspondingly, the fusion module is further configured to: and performing high dynamic range HDR fusion processing on K to-be-processed images in the m to-be-processed video frames according to a first transformation matrix between the position information of the imaging moment of the n to-be-processed images in the m to-be-processed video frames and ideal position information of the imaging moment of the i row of the n to-be-processed images in the m to-be-processed video frames and a second transformation matrix between the position information of the imaging moment of the i row of the n to-be-processed images in the m to-be-processed video frames and ideal position information of the imaging moment of the i row of the n to-be-processed images in the m to-be-processed video frames, so as to obtain target video frames corresponding to the m to-be-processed video frames, wherein J target video frames corresponding to the J to-be-processed video frames form target videos.
With reference to the second aspect, in one possible implementation manner, the location information obtaining module of the first to-be-processed image is specifically configured to: according to the information recorded by the motion sensor, obtaining sensor information of imaging time of a first to-be-processed image in K to-be-processed images in an mth to-be-processed video frame in J to-be-processed video frames by the shooting device through an interpolation function; and integrating the sensor information of the imaging moment of the first image to be processed to obtain the position information of the imaging moment of the first image to be processed.
With reference to the second aspect, in one possible implementation manner, the motion sensor includes a gyroscope or an inertial measurement unit.
With reference to the second aspect, in one possible implementation manner, the first transformation matrix determining module is specifically configured to: calculating an ideal position information of the shooting device at the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame and a transformation matrix R between the position information of the shooting device at the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame; transform matrix R, according to formula TRT -1 And transforming to obtain a first transformation matrix, wherein T is a parameter matrix of the shooting device.
With reference to the second aspect, in one possible implementation manner, the fusion module is specifically configured to: carrying out affine transformation on an nth to-be-processed image in an mth to-be-processed video frame through a corresponding first transformation matrix to obtain an image after affine transformation of the nth to-be-processed image in the mth to-be-processed video frame; inputting the affine transformed image into an HDR fusion module to generate a target video frame corresponding to the mth video frame to be processed; or, inputting the nth to-be-processed image and the corresponding first transformation matrix in the mth to-be-processed video frame to the HDR fusion module at the same time to generate a target video frame corresponding to the mth to-be-processed video frame.
With reference to the second aspect, in one possible implementation manner, each image to be processed is a RAW image.
With reference to the second aspect, in one possible implementation manner, the K images to be processed include a first exposure image, a second exposure image, and a third exposure image, where the first exposure image, the second exposure image, and the third exposure image are in one-to-one correspondence with a first exposure duration, a second exposure duration, and a third exposure duration, the first exposure duration is longer than the second exposure duration, the second exposure duration is longer than the third exposure duration, and the second exposure image is the first image to be processed.
In a third aspect, the present application provides a video processing apparatus comprising: a memory and a processor; the memory is used for storing program instructions; the processor is configured to invoke program instructions in the memory to perform the video processing method according to the first aspect or any of the possible implementation manners of the first aspect.
In a fourth aspect, the present application provides a chip comprising at least one processor and a communication interface, the communication interface and the at least one processor being interconnected by a line, the at least one processor being configured to execute a computer program or instructions to perform a video processing method as described in the first aspect or any one of the possible implementations thereof.
In a fifth aspect, the present application provides a computer readable medium storing program code for execution by a device, the program code comprising instructions for performing the video processing method of the first aspect or any one of the possible implementations thereof.
In a sixth aspect, the present application provides a computer program product comprising instructions, the computer program product comprising computer program code which, when run on a computer, causes the computer to perform the video processing method according to the first aspect or any one of the possible implementations thereof.
Drawings
FIG. 1 is a schematic diagram of an imaging system according to one embodiment of the present application;
fig. 2 is a schematic structural diagram of a video processing method according to an embodiment of the present application;
FIG. 3 is a schematic flow chart of a video processing method according to an embodiment of the present application;
FIG. 4 is a schematic diagram of obtaining images to be processed with different exposure durations according to an embodiment of the present application;
FIG. 5 is a schematic diagram of an estimated motion curve according to one embodiment of the present application;
FIG. 6 is a schematic flow chart of a video processing method according to another embodiment of the present application;
fig. 7 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of a video processing apparatus according to another embodiment of the present application.
Detailed Description
For the purpose of understanding, the relevant terms to which the present application relates will be first described.
1. Dynamic range
Dynamic range in digital imaging systems generally represents the ratio of the maximum to minimum of two points in an image. In most application scenes, the larger the dynamic range of the digital imaging system is, the wider the range of the illumination intensity which can be detected by the imaging system is, and the more abundant scene details in the photographed image are.
In general, the amount of charge stored in the photosensitive element of an image sensor determines the dynamic range of the sensor device, and the image sensor reaches a maximum saturation capacity when in a saturated exposure state, i.e., no more electrons are acceptable no matter how the exposure is increased, and the photosensitive element is in a full charge capacity saturated state. The minimum exposure is also equal to the noise exposure, which is equivalent to the exposure of the image sensor with only its dark current in a completely black environment. At this time, the passing current of the photosensitive element is dark current. The images of the digital imaging system are output in digital form, so the storage of information is ultimately done digitally. An important component of the system is an analog-to-digital-converter (ADC). An important indicator of the ADC converter is its number of bits. For an 8-bit ADC converter, the minimum signal it can record is 1 and the maximum signal is 255, so its output dynamic range is 255.
The dynamic range of real scenes in nature spans very widely (10 -6 ~10 6 ) However, the dynamic range that our eyes can acquire can only cover a part of them, and the traditional digital image has more limited brightness information and dynamic range information due to the limitation of the acquisition mode and the image information storage format.
2. High dynamic range imaging techniques
When a picture is taken with a camera or a common photographing device such as a mobile phone, the dynamic range of the obtained picture is far lower than that contained in a real scene. The former belongs to a low dynamic range (low dynamic range, LDR) image, the size is generally 256 units, and the latter belongs to a high dynamic range (high dynamic range, HDR) scene, which can reach 10 6 Units of. Therefore, when an image is photographed, a brighter or darker area in the real scene will exhibit a saturation phenomenon, i.e., full black or full white, in the photographed image, thereby causing loss of image information.
The high dynamic range imaging technology can solve the problem of the gap between the real scene and the dynamic range of the shot image, better grasp details in the real scene and mainly can be divided into hardware and software. The hardware method is generally implemented by adopting an imaging device with a special sensor or adopting a plurality of imaging devices at the same time, and the dynamic range of the imaging device is obviously improved compared with that of a common camera, but the imaging device cannot be compared with that of a natural scene. These methods are limited by hardware, exposure speed, resolution, etc., are highly demanding for hardware, and are too expensive for most people.
The multi-exposure image fusion technique is the most dominant method of obtaining HDR images based on software. On the premise of not changing hardware, the same scene is subjected to different exposures for a plurality of times by adjusting the aperture and the exposure time of the camera, and then the plurality of exposures are properly fused to obtain an HDR image of which the dynamic range of the target scene is reproduced. The exposure fusion algorithm can be divided into two types according to the different operation fields of the exposure fusion algorithm: a radiation domain fusion algorithm and an image domain fusion algorithm. The classical fusion algorithm is a radiation domain fusion algorithm, which firstly calculates a camera response function through the information such as aperture, exposure time and the like to obtain the real radiation values of all pixel points of an imaging scene, namely, generates an HDR image corresponding to an LDR image obtained by a camera, and then applies tone mapping to carry out nonlinear mapping on the obtained HDR image so as to enable the HDR image to be displayed on common LDR equipment. The exposure fusion in the image domain is to directly perform fusion operation on the pixel values of the image, and the common image which can be directly displayed on the LDR equipment is obtained without recovering the camera response function (camera response function, RCF) and recovering the radiation value. The radiation domain fusion algorithm can restore the dynamic range of a scene more truly, and has been well applied to various image processing software. However, since the camera response function calculation is sensitive to image noise and image sequence registration errors, the operation is not easy. The image fusion algorithm bypasses camera CRF estimation and directly fuses pixel values, so that the performance is more stable, the fusion process is simpler, and the operation cost is relatively low.
3. Ghost image
When the HDR image is obtained using a multi-exposure image fusion technique, the target scene or camera that needs to be photographed remains stationary. Once the scene changes during shooting, for example, moving objects break in or camera shake occurs, blurred or semitransparent images, called ghosts, appear in the area where the scene changes in the finally obtained fused image. Since most of outdoor photographed scenes are dynamic scenes, moving objects are hardly avoided, and thus ghosting is very easy to occur.
4. Video image stabilizing technology
With rapid development of electronic technology, users can take video shots through mobile terminals, such as cellular phones, tablet computers, digital cameras, and hand-held video cameras. However, in the shooting process, due to the influence of factors such as shooting skill and shooting environment, shaking of shooting equipment may be caused, so that the shot video has unstable picture, normal watching of a user is affected, and therefore, the video needs to be subjected to image stabilizing processing. Video stabilization is a technique that re-modifies and aligns a sequence of moving images acquired by a randomly dithered or randomly moving camera so that it is more smoothly displayed on a display. The method eliminates or reduces irregular distortion conditions such as translation, rotation, scaling and the like between image sequences, improves the quality of pictures, and is more suitable for processing operations such as target detection, tracking, recognition and the like in intelligent video analysis.
Currently, video stabilization methods include mechanical, optical, and digital stabilization methods.
Mechanical video stabilization uses motion detected by special sensors (such as gyroscopes and accelerometers) to move the image sensor to compensate for camera motion.
Optical video stabilization is accomplished by moving portions of the lens. This approach does not move the entire camera, but rather uses a movable lens assembly that can variably adjust the path length of the light as it passes through the camera's lens system.
Digital video stabilization does not require special sensors to estimate camera motion, and mainly includes three steps: motion estimation, motion smoothing, and jitter correction. Wherein, the motion estimation is to estimate the motion information of the video image. The motion smoothing is to smooth the motion information of the estimated video image to obtain a new smooth video image motion trail. The jitter correction is to obtain compensation information of the current video frame according to the estimated motion trail of the video image and the motion trail after smoothing, and correct the current video frame.
5. Original image
The RAW image is an image generated by a digital camera, light is irradiated to a photosensitive element through a lens, and is converted into an image file in which an electronic signal with image data is directly stored without any processing, so that the RAW image is formed, which records the most original data obtained by the photosensitive element and is not the image file directly generated after encoding and compression.
Since the data in such an image has not been processed, printed or used for editing, typically with a relatively wide internal color, only precise adjustments may be made, or some simple modifications may be made prior to conversion.
From traditional film cameras to digital cameras of today, images are also accelerating into the digital era as one of the most dominant carriers for people to obtain information about the external environment. The image is generally converted into a digitally stored and processed image and is generally referred to as a digital image.
Fig. 1 is a schematic diagram of a digital imaging system according to an embodiment of the present application. As shown in fig. 1, the digital imaging system of the present application may include a lens 102, a photosensor 103, an analog-to-digital-converter (ADC) 104, an image signal processor 105, and a memory 106.
For the digital imaging system shown in fig. 1, the imaging process consists essentially of the following stages: first, the lens 102 converges Jiao Guangxian and transmits the light ray 101 to the photosensor 103, then the photosensor 103 converts the information of the light ray 101 into an analog electrical signal, and then the ADC converter 104 converts the analog electrical signal into a digital signal, and the image after ADC conversion is called a RAW image, and the pixel value of the RAW image at this time is substantially linear with the intensity of the ambient light, which is quite close to the HDR image, except that it does not have a sufficiently high dynamic range. After the RAW image is taken, a series of operations of image signal processing (image signal processing, ISP) 105 are also required, and finally an LDR image which can be stored in the memory 106 and displayed on the display screen is obtained.
The method of the ISP can comprise the following steps: the most basic correction methods are mainly two kinds of white balance, and the operation is to enable white perceived by human eyes to accurately appear as white in an image based on gray world assumption and whiteness world assumption; the mosaic is solved, the general photosensitive element only records that the obtained information is single-channel, and the information of three RGB channels is arranged in Bayer mode, so that the specific numerical value of the corresponding RGB channel is solved by interpolation; noise suppression, namely after three RGB channels are complemented, further suppression processing of various noise introduced in the camera imaging process is needed; color space conversion, converting the image information from the RGB color space of the sensor to a standard color space (standard red green blue, SRGB), which can then be converted from the SRGB system to various display-related RGB systems; tone mapping, after obtaining an image of a standard SRGB system, in order to enable the image to be displayed on a display device more naturally, a non-linearization process of compressing or stretching pixel values which are originally in a linear relation with ambient light is needed to be carried out through a tone mapping curve; and compressing and quantizing the image, compressing the image file into a JPEG format, and outputting the JPEG format to obtain the LDR image which can be finally displayed.
The digital imaging system shown in fig. 1 may also be used to capture video comprising a plurality of video frames. For example, the digital imaging system can shoot LDR images with different exposure time lengths for each video frame, then utilizes a multi-exposure fusion algorithm to fuse the LDR images with different exposure time lengths to obtain an initial HDR image corresponding to each video frame, then can obtain an enhanced HDR image through ISP methods such as denoising, white balance, color correction and the like, and finally carries out video image stabilizing operation on an HDR video composed of a plurality of HDR images corresponding to the plurality of video frames to obtain the HDR video with certain stability.
However, in the above method for obtaining HDR video, to obtain most of information in a scene for each frame of HDR image in HDR video, it is necessary to assume that the camera and the photographed scene are stationary. However, in actual photographing, movement of the camera and the object in the scene is unavoidable, which may cause ghosting in the HDR image obtained after fusion. At this time, if the HDR image of a certain frame contains ghosting, the ghosting may be further amplified in the subsequent video stabilizing operation, so that an unacceptable floater phenomenon occurs in the final HDR video, which affects the visual experience of the user.
As an example, it is assumed that when an HDR video is generated, LDR images of different exposure durations include one long-exposure image and one short-exposure image, and become one HDR image after fusion with the long-exposure image or the short-exposure image as a reference image. Then, when the whole HDR video sequence is corrected subsequently, since the fused HDR image is used as each frame, if a pixel point in the image corresponds to a pixel value of the short exposure image and a pixel point in the image corresponds to a value of the long exposure image, the transformation value of the point is increased, and if the pixel point is a ghost point, the ghost of the point in the image is amplified.
In view of this, the present application proposes a new video processing scheme. In the technical scheme provided by the application, the video image stabilizing operation is put before the HDR fusion. As shown in fig. 2, the image data of the video includes J frames, each frame includes a plurality of exposure images, and then video image stabilization operation is performed, which includes camera motion vector estimation, camera motion curve smoothing and calculation of an image transformation matrix. After the video image stabilizing operation, performing HDR fusion on the plurality of exposure images in each frame to obtain an HDR image. Optionally, after the HDR image is obtained, ISP processing may be further performed on each HDR image, and the specific implementation process is not described herein.
In the technical scheme of the application, one implementation mode of video image stabilizing operation is as follows: acquiring actual camera position information corresponding to a reference image in each video frame in a video; estimating an ideal motion curve of the camera based on actual camera position information corresponding to reference images of all video frames in the video, and acquiring ideal camera position information corresponding to each video frame from the ideal motion curve; and calculating a transformation matrix capable of correcting the position information of the imaging moments of all the images to be processed in the video to be processed according to the actual camera position information of the images with different exposure time lengths in each video frame and the ideal camera position information of the video frame, and performing HDR fusion on the images with different exposure time lengths based on the transformation matrix.
Furthermore, in the technical scheme of the application, the intra-frame alignment of the video frames can be performed. An implementation of intra-frame alignment is as follows: for each exposure image in each video frame, a transformation matrix capable of correcting the position information of the imaging moment of each line in each exposure image is calculated by the position information of the imaging moment of the camera at each line in the exposure image and the ideal position information of the camera at the video frame, so that each line in each exposure image can be aligned to an image photographed by the camera at the ideal position of the video frame by the transformation matrix.
Fig. 3 is a schematic flowchart of a video processing method according to an embodiment of the present application. As shown in fig. 3, the method of the present embodiment may include S301, S302, S303, S304, S305, and S306. The video processing method may be performed by the digital imaging system shown in fig. 1.
S301, acquiring a video to be processed, wherein the video to be processed comprises J video frames to be processed, each video frame to be processed in the J video frames to be processed comprises K images to be processed, the K images to be processed are in one-to-one correspondence with K exposure time lengths, and J and K are integers larger than 1.
It should be understood that the K images to be processed are in one-to-one correspondence with the K exposure durations, that is, different exposure durations, and the images to be processed are different.
In this embodiment, the video to be processed indicates a video before each frame in the video is not fused into an HDR image, where the video to be processed may include J video frames to be processed, and each video frame to be processed includes K images to be processed with different exposure durations. For example, the video to be processed includes 30 video frames to be processed, and each video frame to be processed includes 3 images to be processed with different exposure durations.
In one implementation, K images of different exposure durations may be obtained by HDR capable sensors. It is noted that, the HDR-capable sensor is capable of obtaining an LDR image for HDR fusion, and the specific implementation process thereof may be described with reference to the related art, which is not described herein.
As an example, fig. 4 is an image of 2 different exposure durations obtained based on an HDR capable sensor with alternating long and short exposures provided by one embodiment of the present application. As shown in fig. 4, each frame includes h lines of images, when images of different exposure durations are obtained, long exposure is performed on each line, then short exposure is performed, and when the long exposure and the short exposure of the last line are completed, the long exposure image and the short exposure image in each frame are finally obtained.
Alternatively, the format of the long-exposure image and the short-exposure image may be Bayer (Bayer) format, which is not limited by the embodiment of the present application.
It is noted that the implementation process of obtaining images with different exposure durations using a sensor with HDR capability may be described with reference to the related art, and will not be described herein.
S302, acquiring position information of imaging time of a first to-be-processed image in K to-be-processed images in an mth to-be-processed video frame in J to-be-processed video frames by a shooting device of the to-be-processed video, wherein m is an integer and is taken from 1 to J.
For example, the capturing device of the video to be processed may be a camera, or other capturing devices capable of capturing the video to be processed, which is not limited in the embodiment of the present application.
In this embodiment, m is taken from 1 to J, that is, the position information of the imaging device of the video to be processed at the imaging time of the first image to be processed of the K images to be processed is acquired in each frame of the video to be processed.
It should be understood that each video frame to be processed in this embodiment includes K images to be processed, and each image to be processed also has its corresponding imaging moment, and the first image to be processed is one of the K images to be processed. For example, each video frame to be processed includes an image of a first exposure period, an image of a second exposure period, and an image of a third exposure period, with the image of the second exposure period as a first image to be processed.
It should also be understood that when capturing K images to be processed in each video frame to be processed, each image to be processed has the corresponding positional information of the capturing device at the imaging time, so that the positional information of the capturing device at the imaging time of the first image to be processed can also be obtained.
S303, estimating a motion curve of the shooting device when shooting the video to be processed according to the position information of the shooting device at the imaging moment of the first image to be processed, wherein the first image to be processed corresponds to all the video frames to be processed in the J video frames to be processed respectively.
In this embodiment, all the to-be-processed video frames in the J to-be-processed videos include a corresponding first to-be-processed image, and after the position information of the photographing device at the imaging time of the first to-be-processed image of all the to-be-processed video frames is obtained, the position information of the photographing device corresponding to all the first to-be-processed image may be estimated, so as to obtain the motion curve of the to-be-processed video.
It should be appreciated that the motion profile is estimated from the position information of the camera at the imaging instant of the first image to be processed in all the video frames to be processed, and thus the motion profile may be considered as an ideal motion profile of the camera at different imaging instants.
In one implementation manner, position information corresponding to imaging moments of first to-be-processed images of all to-be-processed video frames in the J to-be-processed video frames by the shooting device can be fitted, and a motion curve of the shooting device when shooting the to-be-processed video is obtained.
As an example, as shown in fig. 5, the abscissa represents the imaging time of the first to-be-processed image in each to-be-processed video frame, denoted by T i The ordinate represents position information of the imaging device. The black dots in the figure represent the position information of the photographing device corresponding to the imaging time of the first image to be processed in each of the video frames to be processed, e.g. the imaging time T of the first image to be processed in the first video frame to be processed 1 The corresponding position information of the shooting device is P 1 The first image to be processed in the second video frame to be processed is at the imaging moment T 2 The corresponding position information of the shooting device is P 2 The first image to be processed in the third video frame to be processed is at imaging instant T 3 The corresponding position information of the shooting device is P 3 And so on, the J-th view to be processedThe first image to be processed in the frequency frame is at the imaging instant T J The corresponding position information of the shooting device is P J Then, by fitting the position information corresponding to the first imaging moment to be processed in all the video frames to be processed, estimating the motion curve of the shooting device in all the video frames to be processed, namely by P in the graph 1 、P 2 、P 3 And so on to P J And fitting and estimating a motion curve.
In this description, in the fitting process, the denoising process may be performed on the position information of the photographing device corresponding to the first to-be-processed image in each to-be-processed video frame, then the modeling is performed on the position information, and further the modeled parameters are estimated to obtain the motion curve.
S304, determining the position information of the imaging moment of the first to-be-processed image in the mth to-be-processed video frame on the motion curve as ideal position information of the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame, wherein n is an integer and is taken from 1 to K.
In this embodiment, n is an integer and is taken from 1 to K, that is, the position information of the imaging time of the first to-be-processed image in the mth to-be-processed video frame of the photographing device on the motion curve is determined as the ideal position information of the imaging time of the K to-be-processed images of the photographing device in the mth to-be-processed video frame.
When the motion curve of the photographing device is obtained, there is also a corresponding position information of the photographing device on the motion curve at the imaging time of the first to-be-processed image in each to-be-processed video frame, for example, as shown in fig. 4, where the white point in the figure represents the position information of the photographing device corresponding to the imaging time of the first to-be-processed image in each to-be-processed video frame on the motion curve, for example, the first to-be-processed image in the first to-be-processed video frame is at the imaging time T 1 The corresponding position information of the shooting device on the motion curve is Q 1 The first image to be processed in the second video frame to be processed is at the imaging moment T 2 Corresponding imaging device on motion curveThe position information is Q 2 The first image to be processed in the third video frame to be processed is at imaging instant T 3 The corresponding position information of the shooting device on the motion curve is Q 3 And so on, the first image to be processed in the J th video frame to be processed is at the imaging moment T J The corresponding position information of the shooting device on the motion curve is Q J
In this embodiment, the position information of the imaging time of the first to-be-processed image in the mth to-be-processed video frame on the motion curve of the photographing device is determined as ideal position information of the imaging time of the K to-be-processed images in the mth to-be-processed video frame of the photographing device, and further, the ideal position information is the ideal position information of the imaging time of the K to-be-processed images in each to-be-processed video of the photographing device. For example, the imaging moment T of the first to-be-processed image in the 1 st to-be-processed video frame of the camera on the motion curve 1 Position information Q of (2) 1 And determining ideal position information of the shooting device corresponding to the imaging time of all the images to be processed in the first video frame to be processed.
S305, determining a first transformation matrix between the position information of the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame and the ideal position information of the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame according to the ideal position information of the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame and the position information of the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame.
As can be seen from S304, the ideal position information of the imaging moment of all the images to be processed in the mth video frame to be processed by the camera is the same, and the ideal position information is the position information of the imaging moment of the first image to be processed in the mth video frame to be processed by the camera on the motion curve.
It should be understood that the imaging time of each of the images to be processed in each of the video frames to be processed has the position information of the corresponding photographing device, which is inevitably affected by the shake of the photographing device, so that the photographing device deviates from the ideal position information at the time of photographing, and therefore, in order to correct the position information corresponding to the imaging time of each of the images to be processed to the position where the photographing device photographs under the ideal position information, it is necessary to calculate the first transformation matrix between the position information of the imaging time of each of the images to be processed in the m-th video frame to the image photographed under the ideal position information photographed in the m-th video frame, which can align the image photographed due to the shake to the image photographed at the ideal position of the photographing device, thereby completing the inter-frame alignment.
Here, the calculation of the first transformation matrix between the position information corresponding to each image to be processed and the ideal position information of the image to be processed may be described with reference to related art, which is not described herein again.
S306, performing high dynamic range HDR fusion processing on K to-be-processed images in the m-th to-be-processed video frame according to a first transformation matrix between the position information of the imaging moment of the n-th to-be-processed image in the m-th to-be-processed video frame and the ideal position information of the imaging moment of the n-th to-be-processed image in the m-th to-be-processed video frame, so as to obtain a target video frame corresponding to the m-th to-be-processed video frame, wherein J target video frames corresponding to the J to-be-processed video frames form a target video.
In this embodiment, for each video frame to be processed, after a first transformation matrix between position information corresponding to imaging moments of K images to be processed and corresponding ideal position information is obtained, HDR fusion processing may be performed on the K images to be processed through the K images to be processed and the first transformation matrix corresponding to the K images to be processed, and the K images to be processed are fused into one HDR image, so as to obtain a target video frame corresponding to each video frame to be processed, and further, J target video frames form a target video.
In one implementation manner, for each video frame to be processed, affine transformation can be performed on the corresponding image to be processed by using K first transformation matrices to obtain images after affine transformation of the K images to be processed; then, the affine transformed image is input into an HDR fusion deep learning network to generate a fused HDR image, and the HDR image is used as a target video frame, and the specific implementation of the HDR image can be described by referring to the related technology and is not repeated here.
In this implementation, for each image to be processed in each video frame to be processed, affine transformation is first performed on the image to be processed by using a corresponding first transformation matrix, so that when the image to be processed is captured, an image deviating from ideal position information due to shake of a capturing device can be aligned to the ideal position information, and thus one calibration of the image to be processed can be completed.
In another implementation manner, for each video frame to be processed, K images to be processed and the corresponding first transformation matrix may be input into the HDR fusion network at the same time, to generate a fused HDR image, and take the HDR image as the target video frame. The specific implementation thereof may be described with reference to the related art, and will not be described herein.
In the video processing method provided by the embodiment of the application, firstly, through the position information of the shooting device at the imaging moment of the first to-be-processed image in all to-be-processed video frames, an ideal motion curve of the shooting device is estimated, the position information of the shooting device corresponding to the imaging moment of the first to-be-processed image on the ideal motion curve is used as the ideal position information of the shooting device of each to-be-processed video at the shooting time, then the first transformation matrix capable of correcting the position information of the imaging moment of all to-be-processed images to the ideal position information in each to-be-processed video is calculated, and finally, the target HDR image of each to-be-processed video frame is fused according to all to-be-processed images in each to-be-processed video frame and the corresponding first transformation matrix, and further, the target video is generated.
In the technical scheme of the application, the first transformation matrix between the position information of all the imaging moments of the to-be-processed images in each to-be-processed video frame and the ideal position information is calculated before the HDR fusion is carried out, namely, the video stabilizing image is placed before the HDR fusion, so that the amplification of ghosts in the video stabilizing image is avoided. Through the first transformation matrix, all the images to be processed in each video frame to be processed can be corrected to ideal positions, that is, before HDR fusion, the position information of all the images to be processed in each video frame to be processed is registered to the images shot by the shooting device at the same ideal position, so that all the images to be processed are registered once, and further registration fusion can be performed through a subsequent HDR fusion module, so that registration between coarse and fine images to be processed is formed. Therefore, the method also improves the precision of HDR fusion and reduces the probability of ghost occurrence.
As an alternative embodiment, when the image to be processed includes h rows of pixels, the step S305 may further include: determining a second transformation matrix from the position information of the imaging moment of the ith row of the nth to-be-processed image in the mth to the ideal position information of the imaging moment of the ith row of the nth to-be-processed image in the mth to-be-processed video frame according to the ideal position information of the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame of the shooting device and the position information of the imaging moment of the ith row of the nth to-be-processed image in the mth to-be-processed video frame of the shooting device, wherein h is an integer greater than 1, i is an integer and is taken from 1 to h; accordingly, S306 includes: and performing high dynamic range HDR fusion processing on K to-be-processed images in the m to-be-processed video frames according to a first transformation matrix between the position information of the imaging moment of the n to-be-processed images in the m to-be-processed video frames and ideal position information of the imaging moment of the i row of the n to-be-processed images in the m to-be-processed video frames and a second transformation matrix between the position information of the imaging moment of the i row of the n to-be-processed images in the m to-be-processed video frames and ideal position information of the imaging moment of the i row of the n to-be-processed images in the m to-be-processed video frames, so as to obtain target video frames corresponding to the m to-be-processed video frames, wherein J target video frames corresponding to the J to-be-processed video frames form target videos.
The ideal position information of the imaging time of the nth image to be processed in the mth frame of video to be processed may refer to the related description of the embodiment shown in fig. 3, and will not be described herein again.
It should be understood that, for each image to be processed in each video frame to be processed, the photographing device also has a corresponding position information at the imaging time of each line of the image to be processed, and the position information is inevitably also affected by the shake of the photographing device, so that each line of the image to be processed obtained by the photographing device deviates from the ideal position information, and therefore, in order to correct the position information corresponding to the imaging time of each image to be processed in each line to the image photographed with the photographing device at the same ideal position information, it is also necessary to calculate a second transformation matrix between the position information corresponding to the imaging time of the i-th line of each image to be processed in the m-th video frame to be processed to the ideal position information.
After the second transformation matrix is obtained, high dynamic range HDR fusion processing can be performed on K to-be-processed images in the m to-be-processed video frames according to a first transformation matrix between the position information of the imaging moment of the n to-be-processed images in the m to-be-processed video frames and the corresponding ideal position information, and according to a second transformation matrix between the position information of the imaging moment of the i to-be-processed images in the m to-be-processed video frames and the ideal position information of the imaging moment of the corresponding i to-be-processed rows, so as to obtain the target video frames corresponding to the m to-be-processed video frames.
In this embodiment, when each image to be processed includes h rows, since each row corresponds to one second transformation matrix, the h rows correspond to h transformation matrices, so that each row of the image to be processed is corrected to an ideal position.
The specific implementation process of performing the high dynamic range HDR fusion processing on the K pending images in the mth pending video frame according to the first transformation matrix between the position information of the imaging time of the nth pending image in the mth pending video frame and the ideal position information, and the second transformation matrix between the position information corresponding to the imaging time of the ith line of the nth pending image in the mth pending video frame and the ideal position information may be described in reference to related technologies, which will not be repeated herein.
In the technical scheme provided by the embodiment of the application, besides the first transformation matrix for correcting each image to be processed to the ideal position, the second transformation matrix for correcting each row of each image to be processed to the ideal position is also calculated, so that under the condition that the whole image to be processed is corrected to the ideal position, each row of the image to be processed is also ensured to be corrected to the ideal position, the registration between inner rows of each image to be processed is completed, the accuracy of shooting each image to be processed is improved, and furthermore, the registration degree between the images to be processed in each video frame is improved, thereby reducing the ghost problem in fusion.
As an alternative embodiment, one implementation of S302 includes: according to the information recorded by the motion sensor, obtaining sensor information of imaging time of a first to-be-processed image in K to-be-processed images in an mth to-be-processed video frame in J to-be-processed video frames by the shooting device through an interpolation function; and integrating the sensor information of the imaging moment of the first image to be processed to obtain the position information of the imaging moment of the first image to be processed.
Wherein the information recorded by the motion sensor can represent motion information when shooting the video to be processed.
As an example, let the information recorded by the motion sensor at the nth sampling time be y (n)Indicating that the exposure time is t when the jth video frame is shot k Imaging time corresponding to the image imaging center of (2), let ∈ ->Information indicating the imaging time motion sensor is:
where N represents the number of points sampled by the motion sensor and f (·) represents the interpolation function. Taking a 100Hz gyroscope as an example, the gyroscope records the information of 100 camera rotations every 1 second, and the information recorded by the gyroscope at the sampling moment can be used for interpolating the sensor information at a certain moment.
Finally, integrating the sensor information to obtain the pose of the camera corresponding to the imaging moment
The specific implementation process of integrating the sensor information may be described with reference to related technologies, which are not described herein.
Here, the information recorded by the motion sensor in the embodiment of the present application may include information such as an angular velocity and an angle when the photographing device rotates, which is not limited in the embodiment of the present application.
Optionally, the motion sensor comprises a gyroscope or an inertial measurement unit.
As an alternative embodiment, S305 includes: calculating a transformation matrix R between ideal position information of the shooting device at the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame and position information of the shooting device at the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame; transform matrix R, according to formula TRT -1 And transforming to obtain a first transformation matrix, wherein T is a parameter matrix of the shooting device.
Taking fig. 5 as an example, for the first frame to be processed, the position information of the photographing device corresponding to the imaging time of the first image to be processed is P 1 The ideal position information of the shooting device corresponding to the imaging moment of the first image to be processed is Q 1 The transformation matrix of the position information of the shooting device corresponding to the imaging moment of the first image to be processed and the ideal position information is R1, and then the TRT is passed through -1 A transformation is performed from the transformation matrix R1 between the photographing devices to the first to-be-processed image to the first transformation matrix when photographing at the ideal position with the photographing device.
It is explained here that the transformation matrix R is calculated and the transformation matrix is calculated according to the formula TRT -1 The manner in which the conversion is performed may be described with reference to the related art, and will not be described in detail herein.
Alternatively, each image to be processed in the above-described embodiments may be a RAW image.
In some implementations of this embodiment, the K images to be processed include three images to be processed, which are respectively referred to as a first exposure image, a second exposure image, and a third exposure image, where the first exposure time period is longer than the second exposure time period, the second exposure time period is longer than the third exposure time period, and the second exposure image is used as the first image to be processed. The first exposure image may also be referred to as a long exposure image, the second exposure image may also be referred to as a medium exposure image, and the third exposure image may also be referred to as a short exposure image.
The video processing method according to the embodiment of the present application will be described in detail with reference to fig. 6 by taking an example that K images to be processed include three images to be processed, each of which is a RAW image.
As shown in fig. 6, the method of the present embodiment may include S601, S602, S603, S604, S605, S606, and S607. The video processing method may be performed by the digital imaging system shown in fig. 1.
S601, acquiring a video to be processed, wherein the video to be processed comprises J video frames to be processed, each video frame to be processed in the J video frames to be processed comprises 3 images to be processed, the 3 images to be processed are in one-to-one correspondence with 3 exposure time lengths, and J and K are integers larger than 1.
In this embodiment, each video frame to be processed includes 3 images to be processed, which are respectively a long exposure image, a medium exposure image and a short exposure image, and the detailed implementation process of this step may refer to S301, which is not described herein again.
S602, acquiring gyroscope data.
In this embodiment, the gyroscope data represents information recorded by the motion sensor.
And S603, acquiring the camera postures of the 3 images to be processed in each video frame to be processed at the corresponding imaging moments according to the gyroscope data.
In this embodiment, the camera gesture may be considered as position information of the photographing device, and may be specifically described with reference to the above embodiment, which is not described herein.
That is, positional information of imaging timings at which the camera respectively corresponds to the long-exposure image, the medium-exposure image, and the short-exposure image in each video frame to be processed is acquired.
In one implementation, the gyroscope data may be interpolated to integrate to obtain camera poses of the 3 images to be processed in each video frame to be processed at corresponding imaging moments, respectively.
In this step, a specific implementation process of how to obtain the camera pose may be described with reference to the above related embodiments, which are not described herein.
S604, taking the middle exposure images in all the video frames to be processed as reference frames, and estimating a motion curve when the camera shoots the video to be processed according to the camera pose of the imaging moment of the middle exposure images in all the frames.
In this embodiment, the motion curve of the video to be processed is estimated with reference to the positional information of the imaging timing of the mid-exposure image of the camera in each video frame to be processed.
In this step, the motion curve of the video to be processed captured by the camera is estimated, and reference may be made to the description in the embodiment shown in fig. 5, which is not repeated here.
S605 uses the position information of the imaging time of the mid-exposure image in each of the to-be-processed video frames on the motion curve as the ideal position information of the imaging time of the camera in the 3 to-be-processed images in each of the video frames.
In this embodiment, for each video frame to be processed, the ideal position information of the camera is referred to as the position information of the imaging moment of the medium exposure image on the motion curve, that is, in each video frame to be processed, when the long exposure image, the medium exposure image and the short exposure image are captured, the ideal position information of the camera is the position information of the imaging moment of the medium exposure image on the motion curve.
The relevant implementation process of this step may refer to the relevant description of S304, which is not described herein.
S606, calculating a first transformation matrix corresponding to the long exposure image, the medium exposure image and the short exposure image in each video frame to be processed respectively, and calculating a second transformation matrix corresponding to each line in each image to be processed in each video frame to be processed.
In this embodiment, in each video frame to be processed, the ideal position information of the camera is the position information corresponding to the image in the image to be processed at the imaging time on the motion curve.
It should be understood that when the camera shoots long exposure image, medium exposure image and short exposure image, the position information of the camera may be different when shooting the three images because the camera inevitably shakes, but there is corresponding ideal position information of the camera for each video frame to be processed, so in the embodiment of the application, the actual position information and ideal position information of the 3 images to be processed in each video frame to be processed at the corresponding imaging moment of the camera are calculated through the actual position information and ideal position information of the camera, corresponding to the long exposure image, medium exposure image and short exposure image, in each video frame to be processed, and the first transformation matrix corresponding to the image shot due to shake can be aligned to the image shot at the ideal position of the camera, thereby completing inter-frame alignment.
It should also be appreciated that the camera may not be identical in capturing each line of the long exposure image, the medium exposure image, and the short exposure image because of the unavoidable jitter of the camera. Therefore, in this embodiment, the second transformation matrix corresponding to each line in each of the images to be processed in each of the video frames to be processed is also calculated by the actual position information and the ideal position information of the camera at each line imaging time of each of the images to be processed in each of the video frames to be processed, and the second transformation matrix can align each line in the image captured due to occurrence of shake to each line captured at the ideal position of the capturing device, thereby completing intra-frame alignment.
The specific implementation process of calculating the first transformation matrix and the second transformation matrix may be described with reference to the related embodiments, which will not be described herein.
S607, according to the first transformation matrix corresponding to the 3 to-be-processed images in each to-be-processed video frame and the second transformation matrix corresponding to each line of each to-be-processed image in each to-be-processed video frame, performing high dynamic range HDR fusion processing on the 3 to-be-processed images in each to-be-processed video frame to obtain a target video frame corresponding to each to-be-processed video frame, and further, forming a target video by J target video frames.
In this embodiment, each video frame to be processed includes 3 images to be processed, where the 3 images to be processed respectively correspond to one first transformation matrix, and for each image to be processed in the 3 images to be processed, each line included in the 3 images to be processed also corresponds to one second transformation matrix, and if the image to be processed includes h lines, each image to be processed corresponds to h second transformation matrices, that is, each image to be processed corresponds to one first transformation matrix and h second transformation matrices.
After the first transformation matrix and the corresponding second transformation matrix of each image to be processed are obtained, 3 images to be processed in the video frame to be processed can be subjected to HDR fusion processing through the corresponding first transformation matrix and the corresponding second transformation matrix of each image to be processed.
The specific implementation process of the HDR fusion process may be S306 or described in related technologies, which are not described herein.
In the video processing method provided by the embodiment of the application, firstly, through the position information of the imaging moment of the middle exposure image of the shooting device in all the video frames to be processed, an ideal motion curve of the shooting device is estimated, the position information of the shooting device corresponding to the imaging moment of the middle exposure image on the ideal motion curve is used as the ideal position information of each video to be processed when shooting, then, a first transformation matrix for correcting the position information of the imaging moment of all the images to be processed to the ideal position information in each video to be processed is calculated, and a second transformation matrix for correcting each line on each image to be processed to the ideal position is calculated; and finally, according to the first transformation matrix and the second transformation matrix, fusing all the images to be processed in each video frame to be processed to obtain a target HDR image of each video frame to be processed, and further generating a target video.
Alternatively, the gyroscope in the present embodiment may be replaced with an inertial measurement unit.
Fig. 7 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present application. The video processing apparatus shown in fig. 7 may be used to perform the configuration method of the system parameters described in any of the foregoing embodiments.
As shown in fig. 7, the video processing apparatus 700 of the present embodiment includes: a video to be processed acquisition module 701, a position information acquisition module 702 of a first image to be processed, an estimation module 703, an ideal position information determination module 704, a first transformation matrix determination module 705 and a fusion module 706. The video to be processed acquisition module 701 is configured to acquire a video to be processed.
The first to-be-processed image position information obtaining module 702 is configured to obtain position information of an imaging moment of a first to-be-processed image of K to-be-processed images included in an mth to-be-processed video frame of J to-be-processed video frames by a shooting device of the to-be-processed video, where m is an integer and is taken from 1 to J.
The estimating module 703 is configured to estimate a motion curve when the capturing device captures the video to be processed according to position information of imaging moments of the first image to be processed, where the imaging moments correspond to all the video frames to be processed in the J video frames to be processed.
The ideal position information determining module 704 is configured to determine, as ideal position information of imaging time of the camera at the nth to-be-processed image in the mth to-be-processed video frame, position information of imaging time of the first to-be-processed image in the mth to-be-processed video frame on the motion curve, where n is an integer and is taken from 1 to K.
The first transformation matrix determining module 705 is configured to determine a first transformation matrix between the position information of the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame and the ideal position information of the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame according to the ideal position information of the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame by the photographing device and the position information of the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame.
The fusion module 706 is configured to perform high dynamic range HDR fusion processing on K to-be-processed images in the m-th to-be-processed video frame according to a first transformation matrix between the position information of the imaging time of the n-th to-be-processed image in the m-th to-be-processed video frame and the ideal position information of the imaging time of the n-th to-be-processed image in the m-th to-be-processed video frame, so as to obtain a target video frame corresponding to the m-th to-be-processed video frame, where J target video frames corresponding to the J to-be-processed video frames form a target video.
As an example, the pending video acquisition module 701 may be configured to perform the steps of acquiring a pending video in the video processing method described in any one of fig. 3 or fig. 6. For example, the pending video acquisition module 701 is configured to execute S301.
As another example, the estimation module 703 may be used to perform the step of estimating a motion curve when the photographing device photographs a video to be processed in the frequency processing method described in any one of fig. 3 or 6. The estimation module 703 is used to perform S303 or S602, for example.
As yet another example, the fusion module 706 may be configured to perform the step of performing HDR fusion processing on the to-be-processed image in each to-be-processed video frame in the video processing method described in any of fig. 3 or 6. For example, the fusion module 706 is used to perform S306 or S607.
In a possible implementation manner, the image to be processed includes h rows of pixels, and the apparatus further includes: a second transformation matrix determining module 707, configured to determine a second transformation matrix between the position information of the imaging moment of the ith row of the nth to-be-processed image in the mth to-be-processed video frame and the ideal position information of the imaging moment of the ith row of the nth to-be-processed image in the mth to-be-processed video frame according to the ideal position information of the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame and the position information of the imaging moment of the ith row of the nth to-be-processed image in the mth to-be-processed video frame, where h is an integer greater than 1, i is an integer and is taken from 1 to h; accordingly, the fusion module 706 is further configured to: and performing high dynamic range HDR fusion processing on K to-be-processed images in the m to-be-processed video frames according to a first transformation matrix between the position information of the imaging moment of the n to-be-processed images in the m to-be-processed video frames and ideal position information of the imaging moment of the i row of the n to-be-processed images in the m to-be-processed video frames and a second transformation matrix between the position information of the imaging moment of the i row of the n to-be-processed images in the m to-be-processed video frames and ideal position information of the imaging moment of the i row of the n to-be-processed images in the m to-be-processed video frames, so as to obtain target video frames corresponding to the m to-be-processed video frames, wherein J target video frames corresponding to the J to-be-processed video frames form target videos.
In one possible implementation manner, the location information obtaining module 702 of the first image to be processed is specifically configured to: according to the information recorded by the motion sensor, obtaining sensor information of imaging time of a first to-be-processed image in K to-be-processed images in an mth to-be-processed video frame of the J to-be-processed video frames by the shooting device through an interpolation function; and integrating the sensor information of the imaging moment of the first image to be processed to obtain the position information of the imaging moment of the first image to be processed.
In one possible implementation, the motion sensor comprises a gyroscope or an inertial measurement unit.
In one possible implementation, the first transformation matrixThe determining module 705 is specifically configured to: calculating a transformation matrix R between ideal position information of the shooting device at the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame and position information of the shooting device at the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame; transform matrix R, according to formula TRT -1 And transforming to obtain a first transformation matrix, wherein T is a parameter matrix of the shooting device.
In one possible implementation, the fusion module 706 is specifically configured to: carrying out affine transformation on an nth to-be-processed image in an mth to-be-processed video frame through a corresponding first transformation matrix to obtain an image after affine transformation of the nth to-be-processed image in the mth to-be-processed video frame; inputting the affine transformed image into an HDR fusion module to generate a target video frame corresponding to the mth video frame to be processed; or, inputting the nth to-be-processed image and the corresponding first transformation matrix in the mth to-be-processed video frame to the HDR fusion module at the same time to generate a target video frame corresponding to the mth to-be-processed video frame.
In one possible implementation, each of the images to be processed is a native RAW image.
In one possible implementation manner, the K images to be processed include a first exposure image, a second exposure image and a third exposure image, where the first exposure image, the second exposure image and the third exposure image are in one-to-one correspondence with a first exposure duration, a second exposure duration and a third exposure duration, the first exposure duration is longer than the second exposure duration, the second exposure duration is longer than the third exposure duration, and the second exposure image is the first image to be processed.
Fig. 8 is a schematic structural diagram of a video processing apparatus according to another embodiment of the present application. The apparatus shown in fig. 8 may be used to perform the video processing method described in any of the foregoing embodiments.
As shown in fig. 8, the apparatus 800 of the present embodiment includes: a memory 801, a processor 802, a communication interface 803, and a bus 804. Wherein the memory 801, the processor 802, and the communication interface 803 are communicatively connected to each other through a bus 804.
The memory 801 may be a Read Only Memory (ROM), a static storage device, a dynamic storage device, or a random access memory (random access memory, RAM). The memory 801 may store a program, and the processor 802 is configured to perform the steps of the method shown in fig. 3 when the program stored in the memory 801 is executed by the processor 802.
The processor 802 may employ a general-purpose central processing unit (central processing unit, CPU), microprocessor, application specific integrated circuit (application specific integrated circuit, ASIC), or one or more integrated circuits for executing associated programs to perform the methods of the various embodiments of the present application.
The processor 802 may also be an integrated circuit chip with signal processing capabilities. In implementation, various steps of methods of various embodiments of the application may be performed by integrated logic circuitry in hardware or by instructions in software in processor 802.
The processor 802 may also be a general purpose processor, a digital signal processor (digital signal processing, DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The steps of the method disclosed in connection with the embodiments of the present application may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 801, and the processor 802 reads the information in the memory 801, and in combination with its hardware, performs the functions that the unit comprised by the temperature measuring device of the present application needs to perform, for example, the steps/functions of the embodiments shown in fig. 3 or fig. 6 can be performed.
Communication interface 803 may enable communication between apparatus 800 and other devices or communication networks using, but is not limited to, a transceiver-like transceiver.
Bus 804 may include a path for transferring information between components of apparatus 800 (e.g., memory 801, processor 802, communication interface 803).
It should be understood that the apparatus 800 shown in the embodiment of the present application may be an electronic device, or may be a chip configured in an electronic device.
It is to be appreciated that the processor in embodiments of the application may be a central processing unit (central processing unit, CPU), but may also be other general purpose processors, digital signal processors (digital signal processor, DSP), application specific integrated circuits (application specific integrated circuit, ASIC), off-the-shelf programmable gate arrays (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
It should also be appreciated that the memory in embodiments of the present application may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. The volatile memory may be random access memory (random access memory, RAM) which acts as an external cache. By way of example but not limitation, many forms of random access memory (random access memory, RAM) are available, such as Static RAM (SRAM), dynamic Random Access Memory (DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate synchronous dynamic random access memory (DDR SDRAM), enhanced Synchronous Dynamic Random Access Memory (ESDRAM), synchronous Link DRAM (SLDRAM), and direct memory bus RAM (DR RAM).
The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions or computer programs. When the computer instructions or computer program are loaded or executed on a computer, the processes or functions described in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wired (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more sets of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium may be a solid state disk.
It should be understood that the term "and/or" is merely an association relationship describing the associated object, and means that three relationships may exist, for example, a and/or B may mean: there are three cases, a alone, a and B together, and B alone, wherein a, B may be singular or plural. In addition, the character "/" herein generally indicates that the associated object is an "or" relationship, but may also indicate an "and/or" relationship, and may be understood by referring to the context.
In the present application, "at least one" means one or more, and "a plurality" means two or more. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b, or c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or plural.
It should be understood that, in various embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present application.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.
In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a mobile hard disk, a read-only memory, a random access memory, a magnetic disk or an optical disk.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (12)

1. A video processing method, comprising:
obtaining a video to be processed, wherein the video to be processed comprises J video frames to be processed, each video frame to be processed in the J video frames to be processed comprises K images to be processed, the K images to be processed are in one-to-one correspondence with K exposure time lengths, and the J and the K are integers larger than 1;
acquiring position information of imaging time of a first to-be-processed image in K to-be-processed images contained in an mth to-be-processed video frame in the J to-be-processed video frames by a shooting device of the to-be-processed video, wherein m is an integer and is taken from 1 to J;
estimating a motion curve of the shooting device when shooting the video to be processed according to the position information of the shooting device at the imaging moment of the first image to be processed, which corresponds to all the video frames to be processed in the J video frames to be processed respectively;
Determining the position information of the shooting device at the imaging moment of a first to-be-processed image in the mth to-be-processed video frame on a motion curve as ideal position information of the shooting device at the imaging moment of an nth to-be-processed image in the mth to-be-processed video frame, wherein n is an integer and is taken from 1 to K;
determining a first transformation matrix between the position information of the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame and the ideal position information of the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame according to the ideal position information of the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame by the shooting device and the position information of the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame by the shooting device;
according to a first transformation matrix between the position information of the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame and the ideal position information of the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame, carrying out High Dynamic Range (HDR) fusion processing on K to-be-processed images in the mth to-be-processed video frame to obtain a target video frame corresponding to the mth to-be-processed video frame, wherein J target video frames corresponding to the J to-be-processed video frames form a target video.
2. The method of claim 1, wherein the image to be processed comprises h rows of pixels, the method further comprising:
determining a second transformation matrix from the position information of the imaging moment of the ith row of the nth to-be-processed image in the mth to the ideal position information of the imaging moment of the ith row of the nth to-be-processed image in the mth to-be-processed video frame according to the ideal position information of the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame and the position information of the imaging moment of the ith row of the nth to-be-processed image in the mth to-be-processed video frame, wherein h is an integer greater than 1, i is an integer and is taken from 1 to h;
correspondingly, according to a first transformation matrix between the position information of the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame and the ideal position information of the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame, performing High Dynamic Range (HDR) fusion processing on the K to-be-processed images in the mth to-be-processed video frame to obtain a target video frame corresponding to the mth to-be-processed video frame, wherein J target video frames corresponding to the J to-be-processed video frames form a target video, and the method comprises the following steps:
And carrying out High Dynamic Range (HDR) fusion processing on K to-be-processed images in the m to-be-processed video frames according to a first transformation matrix between the position information of the imaging moment of the n to-be-processed images in the m to-be-processed video frames and ideal position information of the imaging moment of the i row of the n to-be-processed images in the m to-be-processed video frames and a second transformation matrix between the position information of the imaging moment of the i row of the n to-be-processed images in the m to-be-processed video frames and ideal position information of the imaging moment of the i row of the n to-be-processed images in the m to-be-processed video frames, so as to obtain target video frames corresponding to the m to-be-processed video frames, wherein J target video frames corresponding to the J to-be-processed video frames form a target video.
3. The method according to claim 1 or 2, wherein the acquiring the position information of the imaging time of the first to-be-processed image in the K to-be-processed images included in the mth to-be-processed video frame in the J to-be-processed video frames by the capturing device of the to-be-processed video includes:
according to the information recorded by the motion sensor, obtaining sensor information of imaging time of a first to-be-processed image in K to-be-processed images contained in an mth to-be-processed video frame in the J to-be-processed video frames by the shooting device through an interpolation function;
And integrating the sensor information of the imaging moment of the first image to be processed to obtain the position information of the imaging moment of the first image to be processed.
4. A method according to claim 3, wherein the motion sensor comprises a gyroscope or an inertial measurement unit.
5. The method according to any one of claims 1 to 4, wherein determining a first transformation matrix between the positional information of the imaging timing of the nth to-be-processed image in the mth to-be-processed video frame to the ideal positional information of the imaging timing of the nth to-be-processed image in the mth to-be-processed video frame based on the ideal positional information of the imaging timing of the nth to-be-processed image in the mth to-be-processed video frame by the photographing device and the positional information of the imaging timing of the nth to-be-processed image in the mth to-be-processed video frame, includes:
calculating a transformation matrix R between ideal position information of the shooting device at the imaging moment of an nth to-be-processed image in the mth to-be-processed video frame and position information of the shooting device at the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame;
The transformation matrix R is transformed according to the formula TRT -1 And transforming to obtain a first transformation matrix, wherein T is a parameter matrix of the shooting device.
6. The method according to any one of claims 1 to 4, wherein performing High Dynamic Range (HDR) fusion processing on K pending images in the mth pending video frame according to a first transformation matrix between actual position information corresponding to imaging time of the nth pending image in the mth pending video frame and corresponding ideal position information to obtain a target video frame corresponding to the mth pending video frame, includes:
carrying out affine transformation on an nth to-be-processed image in the mth to-be-processed video frame through a corresponding first transformation matrix to obtain an image after affine transformation of the nth to-be-processed image in the mth to-be-processed video frame;
inputting the affine transformed image into an (HDR) fusion module to generate a target video frame corresponding to the m-th video frame to be processed; or,
and simultaneously inputting an nth to-be-processed image in the mth to-be-processed video frame and a corresponding first transformation matrix into an (HDR) fusion module to generate a target video frame corresponding to the mth to-be-processed video frame.
7. The method of any one of claims 1 to 4, wherein each of the images to be processed is a RAW image.
8. The method according to any one of claims 1 to 4, wherein the K images to be processed include a first exposure image, a second exposure image, and a third exposure image, the first exposure image, the second exposure image, and the third exposure image being in one-to-one correspondence with a first exposure period, a second exposure period, and a third exposure period, the first exposure period being longer than the second exposure period, the second exposure period being longer than the third exposure period, and the second exposure image being the first image to be processed.
9. A video processing apparatus, characterized in that the apparatus comprises functional modules for performing the method according to any of claims 1 to 8.
10. A video processing apparatus, comprising: a memory and a processor;
the memory is used for storing program instructions;
the processor is configured to invoke program instructions in the memory to perform the video processing method of any of claims 1 to 8.
11. A chip comprising at least one processor and a communication interface, the communication interface and the at least one processor being interconnected by a wire, the at least one processor being configured to execute instructions to perform the method of any one of claims 1 to 8.
12. A computer readable medium, characterized in that the computer readable medium stores a program code for computer execution, the program code comprising instructions for performing the method of any of claims 1 to 8.
CN202110245949.8A 2021-03-05 2021-03-05 Video processing method and processing device Active CN115037915B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110245949.8A CN115037915B (en) 2021-03-05 2021-03-05 Video processing method and processing device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110245949.8A CN115037915B (en) 2021-03-05 2021-03-05 Video processing method and processing device

Publications (2)

Publication Number Publication Date
CN115037915A CN115037915A (en) 2022-09-09
CN115037915B true CN115037915B (en) 2023-11-14

Family

ID=83117755

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110245949.8A Active CN115037915B (en) 2021-03-05 2021-03-05 Video processing method and processing device

Country Status (1)

Country Link
CN (1) CN115037915B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102833471A (en) * 2011-06-15 2012-12-19 奥林巴斯映像株式会社 Imaging device and imaging method
CN110121882A (en) * 2017-10-13 2019-08-13 华为技术有限公司 A kind of image processing method and device
CN111479072A (en) * 2020-04-14 2020-07-31 深圳市道通智能航空技术有限公司 High dynamic range image synthesis method and device, image processing chip and aerial camera

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200211166A1 (en) * 2018-12-28 2020-07-02 Qualcomm Incorporated Methods and apparatus for motion compensation in high dynamic range processing

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102833471A (en) * 2011-06-15 2012-12-19 奥林巴斯映像株式会社 Imaging device and imaging method
CN110121882A (en) * 2017-10-13 2019-08-13 华为技术有限公司 A kind of image processing method and device
CN111479072A (en) * 2020-04-14 2020-07-31 深圳市道通智能航空技术有限公司 High dynamic range image synthesis method and device, image processing chip and aerial camera

Also Published As

Publication number Publication date
CN115037915A (en) 2022-09-09

Similar Documents

Publication Publication Date Title
US8581992B2 (en) Image capturing apparatus and camera shake correction method, and computer-readable medium
US8508619B2 (en) High dynamic range image generating apparatus and method
CN108012078B (en) Image brightness processing method and device, storage medium and electronic equipment
US8384805B2 (en) Image processing device, method, and computer-readable medium for executing pixel value correction in a synthesized image
JP4806476B2 (en) Image processing apparatus, image generation system, method, and program
JP6460653B2 (en) Image processing apparatus, imaging apparatus including the same, image processing method, and image processing program
CN110290323B (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
JP6308748B2 (en) Image processing apparatus, imaging apparatus, and image processing method
CN107911682B (en) Image white balance processing method, device, storage medium and electronic equipment
US8310553B2 (en) Image capturing device, image capturing method, and storage medium having stored therein image capturing program
CN110475067B (en) Image processing method and device, electronic equipment and computer readable storage medium
US20080101710A1 (en) Image processing device and imaging device
KR101120966B1 (en) A hand jitter reduction system for cameras
US20170310903A1 (en) Image capturing apparatus and method of controlling the same
US9589339B2 (en) Image processing apparatus and control method therefor
JP2015144475A (en) Imaging apparatus, control method of the same, program and storage medium
JP2018207497A (en) Image processing apparatus and image processing method, imaging apparatus, program, and storage medium
JP2022179514A (en) Control apparatus, imaging apparatus, control method, and program
CN107959841A (en) Image processing method, device, storage medium and electronic equipment
CN114979500A (en) Image processing method, image processing apparatus, electronic device, and readable storage medium
JP5713643B2 (en) IMAGING DEVICE, IMAGING DEVICE CONTROL METHOD, PROGRAM, AND STORAGE MEDIUM
JP2016111568A (en) Image blur correction control device, imaging apparatus, control method and program thereof
JP2010183460A (en) Image capturing apparatus and method of controlling the same
JP2006148550A (en) Image processor and imaging device
CN115037915B (en) Video processing method and processing device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant