CN115546042A - Video processing method and related equipment - Google Patents

Video processing method and related equipment Download PDF

Info

Publication number
CN115546042A
CN115546042A CN202210334216.6A CN202210334216A CN115546042A CN 115546042 A CN115546042 A CN 115546042A CN 202210334216 A CN202210334216 A CN 202210334216A CN 115546042 A CN115546042 A CN 115546042A
Authority
CN
China
Prior art keywords
frame
image
video stream
original
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210334216.6A
Other languages
Chinese (zh)
Other versions
CN115546042B (en
Inventor
张田田
王宇
李智琦
王宁
朱聪超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Honor Device Co Ltd
Original Assignee
Honor Device Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Honor Device Co Ltd filed Critical Honor Device Co Ltd
Priority to CN202210334216.6A priority Critical patent/CN115546042B/en
Publication of CN115546042A publication Critical patent/CN115546042A/en
Application granted granted Critical
Publication of CN115546042B publication Critical patent/CN115546042B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • G06T5/73
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/60Rotation of a whole image or part thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras

Abstract

The application provides a video processing method and related equipment thereof, relating to the field of video processing, wherein the video processing method comprises the following steps: acquiring an original video stream, wherein the original video stream comprises a plurality of frames of original images; determining a reference frame corresponding to an nth frame original image from a plurality of frames of original images, wherein n is more than or equal to 1, n is a positive integer, and the reference frame is clearer than the nth frame original image; replacing the original image of the nth frame by using the reference frame to generate a first video stream; and carrying out electronic image stabilization processing on the first video stream to generate a target video stream. The application combines the image replacement fuzzy image with the anti-shake processing by utilizing the relatively clear image, thereby maintaining the consistency of vision while improving the inter-frame definition of the anti-shake video.

Description

Video processing method and related device
Technical Field
The present application relates to the field of video processing, and in particular, to a video processing method and related device.
Background
With the widespread use of electronic devices, video recording by using electronic devices has become a daily behavior in people's life. Taking an electronic device as an example of a mobile phone, when people use the mobile phone to record videos, the recorded videos have inconsistent inter-frame definition due to hand shake, mobile phone movement and the like, that is, motion blur is generated.
In order to improve the definition of recorded video and reduce the influence of motion blur, various techniques for improving the video quality have been developed. For example, when a video is recorded in dark light requiring a long exposure, the exposure time is relatively reduced, or OIS controllers are added to the electronic device to mitigate the effects of motion blur. However, since the exposure time is limited to decrease, noise may be introduced, and the range of the OIS controller is limited, the above-mentioned prior art cannot completely remove the motion blur phenomenon occurring during the video recording process.
Therefore, how to completely and efficiently remove the motion blur becomes a problem which needs to be solved urgently.
Disclosure of Invention
The application provides a video processing method and related equipment, which can improve the inter-frame definition of an anti-shake video and simultaneously maintain the consistency of vision by combining the relatively clear image replacement of a blurred image with the anti-shake processing.
In order to achieve the purpose, the technical scheme is as follows:
in a first aspect, a video processing method is provided, including:
acquiring an original video stream, wherein the original video stream comprises a plurality of frames of original images;
determining a reference frame corresponding to the nth frame of original image from the multiple frames of original images, wherein n is more than or equal to 1, n is a positive integer, and the reference frame is clearer than the nth frame of original image;
replacing the original image of the nth frame with the reference frame to generate a first video stream;
and carrying out electronic image stabilization processing on the first video stream to generate a target video stream.
It should be appreciated that the electronic image stabilization process may be indicative of calculating motion between image frames in an image sequence from data acquired by the motion sensor during exposure of each image frame.
In the embodiment of the application, a first video stream is generated by determining a relatively clear reference frame corresponding to an original image in an original video stream and then replacing the original image with the reference frame; and performing electronic image stabilization processing on the first video stream to generate a target video stream. Because the blurred frame in the original video stream is replaced by the relatively clear reference frame, the definition of the target video stream can be improved, and because the pose transformation relation of the image in the target video stream is subjected to electronic image stabilization, the inter-frame definition of the anti-shake video can be improved, the consistency of vision is kept, and the user experience can be improved.
In a possible implementation manner of the first aspect, performing electronic image stabilization on the first video stream to generate a target video stream includes:
performing electronic image stabilization on the first video stream, and determining a first homography transformation matrix and a target translation amount which respectively correspond to a first image in the first video stream, wherein the first homography transformation matrix is used for representing an image rotation relationship, and the target translation amount is used for representing an image translation relationship;
and generating the target video stream according to the first video stream, the first homography transformation matrix and the target translation amount.
In this implementation, since electronic image stabilization is performed on the first video stream, the pose of the first image in the first video stream is transformed, and therefore, a first homography transformation matrix corresponding to the first image before and after the anti-shake process and a target translation amount need to be determined to represent a rotation relationship and a translation relationship corresponding to the first image in the first video stream before and after the anti-shake process.
In a possible implementation manner of the first aspect, the performing electronic image stabilization on the first video stream to generate a target video stream further includes:
determining target scaling factors respectively corresponding to the first images in the first video stream according to the original video stream and the first video stream, wherein the target scaling factors are used for representing image scaling relations;
determining a target video stream according to the first video stream, the first homography transformation matrix and the target translation amount, including:
and generating the target video stream according to the first video stream, the first homography transformation matrix, the target translation amount and the target scaling factor.
In this implementation, since there is also a scaling relationship between the original image of the original video stream and the first image of the first video stream, the scaling relationship needs to be considered when correcting the first video stream to maintain the consistency between video frames.
In a possible implementation form of the first aspect, the electronic device includes a gyroscope sensor and an Optical Image Stabilization (OIS) controller;
performing electronic image stabilization on the first video stream, and determining first homography transformation matrices respectively corresponding to first images in the first video stream, including:
acquiring a first gyroscope data stream with the gyroscope sensor and acquiring a first OIS data stream with the OIS controller; gyroscope data in the first gyroscope data stream corresponds to the first image in the first video stream one to one, and OIS data in the first OIS data stream corresponds to the first image in the first video stream one to one;
determining a first jitter path according to the first gyroscope data stream;
performing path smoothing processing on the first jitter path to determine a second jitter path;
determining a second gyroscope data stream according to the first gyroscope data stream and the second jitter path;
and determining first homography transformation matrixes respectively corresponding to the first images in the first video stream according to the first gyroscope data stream, the second gyroscope data stream and the first OIS data stream.
In this implementation, performing the path smoothing processing on the first jitter path of the first video stream is equivalent to performing anti-jitter. Based on the first gyroscope data and the second gyroscope data before and after the first video stream is subjected to anti-shake and the first OIS data before the first video stream is subjected to anti-shake, the first homography transformation matrix corresponding to the first image in the first video stream before and after the first video stream is subjected to anti-shake can be determined, namely the rotation relation corresponding to the first image before and after the first video stream is subjected to anti-shake.
In a possible implementation manner of the first aspect, determining, according to the first gyroscope data stream, the second gyroscope data stream, and the first OIS data stream, first homography transformation matrices respectively corresponding to the first image in the first video stream includes:
determining a rotation matrix corresponding to each frame of the first image according to the first gyroscope data stream and the second gyroscope data stream;
determining a first camera internal reference matrix corresponding to each frame of the first image according to the first OIS data stream, wherein the first camera internal reference matrix is used for indicating the corresponding camera internal reference matrix when the OIS controller is started;
using the formula H = KRK according to the rotation matrix and the first camera internal reference matrix ois -1 Determining the first homography transformation matrix corresponding to each frame of the first image;
wherein H represents the first homography transformation matrix, K represents a standard camera internal parameter, and R represents the rotation matrix; k ois -1 Representing an inverse of the first camera intra-reference matrix.
In the implementation mode, the rotation relationship corresponding to the first image before and after the anti-shake is determined based on the first gyroscope data and the second gyroscope data before and after the anti-shake of the first video stream and the first OIS data before the anti-shake.
In a possible implementation manner of the first aspect, performing electronic image stabilization on the first video stream, and determining target translation amounts respectively corresponding to the first images in the first video stream includes:
performing feature point detection on the first image in the first video stream, and determining first coordinates corresponding to a plurality of feature points in the first image respectively;
transforming the first coordinates corresponding to the plurality of feature points respectively by using the first homography transformation matrix, and determining second coordinates corresponding to the plurality of transformed feature points respectively;
in the first video stream, except for the first image of the 1 st frame, determining the translation amount corresponding to the first image of each frame according to second coordinates respectively corresponding to a plurality of feature points in the first image of each frame and the first image of the adjacent previous frame;
and performing path smoothing processing on all the translation amounts, and determining the target translation amounts respectively corresponding to the first images.
In this implementation, since the path smoothing processing on the first dithering path affects the translation relationship, before determining the translation amount, the first coordinates of the feature points of the first image need to be transformed by using the first homography transformation matrix to eliminate the influence caused by the rotation before and after the anti-dithering of the first image itself, and then the translation amount is determined by using the second coordinates of the first image with the rotation relationship eliminated and the second coordinates of the first image with the rotation relationship eliminated in the previous frame. Based on the above, path smoothing processing is performed on all the translation amounts to realize the continuity of the translation between frames.
In a possible implementation manner of the first aspect, determining, according to the original video stream and the first video stream, target scaling factors corresponding to the first images in the first video stream, respectively includes:
respectively detecting feature points of the n frame original image and the corresponding reference frame, and determining the feature points in the reference frame as reference feature points, wherein the feature points in the n frame original image are original feature points;
matching the n-th frame original image with the feature points detected by the corresponding reference frame to determine a plurality of pairs of feature points, wherein each pair of feature points comprises 1 reference feature point and 1 original feature point;
and determining the target scaling factor according to a plurality of pairs of the characteristic point pairs.
In this implementation, since the scaling relationship is not affected by the anti-shake processing, the scaling relationship corresponding to the reference frame may be determined by using the feature point pairs of the n-th frame original image and the reference frame (i.e. the first image in the first video stream).
In a possible implementation manner of the first aspect, determining the target scaling factor according to a plurality of pairs of the characteristic point pairs includes:
determining a second homography transformation matrix according to the n frame original image and the corresponding reference frame;
for each pair of feature points, determining original coordinates of the original feature points in the original image of the nth frame and determining first reference coordinates of the reference feature points in the reference frame;
transforming the first reference coordinate of the reference characteristic point by using the second homography transformation matrix, and determining a second reference coordinate corresponding to the transformed reference characteristic point;
determining 1 scaling factor by using a least square method according to any two pairs of the characteristic point pairs in the plurality of pairs of the characteristic point pairs and based on the original coordinates and the second reference coordinates of each pair of the characteristic point pairs;
repeatedly executing for a plurality of times, and determining a plurality of scaling factors;
determining an average of a plurality of the scaling factors as the target scaling factor.
In this implementation, although the definition of the reference frame is higher than that of the n-th frame original image, the reference frame and the n-th frame original image still have a certain rotation relationship, so that the rotation relationship between the reference frame and the n-th frame original image needs to be eliminated by using the second homography transformation matrix, and then the scaling amount is calculated by using the transformed reference frame and the n-th frame original image.
In a possible implementation manner of the first aspect, the method further includes:
determining an nth frame original image in the original video stream as a clear frame or a fuzzy frame;
if the nth frame original image is a fuzzy frame, determining the reference frame corresponding to the nth frame original image in other multiframe original images except the nth frame original image within a preset frame number range;
the preset frame number range comprises an n-k frame original image to an n + k frame original image, the reference frame is the clearest frame in the preset frame number range, k is more than or equal to 1, and k is a positive integer.
In the implementation mode, the clearest frame in the preset frame number range of the nth frame original image is screened out to be used as the first reference image, so that the definition of the nth frame original image can be improved to the greatest extent.
In a second aspect, an electronic device is provided that comprises means for performing the method of the first aspect or any one of the first aspects.
In a third aspect, an electronic device is provided, which includes a camera module, a processor and a memory;
the camera module is used for collecting a video stream, and the video stream comprises a plurality of frames of original images;
the memory for storing a computer program operable on the processor;
the processor is configured to perform the steps of the processing in any one of the first aspect or the first aspect.
In a fourth aspect, a chip is provided, which includes: and the processor is used for calling and running the computer program from the memory so that the device provided with the chip executes the steps processed in the method of the first aspect or the first aspect.
In a fifth aspect, there is provided a computer readable storage medium storing a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the steps of the processing of the first aspect or any of the methods of the first aspect.
In a sixth aspect, there is provided a computer program product comprising: computer program code which, when run by an electronic device, causes the electronic device to perform the steps of the processing of the method of the first aspect or any of the first aspects.
The embodiment of the application provides a video processing method and related equipment thereof, wherein a first video stream is generated by determining a fuzzy frame in an original video stream and a relatively clear reference frame in an adjacent range of the fuzzy frame, and then replacing the fuzzy frame with the reference frame; performing path smoothing on the first video stream, and rotating the feature point of the first image of the first video stream according to the rotation relationship by determining the rotation relationship between the first video stream and the first video stream after the path smoothing; then, according to the coordinates of the feature points of the first image in the first video stream and the coordinates of the feature points after the rotation correction, a translation relation is determined, and a scaling relation is determined by using the original video stream and the first video stream, so that the first video stream can be processed into a target video stream which is consistent with the scaling relation of the original video stream and is combined with the anti-shake processing according to the rotation relation, the translation relation and the scaling relation. Because the blurred frame in the first video is replaced by the clear frame, the definition of the first video stream can be improved, and because the image pose transformation relation in the first video stream is combined with anti-shake processing, the inter-frame definition of the anti-shake video can be improved, the consistency of vision is kept, and the user experience is further improved.
Drawings
FIG. 1 is a 2-frame image of a video recorded using the prior art;
fig. 2 is a schematic diagram of an application scenario provided in an embodiment of the present application;
fig. 3 is a schematic flowchart of a video processing method according to an embodiment of the present application;
fig. 4 is a schematic diagram of a first video stream provided by an embodiment of the present application;
fig. 5 is a schematic diagram of a first gyroscope data stream and a first OIS data stream provided in an embodiment of the present application;
fig. 6 is a schematic flowchart of determining a first homography transformation matrix corresponding to a first image according to an embodiment of the present disclosure;
FIG. 7 is a diagram illustrating a dithering path provided by an embodiment of the present application;
fig. 8 is a schematic flowchart of determining a first homography transformation matrix according to an embodiment of the present application;
FIG. 9 is a schematic flow chart for determining the amount of translation according to an embodiment of the present application;
FIG. 10 is a schematic diagram for determining an amount of translation provided by an embodiment of the present application;
fig. 11 is a schematic flowchart of another video processing method provided in the embodiment of the present application;
FIG. 12 is a schematic flowchart illustrating a process of determining a target scaling factor corresponding to a first image according to an embodiment of the present disclosure;
FIG. 13 is a schematic diagram of determining a target image in a target video stream according to an embodiment of the present application;
fig. 14 is a schematic effect diagram of a video processing method provided by an embodiment of the present application;
FIG. 15 is a schematic diagram of a hardware system suitable for use in the apparatus of the present application;
fig. 16 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present application;
fig. 17 is a schematic structural diagram of a chip according to an embodiment of the present application.
Detailed Description
The technical solution in the present application will be described below with reference to the accompanying drawings.
In the description of the embodiments of the present application, "/" means "or" unless otherwise specified, for example, a/B may mean a or B; "and/or" herein is merely an association relationship describing an associated object, and means that there may be three relationships, for example, a and/or B, and may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, in the description of the embodiments of the present application, "a plurality" means two or more than two.
In the following, the terms "first", "second" are used for descriptive purposes only and are not to be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present embodiment, "a plurality" means two or more unless otherwise specified.
First, some terms in the embodiments of the present application are explained so as to be easily understood by those skilled in the art.
1. The RGB (red, green, blue) color space, or RGB domain, refers to a color model that is related to the structure of the human visual system. All colors are considered as different combinations of red, green and blue depending on the structure of the human eye.
2. The pixel value refers to a set of color components corresponding to each pixel in a color image located in an RGB color space. For example, each pixel corresponds to a set of three primary color components, wherein the three primary color components are a red color component R, a green color component G, and a blue color component B, respectively.
3. YUV color space, or YUV domain, refers to a color coding method, Y denotes luminance, and U and V denote chrominance. The RGB color space emphasizes the color sensing of human eyes, the YUV color space emphasizes the sensitivity of vision to brightness, and the RGB color space and the YUV color space can be converted with each other.
4. Motion blur
During the shooting process of the electronic equipment, each frame of image is generated by accumulating photons in an exposure time, converting the photons into electrons through photoelectric conversion and further converting the electrons into an image which can be recognized by human eyes. If the electronic device has a large motion in this time, the motion information is also accumulated and recorded, and the generated image is accompanied by a strong motion blur.
5. Optical Image Stabilization (OIS)
The OIS technology is that during exposure, a motion sensor (e.g., a gyroscope or an accelerometer) detects shake of an electronic device, and an OIS controller controls a motor for driving the OIS, a moving lens or an image sensor (CCD) according to shake data detected by the motion sensor, so that an optical path is kept as stable as possible during the whole exposure period, and a clearly exposed image is obtained.
The optical anti-shake comprises two anti-shake modes, the first mode is a movable optical anti-shake mode of a lens, and the second mode is a movable optical anti-shake mode of a photosensitive element. The first principle of the movable optical anti-shake lens is to detect a tiny movement through a gyroscope sensor in a lens, transmit a signal to a microprocessor, immediately calculate a displacement to be compensated by the microprocessor, and compensate the displacement according to the shake direction of the lens and the displacement through a compensation lens group, thereby effectively overcoming image blur caused by the vibration of the camera. The second moving type optical anti-shake of photosensitive element uses image sensor offset to realize anti-shake, and its principle is: the CCD is first set on one support capable of moving up, down, left and right, and when the gyroscope sensor detects shaking, the parameters of shaking direction, speed, amount of movement, etc. are processed to calculate the amount of CCD movement to counteract shaking.
Optionally, the OIS controller includes two-axis and three-axis optical image stabilizers, and in the embodiment of the present application, the two-axis OIS controller is taken as an example for description, and relates to horizontal direction data and vertical direction data, which are not described in detail below.
6. Electronic anti-shake (EIS)
Electronic anti-shake can also be called electronic image stabilization, the EIS technology refers to anti-shake processing based on motion sensor data, and motion conditions between image frames in an image sequence are calculated through data acquired by a motion sensor during exposure of each frame of image; and corrects for motion between image frames to produce a relatively stable sequence of images.
The whole process of electronic anti-shake is not assisted and participated by any element, the anti-shake is realized by a digital processing technology, and the current electronic anti-shake mainly comprises two categories of 'natural anti-shake' realized by improving the light sensitivity (ISO) of a camera and 'digital anti-shake' realized by pixel compensation or other operation modes.
7. Timestamp (time stamp)
The time stamp refers to the time when the data is generated is authenticated by a certain technical means.
The foregoing is a brief introduction to the nouns referred to in the embodiments of the present application, and will not be described in detail below.
With the widespread use of electronic devices, video recording by using electronic devices has become a daily behavior in people's life. Taking an electronic device as an example of a mobile phone, when people use the mobile phone to record videos, the recorded videos have inconsistent inter-frame definition due to hand shake, mobile phone movement and the like, that is, motion blur is generated.
Illustratively, fig. 1 shows 2 images in a video recorded using the prior art.
As shown in fig. 1 (a), 1 frame in the video is an image frame having motion blur. As shown in fig. 1 (b), the image frame is 1 frame of the video.
In order to improve the definition of recorded video and reduce the influence of motion blur, various techniques for improving the video quality have been developed. For example, when recording a video, the electronic device generally increases the exposure time to improve the imaging effect if the ambient light is dark, but can relatively decrease the exposure time to reduce the motion blur problem caused by the disparity in the sharpness between frames. In addition, an OIS controller may be added to the electronic device to reduce motion blur by anti-shake.
However, since there are problems such as noise introduced into a recorded video due to a decrease in exposure time, and the range of the introduced OIS controller is limited, the above-mentioned two prior arts cannot completely remove motion blur.
In view of this, an embodiment of the present application provides a video processing method, which determines a relatively clear reference frame corresponding to an original image in an original video stream, and then replaces the original image with the reference frame to generate a first video stream; and performing electronic image stabilization processing on the first video stream to generate a target video stream. Because the blurred frame in the original video stream is replaced by the relatively clear reference frame, the definition of the target video stream can be improved, and because the pose transformation relation of the image in the target video stream is subjected to electronic image stabilization, the inter-frame definition of the anti-shake video can be improved, the consistency of vision is kept, and the user experience can be improved.
First, an application scenario of the embodiment of the present application is briefly described.
Fig. 2 is a schematic diagram of an application scenario provided in an embodiment of the present application. The video processing method provided by the application can be applied to removing motion blur on the images in the video.
In one example, the electronic device is illustrated as a cell phone. As shown in fig. 2 (a), a Graphical User Interface (GUI) of the electronic device is shown. When the electronic device detects an operation of clicking an icon of the camera application on the interface by the user, the camera application may be launched, and another GUI as shown in (b) of fig. 2 is displayed, and the GUI may be referred to as a preview interface.
A viewfinder window 21 may be included on the preview interface. In the preview state, a preview image can be displayed in real time in the finder window 21. The preview interface may also include a plurality of shooting mode options and a first control, i.e., a shooting key 11. The multiple shooting mode options include, for example: a photographing mode, a recording mode, etc., and the photographing key 11 is used to indicate whether the current photographing mode is the photographing mode, the recording mode, or another mode. Wherein the camera application is typically in a photographing mode by default when opened.
Illustratively, as shown in fig. 2 (b), after the electronic device starts the camera application, the electronic device runs a program corresponding to the video processing method, and acquires and stores a video in response to a user clicking the shooting key 11.
It should be understood that, in the shooting process, if the shooter shakes hands due to the physiological phenomenon of the shooter, for example, the body shakes along with the chest cavity during breathing; moreover, the photographer is also walking, and at this time, still things such as buildings, trees and the like in the scene to be photographed are moving relative to the photographer, so that the phenomenon that the definition is inconsistent among frames in the recorded video can be caused, and the phenomenon cannot be effectively avoided by using the related technology. However, the video processing method of the application can determine a relatively clear reference frame corresponding to an original image in an original video stream, and then replace the original image with the reference frame to generate a first video stream; and performing electronic image stabilization processing on the first video stream to generate a target video stream. Because the blurred frame in the original video stream is replaced by the relatively clear reference frame and the pose transformation relation of the image in the target video stream is processed by electronic image stabilization, the video obtained by the video processing method provided by the embodiment of the application is clearer and the consistency among the images can be kept.
It should be understood that the scenario shown in fig. 2 is an illustration of an application scenario, and does not limit the application scenario of the present application in any way. The video processing method provided by the embodiment of the application can be applied to, but is not limited to, the following scenes:
the system comprises a video call, a video conference application, a long and short video application, a video live broadcast application, a video network course application, an intelligent mirror-moving application scene, a system camera video recording function video recording, a video monitoring, an intelligent cat eye and other shooting scenes.
The following describes in detail a video processing method provided in an embodiment of the present application with reference to the drawings of the specification.
The video processing method provided by the embodiment of the application can be used for a video mode, wherein the video mode can indicate the electronic equipment to carry out video shooting; alternatively, the video mode may instruct the electronic device to perform live video.
For example, the video processing method provided by the embodiment of the present application may be applied to a video mode at night or in a dark environment.
Fig. 3 is a schematic flowchart of a video processing method according to an embodiment of the present disclosure. As shown in fig. 3, the video processing method 1 includes the following S11 to S17.
And S11, acquiring an original video stream. The original video stream includes a plurality of frames of original images.
The multi-frame original image may include two or more frames of original images, and the number of the original images is not limited in this embodiment.
It should be understood that the original video stream includes a plurality of original images that are chronologically ordered, for example, a video stream stored in a memory or a buffer includes a plurality of original images that are chronologically ordered. The multiple frames of original images may all be images located in a RAW domain, images located in an RGB domain, or all be images located in a YUV domain, which is not limited in this embodiment of the present application.
It should be understood that the original video stream may be acquired by the electronic device using a camera provided by the electronic device itself or acquired from another device, and the embodiment of the present application does not limit this. When the electronic device utilizes the camera set by the electronic device to collect data, the camera comprises an OIS controller. When the electronic device is obtained from other devices, the other devices need to include the OIS controller.
For example, when the electronic device uses a camera including the OIS controller to capture data, the process may include: the electronic device starts the camera and displays a preview interface as shown in (b) of fig. 2, wherein the preview interface includes a first control, and the first control may be a shooting key 11. After the electronic device detects a first operation of the user on the shooting key 11, in response to the first operation, a camera including an OIS controller collects multiple frames of original images to generate an original video stream, and the OIS controller is configured to perform optical image stabilization in a collection process, that is, the obtained multiple frames of original images are original images subjected to optical image stabilization, or the obtained original video stream is a video stream subjected to optical image stabilization.
It should also be understood that the plurality of frames of raw images included in the raw video stream may be raw images generated directly by a camera including the OIS controller or may be images resulting from one or more processing operations performed on the raw images.
S12, determining that the nth frame original image in the original video stream is a clear frame or a fuzzy frame, wherein n is more than or equal to 1 and is a positive integer.
Illustratively, the degree of blur corresponding to the original image of the nth frame can be determined; then, the original image of the nth frame is determined to be a clear frame or a blurred frame by setting a blurring degree threshold. The blurred frame indicates an image with a relatively low degree of sharpness, and the sharp frame indicates an image with a relatively high degree of sharpness.
It should be understood that blur and sharpness are two relative but interrelated concepts describing the sharpness (blur) of an image. The sharper the image, the higher the quality, the greater the corresponding sharpness, and the less the blur. The less sharp, i.e. blurred, the lower the quality, the less sharp the correspondence and the greater the blur.
If the electronic device is provided with a gyroscope sensor and/or an acceleration sensor, the ambiguity corresponding to the nth frame of original image can be determined according to angle information acquired by the gyroscope sensor and/or acceleration information acquired by the acceleration sensor. Alternatively, other algorithms may be used to determine the degree of blur corresponding to the n-th original image, which is not limited in this embodiment of the present application. The size of the ambiguity threshold may also be set and adjusted as needed, which is not limited in this embodiment.
Based on this, if the corresponding blurring degree of the n-th frame original image is greater than the blurring degree threshold, the n-th frame original image is a blurred frame. And if the corresponding fuzziness of the n-th frame original image is less than or equal to the fuzziness threshold value, the n-th frame original image is a clear frame.
By analogy, the subsequent n +1 th frame original image, the n +2 th frame original image, and the like can be determined to be a clear frame or a blurred frame, and the specific determination process is similar to the above process and is not repeated here.
It should be understood that the above process is mainly for performing a preliminary screening on the original image in the original video stream, so as to perform different processing on the determined sharp frame and the determined blurred frame.
And S13, if the original image of the nth frame is a clear frame, not processing the nth frame.
It should be understood that when the n-th frame original image is a clear frame, which indicates that it is already a relatively clear image, in order to reduce the amount of calculation, emphasis is placed on the processing of the blurred frame, so that the original image determined to be a clear frame may not be processed. Of course, other processing may be performed on the clear frame, which is not limited in this embodiment.
And S14, if the nth frame original image is a fuzzy frame, determining the clearest frame within the range of the preset frame number in other multiframe original images except the nth frame original image within the range of the preset frame number, and taking the clearest frame as a reference frame corresponding to the nth frame original image.
In the preset frame number range, the fuzziness of other original images except the n-th original image also needs to be calculated, and the clear frame or the fuzzy frame is judged according to the size relation between the fuzziness and the fuzziness threshold.
When the original images except the n-th original image in the preset frame number range are all fuzzy frames, the fuzzy of all the original images including the n-th original image in the preset frame number range is very serious, and at the moment, the preset frame number range can be expanded until a clear frame is found to serve as the clearest frame. Or, although all other original images within the preset frame number range are blurred frames, if the blurring degree corresponding to a part of blurred frames is smaller than that corresponding to the original image of the nth frame, the blurred frame with the minimum blurring degree in the part of blurred frames can be determined to be a blurred frame which is clearer relative to the original image of the nth frame, and therefore, the blurred frame with the minimum blurring degree can also be used as the clearest frame.
When only 1 clear frame exists in the original images except the n-th original image within the preset frame number range, and other frames are all fuzzy frames, the clear frame can be used as the clearest frame.
When a plurality of clear frames exist in other original images except the nth original image within the preset frame number range, the clearest frame can be determined according to the blurriness respectively corresponding to the plurality of clear frames. For example, among the plurality of sharp frames, the sharp frame with the least blur is the sharpest frame.
The preset frame number range is from an n-k frame original image to an n + k frame original image, k is more than or equal to 1, and k is a positive integer.
It should be understood that when the value of n is different, the original images included in the preset frame number range corresponding to the original image of the frame are different. When the values of k are different, the original images included in the corresponding preset frame number range are different for the same original image.
It should be understood that the clearest frame referred to herein is a frame of the original image that is relatively clear within the respective range. With the change of n and k, the original images included in the preset frame number range corresponding to the original image of the nth frame are different, and correspondingly, the clearest frames determined from the preset frame number range are not necessarily the same.
For example, assuming that n =5,k =1, and it is determined that the degree of blur corresponding to the 5 th frame of original image is greater than the preset degree of blur threshold, the 5 th frame of original image is a blurred frame, and thus, it needs to be determined whether the 5 th frame of original image is the clearest frame in a range from the 4 th frame of original image to the 6 th frame of original image.
If the 5 th frame original image is determined to be a blurred frame but still the clearest frame of the two adjacent frame original images, the 5 th frame original image does not need to be processed again. Because the other two frames of original images are more blurred relative to the 5 th frame of original images.
If the 5 th frame original image is determined to be a blurred frame, and meanwhile, the blurring degree of the 4 th frame original image is determined to be smaller than the blurring degrees of the 5 th frame original image and the 6 th frame original image, the 4 th frame original image is the clearest frame in the range from the 4 th frame original image to the 6 th frame original image. Therefore, the 4 th frame original image can be used as a reference frame corresponding to the 5 th frame original image.
And S15, if the original image of the nth frame is a fuzzy frame, replacing the fuzzy frame with the corresponding reference frame. A first video stream is generated from all the sharp frames and all the replaced reference frames.
It should be understood that, with the above-mentioned S12 to S14, each frame of original image in the original video stream may be sequentially judged, each frame of original image may be judged as a clear frame or a blurred frame, and when it is a blurred frame, a corresponding reference frame may be determined, so that all original images judged as clear frames in the original video stream may be retained, and all original images judged as blurred frames may be replaced with their corresponding reference frames, and the first video stream may be generated by using the clear frames and the replaced reference frames.
Since the first video stream replaces the blurred frame with respect to the original video stream, the image definition is relatively improved, and the overall video quality is relatively improved accordingly.
Fig. 4 is a schematic diagram of a first video stream according to an embodiment of the present application.
As shown in fig. 4 (a), the acquired original video stream includes 6 frames of original images, wherein the blur degree of each frame of original image is compared with a blur degree threshold, and it is determined that the 2 nd frame of original image and the 5 th frame of original image are blurred frames, and the other original images are all sharp frames. Then, for example, for the 2 nd frame original image, it is determined that the 3 rd frame original image is clearer than the 2 nd frame original image in a corresponding preset frame number range (for example, in two adjacent frame original images), and the 3 rd frame original image is the clearest frame, so that the 3 rd frame original image can be used as a reference frame corresponding to the 2 nd frame original image.
Similarly, for the 5 th frame original image, it is determined that the 4 th frame original image is clearer than the 6 th frame original image in the corresponding preset frame number range (for example, in two adjacent frame original images), and the 4 th frame original image is the clearest frame, so that the 4 th frame original image can be used as a reference frame corresponding to the 5 th frame original image.
As shown in (b) of fig. 4, since the 1 st frame original image, the 3 rd frame original image, the 4 th frame original image and the 6 th frame original image in the original video stream are all judged to be the sharp frames, these frame original images may be left without being processed, and the 3 rd frame original image is substituted for the 2 nd frame original image, and the 4 th frame original image is substituted for the 5 th frame original image, thereby the first video stream may be generated according to the remaining sharp frames and the substituted reference frames. It will be appreciated that in the first video stream, two 3 rd frame original images and two 4 th frame original images will appear.
Because the definition of the reference frame is higher, the definition of the image can be relatively improved after the reference frame is used for replacing the corresponding fuzzy frame, and further the definition of the whole video is improved.
In addition, other processing can be performed on the reference frame, and the processed reference frame is used for replacing the corresponding blurred frame. The embodiment of the present application does not limit the processing manner.
And S16, carrying out electronic image stabilization on the first video stream, and determining a first homography transformation matrix and a translation amount respectively corresponding to a first image in the first video stream.
In the embodiment of the present application, the first homography transformation matrix calculated based on gyroscope data is used to represent an image rotation relationship, and a translation amount is used to represent an image translation relationship. Each frame of the first image of the first video stream corresponds to a first homography transformation matrix and a translation quantity.
It should be understood that electronic image stabilization refers to motion correction in a three-dimensional coordinate system for a plurality of image frames in a video stream; the motion correction is mainly to compensate for three-dimensional rotation, so that a plurality of continuous image frames are in a smooth pose, and meanwhile, compensation for the deviation of the optical center coordinate by considering the OIS controller is needed.
And S17, determining the target video stream according to the first video stream, the first homography transformation matrix and the target translation amount.
Because the first video stream is subjected to electronic image stabilization, images in the first video stream undergo some pose transformation, the pose transformation can be decomposed into rotation and translation, a first homography transformation matrix is used for representing the rotation relation, and a target translation amount is used for representing the translation relation, so that the first images can be subjected to pose transformation based on the first homography transformation matrix corresponding to the first images in the first video stream and the target translation amount, and the transformed first images can indicate the first images after electronic image stabilization.
Then, each frame of the first image in the first video stream is transformed according to the corresponding first homography transformation matrix and the target translation amount, and all the transformed first images are combined together, which may indicate the first video stream after the electronic image stabilization, and may be referred to as a target video stream herein.
It should be understood that, compared with the original video stream, the definition of the target video stream is improved because the blurred frame is replaced by the relatively clear reference frame, and the anti-shake quality and the visual continuity of the target video stream are improved because the target video stream is subjected to the electronic image stabilization.
The embodiment of the application provides a video processing method, which comprises the steps of determining a fuzzy frame in an original video stream and a relatively clear reference frame in an adjacent range of the fuzzy frame, and replacing the fuzzy frame with the reference frame to generate a first video stream; and performing electronic image stabilization on the first video stream, and determining a rotation relation and a translation relation between the first video stream and the first video stream subjected to the electronic image stabilization, so that the first video stream can be processed into a target video stream consistent with the pose transformation relation of the first video stream subjected to the electronic image stabilization according to the rotation relation and the translation relation. Because the fuzzy frame in the original video stream is replaced by the relatively clear reference frame, the definition of the target video stream can be improved, and because the pose transformation relation of the image in the target video stream is consistent with the first video stream after the electronic image stabilization processing, the inter-frame definition of the anti-shake video can be improved, the visual consistency is kept, and the user experience is further improved.
With reference to fig. 6, a detailed description will be given below of a process of determining a first homography transformation matrix corresponding to a first image.
As shown in fig. 6, the above S16 may include S161 to S165.
S161, acquiring a first gyroscope data stream with the gyroscope sensor, and acquiring a first OIS data stream with the OIS controller.
It should be noted that, when the electronic device acquires the original video stream, the electronic device also acquires the gyroscope data by using the gyroscope sensor and acquires the OIS data by using the OIS controller.
The gyroscope data refers to angular velocity information measured when the gyroscope sensor detects shaking of the electronic device, for example, when the gyroscope sensor is a three-axis gyroscope sensor in the embodiment of the present application, the gyroscope data refers to three-axis angular velocity information measured by the gyroscope sensor. The angular velocity may be integrated in units of time, and the subsequent processing may be performed using the obtained angle information.
The OIS data is the movement amount of the OIS controller to control the motor for pushing the OIS, and move the lens or the image sensor to offset the shake according to the gyroscope data detected by the gyroscope sensor when the electronic device performs optical anti-shake. For example, when the OIS controller in the embodiment of the present application includes a two-axis optical image stabilizer, the movement amount for canceling out the shake includes a horizontal movement amount and a vertical movement amount, that is, the OIS data includes a horizontal movement amount and a vertical movement amount.
It should also be understood that the moment at which the gyroscope sensor acquires the gyroscope data each time, i.e., the corresponding timestamp of the gyroscope data, is stored with the gyroscope data. The time when the OIS controller acquires the OIS data each time, that is, the timestamp corresponding to the OIS data, is stored with the OIS data.
Here, it should be further noted that, because the frequency of data collected by the gyro sensor and the OIS controller is not consistent with the frequency of the captured image frame, and the frequency of data collected between the gyro sensor and the OIS controller may also be inconsistent, the data collected by the gyro sensor may be utilized to calculate the gyro data corresponding to the same timestamp by interpolation according to the timestamp corresponding to the original image in the original video stream, and the gyro data is referred to as the original gyro data corresponding to the original image in the frame. Similarly, the OIS data corresponding to the same timestamp may be calculated by interpolation using the data acquired by the OIS controller according to the timestamp corresponding to the original image in the original video stream, and the OIS data is referred to as the original OIS data corresponding to the original image of the frame.
It should be understood that the raw gyroscope data and the raw OIS data refer to the gyroscope data and the OIS data after interpolation calculation, and of course, the above is only an example, and other algorithms may also be used to determine the raw gyroscope data and the raw OIS data corresponding to each frame of the raw image, which is not limited in this embodiment of the present application.
Based on the obtained original gyroscope data and original OIS data, in S15, when the electronic device replaces the blurred frame with the reference frame, the electronic device replaces not only the image itself, but also the gyroscope data and the OIS data. That is, for each frame of the blurred frame, the original gyroscope data corresponding to the blurred frame is replaced by the original gyroscope data corresponding to the reference frame, and the original OIS data corresponding to the reference frame is replaced by the original OIS data corresponding to the blurred frame.
Here, the original gyroscope data of all the clear frames and the original gyroscope data of all the replaced reference frames may be referred to as a first gyroscope data stream, and the gyroscope data in the first gyroscope data stream corresponds to the first image in the first video stream one to one; and the original OIS data of all the clear frames and the original OIS data of all the replaced reference frames are called as a first OIS data stream, and the OIS data in the first OIS data stream is in one-to-one correspondence with the first image in the first video stream.
Fig. 5 is a schematic diagram of a first gyroscope data stream and a first OIS data stream provided in an embodiment of the present application.
As shown in fig. 5 (a), when 6 frames of original images of an original video stream are acquired, original gyroscope data and original OIS data corresponding to the 6 frames of original images are acquired at the same time, where the original gyroscope data corresponding to the 6 frames of original images are original gyroscope data 1 to original gyroscope data 6, respectively, and the original OIS data corresponding to the 6 frames of original images are original OIS data 1 to original OIS data 6, respectively.
As shown in (b) of fig. 5, while replacing the 2 nd frame original image with the 3 rd frame original image as the reference frame and replacing the 5 th frame original image with the 4 th frame original image as the reference frame, replacing the original gyroscope data 2 corresponding to the 2 nd frame original image with the original gyroscope data 3 corresponding to the 3 rd frame original image and replacing the original gyroscope data 5 corresponding to the 5 th frame original image with the original gyroscope data 4 corresponding to the 4 th frame original image, and then generating a first gyroscope data stream from the retained original gyroscope data 1, the original gyroscope data 3, the original gyroscope data 4, and the original gyroscope data 6, and the replaced original gyroscope data 3 and the original gyroscope data 4. It will be appreciated that in the first gyroscope data stream, two raw gyroscope data 3 and two raw gyroscope data 4 will be included.
Similarly, the original OIS data 3 corresponding to the 3 rd frame original image is used to replace the original OIS data 2 corresponding to the 2 nd frame original image, the original OIS data 4 corresponding to the 4 th frame original image is used to replace the OIS data 5 corresponding to the 5 th frame original image, and then the first OIS data stream is generated according to the reserved original OIS data 1, original OIS data 3, original OIS data 4 and original OIS data 6, and the replaced original OIS data 3 and original OIS data 4. It will be appreciated that in the first OIS data stream, two original OIS data 3 and two original OIS data 4 will be included.
And S162, determining a first jitter path according to the first gyroscope data stream.
The first gyroscope data stream refers to a gyroscope data stream composed of gyroscope data corresponding to the first images in the first video stream respectively.
The three-axis angular velocities respectively indicated by the gyroscope data in the first gyroscope data stream can be integrated to determine the corresponding angle, so that the first jitter path corresponding to the whole first gyroscope data stream can be determined according to the determined angle corresponding to each gyroscope data.
S163, perform a path smoothing process on the first jitter path, and determine a second jitter path.
The second shake path may be considered as a virtual path that the camera is expected to reach after the path smoothing process.
The path smoothing processing is optimized according to the motion trail of the whole image sequence in the video, so that the motion between the images is smoother and the connectivity is better. Or, the path smoothing process is to obtain the smallest path, i.e. the shortest path, through which a certain vertex triggers to pass to another vertex within the anti-shake constraint range in the curve formed by the acquired gyroscope data, i.e. the first shake path. The path smoothing algorithm may be, for example, low-pass filtering, etc., and this is not limited in any way by the embodiment of the present application.
And S164, determining a second gyroscope data stream according to the first gyroscope data stream and the second jitter path.
The second gyroscope data stream includes gyroscope data corresponding to the second shake path, that is, the second gyroscope data stream is gyroscope data corresponding to the virtual pose expected to be reached by the camera after the path smoothing processing. And the second gyroscope data stream corresponds to the first image subjected to the path smoothing one by one.
For example, fig. 7 is a schematic diagram of a jitter path provided in an embodiment of the present application.
As shown in (a) of fig. 7, from the original gyroscope data corresponding to each acquired original image, a shake path of the original video stream can be determined, which may be referred to as an original shake path, and since the amplitude of the camera shake is relatively large, the waveform in the original shake path is relatively steep.
As shown in (b) of fig. 7, if the original gyroscope data of the corresponding blurred frame is replaced based on the original gyroscope data of the reference frame, the original jitter path may also be changed accordingly, so that the jitter path corresponding to the first video stream may be determined according to the first gyroscope data stream, and may be referred to as a first jitter path.
As shown in (c) of fig. 7, after the path smoothing processing is performed on the first shake path, the shake amplitude of the camera becomes small, and here, the shake path corresponding to the path smoothing processing may be referred to as a second shake path, and the waveform of the second shake path becomes more gentle with respect to the first shake path.
It should be understood that, since the first jitter path is subjected to the path smoothing processing, and the second jitter path is changed relative to the first jitter path, the gyroscope data is also changed correspondingly, so that the corresponding gyroscope data after the path smoothing processing can be determined according to the first gyroscope data stream and the second jitter path, and is referred to as the second gyroscope data stream herein.
And S165, determining first homography transformation matrixes respectively corresponding to the first images in the first video stream according to the first gyroscope data stream, the second gyroscope data stream and the first OIS data stream.
The first homography transformation matrix H may be used to represent a rotation relationship between the images, and in this embodiment, may be used to represent a rotation relationship between the first image before the path smoothing processing and the first image after the path smoothing processing, so that the rotation relationship between the two frames of images may be determined by determining the first homography transformation matrix H between the first image and the first image after the path smoothing processing.
Since each frame of the first image in the first video stream corresponds to gyroscope data before and after the path smoothing processing, and corresponds to OIS data before the path smoothing processing, the OIS data after the path smoothing processing is (0, 0). Therefore, for each frame of the first image, the first homography transformation matrix H corresponding to the first image can be determined according to the gyroscope data respectively corresponding to the first image before the path smoothing processing and the gyroscope data respectively corresponding to the first image after the path smoothing processing, and the OIS data before the path smoothing processing. By determining the corresponding first homography transformation matrix H, the rotational relationship between the first image and the first image after the path smoothing processing can be determined.
As shown in fig. 8, the determining of the first homography transformation matrix corresponding to the first image of each frame in the first video stream in S165 may include the following S1651 to S1653.
S1651, determining a rotation matrix R corresponding to each frame of the first image according to the first gyroscope data stream and the second gyroscope data stream.
For each frame of the first image, a rotation matrix R corresponding to the first image and the first image after the path smoothing processing may be determined according to the gyroscope data corresponding to the first image in the first gyroscope data stream and the gyroscope data corresponding to the first image after the path smoothing processing in the second gyroscope data stream, where the rotation matrix R is the rotation matrix R corresponding to the first image. Thus, the same number of rotation matrices R may be determined for all first images in the first video stream, wherein the first images and the rotation matrices R have a one-to-one correspondence.
Here, since the gyroscope data is the three-axis angular velocity information, the determined rotation matrix R is a three-dimensional rotation matrix.
S1652, determining a first camera internal reference matrix K corresponding to each frame of first image according to the first OIS data stream ois
First camera internal reference matrix K ois Representing the corresponding camera intrinsic parameter matrix when the OIS controller is enabled.
For each frame of the first image, a first camera internal reference matrix K corresponding to the first image and the first image after the path smoothing processing may be determined according to OIS data corresponding to the first image in a first OIS data stream ois . Thus, the same number of first camera internal reference matrices K can be determined for all first images in the first video stream ois Wherein the first image and the first camera internal reference matrix K ois Has a one-to-one correspondence relationship.
It should be understood that the OIS data includes the positions of the optical centers of the image sensors on the x-axis and the y-axis, respectively, in the image coordinate system, and displacement data for achieving anti-shake when performing optical anti-shake, for example, when the displacement data is two-dimensional data, the displacement data may include an offset amount in the x-axis direction and an offset amount in the y-axis direction.
Wherein, the first camera internal reference matrix K ois Can be expressed as:
Figure BDA0003576041650000141
wherein f represents the focal length of the camera; center x -ois x Representing the coordinate position on the x-axis after the optical center of the image sensor is shifted; center x Indicating the position of the optical center of the image sensor on the x-axis; ois of oil x Representing the offset of the optical center of the image sensor on the x-axis after the offset; center y -ois y Representing the coordinate position of the image sensor on the y-axis after the optical center is shifted; center y Indicating the position of the optical center of the image sensor on the y-axis; ois of oil y Indicating the amount of shift in the y-axis after the optical center of the image sensor is shifted.
S1653, according to the rotation matrix and the first camera internal reference matrix, using formula H = KRK ois -1 And determining a first homography (homography) transformation matrix H corresponding to the first image of each frame.
Wherein H represents a first homography transform matrix; k represents a standard camera reference; r represents a rotation matrix; k ois -1 Representing the inverse of the camera intrinsic matrix of the optical image stabilization controller.
The standard camera intrinsic parameter K may be expressed as:
Figure BDA0003576041650000142
wherein f represents the focal length of the camera; center x Indicating the position of the optical center of the image sensor on the x-axis; center y Indicating the position of the optical center of the image sensor on the y-axis.
For each frame of the first image, a first homography transformation matrix H corresponding to the first image and the first image after the path smoothing processing can be determined according to the above formula. Thus, for all first images in the first video stream, a corresponding number of first homography transformation matrices H may be determined, wherein the first images and the first homography transformation matrices H have a one-to-one correspondence.
In the above, a flow of determining the translation amount corresponding to the first image will be described in detail with reference to fig. 9.
As shown in fig. 9, S16 may further include S166 to S169.
And S166, detecting characteristic points of the first image in the first video stream, and determining first coordinates corresponding to the plurality of characteristic points in the first image.
It should be understood that feature point detection is a concept in computer vision and image processing; the feature point detection is an arithmetic process performed on an image, and the feature point is a point where the image gradation value changes drastically or a point where the curvature is large on the edge of the image. Such as contour points, bright points in darker areas, dark points in lighter areas, etc.
For example, the algorithm used in the above feature point detection on the image may be: ORB algorithm, SIFT, SURF, etc. Of course, other detection methods may be used, and the embodiment of the present application does not limit this.
Optionally, as an achievable manner, the feature points of the first image may be screened, and some feature points in the image that do not meet the requirement or have a large error are removed, so as to improve the accuracy of the calculated translation amount.
It should also be understood that the first coordinate of the feature point in the first image refers to a coordinate of the feature point in the first image, and the embodiment of the present application does not set any limit to the number of feature points determined in the first image.
And S167, transforming the first coordinates corresponding to the plurality of feature points respectively by using the first homography transformation matrix, and determining second coordinates corresponding to the plurality of transformed feature points respectively.
It should be understood that the first homography transformation matrix can perform pose transformation on the first coordinates of the feature points in the first image according to the motion information in the camera under the camera coordinate system by multiplying the first coordinates of the feature points of the first image by the first homography transformation matrix, in other words, the feature points of the first image before the path smoothing processing can be rotated into the feature points consistent with the pose relationship of the first image after the path smoothing processing according to the rotation relationship.
The two-dimensional coordinate system may include a two-dimensional image coordinate system, which refers to a two-dimensional coordinate system in units of pixels. For example, the image captured by the camera may be stored in the electronic device as an array, and the value of each element (pixel) in the array is the brightness (gray scale) of the image point; a rectangular coordinate system u-v is defined on the image, and the coordinate (u, v) of each pixel can respectively represent the column number and the row number of the pixel in the array.
The three-dimensional coordinate system may include a three-dimensional camera coordinate system, which refers to a three-dimensional coordinate system with the optical center as an origin.
Since the first coordinates of the feature points in the first image are values in the two-dimensional image coordinate system, the first image can be inverted K to the camera internal reference matrix ois -1 Multiplying, which corresponds to transforming the first image from the two-dimensional image coordinate system to a three-dimensional camera coordinate system; multiplying the multiplication result by a rotation matrix R, namely performing rotation transformation; then, by multiplying the data corresponding to the rotation conversion by K, which corresponds to restoring the motion in the three-dimensional camera coordinate system to the two-dimensional image coordinate system, a new feature point whose rotation relationship with respect to the feature point before the path smoothing processing is converted can be obtained, the content of the new feature point corresponding to the content of the original feature point, but the rotation relationship is identical to the first image after the path smoothing processing.
S168, in the first video stream, except for the first image of the 1 st frame, determining the translation amount corresponding to each frame of the first image according to the second coordinates respectively corresponding to the plurality of feature points in each frame of the first image and the first image of the adjacent previous frame.
It should be understood that, due to the smoothing process of the dithering path performed on the first video stream, the coordinate difference of the first coordinates corresponding to the first images of two adjacent frames is not enough to represent the translation amount of the first images. Therefore, in order to improve the accuracy, the translation amount corresponding to the transformed first image may be determined by using the second coordinates corresponding to the first image transformed according to the first homography transformation matrix after the first video stream is subjected to the smoothing processing of the dithering path.
It should be understood that, when calculating the translation amount corresponding to a frame of the first image, the second coordinate of a feature point in the frame of the first image and the coordinate difference of the second coordinate of a feature point in the previous frame of the first image may be calculated, and each set of coordinate difference may include a difference in the x-axis direction and a difference in the y-axis direction, where the x-axis and the y-axis are perpendicular to each other. Then, the coordinate difference values of the plurality of groups of feature points in the first image and the first image of the previous frame are determined to obtain a plurality of coordinate difference values, and then the average value of the plurality of coordinate difference values is determined as the translation amount corresponding to the first image, where the average value of the plurality of coordinate difference values may include the average value of the difference values in the x-axis direction and the average value of the difference values in the y-axis direction, that is, the translation amount corresponding to the first image includes the average value of the difference values in the x-axis direction and the average value of the difference values in the y-axis direction. The feature points in the first image may be matched with the feature points in the first image of the previous frame, and then the coordinate difference of the second coordinates of the two matched feature points may be determined.
Optionally, as an implementation manner, the coordinate difference values may be filtered to eliminate some abnormal coordinate difference values, and then an average value of the coordinate difference values after elimination is determined to serve as the translation amount corresponding to the first image.
For example, some coordinate differences which are larger than other coordinate differences can be eliminated to reduce errors and improve the accuracy of the obtained translation amount.
For example, fig. 10 is a schematic diagram for determining a translation amount according to an embodiment of the present application.
As shown in fig. 10 (a), a first image is provided for one frame of the present application; as shown in (b) of fig. 10, the first image after one-frame path smoothing processing provided by the present application is provided.
Determining a corresponding rotation matrix between the first image and the first image after the path smoothing processing according to the gyroscope data respectively corresponding to the first image and the first image after the path smoothing processing, and continuing to perform calculation according to the first image pairDetermining a first camera internal reference matrix according to the OIS data; then using the formula H = KRK ois -1 A first homography transformation matrix H corresponding to the first image and the first image after the path smoothing processing can be determined.
Thus, the first coordinates of the feature point of the first image and the first homography transformation matrix H can be multiplied to obtain the second coordinates shown in (c) of fig. 10. The second coordinates are corrected with respect to the first coordinates in terms of the rotational relationship, and the corrected second coordinates are kept consistent with the rotational relationship of the first image after the path smoothing processing, for example, the vertical directions of the heart-shaped patterns in the two images are kept consistent.
Here, since the feature points in the first image are transformed by the first homography transformation matrix H, the content shown in (c) of fig. 10 is considered to be only a plurality of feature points and not complete image information.
For example, as shown in (a) of fig. 10, the first image includes a feature point c1 and a feature point d1. The first coordinates of the feature point c1 in the first image are (cx 1, cy 1), and the coordinates of the feature point d1 in the first image are (dx 1, dy 1).
The feature point c1 and the feature point d1 are subjected to rotation transformation by using a first homography transformation matrix pair. For example, as shown in (c) of fig. 10, the feature point c1 'is the feature point c1 after rotation transformation, and the coordinate of the feature point c1' is the second coordinate of the transformed feature point c 1; the feature point d1 'is the feature point d1 after the rotation transformation, and the coordinate of the feature point d1' is the second coordinate of the transformed feature point d1.
As shown in fig. 10 (d), since the first coordinates of the feature points of the first image of the previous frame are also transformed by the first homography transformation matrix corresponding to the first coordinates, the content shown in fig. 10 (d) may be considered to be only a plurality of feature points included in the first image of the previous frame, and not complete image information.
Then, the feature point c1 ″ of the first image of the previous frame is matched with the feature point c1 of the current first image, so that the coordinate difference value of the second coordinate corresponding to the feature point c1 ″ and the second coordinate corresponding to the feature point c1 of the current first image can be calculated; similarly, the feature point d1 ″ of the first image of the previous frame is matched with the feature point d1 of the current first image, so that the coordinate difference value between the second coordinate corresponding to the feature point d1 ″ and the second coordinate corresponding to the feature point d1 of the current first image can be determined; and determining the average value of the difference values of the two coordinates, and taking the average value as the translation amount corresponding to the first image.
And S169, performing path smoothing treatment on all the translation amounts, and determining a target translation amount corresponding to the first image of each frame.
According to the steps from S166 to S168, the translation amount corresponding to each frame of the first image in the first video stream may be determined, based on which, in order to make the translation transformation of the entire first video stream smoother and not stuck, the translation amounts corresponding to all the first images may be subjected to path smoothing processing, and the translation amount after the path smoothing processing is determined to be used as the translation amount corresponding to the first image. The path smoothing processing may be in a filtering manner, which is not limited in this embodiment of the present application.
It should be understood that, since the calculated translation amount is calculated based on the first image of frame 1, the first image of frame 1 may be considered not to be translated, or the target translation amount corresponding to the first image of frame 1 may be 0.
Optionally, as an implementation manner, fig. 11 is a schematic flowchart of another image processing method provided in this embodiment of the present application. As shown in fig. 11, the video processing method 1 may further include the following S18.
And S18, when the reference frame corresponding to the blurred frame is determined in the S14, determining a target scaling factor corresponding to the blurred frame according to the blurred frame and the corresponding reference frame.
Then, the above S18 may include:
and determining the target video stream according to the first video stream, the first homography transformation matrix, the target translation amount and the target scaling factor.
It should be understood that when a blurred frame is replaced by a relatively clear reference frame, a corresponding scaling relationship exists between the reference frame and the blurred frame, and the influence of subsequent electronic image stabilization processing on the scaling relationship is negligible, that is, the scaling relationship can be considered to be not influenced by the electronic image stabilization processing, so that a target scaling factor representing the scaling relationship can be calculated by the blurred frame and the reference frame, and then the replaced reference frame is transformed according to the target scaling factor, so that the scaling relationship of the obtained image is consistent with the blurred frame, and thus, the content of the replaced reference frame is prevented from appearing suddenly between the previous and next image frames, and the whole video stream is more coherent and natural.
The target scaling factor corresponding to the sharp frame remaining in the first video stream may be relatively determined.
Fig. 12 is a schematic flowchart of determining a target scaling factor corresponding to a first image according to an embodiment of the present disclosure. It should be understood that the first image refers to the first image being a reference frame.
Here, in the above S18, when determining the target scaling factor corresponding to the blurred frame from the blurred frame and the corresponding reference frame, as shown in fig. 12, the following S181 to S188 may be included.
S181, respectively detecting the feature points of the blurred frame and the corresponding reference frame, determining the feature points in the reference frame as reference feature points, and determining the feature points of the blurred frame as original feature points.
It should be understood that since the original video stream includes the blurred frame and the corresponding reference frame, the feature point detection performed on the original video stream is equivalent to the feature point detection performed on both the blurred frame and the corresponding reference frame, and thus the above S181 may also be referred to as performing the feature point detection on the original video stream to determine the feature point of the original image of each frame.
In addition, it should be understood that, if feature point detection is performed on the original video stream first, feature point detection does not need to be performed on the first image in the first video stream any more, because a part of the first image in the first video stream is an original image, and a part of the first image in the first video stream is a reference frame corresponding to the original image. However, if feature point detection is performed on the first image in the first video stream, feature point detection needs to be performed on the blurred frame and the reference frame, because the first video stream does not include the replaced blurred frame.
And S182, matching the characteristic points respectively included in the fuzzy frame and the corresponding reference frame to generate characteristic point pairs. Each pair of feature points includes 1 reference feature point and 1 original feature point.
And S183, determining a second homography transformation matrix according to the fuzzy frame and the corresponding reference frame.
Determining a corresponding rotation matrix between the blurred frame and the reference frame according to gyroscope data respectively corresponding to the blurred frame and the reference frame, and continuously determining a first camera internal reference matrix according to OIS data respectively corresponding to the blurred frame and the reference frame; then using the formula H = KRK ois -1 A second homography transformation matrix corresponding to the blurred frame and the reference frame may be determined.
And S184, determining the original coordinates of the original characteristic points in the blurred frame and determining the first reference coordinates of the reference characteristic points in the reference frame for each pair of characteristic points.
And S185, transforming the first reference coordinate of the reference characteristic point by using the second homography transformation matrix, and determining a second reference coordinate corresponding to the transformed reference characteristic point.
And S186, determining a scaling factor by using a least square method according to any two pairs of feature point pairs in the multiple pairs of feature point pairs and based on the original coordinates and the second reference coordinates of each pair of feature point pairs.
S187 repeats S186 a plurality of times, and determines a plurality of scaling factors.
And S188, determining an average value of the multiple scaling factors, and taking the average value of the multiple scaling factors as a target scaling factor corresponding to the reference frame. Since the reference frame is also the first image, the determined target scaling factor is also the target scaling factor of the first image.
For example, assuming that the coordinates of an original feature point in a frame of blurred frame is (x ', y', 1), and the second reference coordinate of a reference feature point in a reference frame is (x, y, 1), the two feature points should have a scaling relationship and a translation relationship, and thus the scaling relationship and the translation relationship can be expressed together by a 3 × 3 matrix H:
Figure BDA0003576041650000171
wherein s represents a scaling relationship, i.e., s is a scaling factor; t is t x For representing the difference in the x-axis direction, t y For difference in y-axis direction, t x 、t y For representing translation relations, i.e. t x 、t y Is the amount of translation.
Thus, the following equation (one) may be listed:
Figure BDA0003576041650000181
the original coordinates and the second reference coordinates of any two pairs of feature pairs are selected from a plurality of pairs of feature pairs matched by the fuzzy frame and the reference frame and are substituted into the equation, so that a group of equation sets can be obtained:
Figure BDA0003576041650000182
Figure BDA0003576041650000183
wherein (x) 1 ',y 1 ', 1) is the original coordinate of an original feature point in the blurred frame, (x) 1 ,y 1 And 1) is a second reference coordinate of a reference feature point in the reference frame, and the original feature point and the reference feature point are a pair of matched feature points.
(x 2 ',y 2 ', 1) is the original coordinate of another original feature point in the blurred frame, (x) 2 ,y 2 And 1) is a second reference coordinate of another reference feature point in the reference frame, and the original feature point and the reference feature point are a pair of matched feature points.
By disassembling the above set of equations, the following set of equations can be derived:
x 1 '=sx 1 +t x
y 1 '=sy 2 +t y
x 2 '=sx 2 +t x
further, the equation (two) can be derived:
Figure BDA0003576041650000184
suppose that
Figure BDA0003576041650000185
Is indicated by the reference B to indicate that,
Figure BDA0003576041650000186
is indicated by a in the representation of a,
Figure BDA0003576041650000187
denoted by M.
Thus, the above equation (two) can be simplified to B = AM.
Then, can be represented by M = (a) T A) -1 A T B to solve a matrix M representing the scaling and translation relationships.
Here, since the translation amount calculated in the above manner is not accurate enough, only the scaling factor may be retained, in other words, the target scaling factor s representing the scaling relationship may be calculated only in the above manner.
The above steps are repeatedly executed, any two pairs of feature point pairs are selected from the pairs of feature point pairs matched from the blurred frame and the reference frame each time, a corresponding scaling factor can be determined by the above method, a plurality of scaling factors can be determined a plurality of times, and then, an average value of the plurality of scaling factors is obtained to be used as a target scaling factor corresponding to the reference frame (called as a first image in the first video stream).
Optionally, as an implementation manner, the scaling factors may be filtered to remove some abnormal scaling factors, and then an average value of the removed multiple other scaling factors is determined as the target scaling factor.
For example, some scaling factors with a larger difference from other scaling factors may be eliminated to reduce errors and improve the accuracy of the obtained target scaling factor.
Based on the above, after the target scaling factor is determined, for each frame of the first image in the first video stream, the target image corresponding to the first image may be determined according to the first homography transformation matrix corresponding to the first image, the target translation amount, and the target scaling factor.
And according to the target image corresponding to the first image of each frame, forming a target video stream. Compared with the original image stream, the definition of the target video stream is improved, the optical image stabilization processing and the electronic image stabilization processing are combined, and the anti-shake quality is high.
Illustratively, as shown in fig. 13, fig. 13 is a schematic diagram of continuing to determine the target image in the target video stream based on fig. 10.
As shown in (d) in fig. 13, for the 9 th frame original image provided by the present application, the heart-shaped pattern in the 9 th frame original image is blurred. The calculated fuzziness of the 9 th frame original image is greater than the fuzziness threshold value, so the 9 th frame original image is regarded as a blurred frame and needs to be subjected to deblurring processing. Then, a corresponding reference frame, for example, a 10 th original frame image, is determined within a preset frame number range corresponding to the 9 th original frame image, as shown in (a) of fig. 13.
And replacing the blurred frame with the reference frame to generate the first video stream, wherein the reference frame is the first image in the first video stream. The first image after the path smoothing processing is performed on the first image, and the obtained first image after the path smoothing processing is as shown in (b) in fig. 13. According to the first image and the first image after the path smoothing processing, a first homography transformation matrix H corresponding to the first image can be determined.
Thus, the first coordinates of the feature points of the first image and the first homography transformation matrix H can be multiplied to obtain the second coordinates as shown in (c) of fig. 13. The second coordinates are corrected with respect to the first coordinates in terms of the rotational relationship, and the corrected second coordinates coincide with the rotational relationship of the first image after the path smoothing processing, for example, the vertical directions of the heart-shaped patterns in the two figures coincide with each other.
Then, a target translation amount may be determined according to the first coordinate and the second coordinate of the plurality of feature points, and a target scaling factor may be determined according to the 9 th original image and the first image, and then the first image is transformed according to the first homography transformation matrix H, the target scaling factor, and the target translation amount obtained above, so as to obtain a target image as shown in (e) in fig. 13. The rotation, scaling and translation relationships of the target image are corrected relative to the first image. It will be appreciated that the heart-shaped pattern in the target image is sharper than that of the original image of frame 9, due to the replacement with a relatively sharp reference frame, combined with the anti-shake processing.
The above operation is performed on each frame of original image in the original video stream, so that a target image corresponding to each frame of original image can be obtained, and a target video stream can be formed according to all target images. The target video stream may be stored in the electronic device or displayed on a display screen included in the display device, which is not limited in this embodiment of the present application.
The embodiment of the application provides a video processing method, which comprises the steps of determining a fuzzy frame in an original video stream and a relatively clear reference frame in an adjacent range of the fuzzy frame, and replacing the fuzzy frame with the reference frame to generate a first video stream; performing path smoothing on the first video stream, and rotating the feature point of the first image of the first video stream according to the rotation relationship by determining the rotation relationship between the first video stream and the first video stream after the path smoothing; then, according to the coordinates of the feature points of the first image in the first video stream and the coordinates of the feature points after the rotation correction, a translation relation is determined, and a scaling relation is determined by using the original video stream and the first video stream, so that the first video stream can be processed into a target video stream which is consistent with the scaling relation of the original video stream and is combined with the anti-shake processing according to the rotation relation, the translation relation and the scaling relation. Because the blurred frame in the first video is replaced by the clear frame, the definition of the first video stream can be improved, and because the image pose transformation relation in the first video stream is combined with anti-shake processing, the inter-frame definition of the anti-shake video can be improved, the consistency of vision is kept, and the user experience is further improved.
Fig. 14 shows an effect diagram of a video processing method provided by the embodiment of the application.
As shown in fig. 14 (a), the image frame is blurred by 1 frame in the video. As shown in fig. 14 (b), the image shown in fig. 14 (a) is processed by the video processing method provided in the embodiment of the present application, and then the determined target image has higher definition and better anti-shake effect.
The embodiment of the application provides a video processing method, which comprises the steps of determining a relatively clear reference frame corresponding to an original image in an original video stream, and then replacing the original image with the reference frame to generate a first video stream; and performing electronic image stabilization processing on the first video stream to generate a target video stream. Because the fuzzy frame in the original video stream is replaced by the relatively clear reference frame, the definition of the target video stream can be improved, and because the pose transformation relation of the image in the target video stream is processed by electronic image stabilization, the inter-frame definition of the anti-shake video can be improved, the visual continuity is kept, and the user experience can be improved.
It should be understood that the above illustrations are for the purpose of assisting persons skilled in the art in understanding the embodiments of the application, and are not intended to limit the embodiments of the application to the specific values or specific scenarios illustrated. It will be apparent to those skilled in the art from the foregoing description that various equivalent modifications or changes may be made, and such modifications or changes are intended to fall within the scope of the embodiments of the present application.
The video processing method according to the embodiment of the present application is described in detail with reference to fig. 2 to 14, and a hardware system, an apparatus, and a chip of an electronic device to which the present application is applicable are described in detail with reference to fig. 15 to 17. It should be understood that, the hardware system, the apparatus, and the chip in the embodiment of the present application may execute various video processing methods in the foregoing embodiments of the present application, that is, specific working processes of various products below may refer to corresponding processes in the foregoing method embodiments.
The video processing method provided by the embodiment of the application can be applied to various electronic devices, and correspondingly, the video processing device provided by the embodiment of the application can be electronic devices in various forms.
In some embodiments of the present application, the electronic device may be a single lens reflex camera, a card machine, or other various image capturing devices, a mobile phone, a tablet computer, a wearable device, a vehicle-mounted device, an Augmented Reality (AR)/Virtual Reality (VR) device, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a Personal Digital Assistant (PDA), or other devices or devices capable of performing image processing, and the embodiments of the present application are not limited to the specific type of the electronic device.
Taking an electronic device as a mobile phone as an example, fig. 15 shows a schematic structural diagram of an electronic device 100 according to an embodiment of the present application.
The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a Universal Serial Bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a button 190, a motor 191, an indicator 192, a camera 193, a display screen 194, a Subscriber Identity Module (SIM) card interface 195, and the like. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.
The configuration shown in fig. 15 is not intended to specifically limit the electronic device 100. In other embodiments of the present application, electronic device 100 may include more or fewer components than those shown in FIG. 15, or electronic device 100 may include a combination of some of the components shown in FIG. 15, or electronic device 100 may include sub-components of some of the components shown in FIG. 15. The components shown in fig. 15 may be implemented in hardware, software, or a combination of software and hardware.
Processor 110 may include one or more processing units, such as: the processor 110 may include an Application Processor (AP), a modem processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a video codec, a Digital Signal Processor (DSP), a baseband processor, and/or a neural-Network Processing Unit (NPU), etc. The different processing units may be separate devices or may be integrated into one or more processors.
The controller may be, among other things, a neural center and a command center of the electronic device 100. The controller can generate an operation control signal according to the instruction operation code and the time sequence signal to finish the control of instruction fetching and instruction execution.
A memory may also be provided in processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Avoiding repeated accesses reduces the latency of the processor 110, thereby increasing the efficiency of the system.
In an embodiment of the present application, the processor 110 may execute software codes of the video processing method provided in the embodiment of the present application, so as to capture a video with higher definition.
The connection relationship between the modules shown in fig. 15 is only for illustrative purposes and does not limit the connection relationship between the modules of the electronic apparatus 100. Alternatively, the modules of the electronic device 100 may also adopt a combination of the connection manners in the above embodiments.
The wireless communication function of the electronic device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor, the baseband processor, and the like.
The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in the electronic device 100 may be used to cover a single or multiple communication bands. Different antennas can also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed as a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.
The electronic device 100 may implement display functionality through the GPU, the display screen 194, and the application processor. The GPU is a microprocessor for image processing, and is connected to the display screen 194 and an application processor. The GPU is used to perform mathematical and pose calculations for graphics rendering. The processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
For example, in the embodiment of the present application, determining the reference frame corresponding to the nth original image may be performed in the processor 110; replacing the original image of the nth frame by using the reference frame to generate a first video stream; and performing electronic image stabilization processing on the first video stream to generate a target video stream.
The display screen 194 may be used to display images or video.
Illustratively, the display screen 194 may be used to display the target video stream.
The electronic device 100 may implement a photographing function through the ISP, the camera 193, the video codec, the GPU, the display screen 194, the application processor, and the like.
The ISP is used to process the data fed back by the camera 193. For example, when a user takes a picture, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, an optical signal is converted into an electric signal, and the camera photosensitive element transmits the electric signal to the ISP for processing and converting into an image visible to the naked eye. The ISP can optimize the algorithm of the noise, brightness and color of the image, and can also optimize the parameters of exposure, color temperature and the like of the shooting scene. In some embodiments, the ISP may be provided in camera 193.
The camera 193 is used to capture images or video. The starting can be triggered through an application program instruction, so that the shooting function is realized, such as shooting and acquiring the video stream of any scene. The camera may include an imaging lens, a filter, an image sensor, and the like. Light rays emitted or reflected by the object enter the imaging lens, pass through the optical filter and finally converge on the image sensor. The image sensor is mainly used for converging and imaging light emitted or reflected by all objects (also called as a scene to be shot, a target scene, and also understood as a scene image expected to be shot by a user) in a shooting visual angle; the optical filter is mainly used for filtering unnecessary light waves (such as light waves except visible light, such as infrared) in light; the image sensor is mainly used for performing photoelectric conversion on the received optical signal, converting the optical signal into an electrical signal, and inputting the electrical signal to the processor 130 for subsequent processing. The cameras 193 may be located in front of the electronic device 100, or in back of the electronic device 100, and the specific number and arrangement of the cameras may be set according to requirements, which is not limited in this application.
Illustratively, in embodiments of the present application, the camera 193 may acquire a video stream, which includes a plurality of frames of original images.
The digital signal processor is used for processing digital signals, and can process other digital signals besides digital image signals. For example, when the electronic device 100 selects a frequency bin, the digital signal processor is used to perform fourier transform or the like on the frequency bin energy.
Video codecs are used to compress or decompress digital video. The electronic device 100 may support one or more video codecs. In this way, the electronic device 100 may play or record video in a variety of encoding formats, such as: moving Picture Experts Group (MPEG) 1, MPEG2, MPEG3, and MPEG4.
The gyro sensor 180B may be used to determine the motion attitude of the electronic device 100. In some embodiments, the angular velocity of electronic device 100 about three axes (i.e., the x-axis, y-axis, and z-axis) may be determined by gyroscope sensor 180B. The gyro sensor 180B may be used for photographing anti-shake. For example, when the shutter is pressed, the gyro sensor 180B detects a shake angle of the electronic device 100, calculates a distance to be compensated for by the lens module according to the shake angle, and allows the lens to counteract the shake of the electronic device 100 by a reverse movement, thereby achieving anti-shake. The gyro sensor 180B can also be used in scenes such as navigation and motion sensing games.
For example, in the embodiment of the present application, the gyroscope sensor 180B may be used for the collected angle information, and the angle information may be used for determining the corresponding degree of blur of the original image.
Acceleration sensor 180E may detect the magnitude of acceleration of electronic device 100 in various directions, typically the x-axis, y-axis, and z-axis. The magnitude and direction of gravity may be detected when the electronic device 100 is stationary. The acceleration sensor 180E may also be used to identify the attitude of the electronic device 100 as an input parameter for applications such as horizontal and vertical screen switching and pedometers.
The distance sensor 180F is used to measure a distance. The electronic device 100 may measure the distance by infrared or laser. In some embodiments, for example in a shooting scene, the electronic device 100 may utilize the range sensor 180F to range for fast focus.
The ambient light sensor 180L is used to sense the ambient light level. Electronic device 100 may adaptively adjust the brightness of display screen 194 based on the perceived ambient light level. The ambient light sensor 180L may also be used to automatically adjust the white balance when taking a picture. The ambient light sensor 180L may also cooperate with the proximity light sensor 180G to detect whether the electronic device 100 is in a pocket to prevent accidental touches.
The fingerprint sensor 180H is used to collect a fingerprint. The electronic device 100 can utilize the collected fingerprint characteristics to implement functions such as unlocking, accessing an application lock, taking a picture, and answering an incoming call.
The touch sensor 180K is also referred to as a touch device. The touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also referred to as a touch screen. The touch sensor 180K is used to detect a touch operation applied thereto or in the vicinity thereof. The touch sensor 180K may pass the detected touch operation to the application processor to determine the touch event type. Visual output associated with the touch operation may be provided through the display screen 194. In other embodiments, the touch sensor 180K may be disposed on a surface of the electronic device 100 and at a different location than the display screen 194.
It is to be understood that the illustrated structure of the embodiment of the present application does not specifically limit the electronic device 100. In other embodiments of the present application, the electronic device 100 may include more or fewer components than shown, or combine certain components, or split certain components, or arrange different components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
Fig. 16 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present application. As shown in fig. 16, the video processing apparatus 200 includes an acquisition module 210 and a processing module 220.
The video processing apparatus 200 may perform the following scheme:
the obtaining module 210 is configured to obtain a video stream, where the video stream includes multiple frames of original images.
The processing module 220 is used for determining a reference frame corresponding to the nth frame original image from the plurality of frames of original images, wherein n is more than or equal to 1, n is a positive integer, and the reference frame is clearer than the nth frame original image;
the processing module 220 is further configured to replace the nth original image with the reference frame to generate a first video stream; and then, carrying out electronic image stabilization processing on the first video stream to generate a target video stream.
The video processing apparatus 200 is embodied in the form of functional modules. The term "module" herein may be implemented in software and/or hardware, and is not particularly limited thereto.
For example, a "module" may be a software program, a hardware circuit, or a combination of both that implements the functionality described above. The hardware circuitry may include an Application Specific Integrated Circuit (ASIC), an electronic circuit, a processor (e.g., a shared, dedicated, or group processor) and memory that execute one or more software or firmware programs, a combinational logic circuit, and/or other suitable components that support the described functionality.
Accordingly, the modules of the examples described in the embodiments of the present application can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The embodiment of the application also provides another electronic device which comprises a camera module, a processor and a memory.
The camera module is used for acquiring a video stream, and the video stream comprises a plurality of frames of original images.
A memory for storing a computer program operable on the processor.
And the processor is used for executing the steps processed in the video processing method.
Optionally, the camera module may include at least one of a wide-angle camera, a main camera, and a tele camera.
The embodiment of the application also provides a computer readable storage medium, wherein the computer readable storage medium stores computer instructions; the computer readable storage medium, when run on a video processing apparatus, causes the video processing apparatus to perform a method as shown in the foregoing. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or can comprise one or more data storage devices, such as a server, a data center, etc., that can be integrated with the medium. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium, or a semiconductor medium (e.g., solid State Disk (SSD)), among others.
Embodiments of the present application further provide a computer program product containing computer instructions, which when run on a video processing apparatus, enables the video processing apparatus to execute the foregoing technical solutions.
Fig. 17 is a schematic structural diagram of a chip according to an embodiment of the present application. The chip shown in fig. 17 may be a general-purpose processor or a dedicated processor. The chip includes a processor 301. The processor 301 is configured to support the video processing apparatus to implement the foregoing technical solution.
Optionally, the chip further includes a transceiver 302, where the transceiver 302 is configured to receive control of the processor 301 and is configured to support the communication device to execute the foregoing technical solution.
Optionally, the chip shown in fig. 17 may further include: a storage medium 303.
It should be noted that the chip shown in fig. 17 can be implemented by using the following circuits or devices: one or more Field Programmable Gate Arrays (FPGAs), programmable Logic Devices (PLDs), controllers, state machines, gate logic, discrete hardware components, any other suitable circuitry, or any combination of circuitry capable of performing the various functions described throughout this application.
The electronic device, the video processing apparatus, the computer storage medium, the computer program product, and the chip provided in the embodiments of the present application are all configured to execute the method provided above, and therefore, the beneficial effects achieved by the electronic device, the video processing apparatus, the computer storage medium, the computer program product, and the chip may refer to the beneficial effects corresponding to the method provided above, and are not described herein again.
It should be understood that the above description is only for the purpose of helping those skilled in the art better understand the embodiments of the present application, and is not intended to limit the scope of the embodiments of the present application. It will be apparent to those skilled in the art that various equivalent modifications or variations are possible in light of the above examples given, for example, some steps may not be necessary or some steps may be newly added in various embodiments of the above detection method, etc. Or a combination of any two or more of the above embodiments. Such modifications, variations, or combinations are also within the scope of the embodiments of the present application.
It should also be understood that the foregoing descriptions of the embodiments of the present application focus on highlighting differences between the various embodiments, and that the same or similar elements that are not mentioned may be referred to one another and, for brevity, are not repeated herein.
It should also be understood that the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by the function and the inherent logic thereof, and should not constitute any limitation to the implementation process of the embodiments of the present application.
It should also be understood that, in this embodiment of the present application, "preset" and "predefined" may be implemented by saving a corresponding code, table, or other means that can be used to indicate relevant information in advance in a device (for example, including an electronic device), and this application is not limited to the specific implementation manner thereof.
It should also be understood that the manner, the case, the category, and the division of the embodiments are only for convenience of description and should not be construed as a particular limitation, and features in various manners, the category, the case, and the embodiments may be combined without contradiction.
It is also to be understood that the terminology and/or the description of the various embodiments herein is consistent and mutually inconsistent if no specific statement or logic conflicts exists, and that the technical features of the various embodiments may be combined to form new embodiments based on their inherent logical relationships.
Finally, it should be noted that: the above description is only an embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions within the technical scope disclosed in the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (12)

1. A video processing method applied to an electronic device, the method comprising:
acquiring an original video stream, wherein the original video stream comprises a plurality of frames of original images;
determining a reference frame corresponding to the nth frame of original image from the multiple frames of original images, wherein n is more than or equal to 1, n is a positive integer, and the reference frame is clearer than the nth frame of original image;
replacing the original image of the nth frame with the reference frame to generate a first video stream;
and carrying out electronic image stabilization processing on the first video stream to generate a target video stream.
2. The video processing method of claim 1, wherein performing electronic image stabilization on the first video stream to generate a target video stream comprises:
performing electronic image stabilization on the first video stream, and determining a first homography transformation matrix and a target translation amount which respectively correspond to a first image in the first video stream, wherein the first homography transformation matrix is used for representing an image rotation relation, and the target translation amount is used for representing an image translation relation;
and generating the target video stream according to the first video stream, the first homography transformation matrix and the target translation amount.
3. The video processing method of claim 2, wherein performing electronic image stabilization on the first video stream to generate a target video stream, further comprises:
determining target scaling factors respectively corresponding to the first images in the first video stream according to the original video stream and the first video stream, wherein the target scaling factors are used for representing image scaling relations;
determining a target video stream according to the first video stream, the first homography transformation matrix and the target translation amount, including:
and generating the target video stream according to the first video stream, the first homography transformation matrix, the target translation amount and the target scaling factor.
4. The video processing method according to claim 2 or 3, wherein the electronic device comprises a gyro sensor and an Optical Image Stabilization (OIS) controller;
performing electronic image stabilization on the first video stream, and determining first homography transformation matrices respectively corresponding to first images in the first video stream, including:
acquiring a first gyroscope data stream by using the gyroscope sensor, and acquiring a first OIS data stream by using the OIS controller; gyroscope data in the first gyroscope data stream corresponds to the first image in the first video stream one to one, and OIS data in the first OIS data stream corresponds to the first image in the first video stream one to one;
determining a first jitter path according to the first gyroscope data stream;
performing path smoothing processing on the first jitter path to determine a second jitter path;
determining a second gyroscope data stream according to the first gyroscope data stream and the second jitter path;
and determining first homography transformation matrixes respectively corresponding to the first images in the first video stream according to the first gyroscope data stream, the second gyroscope data stream and the first OIS data stream.
5. The method of claim 4, wherein determining, according to the first gyroscope data stream, the second gyroscope data stream, and the first OIS data stream, first homography transformation matrices respectively corresponding to the first images in the first video stream comprises:
determining a rotation matrix corresponding to each frame of the first image according to the first gyroscope data stream and the second gyroscope data stream;
determining a first camera internal reference matrix corresponding to each frame of the first image according to the first OIS data stream, wherein the first camera internal reference matrix is used for indicating the corresponding camera internal reference matrix when the OIS controller is started;
using the formula H = KRK according to the rotation matrix and the first camera internal reference matrix ois -1 Determining the first homography transformation matrix corresponding to each frame of the first image;
wherein H represents the first homography transformation matrix, K represents a standard camera internal parameter, and R represents the rotation matrix; k is ois -1 Representing an inverse of the first camera intra-reference matrix.
6. The video processing method according to any one of claims 2 to 5, wherein performing electronic image stabilization on the first video stream, and determining target translation amounts respectively corresponding to the first images in the first video stream, comprises:
detecting feature points of the first image in the first video stream, and determining first coordinates corresponding to a plurality of feature points in the first image;
transforming the first coordinates corresponding to the plurality of feature points respectively by using the first homography transformation matrix, and determining second coordinates corresponding to the plurality of transformed feature points respectively;
in the first video stream, except for the first image of the 1 st frame, determining a translation amount corresponding to each frame of the first image according to second coordinates respectively corresponding to a plurality of feature points in each frame of the first image and the first image of the adjacent previous frame;
and performing path smoothing processing on all the translation amounts, and determining the target translation amounts respectively corresponding to the first images.
7. The method according to any of claims 3 to 6, wherein determining the target scaling factors corresponding to the first images in the first video stream according to the original video stream and the first video stream comprises:
respectively detecting feature points of the n-th frame original image and the corresponding reference frame, and determining the feature points in the reference frame as reference feature points, wherein the feature points in the n-th frame original image are original feature points;
matching the n-th frame original image with the feature points detected by the corresponding reference frame to determine a plurality of pairs of feature points, wherein each pair of feature points comprises 1 reference feature point and 1 original feature point;
and determining the target scaling factor according to a plurality of pairs of the characteristic point pairs.
8. The video processing method of claim 7, wherein determining the target scaling factor based on a plurality of pairs of the characteristic point pairs comprises:
determining a second homography transformation matrix according to the n frame original image and the corresponding reference frame;
for each pair of feature points, determining original coordinates of the original feature points in the original image of the nth frame and determining first reference coordinates of the reference feature points in the reference frame;
transforming the first reference coordinate of the reference characteristic point by using the second homography transformation matrix, and determining a second reference coordinate corresponding to the transformed reference characteristic point;
determining 1 scaling factor by using a least square method according to any two pairs of the characteristic point pairs in the plurality of pairs of the characteristic point pairs and based on the original coordinates and the second reference coordinates of each pair of the characteristic point pairs;
repeatedly executing for a plurality of times, and determining a plurality of scaling factors;
determining an average of a plurality of the scaling factors as the target scaling factor.
9. The video processing method according to any of claims 1 to 8, wherein the method further comprises:
determining that an nth frame original image in the original video stream is a clear frame or a fuzzy frame;
if the nth frame original image is a fuzzy frame, determining the reference frame corresponding to the nth frame original image in other multiframe original images except the nth frame original image within a preset frame number range;
the preset frame number range comprises an n-k frame original image to an n + k frame original image, the reference frame is the clearest frame in the preset frame number range, k is more than or equal to 1, and k is a positive integer.
10. An electronic device is characterized by comprising a camera module, a processor and a memory;
the camera module is used for collecting a video stream, and the video stream comprises a plurality of frames of original images;
the memory for storing a computer program operable on the processor;
the processor for performing the steps of processing in the video processing method according to any one of claims 1 to 9.
11. A chip, comprising: a processor for calling and running a computer program from a memory so that a device on which the chip is installed performs the video processing method according to any one of claims 1 to 9.
12. A computer-readable storage medium, characterized in that it stores a computer program comprising program instructions which, when executed by a processor, cause the processor to carry out the video processing method according to any one of claims 1 to 9.
CN202210334216.6A 2022-03-31 2022-03-31 Video processing method and related equipment thereof Active CN115546042B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210334216.6A CN115546042B (en) 2022-03-31 2022-03-31 Video processing method and related equipment thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210334216.6A CN115546042B (en) 2022-03-31 2022-03-31 Video processing method and related equipment thereof

Publications (2)

Publication Number Publication Date
CN115546042A true CN115546042A (en) 2022-12-30
CN115546042B CN115546042B (en) 2023-09-29

Family

ID=84723413

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210334216.6A Active CN115546042B (en) 2022-03-31 2022-03-31 Video processing method and related equipment thereof

Country Status (1)

Country Link
CN (1) CN115546042B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080225127A1 (en) * 2007-03-12 2008-09-18 Samsung Electronics Co., Ltd. Digital image stabilization method for correcting horizontal inclination distortion and vertical scaling distortion
US20120176492A1 (en) * 2011-01-11 2012-07-12 Qualcomm Incorporated Camera-based inertial sensor alignment for pnd
CN107241544A (en) * 2016-03-28 2017-10-10 展讯通信(天津)有限公司 Video image stabilization method, device and camera shooting terminal
WO2018223381A1 (en) * 2017-06-09 2018-12-13 厦门美图之家科技有限公司 Video shake-prevention method and mobile device
CN111275626A (en) * 2018-12-05 2020-06-12 深圳市炜博科技有限公司 Video deblurring method, device and equipment based on ambiguity
CN113269682A (en) * 2021-04-21 2021-08-17 青岛海纳云科技控股有限公司 Non-uniform motion blur video restoration method combined with interframe information

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080225127A1 (en) * 2007-03-12 2008-09-18 Samsung Electronics Co., Ltd. Digital image stabilization method for correcting horizontal inclination distortion and vertical scaling distortion
US20120176492A1 (en) * 2011-01-11 2012-07-12 Qualcomm Incorporated Camera-based inertial sensor alignment for pnd
CN107241544A (en) * 2016-03-28 2017-10-10 展讯通信(天津)有限公司 Video image stabilization method, device and camera shooting terminal
WO2018223381A1 (en) * 2017-06-09 2018-12-13 厦门美图之家科技有限公司 Video shake-prevention method and mobile device
CN111275626A (en) * 2018-12-05 2020-06-12 深圳市炜博科技有限公司 Video deblurring method, device and equipment based on ambiguity
CN113269682A (en) * 2021-04-21 2021-08-17 青岛海纳云科技控股有限公司 Non-uniform motion blur video restoration method combined with interframe information

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SHUAICHENG LIU: "DeepOIS: Gyroscope-Guided Deep Optical Image Stabilizer Compensation", 《IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY》, vol. 32, no. 5 *
王其成;郭卫华;: "长焦光学防抖镜头在铁路视频监控中的应用", 中国铁路, no. 10 *

Also Published As

Publication number Publication date
CN115546042B (en) 2023-09-29

Similar Documents

Publication Publication Date Title
CN107948519B (en) Image processing method, device and equipment
CN108111749B (en) Image processing method and device
JP5917054B2 (en) Imaging apparatus, image data processing method, and program
CN110572584B (en) Image processing method, image processing device, storage medium and electronic equipment
WO2020037959A1 (en) Image processing method, image processing apparatus, electronic device and storage medium
JP5766077B2 (en) Image processing apparatus and image processing method for noise reduction
CN114339102B (en) Video recording method and equipment
CN110266954A (en) Image processing method, device, storage medium and electronic equipment
CN113542600B (en) Image generation method, device, chip, terminal and storage medium
CN110445986A (en) Image processing method, device, storage medium and electronic equipment
CN110198419A (en) Image processing method, device, storage medium and electronic equipment
CN108574803B (en) Image selection method and device, storage medium and electronic equipment
CN110740266B (en) Image frame selection method and device, storage medium and electronic equipment
CN115701125A (en) Image anti-shake method and electronic equipment
CN110264420B (en) Image processing method and device based on multi-frame images
CN115546043B (en) Video processing method and related equipment thereof
EP3267675B1 (en) Terminal device and photographing method
CN108520036B (en) Image selection method and device, storage medium and electronic equipment
CN115546042B (en) Video processing method and related equipment thereof
WO2023124202A1 (en) Image processing method and electronic device
CN115633262A (en) Image processing method and electronic device
CN110266967A (en) Image processing method, device, storage medium and electronic equipment
CN114339101B (en) Video recording method and equipment
CN115134532A (en) Image processing method, image processing device, storage medium and electronic equipment
CN115835034A (en) White balance processing method and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant