CN114339102A

CN114339102A - Video recording method and device

Info

Publication number: CN114339102A
Application number: CN202011057718.6A
Authority: CN
Inventors: 孙思佳; 朱聪超; 王宇; 卢圣卿
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-09-29
Filing date: 2020-09-29
Publication date: 2022-04-12
Anticipated expiration: 2040-09-29
Also published as: CN114339102B

Abstract

The embodiment of the application provides a video recording method and video recording equipment, relates to the technical field of electronics, and can be used for carrying out anti-shake processing by combining rotation information of a camera and target translation information of the camera obtained according to image content in a video recording scene, so that the image stabilizing effect of a video image is improved, and the shooting experience of a user is improved. The specific scheme is as follows: the method comprises the steps that after a video recording function is started, an electronic device collects an original image; acquiring target translation information of a camera according to the acquired image information of the multiple frames of original images; acquiring rotation information of the camera according to attitude sensor data corresponding to multiple frames of original images; calculating a stable image transformation matrix of a first original image according to target translation information of the camera and rotation information of the camera, wherein the first original image is an image in a plurality of frames of original images; and carrying out image transformation on the first original image according to the image stabilization transformation matrix to obtain a target image. The embodiment of the application is used for video anti-shaking.

Description

Video recording method and device

Technical Field

The embodiment of the application relates to the technical field of electronics, in particular to a video recording method and video recording equipment.

Background

With the development of shooting technology, the requirements of users on video recording effects are higher and higher. In the video recording process, due to the reasons of shaking hands of a user or shaking of the electronic equipment and the like, the electronic equipment is easy to move, so that the shot image shakes. The electronic device may remove image jitter between image frames due to rotational motion of the camera based on gyroscope (gyro) data. The effect of anti-shake processing on images only by using gyroscope data is poor, and the shooting experience of users is poor.

Disclosure of Invention

The embodiment of the application provides a video recording method and video recording equipment, which can be used for carrying out anti-shake processing by combining rotation information of a camera and target translation information of the camera obtained according to image content in a video recording scene, reducing image shake caused by manual operation of a user or shaking of electronic equipment and the like, improving the image stabilization effect of video images and improving the shooting experience of the user.

In order to achieve the above purpose, the embodiment of the present application adopts the following technical solutions:

in one aspect, an embodiment of the present application provides a video recording method, which is applied to an electronic device, where the electronic device includes a camera, and the method may include: the electronic equipment collects an original image after the video recording function is started. And the electronic equipment acquires target translation information of the camera according to the acquired image information of the multiple frames of original images. And then, the electronic equipment obtains the rotation information of the camera according to the attitude sensor data corresponding to the multi-frame original image. The electronic equipment calculates an image stabilization transformation matrix of a first original image according to the target translation information of the camera and the rotation information of the camera, wherein the first original image is an image in a plurality of frames of original images. And the electronic equipment performs image transformation on the first original image according to the image stabilization transformation matrix to obtain a target image.

In the scheme, the electronic equipment can perform anti-shake processing in combination with the rotation information of the camera and the target translation information of the camera obtained according to the image content in a video recording scene, so that image shake caused by manual operation of a user or shaking of the electronic equipment and the like is reduced, the image stabilization effect of a video image is improved, and the shooting experience of the user is improved.

In one possible design, the acquiring, by the electronic device, the original image after the video recording function is turned on may include: the electronic equipment starts a video recording function and acquires an original image after detecting a shooting operation of a user. The method may further comprise: after the electronic equipment detects that the shooting operation of the user is stopped, a video file is generated according to the video image, and the video image is a target image.

That is to say, in the video shooting process, the electronic device can combine the rotation information of the camera and the target translation information of the camera obtained according to the image content to perform anti-shake processing on the original image so as to generate a video image, thereby reducing image shake caused by manual operation of a user or shaking of the electronic device and the like, improving the image stabilization effect of the video image, and improving the shooting experience of the user.

In another possible design, the acquiring, by the electronic device, the original image after the video recording function is turned on may include: the electronic equipment starts a video recording function and acquires an original image after detecting a shooting operation of a user. The method may further comprise: the electronic equipment displays the recorded image on the shooting interface, and the recorded image is a target image.

That is to say, in the video shooting process, the electronic device can combine the rotation information of the camera and the target translation information of the camera obtained according to the image content to perform anti-shake processing on the original image so as to generate a recorded image, thereby reducing image shake caused by manual operation of a user or shaking of the electronic device and the like, improving the image stabilizing effect of the recorded image on the shooting interface, and improving the shooting experience of the user.

In another possible design, the target image is a preview image, and the method may further include: the electronic device displays the preview image on the preview interface.

That is to say, in the video shooting process, the electronic device can combine the rotation information of the camera and the target translation information of the camera obtained according to the image content to perform anti-shake processing on the original image so as to generate a preview image, thereby reducing image shake caused by manual operation of a user or shaking of the electronic device and the like, improving the image stabilization effect of the preview image on the preview interface, and improving the shooting experience of the user.

In another possible design, the target translation information of the camera is a target translation curve of the camera. The obtaining, by the electronic device, target translation information of the camera according to the image information of the collected multiple frames of original images may include: the electronic equipment obtains target translation vectors corresponding to two adjacent frames of original images according to the image information of the two adjacent frames of original images in the multiple frames of original images. The target translation vectors between the continuous multiframe original images are connected to form an original translation curve of the camera. The electronic device obtains a target translation curve of the camera from the original translation curve of the camera.

That is, the target translation curve of the camera may be used to represent the translation of the camera. The electronic device may obtain an original translation curve of the camera from the additional image information of the original image, thereby generating a target translation curve of the camera from the original translation curve of the camera.

In another possible design, the obtaining, by the electronic device, a target translation vector corresponding to two adjacent original images according to image information of the two adjacent original images may include: the electronic equipment calculates a first translation vector according to feature points on two adjacent frames of original images; and obtaining target translation vectors corresponding to the two adjacent frames of original images according to the first translation vector.

In this way, the electronic device may obtain the first translation vector from the image information of the original image, thereby obtaining the target translation vector from the first translation vector, so as to obtain the original translation curve of the camera from the target translation vector.

In another possible design, the obtaining, by the electronic device, a target translation vector corresponding to two adjacent original images according to the first translation vector may include: and the electronic equipment calculates a second translation vector according to the motion sensor data corresponding to the two adjacent frames of original images. The electronic equipment selects a third translation vector from the first translation vectors, and the third translation vector is positioned in the delta neighborhood of the second translation vector; and obtaining target translation vectors corresponding to the two adjacent frames of original images according to the third translation vector.

Therefore, the electronic equipment filters the first translational vector to obtain a third translational vector, the distance between the third translational vector and the second translational vector is small, the first translational vector corresponding to most of feature point pairs which are mismatched can be filtered, the first translational vector corresponding to local motion of the shot moving object is filtered, and the selected third translational vector can more accurately represent the whole translational condition of the camera caused by hand shake of a user or shaking of the electronic equipment and the like.

In another possible design, the obtaining, by the electronic device, a target translation vector corresponding to two adjacent original images according to the third translation vector may include: and the electronic equipment selects a fourth translation vector with the similarity larger than or equal to a preset value from the third translation vectors. And the electronic equipment obtains target translation vectors corresponding to the two adjacent frames of original images according to the feature points corresponding to the fourth translation vector.

Therefore, the electronic equipment can filter the translation vector and the outlier corresponding to the local motion of the shot moving object similar to the translation in the target direction by the fourth translation vector obtained by filtering the third translation vector, so that the fourth translation vector can more accurately represent the translation condition of the two adjacent frames of original images. Furthermore, the electronic device may obtain the target translation vector according to the more accurate fourth translation vector.

In another possible design, the electronic device may calculate an image stabilization transformation matrix of the first original image according to the target translation information of the camera and the rotation information of the camera, and may include: the electronic equipment calculates the translation compensation quantity of the first original image according to the target translation information of the camera; calculating a rotation compensation amount of the first original image according to the rotation information of the camera; and the electronic equipment calculates the image stabilization transformation matrix of the first original image according to the translation compensation quantity and the rotation compensation quantity.

In this way, the electronic device can generate an image stabilization transformation matrix according to the translation compensation amount and the rotation compensation amount, so as to perform warp transformation and motion compensation on the original image.

In another possible design, the method may further include: if the preset conditions are met, the electronic equipment obtains rotation information of the camera according to the attitude sensor data corresponding to the multiple frames of original images; the electronic equipment calculates a stable image transformation matrix of the first original image according to the rotation information of the camera; and carrying out image transformation on the first original image according to the image stabilization transformation matrix to obtain a target image.

That is, the electronic device may exit the five-axis anti-shake mode after a certain condition is satisfied, so that the anti-shake processing is performed according to the rotation information of the camera, and the anti-shake processing does not need to be performed in combination with the target translation information of the camera obtained according to the image content.

In another possible design, the method may further include: and if the preset condition is met, prompting the user that the user exits the target anti-shake mode.

Therefore, a user can conveniently know whether the target anti-shaking mode is currently in the target anti-shaking mode, and the target anti-shaking mode can be a five-axis anti-shaking mode.

In another possible design, the preset conditions include: the number of the feature points on the two adjacent frames of original images is less than or equal to a second preset value; or the proportion of the third translation vector corresponding to the two adjacent frames of original images in the first translation vector is less than or equal to a third preset value; or the proportion of the fourth translation vector in the third translation vector of the two adjacent frames of original images is less than or equal to a fourth preset value; or, the variance of the translational compensation quantity between the original images of the continuous P frames is greater than or equal to a fifth preset value, and P is an integer greater than 1; or the translation amplitude between the original images of the continuous Q frames is greater than or equal to a sixth preset value, and Q is an integer greater than 1.

Under the condition that the preset condition is met, it can be shown that the target translation information determined according to the image information is inaccurate, and the image cannot be or does not need to be subjected to anti-shake processing by combining the target translation information, so that the electronic equipment can determine the image stabilization transformation matrix only according to the rotation information to perform anti-shake processing on the image.

In another possible design, the obtaining, by the electronic device, rotation information of the camera according to the posture sensor data corresponding to the multiple frames of raw images may include: the electronic device obtains rotation information of the camera according to the posture sensor data corresponding to the N frames of original images, wherein N is an integer larger than 1, N is 1+ I + N2, N1 and I are positive integers, and N2 is a non-negative integer. The electronic equipment calculates an image stabilization transformation matrix of the first original image according to the rotation information of the camera, and the image stabilization transformation matrix comprises the following steps: the electronic equipment calculates an image stabilization transformation matrix of the I frame original image according to the target pose of the camera corresponding to the N frame original image on the rotation information of the camera, wherein the I frame original image is the first original image, the image stabilization transformation matrix of the I frame original image is used for obtaining the I frame target image, and the initial frame of the I frame original image corresponds to the (N1 + 1) th frame original image in the N frame original image.

That is to say, the electronic device can calculate the camera pose and the image stabilization transformation matrix corresponding to a certain original image according to the original image before and after the original image, so that the change between the camera poses corresponding to different original images is smoother, and the image stabilization effect is improved.

In another possible design, when the target image is a preview image or a captured image, N2 is 0.

Therefore, on the preview interface and the shooting interface, the electronic equipment does not need to calculate the camera pose and the image stabilization transformation matrix corresponding to a certain original image according to the original image behind the original image, so that the preview image and the recorded image corresponding to the original image can be processed and displayed in real time.

In another aspect, an embodiment of the present application provides a shooting device, which is included in an electronic device. The device has the function of realizing the behavior of the electronic equipment in any one of the above aspects and possible designs, so that the electronic equipment executes the video recording method executed by the electronic equipment in any one of the possible designs of the above aspects. The function can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes at least one module or unit corresponding to the above functions. For example, the apparatus may comprise an acquisition unit, an obtaining unit, a processing unit, etc.

In another aspect, an embodiment of the present application provides an electronic device, including: the camera comprises a camera, and the camera is used for acquiring images; a screen for displaying an interface, one or more processors; and a memory having code stored therein. When executed by an electronic device, cause the electronic device to perform the video recording method performed by the electronic device in any of the possible designs of the above aspects.

In another aspect, an embodiment of the present application provides an electronic device, including: one or more processors; and a memory having code stored therein. When executed by an electronic device, cause the electronic device to perform the video recording method performed by the electronic device in any of the possible designs of the above aspects.

In yet another aspect, an embodiment of the present application provides a computer-readable storage medium, which includes computer instructions, when the computer instructions are executed on an electronic device, cause the electronic device to perform the video recording method in any one of the possible designs of the foregoing aspect.

In yet another aspect, the present application provides a computer program product, which when run on a computer, causes the computer to execute the video recording method executed by an electronic device in any one of the possible designs of the above aspect.

In another aspect, an embodiment of the present application provides a chip system, which is applied to an electronic device. The chip system includes one or more interface circuits and one or more processors; the interface circuit and the processor are interconnected through a line; the interface circuit is used for receiving signals from a memory of the electronic equipment and sending the signals to the processor, and the signals comprise computer instructions stored in the memory; the computer instructions, when executed by the processor, cause the electronic device to perform the video recording method of any of the possible designs of the above aspects.

For the advantageous effects of the other aspects, reference may be made to the description of the advantageous effects of the method aspects, which is not repeated herein.

Drawings

Fig. 1 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure;

fig. 2 is a flowchart of a video recording method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a set of interfaces provided by an embodiment of the present application;

FIG. 4A is a diagram illustrating a translation vector filtering according to an embodiment of the present disclosure;

fig. 4B is a flowchart of calculating a translational compensation amount according to an embodiment of the present application;

FIG. 5 is a schematic diagram illustrating an effect of a constraint provided in an embodiment of the present application;

fig. 6 is a schematic timing relationship diagram of an original image and a preview image according to an embodiment of the present disclosure;

FIG. 7A is a schematic diagram of a set of preview interfaces provided by an embodiment of the present application;

FIG. 7B is a schematic diagram of another set of preview interfaces provided by an embodiment of the present application;

FIG. 7C is a schematic diagram of another set of preview interfaces provided by an embodiment of the present application;

FIG. 8A is a schematic diagram of a set of video images provided by an embodiment of the present application;

FIG. 8B is a schematic diagram of another set of video images according to an embodiment of the present application;

FIG. 8C is a schematic diagram of another set of video images provided by an embodiment of the present application;

fig. 9 is a schematic diagram of a prompt interface provided in an embodiment of the present application;

FIG. 10 is a flowchart of another video recording method according to an embodiment of the present application;

fig. 11 is a flowchart of another video recording method according to an embodiment of the present application;

FIG. 12 is a schematic view of another interface provided by an embodiment of the present application;

fig. 13 is a flowchart of another video recording method according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. In the description of the embodiments herein, "/" means "or" unless otherwise specified, for example, a/B may mean a or B; "and/or" herein is merely an association describing an associated object, and means that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, in the description of the embodiments of the present application, "a plurality" means two or more than two.

In the following, the terms "first", "second" are used for descriptive purposes only and are not to be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present embodiment, "a plurality" means two or more unless otherwise specified.

In the embodiments of the present application, words such as "exemplary" or "for example" are used to mean serving as examples, illustrations or descriptions. Any embodiment or design described herein as "exemplary" or "e.g.," is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present concepts related in a concrete fashion.

When an electronic device is used to record a video, the electronic device and a camera module (hereinafter referred to as a camera) are prone to moving due to the shake of the hand of a user or the shaking of the electronic device, so that the image is shaken and blurred. Wherein the movement may comprise a translational movement and a rotational movement. Particularly, when the object to be shot is close to the camera or a long-focus camera is used for shooting, the translation motion of the electronic equipment and the camera is more obvious, and the translation motion has a larger influence on image shake.

In a technical scheme, the electronic equipment performs image anti-shake processing according to gyroscope data, so that image shake caused by rotation motion of a camera between image frames can be removed, but image shake caused by translation motion between image frames is difficult to remove, so that an anti-shake effect is poor, and user shooting experience is poor.

The embodiment of the application provides a video recording method and video recording equipment, which can be applied to electronic equipment, can perform anti-shake processing by combining rotation information of a camera obtained according to data of attitude sensors such as a gyroscope and the like and target translation information of the camera obtained according to image content in a video recording scene, reduce image shake caused by hand shake of a user or shaking of the electronic equipment and the like, improve the image stabilization effect of a video image, and improve the shooting experience of the user.

In the embodiment of the application, the target translation information of the camera, which is obtained by the electronic device according to the image content, is used for representing the translation condition of the camera caused by the hand shake of a user or the shaking of the electronic device and the like, and can be used for representing the global translation trend between adjacent original images; and is not used for representing the local relative translation condition of the shot moving object on the adjacent original images.

The video recording method provided by the embodiment of the application can be used for a rear video recording scene and can also be used for a front video recording scene, and is not limited. After the shooting function is started, the electronic equipment can perform anti-shake processing by combining the rotation information of the camera and the target translation information of the camera obtained according to the image content.

For example, the electronic device may be a mobile phone, a tablet computer, a wearable device (e.g., a smart watch), an in-vehicle device, an Augmented Reality (AR)/Virtual Reality (VR) device, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a Personal Digital Assistant (PDA), or other mobile terminals, or may be a professional camera or other devices.

Fig. 1 shows a schematic structural diagram of an electronic device 100. The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a Universal Serial Bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a key 190, a motor 191, an indicator 192, a camera 193, a display screen 194, a Subscriber Identification Module (SIM) card interface 195, and the like. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.

It is to be understood that the illustrated structure of the embodiment of the present application does not specifically limit the electronic device 100. In other embodiments of the present application, electronic device 100 may include more or fewer components than shown, or some components may be combined, some components may be split, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

Processor 110 may include one or more processing units, such as: the processor 110 may include an Application Processor (AP), a modem processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a memory, a video codec, a Digital Signal Processor (DSP), a baseband processor, and/or a neural-Network Processing Unit (NPU), etc. The different processing units may be separate devices or may be integrated into one or more processors.

The controller may be, among other things, a neural center and a command center of the electronic device 100. The controller can generate an operation control signal according to the instruction operation code and the timing signal to complete the control of instruction fetching and instruction execution.

A memory may also be provided in processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to reuse the instruction or data, it can be called directly from memory. Avoiding repeated accesses reduces the latency of the processor 110, thereby increasing the efficiency of the system.

The electronic device 100 implements display functions via the GPU, the display screen 194, and the application processor. The GPU is a microprocessor for image processing, and is connected to the display screen 194 and an application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. The processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.

The display screen 194 is used to display images, video, and the like. The display screen 194 includes a display panel. The display panel may adopt a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (active-matrix organic light-emitting diode, AMOLED), a flexible light-emitting diode (FLED), a miniature, a Micro-oeld, a quantum dot light-emitting diode (QLED), and the like. In some embodiments, the electronic device 100 may include 1 or R display screens 194, R being a positive integer greater than 1. In an embodiment of the present application, the display screen 194 may be used to display a preview interface, a shooting interface, and the like in a video recording mode.

The electronic device 100 may implement a shooting function through the ISP, the camera 193, the video codec, the GPU, the display 194, the application processor, and the like.

The ISP is used to process the data fed back by the camera 193. For example, when a photo is taken, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing and converting into an image visible to naked eyes. The ISP can also carry out algorithm optimization on the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in camera 193.

The camera 193 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image to the photosensitive element. The photosensitive element may be a Charge Coupled Device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The light sensing element converts the optical signal into an electrical signal, which is then passed to the ISP where it is converted into a digital image signal. And the ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into image signal in standard RGB, YUV and other formats. In some embodiments, the electronic device 100 may include 1 or L cameras 193, L being a positive integer greater than 1.

Wherein, camera 193 can include the camera of different focal length, for example ultra wide-angle camera, wide-angle camera and tele camera that equivalent focal length changes from small to big etc.. The camera with the smaller equivalent focal length has a larger visual field range and can be used for shooting larger pictures such as landscapes and the like. The camera with the larger equivalent focal length has a smaller visual field range, can be used for shooting objects at a far position, and has a smaller shooting area.

In addition, the camera 193 may further include a depth camera for measuring an object distance of an object to be photographed, and other cameras. For example, the depth camera may include a three-dimensional (3D) depth-sensing camera, a time of flight (TOF) depth camera, a binocular depth camera, or the like.

The digital signal processor is used for processing digital signals, and can process digital image signals and other digital signals. For example, when the electronic device 100 selects a frequency bin, the digital signal processor is used to perform fourier transform or the like on the frequency bin energy.

Video codecs are used to compress or decompress digital video. The electronic device 100 may support one or more video codecs. In this way, the electronic device 100 may play or record video in a variety of encoding formats, such as: moving Picture Experts Group (MPEG) 1, MPEG2, MPEG3, MPEG4, and the like.

The NPU is a neural-network (NN) computing processor that processes input information quickly by using a biological neural network structure, for example, by using a transfer mode between neurons of a human brain, and can also learn by itself continuously. Applications such as intelligent recognition of the electronic device 100 can be realized through the NPU, for example: image recognition, face recognition, speech recognition, text understanding, pattern recognition, machine self-learning, and the like.

The internal memory 121 may be used to store computer-executable program code, which includes instructions. The processor 110 executes various functional applications of the electronic device 100 and data processing by executing instructions stored in the internal memory 121. The internal memory 121 may include a program storage area and a data storage area. The storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required by at least one function, and the like. The storage data area may store data (such as audio data, phone book, etc.) created during use of the electronic device 100, and the like. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (UFS), and the like.

In the embodiment of the present application, the processor 110 obtains target translation information of the camera according to a translation vector obtained by image information and a translation vector obtained by a motion sensor in a video recording scene by operating an instruction stored in the internal memory 121, and performs anti-shake processing by combining rotation information of the camera and the target translation information of the camera, so as to improve an image stabilization effect of a video image and improve a user shooting experience.

The gyro sensor 180B may be used to determine the motion attitude of the electronic device 100. In some embodiments, the angular velocity of the electronic device 100 about three axes (i.e., x, y, and z axes) may be determined by the gyro sensor 180B, i.e., the rotation information of the electronic device 100, i.e., the rotation information of the camera, may be obtained. The gyro sensor 180B may be used for photographing anti-shake. For example, when the shutter is pressed, the gyro sensor 180B detects a shake angle of the electronic device 100, calculates a distance to be compensated for by the lens module according to the shake angle, and allows the lens to counteract the shake of the electronic device 100 through a reverse movement, thereby achieving anti-shake. The gyroscope sensor 180B may also be used for navigation, somatosensory gaming scenes. Here, since the camera (module) is fixed to the electronic apparatus 100, the rotation information of the electronic apparatus 100 can be understood as the rotation information of the camera. Also, the camera (module) includes the camera 193, and thus the rotation information of the camera can also be understood as the rotation information of the camera 193.

The acceleration sensor 180E may detect the magnitude of acceleration of the electronic device 100 in various directions (typically three axes). The magnitude and direction of gravity can be detected when the electronic device 100 is stationary. The method can also be used for recognizing the posture of the electronic equipment, and is applied to horizontal and vertical screen switching, pedometers and other applications. In an embodiment of the present application, the acceleration sensor 180E may be used to obtain a translation vector of the electronic device 100, that is, a translation vector of the camera, so that the user calculates target translation information of the camera.

A distance sensor 180F for measuring a distance. The electronic device 100 may measure the distance by infrared or laser. In some embodiments, taking a picture of a scene, electronic device 100 may utilize range sensor 180F to range for fast focus.

The touch sensor 180K is also referred to as a "touch panel". The touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also called a "touch screen". The touch sensor 180K is used to detect a touch operation applied thereto or nearby. The touch sensor can communicate the detected touch operation to the application processor to determine the touch event type. Visual output associated with the touch operation may be provided through the display screen 194. In other embodiments, the touch sensor 180K may be disposed on a surface of the electronic device 100, different from the position of the display screen 194.

In the embodiment of the present application, in a video recording scene, the camera 193 may be used to acquire an image; the display screen 194 may be used to display a preview interface, a capture interface, and the like; motion sensors such as acceleration sensor 180E (also called an accelerometer) may be used to obtain translation vectors of the camera; attitude sensors such as a gyro sensor 180B (also referred to as a gyroscope) may be used to obtain rotation information of the camera; the processor 110 may obtain target translation information of the camera according to a translation vector obtained by image information and a translation vector obtained by the motion sensor in a video recording scene by operating an instruction stored in the internal memory 121, and perform anti-shake processing by combining rotation information of the camera and the target translation information of the camera, thereby improving an image stabilization effect of a video image and improving a user shooting experience.

It is understood that the attitude sensor is used to detect the attitude of the electronic device 100, and the rotation information of the electronic device 100 can be obtained. For example, the attitude sensor may be a gyroscope, a three-axis electronic compass, or the like, and the type of the attitude sensor is not limited in the embodiments of the present application.

In the anti-shake processing described in the embodiment of the present application, the mobile phone may perform image anti-shake processing in combination with the target translation information of the camera and the rotation information of the camera to improve the image stabilization effect of the preview image, and thus the adopted anti-shake mode may be referred to as a translation anti-shake mode.

In the anti-shake processing process described in the embodiment of the application, the mobile phone may perform image anti-shake processing by combining the two-axis target translation information and the three-axis rotation information to improve the image stabilization effect of the video image, and thus the adopted anti-shake mode may also be referred to as a five-axis anti-shake mode.

The following describes a video recording method provided in an embodiment of the present application, taking an example in which an electronic device is a mobile phone having a structure shown in fig. 1 and an attitude sensor is a gyroscope. As shown in fig. 2, the video recording method may include:

200. the mobile phone starts a shooting function and then enters a video recording mode, and original images are collected according to a preset frame rate in a preview state.

When a user wants to use the mobile phone to shoot an image, the shooting function of the mobile phone can be started. For example, the mobile phone may start a camera application, or start other applications with a photographing or video recording function (such as an AR application like a tremble or a river view cyberverse), so as to start the photographing function of the mobile phone.

In some embodiments, the video recording method provided by the embodiment of the present application may be applied to a video recording mode, and the mobile phone may enter the video recording mode after starting a shooting function, so as to perform anti-shake processing in the video recording mode by combining rotation information of the camera obtained according to gyroscope data and target translation information of the camera obtained according to image content.

For example, after detecting an operation of clicking the camera icon 301 shown in (a) of fig. 3 by the user, the mobile phone starts a shooting function of the camera application, and displays a preview interface in a shooting mode as shown in (b) of fig. 3. After detecting that the user clicks the control 302 shown in (b) in fig. 3, the mobile phone enters the video recording mode as shown in (c) in fig. 3.

Further exemplarily, the mobile phone displays an interface of a desktop or a non-camera application, and after detecting a voice instruction of the user indicating to record a video, starts a shooting function and enters a video recording mode, and enters the video recording mode as shown in (c) of fig. 3.

It should be noted that the mobile phone may also enter the video recording mode in response to other operations such as a user's touch operation, a voice instruction, or a shortcut gesture, and the operation of triggering the mobile phone to enter the video recording mode is not limited in the embodiment of the present application.

In other embodiments, in the video recording mode, the mobile phone does not automatically perform anti-shake processing in combination with the rotation information of the camera obtained according to the gyroscope data and the target translation information of the camera obtained according to the image content; and after the mobile phone detects the preset operation 1 of the user, starting anti-shake processing by combining the rotation information of the camera obtained according to the gyroscope data and the target translation information of the camera obtained according to the image content. The preset operation 1 is used for instructing the mobile phone to perform anti-shake processing in combination with target translation information. Illustratively, in the video recording mode, the preview interface includes a five-axis anti-shake control, and after detecting that the user clicks the five-axis anti-shake control, the mobile phone performs anti-shake processing by combining rotation information of the camera obtained according to gyroscope data and target translation information of the camera obtained according to image content.

After the mobile phone enters a video recording mode, an original image is collected according to a preset frame rate in a preview state.

In other embodiments, the video recording method provided in the embodiments of the present application is applied to a specific target shooting mode other than the video recording mode, and the mobile phone may enter the target shooting mode after starting the shooting function, so as to perform anti-shake processing in the target shooting mode by combining the rotation information of the camera obtained according to the gyroscope data and the target translation information of the camera obtained according to the image content.

In the following embodiments, a five-axis anti-shake mode is started in a video recording mode of a mobile phone, so that anti-shake processing is performed by combining rotation information of a camera obtained according to gyroscope data and target translation information of the camera obtained according to image content.

In a preview state, after the mobile phone starts a five-axis anti-shake mode, target translation information can be obtained according to image content of adjacent original images, and anti-shake processing is carried out by combining the target translation information and rotation information of the camera obtained according to gyroscope data. In some embodiments, the process of the mobile phone obtaining the target translation information according to the image content of the adjacent original image may include the following steps 201 and 206. The target translation information of the camera can be a target translation curve of the camera, and the mobile phone can obtain target translation vectors corresponding to two adjacent frames of original images according to image information of the two adjacent frames of original images in the multiple frames of original images. The target translation vectors between the continuous multiframe original images are connected to form an original translation curve of the camera. The mobile phone can obtain a target translation curve of the camera according to the original translation curve of the camera.

201. In a preview state, after a five-axis anti-shake mode is started, the mobile phone calculates a first translation vector according to feature points on two adjacent frames of original images.

In some embodiments, the mobile phone starts a five-axis anti-shake mode by default in a preview state of the video recording mode; in other embodiments, after detecting that the user starts the five-axis anti-shake mode, the mobile phone starts the five-axis anti-shake mode. For example, a five-axis anti-shake control is included on the preview interface, and a five-axis anti-shake mode is started after the mobile phone detects that the user clicks the control.

In some embodiments of the application, after the mobile phone is turned on/exits from the five-axis anti-shake mode, the user can be prompted in a mode of displaying information, voice broadcasting or vibration and the like.

After the mobile phone starts a five-axis anti-shake mode, a first translation vector is calculated according to feature points on two adjacent frames of original images. The first translation vector corresponding to the two adjacent frames of original images comprises a plurality of vectors, each first translation vector corresponds to one or more feature point pairs on the two adjacent frames of original images, and the first translation vector is used for representing translation conditions such as translation directions, translation distances and the like of the mutually matched feature point pairs between the two adjacent frames of original images.

For example, the mobile phone may perform feature point detection on non-edge regions of the 2 nd frame original image and the 1 st frame original image acquired in the preview state. And then, the mobile phone performs inter-frame feature point matching according to the detected feature points, and determines the matched feature point pairs on the 2 nd frame original image and the 1 st frame original image. And the mobile phone calculates a first translation vector according to the matched characteristic point pairs. That is, the mobile phone obtains the first translation vector according to the image information of the two adjacent original images. For another example, the mobile phone may perform feature point matching on the 3 rd frame original image and the 2 nd frame original image in the preview state, so as to obtain a corresponding first translation vector.

The original images are obtained by shooting through a camera, and the translation of the camera can cause the translation between two adjacent original images, so that the translation between two adjacent original images can be used for representing the translation of the camera and the mobile phone. This first translation vector may be referred to as the original translation vector of the original image, or the original translation vector of the camera.

The translation vector may also be referred to as a translation vector or a motion vector, and is in the form of a vector. A D-dimensional (i.e., multi-dimensional) vector may represent a point in a D-dimensional space, and thus the first translational vector may be a plurality of points in the D-dimensional space. Illustratively, the first translational vector may correspond to all points in the D-dimensional space as shown in FIG. 4A.

Then, the mobile phone can perform processing such as filtering on the original translation vector to obtain a target translation vector between two adjacent frames of original images, and further obtain target translation information of the camera. The target translation information is used to accurately represent the translation condition of the camera, and may be, for example, a target translation curve of a camera in the following.

202. And the mobile phone calculates a second translation vector according to the motion sensor data corresponding to the two adjacent frames of original images.

The motion sensor can be used for monitoring the translational motion condition of the mobile phone, so that the second translational vector is calculated according to the translational motion condition. For example, the motion sensor may be an accelerometer, and the mobile phone may calculate the second translation vector according to data of the accelerometer corresponding to two adjacent frames of the original image (i.e., data of the accelerometer corresponding to the two adjacent frames of the original image during the acquisition period). The second translation vector is used for representing the translation condition of the mobile phone, namely the translation condition between the camera and the two adjacent original images. Illustratively, the mobile phone calculates a corresponding second translation vector according to data of the accelerometer corresponding to the 1 st to 2 nd frames of original images acquired in the preview state in the acquisition period.

203. The handset selects a third translation vector from the first translation vectors, the third translation vector being located within δ -neighborhood of the second translation vector.

Wherein the third translation vector comprises one or more vectors. The cell phone may select a third translation vector in the first neighborhood of the second translation vector from the first translation vector.

Because the translation condition between two adjacent frames of original images determined according to the feature points should be consistent with the translation condition of the mobile phone determined according to the motion sensor such as the accelerometer, the mobile phone can select the first translation vector, i.e. the third translation vector, located in the neighborhood of the second translation vector δ. That is, the distance between the third translation vector and the second translation vector is less than or equal to δ.

Therefore, the distance between the third translation vector obtained by filtering the first translation vector by the mobile phone and the second translation vector is small, the first translation vector corresponding to most of feature point pairs which are mismatched can be filtered, the first translation vector corresponding to local motion of the shot moving object can be filtered, and the whole translation condition of the camera caused by hand shake of a user or shaking of the mobile phone can be more accurately represented by the selected third translation vector.

That is, the mobile phone determines the translation information of the camera by combining the data of the motion sensor such as the accelerometer and the image information, so that the accuracy and the robustness of the translation information can be improved.

For example, the points in the D-dimensional space corresponding to the third translation vector may be the points remaining after filtering the points in the circle 401 in fig. 4A.

Then, the mobile phone can perform processing such as filtering on the third translation vector to obtain a target translation vector between two adjacent frames of original images, and further obtain target translation information of the camera.

204. And the mobile phone selects a fourth translation vector with the similarity greater than or equal to a preset value 1 from the third translation vectors.

Wherein the fourth translational vector comprises one or more vectors. When the camera translates towards the target direction, pixel points with the same content on the two adjacent frames of original images also translate towards the target direction, and most of feature point pairs on the two adjacent frames of original images also translate towards the target direction. Therefore, the mobile phone can select a fourth translation vector with the similarity greater than or equal to the preset value 1 from the third translation vectors, wherein the fourth translation vector is a translation vector corresponding to the feature point pair which is translated towards the target direction basically, and the translation condition of the camera can be represented more accurately. Therefore, the mobile phone can filter the translation vector and the outlier corresponding to the local motion of the shot moving object similar to the translation in the target direction by the fourth translation vector obtained by filtering the third translation vector.

For example, the mobile phone may use a clustering algorithm such as DBSCAN to screen out the fourth translational vectors that are most similar to each other from the third translational vectors, so as to accurately represent the translational condition of the camera.

For another example, the mobile phone may use other machine learning methods such as Kmeans and RANSAC to remove outlier translation vectors from the third translation vector or select a translation vector with the highest confidence, so as to obtain a fourth translation vector, so as to accurately represent the translation condition of the camera.

Illustratively, the points in the D-dimensional space corresponding to the fourth panning vector may be the points in circle 402 that are filtered out again, and the points in circle 404 that remain after the points in circle 403 that lie outside circle 404.

205. And the mobile phone determines a target translation vector according to the feature point corresponding to the fourth translation vector.

The mobile phone calculates an average coordinate 1 according to coordinates of feature points on a previous frame original image corresponding to a fourth translation vector on two adjacent frames of original images, calculates an average coordinate 2 according to coordinates of feature points on a next frame original image corresponding to the fourth translation vector, and a vector formed by the average coordinate 1 to the average coordinate 2 is a target translation vector. The target translation vector is used for representing the translation condition between two adjacent original images.

206. And the mobile phone determines the target translation information of the camera according to the target translation vectors of the continuous M frames of original images.

Wherein, the target translation information of the camera is used for representing the translation condition of the camera. The target translation information of the camera is used for calculating an image stabilization transformation matrix of the original image subsequently, and the image stabilization transformation matrix is used for performing motion compensation on the original image through deformation (warp) transformation, so that the anti-shake and image stabilization effects of the image are achieved.

For example, the target translation information of the camera may be a target translation curve of the camera. The target translation vectors between the M consecutive original images are connected to form an original translation curve of the camera, which may also be referred to as an original translation trajectory of the camera or an original translation path of the camera. The mobile phone can perform smoothing processing on the original translation curve of the camera so as to obtain a target translation curve of the camera. The target translation curve of the camera is also called a target translation track or path of the camera, and can accurately represent the translation condition of the camera.

For example, the mobile phone may perform optimal estimation on the original translation curve of the camera through an algorithm such as Kalman (Kalman), so as to obtain a target translation curve of the camera, which can accurately represent the translation condition between consecutive multiple frames of original images.

The mobile phone processes the translation vector through the algorithms such as clustering and Kalman, and the like, so that more accurate and robust target translation information can be obtained.

After the mobile phone forms a target translation curve of the camera according to the target translation vector between the M frames of original images for the first time, each frame of original image acquired by the mobile phone subsequently can be combined with the M frames of original images acquired recently, and subsequent points on the target translation curve of the camera are continuously acquired.

207. And the mobile phone calculates the translation compensation quantity of the original image according to the target translation information of the camera.

The translation compensation amount is used for performing translation motion compensation on the original image during deformation (warp) conversion so as to avoid shaking or blurring of the original image due to translation motion of the camera as much as possible and realize image stabilization and stabilization effects. The mobile phone can determine the translation compensation amount of the original image according to the target translation information of the camera, such as the target translation curve of the camera.

In the embodiment of the present application, when the target translation information of the camera is the target translation curve of the camera, the flowchart of the method for acquiring the translation compensation amount described in the above step 201 and step 207 can be seen in fig. 4B. The process comprises the following steps: the method comprises the steps that a mobile phone carries out non-edge region feature point detection on two adjacent frames of original images, inter-frame feature point matching is carried out, translation vectors are calculated according to matched feature point pairs, the translation vectors are filtered according to accelerometer information, the translation vectors are further filtered by adopting a DBSAN algorithm, target translation vectors are obtained according to the filtered translation vectors, original translation curves of a camera are obtained according to the target translation vectors of continuous frames of the original images, the original translation curves of the camera are subjected to smoothing processing by adopting a Kalman algorithm, accordingly, target translation curves of the camera are obtained, and translation compensation quantity of the original images is calculated according to the target translation curves of the camera.

208. And the mobile phone obtains the rotation information of the camera according to the gyroscope data corresponding to the N frames of original images.

Wherein the rotation information of the camera is used for representing the rotation condition of the camera. For example, the rotation information of the camera may be a target rotation curve of the camera. The rotation information of the camera and the target translation information of the camera can be used for calculating an image stabilization transformation matrix of the original image, and the image stabilization transformation matrix is used for performing motion compensation on the original image through deformation (warp) transformation, so that the anti-shake and image stabilization effects of the image are achieved.

The following description will be given taking an example in which the target rotation curve of the camera is obtained by the mobile phone from the gyroscope data corresponding to the N frames of original images.

For example, after meeting the first condition, the mobile phone obtains the original rotation curve of the camera according to the gyroscope data corresponding to the N frames of original images. And the mobile phone processes the original rotation curve of the camera according to the constraint condition so as to obtain a target rotation curve of the camera.

The target rotation curve of the camera satisfies the following first condition:

(1) the target rotation curve of the camera is continuous everywhere. That is, the motion trajectory of the camera is smooth, and the trajectory change cannot be too drastic.

(2) The target rotation curve of the camera is first, second, and third order derivative, and the curvature is as small as possible (e.g., less than or equal to a preset threshold). That is, the change in the speed, acceleration, and rate of change in acceleration (i.e., jerk) of the target rotation curve of the camera is smooth and cannot be too drastic.

(3) When motion compensation is performed on an original image according to an image stabilization transformation matrix obtained from a target rotation curve of a camera, the compensated original image cannot exceed the boundary of cropping (after crop) to generate a black edge.

The mobile phone can adopt various methods to carry out smoothing processing on the original rotation curve of the camera to obtain the target rotation curve of the camera. For example, the mobile phone may obtain a smooth target rotation curve of the camera through a quadratic programming method.

Wherein, the point on the original rotation curve of the camera, i.e. the original pose of the camera, can be represented by using 3 groups of rotation angle sequences with equal time intervals represented by formula 1:

where (X, Y, Z) respectively represent rotation angles in three directions integrated from the gyroscope. The mobile phone can determine a smooth target rotation curve of the camera meeting the first condition by a quadratic programming method according to the target function represented by formula 2 and the constraint condition represented by formula 3.

minw₁J₁+w₂J₂+w₃J₃Formula 2

Wherein, J in formula 2_iRepresenting the ith derivative of the target rotation curve of the camera, as shown in equation 3, the constraints may include:

after the first condition is met, the mobile phone processes the acquired N frames of original images according to the original poses of the cameras corresponding to the acquired N frames of original images for the first time by adopting the algorithm provided by the embodiment of the application, and obtains the optimized target poses of the cameras corresponding to the I frames of original images, wherein I is a positive integer, so that the target pose transition of the cameras corresponding to the I frames of original images is smooth. The optimized target pose of the camera forms a target rotation curve of the camera, and the optimized target pose of the camera is a point on the target rotation curve of the camera. And then, after acquiring the subsequent I frame original image, the mobile phone processes the I frame original image according to the original pose of the camera corresponding to the recently acquired historical N frame original image to obtain the optimized target pose of the camera corresponding to the I frame original image.

Wherein N is N1+ I + N2, N1 and I are positive integers, and N2 is a non-negative integer. According to the original pose of the camera corresponding to the N frames of original images, the optimized target pose of the camera corresponding to the I frames of original images behind the N1 frames of original images is output by the algorithm. For example, N is 45, N1 is 15, N2 is 27, and I is 3, the algorithm processes the original poses of the cameras corresponding to the 1 st to 45 th frames of original images for the first time, and outputs the optimized target poses of the cameras corresponding to the 3 th frames of original images of 16 th, 17 th and 18 th frames of original images. That is to say, the mobile phone obtains the target pose of the optimized camera corresponding to the I frame original image according to the original poses of the cameras corresponding to the I frame original image, the previous N1 frame original image and the next N2 frame original image.

The sub-constraint condition (1) in equation 6 is used to take the target poses of the cameras corresponding to the optimized partial original images as input to optimize the target poses of the cameras corresponding to the adjacent subsequent frames of original images, so that the overall transition of the target poses of the optimized cameras is smoother, rather than only the local transition between the target poses of the cameras corresponding to the I-frame original images. For example, N is 45, N1 is 15, N2 is 27, I is 3, the algorithm processes the target poses of the optimized cameras corresponding to the 4 th to 48 th frames of original images for the second time, and the target poses of the optimized cameras corresponding to the 19 th, 20 th and 21 th frames of original images are output. In order to ensure that the target poses of the optimized cameras corresponding to the 18 th and 19 th frames of original images do not jump and the transition is smooth, the mobile phone can replace the original positions of the cameras corresponding to the 17 th and 18 th frames of original images used in the second optimization with the target poses of the optimized cameras corresponding to the 17 th and 18 th frames of original images after the first optimization based on the sub-constraint condition (1).

The sub-constraint condition (2) in equation 6 indicates that the original image cannot exceed the preset clipping boundary after being subjected to warp transformation based on the target rotation curve of the camera. For example, referring to fig. 5, a block 501 indicates a range of an original image, a block 502 indicates a clipping reserved range during an anti-shake process, and the preset clipping boundary may be P shown in fig. 5_wAnd P_hA defined boundary. For example, in FIG. 5, P_iRepresenting pixel points, P, in the original image before warp transformation_i' denotes a pixel point subjected to warp conversion according to rotation information of the camera. Four corners of the edge of the original image, namely four vertices of the box 501, cannot exceed P after warp transformation_wAnd P_hThe boundary being defined, i.e. not being at P_wAnd P_hThe cropping represented by box 502 outside the defined boundary remains within the range so that the image resulting from the cropping does not leave a black border. As another example, a certain pixel point P on the original image_cP after warp conversion_c' must not exceed the boundary P_wAnd P_hI.e. the cropping reservation represented by block 503 cannot be exceeded, so that the image resulting from the cropping does not leave a black border.

After the mobile phone obtains the target rotation curve of the camera according to the N frames of original images for the first time, each subsequent I frame of original images can be combined with the N frames of original images which are collected recently to obtain subsequent I points on the target rotation curve of the camera.

209. And the mobile phone calculates the rotation compensation quantity of the original image according to the rotation information of the camera.

The rotation compensation amount is used for performing rotation motion compensation on the original image during warp conversion so as to avoid the original image from shaking or blurring due to the rotation motion of the camera as much as possible and realize the anti-shaking and image stabilization effects of the image. The handset can calculate the rotation compensation amount according to the rotation information of the camera. For example, the handset may calculate the amount of rotation compensation from the target rotation curve of the camera.

The rolling shutter of the mobile phone camera is exposed line by line, correspondingly, the rotation compensation amount comprises compensation amount corresponding to each exposure line. The amount of rotation compensation for different exposure lines may be the same or different.

For example, the rotation information of the camera is a target rotation curve of the camera, and the mobile phone can calculate rotation compensation amounts corresponding to the I frames of original images respectively according to target poses of the camera corresponding to the I frames of original images on the target rotation curve of the camera. For example, N is 45, N1 is 15, N2 is 27, and I is 3, the mobile phone processes the original poses of the cameras corresponding to the 1 st to 45 th frames of original images in the preview state, outputs the target poses of the optimized cameras corresponding to the 16 th, 17 th and 18 th frames of original images, and then calculates the rotation compensation amounts corresponding to the 16 th, 17 th and 18 th frames of original images according to the target poses of the optimized cameras corresponding to the 16 th, 17 th and 18 th frames of original images. Then, the mobile phone processes the original poses of the cameras corresponding to the 4 th to 48 th frames of original images (wherein, the original poses corresponding to the 17 th and 18 th frames of original images are replaced by the target poses of the optimized cameras), outputs the target poses of the optimized cameras corresponding to the 19 th, 20 th and 21 th frames of original images, and calculates the rotation compensation quantities corresponding to the 19 th, 20 th and 21 st frames of original images according to the target poses of the optimized cameras corresponding to the 19 th, 20 th and 21 st frames of original images.

210. And the mobile phone calculates the image stabilization transformation matrix of the original image according to the translation compensation quantity and the rotation compensation quantity.

And the image stabilization transformation matrix is a homography matrix corresponding to warp transformation of the original image. The mobile phone can add the rotation compensation amount and the translation compensation amount corresponding to each row to obtain an image stabilization transformation matrix. And the translation compensation amount corresponding to each exposure line is equal. It is understood that the image stabilization transformation matrix may include not only the rotation compensation amount and the translation compensation amount, but also other motion compensation amounts such as the RS compensation amount.

The mobile phone can calculate the image stabilization transformation matrix of the I frame original image according to the rotation compensation quantity and the translation compensation quantity respectively corresponding to the I frame original image. For example, after the mobile phone obtains the rotation compensation amount corresponding to the original images of the 16 th, 17 th and 18 th frames, N is 45, N1 is 15, N2 is 27, and I is 3, the rotation compensation amount and the translation compensation amount corresponding to the original image of the 16 th frame may be added to obtain the image stabilization transformation matrix corresponding to the original image of the 16 th frame. The mobile phone may add the rotation compensation amount and the translation compensation amount corresponding to the 17 th original image to obtain an image stabilizing transformation matrix corresponding to the 17 th original image.

211. And the mobile phone transforms the original image according to the image stabilization transformation matrix to obtain a preview image, and displays the preview image on a preview interface.

The mobile phone can perform warp conversion on the corresponding original image according to the image stabilization transformation matrix corresponding to each frame of the original image in the I frame of the original image so as to obtain a preview image, and the preview image is displayed on a preview interface. For example, N is 45, N1 is 15, N2 is 27, and I is 3, the mobile phone performs warp transformation on the 16 th, 17 th, and 18 th frame original images according to the image stabilization transformation matrices corresponding to each frame original image in the 16 th, 17 th, and 18 th frame original images, so as to obtain 1 st, 2 nd, and 3 rd frame preview images, and sequentially displays the 1 st, 2 nd, and 3 rd frame preview images on the preview interface. The mobile phone performs warp conversion on the 19 th, 20 th and 21 st frame original images according to the image stabilization transformation matrixes respectively corresponding to each frame original image in the 19 th, 20 th and 21 st frame original images to obtain 4 th, 5 th and 6 th frame preview images, and sequentially displays the 4 th, 5 th and 6 th frame preview images on a preview interface.

The difference value between the sequence number of the preview image frame and the sequence number of the original image frame displayed by the mobile phone is N1. That is, the mobile phone does not display the preview image after warp conversion of 1 to N1 frames of original images on the preview interface, but starts to display the preview image after warp conversion of the N1+1 frame of original images.

It can be understood that the mobile phone has a relatively high acquisition frame rate of the original image in the preview state, a certain time is required for switching the mobile phone from the other mode to the video recording mode, and the mobile phone does not display the preview image of the original image with 1 to N1 frames after warp conversion after switching the mobile phone to the video recording mode, so that the user does not have poor visual experience such as blocking or black screen.

And the mobile phone can obtain the target pose of the camera corresponding to the N1+1 frame original image according to the initial pose of the camera corresponding to the N1+ I + N2 frame original image, so that the image stabilizing transformation matrix corresponding to the N1+1 frame original image can be obtained according to the target pose of the camera corresponding to the N1+1 frame original image, and further the 1 st preview image corresponding to the N1+1 frame original image after warp transformation is performed is obtained. That is, the starting frame of the I-frame original image corresponds to the (N + 1) th frame original image in the N-frame original images, and the preview image displayed by the mobile phone is delayed by at least N2 frames compared with the original image acquired by the mobile phone. For example, when N is 45, N1 is 15, and N2 is 27, the time sequence correspondence between the original image frame acquired by the mobile phone in the preview state and the displayed preview image can be seen in fig. 6.

In some other embodiments of the present application, in the process of acquiring the first N frames of original images by the mobile phone immediately after entering the preview state, each frame of original image in the first N1 frames of original images is displayed on the preview interface as a preview image. Subsequently, after the mobile phone acquires a new original image, a preview image corresponding to the I-frame original image is generated according to the N-frame original image by using the method described in the above embodiment, and the generated preview image is displayed on the preview interface.

In some other embodiments of the present application, in the preview state, the frame number of the preview image displayed on the preview interface by the mobile phone corresponds to the frame number of the original image, i.e. N2 is 0, and I is 1. That is, after acquiring a frame of original image, the mobile phone displays a preview image corresponding to the frame of original image. N1 may be small, for example, 5, 8, or 10. In the preview state, before the mobile phone acquires N1 frames of original images, the mobile phone generates and displays a preview image corresponding to the current original image in combination with the original images acquired in the preview state.

For example, just after entering the preview state, the mobile phone acquires the 1 st frame of original image, that is, the 1 st frame of original image is displayed on the preview interface as the 1 st frame of preview image. After the mobile phone acquires the 2 nd frame original image, a 2 nd frame preview image corresponding to the 2 nd frame original image is generated according to the 1 st to 2 nd frame original images, and the 2 nd frame preview image is displayed on a preview interface. After the 3 rd frame original image is collected by the mobile phone, a 3 rd frame preview image corresponding to the 3 rd frame original image is generated according to the 1 st to 3 rd frame original images, and the 3 rd frame preview image is displayed on a preview interface. After the mobile phone acquires the N1+1 th original image, the N1+1 th preview image is generated according to the 1 st to N1+1 th original images, and the N1+1 th preview image is displayed on a preview interface. Subsequently, after acquiring a new original image, the mobile phone generates a preview image corresponding to the new original image by combining the N1 frame original image before the new original image and the new original image, and displays the generated preview image on a preview interface.

Under the condition that the preview image is obtained by image transformation according to the image stabilization matrix calculated by the target pose on the smooth target rotation curve of the camera, the smooth target rotation curve of the camera meets the sub-constraint condition (1), so that the whole transition between the preview images after warp transformation, which are obtained based on the target rotation curve of the camera, is smooth. The target rotation curve of the smooth camera satisfies the sub-constraint condition (2), so that a warp-transformed preview image obtained based on the target rotation curve of the camera does not exceed the cropping boundary.

For example, in the preview state, the effect schematic diagram of the preview image of the mobile phone without the anti-shake processing can be seen in (a) - (c) of fig. 7A. As shown in fig. 7A, in the preview process, the subject is translated and rotated, and the original image is shaken. The effect diagrams of the preview image of the mobile phone performing the anti-shake processing according to the rotation information of the camera can be seen in (a) - (c) of fig. 7B. The schematic effect diagrams of the preview image subjected to the anti-shake processing after the five-axis anti-shake mode of the mobile phone is started can be seen in (a) - (C) of fig. 7C.

212. And if the mobile phone determines that the second condition is met, exiting the five-axis anti-shake mode, and obtaining rotation information of the camera according to the gyroscope data corresponding to the N frames of original images.

If the second condition is met, it can be shown that the target translation information determined according to the image information is inaccurate, and the image cannot or does not need to be subjected to anti-shake processing by combining the target translation information, so that the mobile phone can determine the image stabilization transformation matrix only according to the rotation information to perform anti-shake processing on the image. That is, after the mobile phone enters the five-axis anti-shake mode, if the mobile phone determines that the second condition is met, the mobile phone exits the five-axis anti-shake mode, and image anti-shake processing is performed according to the rotation information.

In some embodiments, in the video recording mode, the mobile phone may default to the five-axis anti-shake mode and exit the five-axis anti-shake mode after determining that the second condition is satisfied.

For example, the second condition may include any one of the following sub-conditions 1 to 5:

sub-condition 1: the number of the feature points detected by the mobile phone is less than or equal to a preset value 2. If the sub-condition 1 is met, it can be shown that the number of the feature points on the original image detected by the mobile phone is too small, the first translation vector determined according to the feature points is inaccurate, the determined target translation information is also inaccurate, the anti-shake processing effect performed by combining the target translation information is possibly poor, and thus the five-axis anti-shake mode can be exited.

Sub-condition 2: and a third translation vector obtained by filtering the first translation vector by using a second translation vector corresponding to a motion sensor such as an accelerometer is smaller than or equal to a preset value 3 in proportion to the first translation vector. If the sub-condition 2 is satisfied, it may be indicated that the deviation between the first translation vector and the second translation vector obtained according to the feature point is large, the feature point may be inaccurate, and the determined target translation information may be inaccurate, so that the five-axis anti-shake mode may be exited.

Sub-condition 3: and the mobile phone selects a fourth translation vector with the similarity greater than or equal to a preset value 1 from the third translation vectors, and the proportion of the fourth translation vector in the third translation vector is less than or equal to a preset value 4. If the sub-condition 3 is satisfied, it may be indicated that a large local motion exists between two adjacent frames of original images, and the effect of anti-shake processing performed in combination with the target translation information may be poor, so that the five-axis anti-shake mode may be exited.

Sub-condition 4: the variance of the amount of translational compensation between the original images of consecutive P (integers greater than 1) frames is greater than or equal to a preset value of 5. If the sub-condition 4 is met, it can be shown that mismatching exists between the feature points, the target translation information determined according to the first translation vector corresponding to the feature points is inaccurate, and the anti-shake processing effect performed by combining the target translation information may be poor, so that the five-axis anti-shake mode can be exited.

Sub-condition 5: the translation amplitude between the original images of successive Q (integer greater than 1) frames is greater than or equal to a preset value 6. If the sub-condition 5 is met, it can be shown that the translation amplitude between consecutive multiple frames is too large, the original image may be blurred or a ghost appears, the matched feature point pairs on the original image may be inaccurate or difficult to match, and the target translation information is difficult to determine or the determined target translation information is inaccurate according to the feature points, so that the five-axis anti-shake mode can be exited. For example, when the length of the translation path corresponding to the continuous Q frames of original images on the initial translation curve of the camera is greater than or equal to the preset value 7, the mobile phone determines that the translation amplitude is greater than or equal to the preset value 6, and the sub-condition 5 is satisfied.

For the relevant description of the rotation information of the camera obtained by the mobile phone according to the gyroscope data corresponding to the N frames of original images in step 212, reference may be made to the relevant description in step 208, and details are not repeated here. For example, the rotation information of the camera may be a target rotation curve of the camera.

213. And the mobile phone calculates the rotation compensation quantity of the original image according to the rotation information of the camera.

For the description of step 213, reference may be made to the related description in step 209, which is not described herein again.

214. And the mobile phone calculates an image stabilization transformation matrix of the original image according to the rotation compensation quantity.

After the mobile phone obtains the rotation compensation amount, the rotation compensation amount can be added into the image stabilization transformation matrix. It is understood that the image stabilization transformation matrix may include not only the rotation compensation amount and the translation compensation amount, but also other motion compensation amounts such as the RS compensation amount.

In some embodiments, since the mobile phone calculates the rotation compensation amount of the original image according to the rotation compensation amount obtained from the target rotation curve of the camera and the translation compensation amount obtained from the target translation curve of the camera, the point on the target translation curve of the camera should include a point on the target rotation curve of the camera, that is, M is greater than or equal to N.

215. And the mobile phone transforms the original image according to the image stabilization transformation matrix to obtain a preview image, and displays the preview image on a preview interface.

216. If the mobile phone determines that the second condition is not satisfied, the five-axis anti-shake mode is started, and the steps 203 and 211 are executed.

That is to say, after the mobile phone exits from the five-axis anti-shake mode, if the mobile phone determines that the second condition disappears, that is, the second condition is no longer satisfied, the mobile phone re-enters the five-axis anti-shake mode, so that the anti-shake processing is performed by combining the rotation information of the camera obtained according to the gyroscope data and the target translation information of the camera obtained according to the image content, and the image stabilization effect of the video image is improved.

In this way, in the preview state, the mobile phone can perform anti-shake processing by combining the rotation information of the camera obtained according to the gyroscope data and the target translation information of the camera obtained according to the image content, so that the image stabilization effect of the preview image presented to the user in the preview state is improved; and quitting the five-axis anti-shaking mode when a second condition is met, so that image anti-shaking processing is performed according to the rotation information.

Then, the mobile phone enters a shooting process after detecting the shooting operation of the user. The method may further include step 217:

217. after the mobile phone detects the shooting operation of the user, an original image is collected according to a preset frame rate in the shooting process.

For example, after detecting an operation of the user clicking the shooting control 700 shown in (C) in fig. 7C, the mobile phone determines that the shooting operation of the user is detected, thereby entering a video shooting process. For another example, after detecting that the user indicates to start shooting operation by voice, the mobile phone determines that the shooting operation of the user is detected, and then enters a video shooting process.

It can be understood that there may be many other ways for triggering the mobile phone to enter the video shooting process, and the embodiment of the present application is not limited thereto.

It should be noted that the image data stream during shooting includes a preview stream during recording and a video stream. And the preview stream in the video is used for presenting the recorded image on a shooting interface by a user in the video recording process. The preview stream used to present preview images to the user in the preview state, as opposed to the preview stream in video recording, may be referred to as a pre-video preview stream. The video stream is used to generate video images in a video file.

In some embodiments, the preview stream in the video recording is the same data stream as the preview stream before the video recording. After the shooting process is started, the mobile phone continues to process the state, the anti-shake mode and the processing process in the preview state to generate a recorded image. For example, before the shooting process is started, the mobile phone is in a five-axis anti-shake mode, and the mobile phone generates a 100 th frame preview image; after the shooting process is started, the mobile phone is still in a five-axis anti-shake mode, and a 101 th frame preview image is generated by combining the original image before the shooting process is started, wherein the 101 th frame preview image is the 1 st frame recording image on the shooting interface. If the mobile phone exits from the five-axis anti-shake mode when entering the shooting process, the mobile phone is still in a state of exiting from the five-axis anti-shake mode after entering the shooting process.

In other embodiments, the preview stream in the video recording is not the same stream as the preview stream before the video recording. And after the shooting process is started, the mobile phone stops previewing streams before video recording and starts previewing streams in the video recording. Similarly, in a preview state, the mobile phone performs anti-shake processing according to an original image in a preview stream before video recording and in combination with rotation information of the camera obtained according to gyroscope data and target translation information of the camera obtained according to image content so as to generate and display a preview image; the mobile phone can perform anti-shake processing according to original images in preview streams in videos and rotation information of the camera obtained according to gyroscope data and target translation information of the camera obtained according to image content in the shooting process so as to generate and display recorded images. In addition, the mobile phone can restart the five-axis anti-shake mode according to the original image acquired in the shooting process. For example, before the shooting process is started, the mobile phone is in a five-axis anti-shake mode, and the mobile phone generates a 100 th frame preview image; after the shooting process is started, the mobile phone restarts the five-axis anti-shake mode, generates a 1 st frame of recorded image according to the original image acquired in the shooting process, and displays the 1 st frame of recorded image on a recording interface. For example, referring to fig. 2, after step 217, for the preview stream in the video recording, the method may further include the following steps 218 and 233:

218. in the shooting process, the mobile phone calculates a first translation vector according to feature points on two adjacent frames of original images in a five-axis anti-shake mode.

In some embodiments, if the five-axis anti-shake mode is already turned on before the mobile phone enters the shooting process, the five-axis anti-shake mode continues to be turned on after the mobile phone enters the shooting process. In other embodiments, a five-axis anti-shake mode is turned on by default in the shooting process of the mobile phone. In other embodiments, after the mobile phone detects that the user starts the five-axis anti-shake mode in the shooting process, the five-axis anti-shake mode is started. For example, a five-axis anti-shake control is included on the shooting interface, and a five-axis anti-shake mode is started after the mobile phone detects that the user clicks the control.

219. And the mobile phone calculates a second translation vector according to the motion sensor data corresponding to the two adjacent frames of original images.

220. The handset selects a third translation vector from the first translation vectors, the third translation vector being located within δ -neighborhood of the second translation vector.

221. And the mobile phone selects a fourth translation vector with the similarity greater than or equal to a preset value 1 from the third translation vectors.

222. And the mobile phone determines a target translation vector according to the feature point corresponding to the fourth translation vector.

223. And the mobile phone determines the target translation information of the camera according to the target translation vector of the continuous M' frame original image.

Wherein M' and M may be the same or different.

224. And the mobile phone calculates the translation compensation quantity of the original image according to the target translation information of the camera.

225. And the mobile phone obtains the rotation information of the camera according to the gyroscope data corresponding to the N' frame original image.

Wherein N 'and N may be the same or different, N1' and N1 may be the same or different, I 'and I may be the same or different, and N2' and N2 may be the same or different.

226. And the mobile phone calculates the rotation compensation quantity of the original image according to the rotation information of the camera.

227. And the mobile phone calculates the image stabilization transformation matrix of the original image according to the translation compensation quantity and the rotation compensation quantity.

228. And the mobile phone transforms the original image according to the image stabilization transformation matrix to obtain a recorded image, and displays the recorded image on a shooting interface.

229. And if the mobile phone determines that the second condition is met, exiting the five-axis anti-shaking mode, and obtaining rotation information of the camera according to the gyroscope data corresponding to the N' frame original image.

230. And the mobile phone calculates the rotation compensation quantity of the original image according to the rotation information of the camera.

231. And the mobile phone calculates an image stabilization transformation matrix of the original image according to the rotation compensation quantity.

232. And the mobile phone transforms the original image according to the image stabilization transformation matrix to obtain a recorded image, and displays the recorded image on a shooting interface.

233. If the handset determines that the second condition is not satisfied, then step 218 and step 228 are performed.

It should be noted that, for the description of steps 218-233, reference may be made to the related description of steps 201-216, which is not repeated herein.

Similar to the preview state, in the shooting process, the mobile phone can obtain the target pose of the camera corresponding to the N1+1 th frame of original image according to the initial pose of the camera corresponding to the N1+ I + N2 frame of original image, so as to obtain the stationary image transformation matrix corresponding to the N1+1 th frame of original image according to the target pose of the camera corresponding to the N1+1 th frame of original image, and further obtain the 1 st frame of recorded image corresponding to the N1+1 th frame of original image after warp transformation. That is, the starting frame of the I-frame original image corresponds to the N1+1 th frame of the N-frame original images, and the mobile phone displays the captured image delayed by at least N2 frames compared with the original image acquired by the mobile phone.

In other embodiments, in the process of acquiring the first N frames of original images by the mobile phone immediately after entering the shooting process, each frame of original image in the first N1 frames of original images is displayed on the shooting interface as a captured image. Subsequently, after the mobile phone acquires a new original image, the method described in the above embodiment is used to generate a captured image corresponding to the I-frame original image according to the N-frame original image, and the generated captured image is displayed on the shooting interface.

In other embodiments, during the shooting process, the frame number of the captured image displayed on the shooting interface by the mobile phone corresponds to the frame number of the original image, i.e., N2 is 0, and I is 1. That is, after acquiring a frame of original image, the mobile phone displays the recorded image corresponding to the frame of original image. N1 may be small, for example, 5, 8, or 10. In the shooting process, before the mobile phone collects N1 frames of original images, the mobile phone generates and displays a recording image corresponding to the current original image by combining the original images collected in the shooting process.

For example, just after the shooting process is started, the mobile phone acquires the 1 st frame of original image, that is, the 1 st frame of original image is displayed on the shooting interface as the 1 st frame of recorded image. After the mobile phone acquires the 2 nd frame original image, a 2 nd frame recording image corresponding to the 2 nd frame original image is generated according to the 1 st to 2 nd frame original images, and the 2 nd frame recording image is displayed on a shooting interface. After the 3 rd frame original image is collected by the mobile phone, a 3 rd frame recording image corresponding to the 3 rd frame original image is generated according to the 1 st to 3 rd frame original images, and the 3 rd frame recording image is displayed on a shooting interface. After the mobile phone collects the N1+1 th frame of original image, the N1+1 th frame of recording image is generated according to the 1 to N1+1 th frame of original image, and the N1+1 th frame of recording image is displayed on the shooting interface. Subsequently, after the mobile phone acquires a new original image, combining the N1 frame original image before the new original image and the new original image to generate a recording image corresponding to the new original image, and displaying the generated recording image on a shooting interface.

Under the condition that the recorded images are obtained by image transformation according to the image stabilization matrix calculated by the target pose on the smooth target rotation curve of the camera, the smooth target rotation curve of the camera meets the sub-constraint condition (1), so that the integral transition between the recorded images after warp transformation, which are obtained based on the target rotation curve of the camera, is smooth. The target rotation curve of the smooth camera meets the sub-constraint condition (2), so that a warp-transformed captured image obtained based on the target rotation curve of the camera does not exceed the clipping boundary.

In this way, in the shooting process, the mobile phone can perform anti-shake processing by combining the rotation information of the camera obtained according to the gyroscope data and the target translation information of the camera obtained according to the image content, so that the image stabilization effect of the recorded image presented to the user in the shooting process is improved, and the recording effect of the video image generated according to the recorded image subsequently is improved; and quitting the five-axis anti-shaking mode when a second condition is met, so that image anti-shaking processing is performed according to the rotation information.

In the shooting process, the video stream and the preview stream in the video recording are processed in parallel and independently. For the video stream, the state and the processing process of the mobile phone before the shooting process are irrelevant, the mobile phone can judge whether to exit or enter the five-axis anti-shake mode again according to the original image collected in the shooting process, and after entering the five-axis anti-shake mode, anti-shake processing is carried out by combining the rotation information of the camera obtained according to the gyroscope data and the target translation information of the camera obtained according to the image content, so that the video image is generated. For example, after the shooting process is started, the mobile phone restarts the five-axis anti-shake mode, and generates and stores the 1 st frame of video image according to the original image acquired in the shooting process in the five-axis anti-shake mode, so as to generate a video file according to the stored video image after the shooting is stopped. For example, referring to fig. 2, after step 217, for a video stream, the method may further comprise the following steps 218 '-233':

218', in the shooting process, the mobile phone calculates a first translation vector according to feature points on two adjacent frames of original images in a five-axis anti-shake mode.

219', the mobile phone calculates a second translation vector according to the motion sensor data corresponding to the two adjacent frames of original images.

220' the handset selects a third translation vector from the first translation vectors, the third translation vector being located within δ -neighborhood of the second translation vector.

221' and the mobile phone selects a fourth translation vector with the similarity greater than or equal to a preset value 1 from the third translation vectors.

222' and the mobile phone determines a target translation vector according to the feature point corresponding to the fourth translation vector.

223 'the mobile phone determines the target translation information of the camera according to the target translation vector of the continuous M' frame original image.

Wherein M "and M may be the same or different.

224' the handset calculates the amount of translation compensation of the original image from the target translation information of the camera.

225 'and the mobile phone obtains the rotation information of the camera according to the gyroscope data corresponding to the N' frame original image.

Wherein, N ' and N may be the same or different, N1 ' and N1 may be the same or different, and N2 ' and N2 may be the same or different.

226', the handset calculates the amount of rotation compensation of the original image from the rotation information of the camera.

227' and calculating an image stabilization transformation matrix of the original image by the mobile phone according to the translation compensation quantity and the rotation compensation quantity.

228', the mobile phone transforms the original image according to the image stabilization transformation matrix to obtain a recorded image, and displays the recorded image on the shooting interface.

229', if the mobile phone determines that the second condition is met, exiting the five-axis anti-shake mode, and obtaining the rotation information of the camera according to the gyroscope data corresponding to the N ″ frames of original images.

230' the handset calculates the amount of rotation compensation of the original image from the rotation information of the camera.

231' the mobile phone calculates the image stabilization transformation matrix of the original image according to the rotation compensation quantity.

232', the mobile phone transforms the original image according to the image stabilization transformation matrix to obtain a recorded image, and displays the recorded image on the shooting interface.

233 ', if the handset determines that the second condition is not satisfied, the above steps 218 ' -228 ' are performed.

It should be noted that, for the description of the steps 218 '-233', reference may be made to the related description in the above steps 201-216, which is not repeated herein.

Similar to the preview image and the captured image, in the shooting process, the mobile phone can obtain the target pose of the camera corresponding to the N1+1 th frame of original image according to the initial pose of the camera corresponding to the N1+ I + N2 frame of original image, so as to obtain the stationary image transformation matrix corresponding to the N1+1 th frame of original image according to the target pose of the camera corresponding to the N1+1 th frame of original image, and further obtain the 1 st frame of captured image corresponding to the N1+1 th frame of original image after warp transformation. That is, the starting frame of the I-frame original image corresponds to the N1+1 th frame of the N-frame original images, and the mobile phone displays the captured image delayed by at least N2 frames compared with the original image acquired by the mobile phone.

The video image does not need to be presented for the user in real time in the shooting process, so the anti-shake processing time length and the processing delay can be longer, and the number of the original image frames can be more. To obtain better quality video images, S "may be greater than S, M" may be greater than M, N "may be greater than N, N1" may be greater than N1, I' may be greater than I, and/or N2 "may be greater than N2. For example, for the preview stream, N2 or N2' may be 0 or a smaller integer (e.g., 2 or 3, etc.) to reduce latency; for video streams, N2 "may be a larger integer (e.g., 27 or 15, etc.) to improve the anti-shake processing effect.

In other embodiments, in the process of capturing the first N frames of original images by the mobile phone immediately after entering the shooting process, each frame of original image in the first N1 frames of original images is taken as a video image and stored. Subsequently, after the mobile phone acquires a new original image, the method described in the above embodiment is used to generate a video image corresponding to the I-frame original image according to the N-frame original image.

In other embodiments, during the shooting process, the frame number of the video image generated by the mobile phone corresponds to the frame number of the original image, i.e., N2 is 0 and I is 1. That is, after acquiring a frame of original image, the mobile phone displays a video image corresponding to the frame of original image. N1 may be small, for example, 5, 8, or 10. In the shooting process, before the mobile phone collects N1 frames of original images, the mobile phone generates and stores a video image corresponding to the current original image by combining the original images collected in the shooting process.

For example, just after the shooting process is started, the 1 st frame of original image is collected by the mobile phone, that is, the 1 st frame of original image is used as the 1 st frame of video image. After the mobile phone collects the 2 nd frame original image, the 2 nd frame video image corresponding to the 2 nd frame original image is generated according to the 1 st to 2 nd frame original images. After the 3 rd frame original image is collected by the mobile phone, a 3 rd frame video image corresponding to the 3 rd frame original image is generated according to the 1 st to 3 rd frame original images. After the mobile phone collects the N1+1 th frame of original image, the N1+1 th frame of video image is generated according to the 1 st to N1+1 th frame of original image. Subsequently, after the mobile phone acquires a new original image, combining the N1 frame original image before the new original image and the new original image to generate a video image corresponding to the new original image.

Under the condition that the video images are obtained by image transformation according to the image stabilization matrix calculated by the target pose on the smooth target rotation curve of the camera, the whole transition between the video images after warp transformation obtained on the basis of the target rotation curve of the camera is smooth because the target rotation curve of the smooth camera meets the sub-constraint condition (1). The target rotation curve of the smooth camera satisfies the sub-constraint condition (2), so that a warp-transformed video image obtained based on the target rotation curve of the camera does not exceed the cropping boundary.

234. And after the shooting of the mobile phone is finished, generating a video file according to the video image.

For example, after detecting that the user clicks the shooting stop control on the shooting interface, the mobile phone determines that the shooting is finished. It can be understood that the shooting stopping operation may also be other gesture operations or user voice instruction operations, and the operation of triggering the mobile phone to end the shooting process is not limited in the embodiment of the present application. And after shooting is finished, the mobile phone generates a video according to the recorded image.

Under the condition that a five-axis anti-shake mode is adopted for anti-shake processing in the shooting process, the mobile phone performs anti-shake processing on the video image in the generated video file after shooting is finished by combining the rotation information of the camera obtained according to the gyroscope data and the target translation information of the camera obtained according to the image content, so that the image stabilizing effect is good, and the shooting experience of a user can be improved.

For an exemplary effect diagram of a video image without the anti-shake processing of the mobile phone, see (a) - (c) in fig. 8A. As shown in fig. 8A, in the preview process, the subject is translated and rotated, and the original image is shaken. The effect diagrams of the video image of the mobile phone performing the anti-shake processing according to the rotation information of the camera can be seen in (a) - (c) of fig. 8B. The schematic effect diagrams of the video images subjected to the anti-shake processing after the five-axis anti-shake mode of the mobile phone is started can be seen in (a) - (C) of fig. 8C.

As described above, after the mobile phone is turned on/out of the five-axis anti-shake mode, the mobile phone may prompt the user in a manner of displaying information, voice broadcasting, vibration, or the like. For example, referring to fig. 9, in the shooting process, after the mobile phone starts the five-axis anti-shake mode, the user may be prompted on the shooting interface through text information: five-axis anti-shake is started, and translation and rotation anti-shake can be carried out!

In the scheme described in step 200-234, the mobile phone may perform anti-shake processing in combination with the rotation information of the camera obtained according to the gyroscope data and the target translation information of the camera obtained according to the image content in the preview state and the shooting process, so as to improve the image stabilization effect of the preview image and the captured image presented to the user, improve the image stabilization effect of the generated video image, and improve the shooting experience of the user.

Particularly, in a scene with a large influence on image shake due to translational motion such as shooting a close shot or recording by using a telephoto camera, the image-recording method provided by the embodiment of the present application is used for performing anti-shake processing by combining the rotation information of the camera obtained according to the gyroscope data and the target translational information of the camera obtained according to the image content, so that the image anti-shake and image stabilization effects are better.

In some other embodiments, in the video recording mode, if the mobile phone determines that the second condition is satisfied, the mobile phone performs image anti-shake processing according to neither the target translation information nor the rotation information.

The above description is given by taking an example that the method provided by the embodiment of the present application is enabled in both the preview state and the shooting process of the mobile phone in the video recording mode, and the anti-shake processing is performed by combining the rotation information of the camera obtained according to the gyroscope data and the target translation information of the camera obtained according to the image content. In some other embodiments, the mobile phone does not perform anti-shake processing on the preview stream before video recording by using the method provided by the embodiment of the present application in the preview state of the video recording mode, but performs anti-shake processing on the preview stream in the video recording by using the method provided by the embodiment of the present application only in the shooting process. Referring to FIG. 10, the steps 200-216 shown in FIG. 2 can be replaced with the steps 200A:

200A, after a shooting function is started, the mobile phone enters a video recording mode, original images are collected according to a preset frame rate in a preview state, preview images are generated according to the original images, and the preview images are displayed on a preview interface.

In the scheme, the mobile phone can perform anti-shake processing according to the rotation information of the camera obtained according to the gyroscope data and the target translation information of the camera obtained according to the image content in the shooting process, so that the image stabilizing effect of the recorded image presented for the user is improved, the image stabilizing effect of the generated video image is improved, and the shooting experience of the user is improved.

In some other embodiments, in the video recording mode, the mobile phone does not perform the anti-shake processing on the preview stream before video recording and the preview stream in the video recording by using the method provided by the embodiment of the present application, and performs the anti-shake processing on the video stream by using the method provided by the embodiment of the present application. Referring to fig. 11, the method may include the above-described step 200A, the following step 217A, and steps 218' -234.

217A, after the mobile phone detects the shooting operation of the user, acquiring an original image according to a preset frame rate in the shooting process, generating a recorded image according to the original image, and displaying the recorded image on a shooting interface.

In the scheme, the mobile phone can perform anti-shake processing on the rotation information of the camera obtained according to the gyroscope data and the target translation information of the camera obtained according to the image content in the shooting process, so that the image stabilizing effect of the video image is improved, the image stabilizing effect of the generated video image is improved, and the shooting experience of a user is improved.

In some embodiments of the present application, a video file saved in a mobile phone and subjected to five-axis anti-shake processing may be distinguished from other video files and identified in particular, so that a user can intuitively know the video file subjected to five-axis anti-shake processing. For example, referring to fig. 12 (a), a text label 1201 of "wzfd" is displayed on a video file generated through five-axis anti-shake processing. For another example, referring to (b) in fig. 12, a text label 1202 of "pyfd" is displayed on the video file subjected to five-axis anti-shake processing.

In some other embodiments, the mobile phone may not combine the rotation information of the camera to perform the anti-shake processing, but perform the anti-shake processing only according to the target translation information of the camera obtained from the image content, so as to suppress the image shake caused by the translation of the camera due to the hand shake or the mobile phone shake of the user. For example, the mobile phone may calculate a translation compensation amount according to target translation information of the camera obtained from the image content, calculate an image stabilization transformation matrix according to the translation compensation amount, and perform warp transformation on the original image according to the image stabilization transformation matrix to realize translation anti-shake.

With reference to the foregoing embodiments and accompanying drawings, another embodiment of the present application provides a shooting method, which may be implemented in an electronic device having a hardware structure shown in fig. 1, where the electronic device includes a camera, and the camera includes a camera head. As shown in fig. 13, the method may include:

1301. the electronic equipment collects an original image after the video recording function is started.

After the video recording function is started, the electronic equipment can continuously acquire the original image according to the preset acquisition frame rate.

1302. And the electronic equipment acquires target translation information of the camera according to the acquired image information of the multiple frames of original images.

The target translation information of the camera can be used for representing the translation condition of the camera. For example, the target translation information of the camera may be the target translation curve of the camera.

1303. The electronic equipment obtains the rotation information of the camera according to the posture sensor data corresponding to the multiple frames of original images.

Wherein the rotation information of the camera is used for representing the rotation condition of the camera. For example, the rotation information of the camera may be a target rotation curve of the camera.

For example, the attitude sensor may be a gyroscope, and the rotation information of the camera may be a target rotation curve of the camera.

1304. The electronic equipment calculates an image stabilization transformation matrix of a first original image according to the target translation information of the camera and the rotation information of the camera, wherein the first original image is an image in a plurality of frames of original images.

Wherein the image stabilization transformation matrix is used for performing motion compensation and warp transformation on the first original image.

1305. And the electronic equipment performs image transformation on the first original image according to the image stabilization transformation matrix to obtain a target image.

For example, the target image may be a preview image, a captured image, or a video image of a video recording scene.

The above description is given by taking an electronic device as a mobile phone as an example, and is not limited to a mobile phone, and the anti-shake processing may be performed on other electronic devices such as a smart watch or a tablet computer by using the above method, which is not described herein again.

It will be appreciated that in order to implement the above-described functions, the electronic device comprises corresponding hardware and/or software modules for performing the respective functions. The present application is capable of being implemented in hardware or a combination of hardware and computer software in conjunction with the exemplary algorithm steps described in connection with the embodiments disclosed herein. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, with the embodiment described in connection with the particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In this embodiment, the electronic device may be divided into functional modules according to the above method example, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated module may be implemented in the form of hardware. It should be noted that the division of the modules in this embodiment is schematic, and is only a logic function division, and there may be another division manner in actual implementation.

Embodiments of the present application also provide an electronic device including one or more processors and one or more memories. The one or more memories are coupled to the one or more processors, the one or more memories storing computer program code comprising computer instructions which, when executed by the one or more processors, cause the electronic device to perform the associated method steps described above to implement the video recording method of the above embodiments.

Embodiments of the present application further provide a computer-readable storage medium, where computer instructions are stored in the computer-readable storage medium, and when the computer instructions are executed on an electronic device, the electronic device is caused to execute the above related method steps to implement the video recording method in the above embodiments.

Embodiments of the present application further provide a computer program product, which when running on a computer, causes the computer to execute the above related steps, so as to implement the video recording method executed by the electronic device in the above embodiments.

In addition, embodiments of the present application also provide an apparatus, which may be specifically a chip, a component or a module, and may include a processor and a memory connected to each other; when the device runs, the processor can execute the computer execution instruction stored in the memory, so that the chip can execute the video recording method executed by the electronic equipment in the above method embodiments.

The electronic device, the computer-readable storage medium, the computer program product, or the chip provided in this embodiment are all configured to execute the corresponding method provided above, so that the beneficial effects achieved by the electronic device, the computer-readable storage medium, the computer program product, or the chip may refer to the beneficial effects in the corresponding method provided above, and are not described herein again.

Through the description of the above embodiments, those skilled in the art will understand that, for convenience and simplicity of description, only the division of the above functional modules is used as an example, and in practical applications, the above function distribution may be completed by different functional modules as needed, that is, the internal structure of the device may be divided into different functional modules to complete all or part of the above described functions.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical functional division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another device, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may be one physical unit or a plurality of physical units, that is, may be located in one place, or may be distributed in a plurality of different places. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially or partially contributed to by the prior art, or all or part of the technical solutions may be embodied in the form of a software product, where the software product is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A video recording method is applied to an electronic device, the electronic device comprises a camera, and the method comprises the following steps:

collecting an original image after starting a video recording function;

acquiring target translation information of a camera according to the acquired image information of the multiple frames of original images;

acquiring rotation information of the camera according to the attitude sensor data corresponding to the multiple frames of original images;

calculating an image stabilization transformation matrix of a first original image according to the target translation information of the camera and the rotation information of the camera, wherein the first original image is an image in the multi-frame original image;

and carrying out image transformation on the first original image according to the image stabilization transformation matrix to obtain a target image.

2. The method of claim 1, wherein the capturing the original image after the video recording function is turned on comprises:

the method comprises the steps of starting a video recording function and collecting an original image after detecting shooting operation of a user;

the method further comprises the following steps:

and after the shooting stopping operation of the user is detected, generating a video file according to the video image, wherein the video image is the target image.

3. The method of claim 1, wherein the capturing the original image after the video recording function is turned on comprises:

the method further comprises the following steps:

and displaying a recorded image on a shooting interface, wherein the recorded image is the target image.

4. The method of claim 1, wherein the target image is a preview image, the method further comprising:

and displaying the preview image on a preview interface.

5. The method according to any one of claims 1 to 4, wherein the target translation information of the camera is a target translation curve of the camera, and the obtaining the target translation information of the camera according to the image information of the collected original images of the plurality of frames comprises:

obtaining target translation vectors corresponding to two adjacent original images according to image information of the two adjacent original images in the multiple original images, wherein the target translation vectors between the two continuous original images are connected to form an original translation curve of the camera;

and obtaining a target translation curve of the camera according to the original translation curve of the camera.

6. The method according to claim 5, wherein the obtaining the target translation vector corresponding to the two adjacent frames of original images according to the image information of the two adjacent frames of original images comprises:

calculating a first translation vector according to the characteristic points on the two adjacent frames of original images;

and obtaining target translation vectors corresponding to the two adjacent frames of original images according to the first translation vector.

7. The method according to claim 6, wherein the obtaining the target translation vector corresponding to the two adjacent frames of original images according to the first translation vector comprises:

calculating a second translation vector according to the motion sensor data corresponding to the two adjacent frames of original images;

selecting a third translation vector from the first translation vectors, wherein the third translation vector is positioned in a delta neighborhood of the second translation vector;

and obtaining target translation vectors corresponding to the two adjacent frames of original images according to the third translation vector.

8. The method according to claim 7, wherein the obtaining of the target translation vector corresponding to the two adjacent frames of original images according to the third translation vector comprises:

selecting a fourth translation vector with the similarity greater than or equal to a first preset value from the third translation vectors;

and obtaining target translation vectors corresponding to the two adjacent frames of original images according to the feature points corresponding to the fourth translation vectors.

9. The method according to any one of claims 1-8, wherein the calculating an image stabilization transformation matrix of the first original image according to the target translation information of the camera and the rotation information of the camera comprises:

calculating the translation compensation quantity of the first original image according to the target translation information of the camera;

calculating a rotation compensation amount of the first original image according to the rotation information of the camera;

and calculating an image stabilization transformation matrix of the first original image according to the translation compensation quantity and the rotation compensation quantity.

10. The method according to any one of claims 1-9, further comprising:

if the preset conditions are met, acquiring rotation information of the camera according to the attitude sensor data corresponding to the multiple frames of original images;

calculating a stabilized image transformation matrix of the first original image according to the rotation information of the camera;

11. The method of claim 10, further comprising:

and if the preset condition is met, prompting the user that the user exits the target anti-shake mode.

12. The method according to claim 10 or 11, wherein the preset condition comprises at least one of:

the number of the feature points on the two adjacent frames of original images is less than or equal to a second preset value;

or the proportion of the third translation vector corresponding to the two adjacent frames of original images in the first translation vector is less than or equal to a third preset value;

or the proportion of the fourth translation vector in the third translation vector of the two adjacent frames of original images is less than or equal to a fourth preset value;

or, the variance of the translational compensation quantity between the original images of the continuous P frames is greater than or equal to a fifth preset value, and P is an integer greater than 1;

or the translation amplitude between the original images of the continuous Q frames is greater than or equal to a sixth preset value, and Q is an integer greater than 1.

13. The method according to any one of claims 1-12, wherein the obtaining rotation information of the camera according to the corresponding attitude sensor data of the plurality of frames of raw images comprises:

obtaining rotation information of the camera according to attitude sensor data corresponding to N frames of original images, wherein N is an integer larger than 1, N is N1+ I + N2, N1 and I are positive integers, and N2 is a non-negative integer;

the calculating of the image stabilization transformation matrix of the first original image according to the rotation information of the camera includes:

and calculating an image stabilization transformation matrix of the I frame original image according to the target pose of the camera corresponding to the N frame original image on the rotation information of the camera, wherein the I frame original image is the first original image, the image stabilization transformation matrix of the I frame original image is used for obtaining the I frame target image, and the initial frame of the I frame original image corresponds to the (N1 + 1) th frame original image in the N frame original image.

14. The method according to claim 13, wherein when the target image is a preview image or a captured image, the N2 is 0.

15. An electronic device, comprising:

a camera comprising a camera for capturing an image;

a screen for displaying an interface;

one or more processors;

a memory;

and one or more computer programs, wherein the one or more computer programs are stored in the memory, the one or more computer programs comprising instructions which, when executed by the electronic device, cause the electronic device to perform the video recording method of any of claims 1-14.

16. A computer-readable storage medium comprising computer instructions which, when executed on a computer, cause the computer to perform the video recording method of any one of claims 1-14.

17. A computer program product, characterized in that it causes a computer to carry out the video recording method according to any one of claims 1-14, when said computer program product is run on said computer.