CN114339102B - Video recording method and equipment - Google Patents

Video recording method and equipment Download PDF

Info

Publication number
CN114339102B
CN114339102B CN202011057718.6A CN202011057718A CN114339102B CN 114339102 B CN114339102 B CN 114339102B CN 202011057718 A CN202011057718 A CN 202011057718A CN 114339102 B CN114339102 B CN 114339102B
Authority
CN
China
Prior art keywords
image
camera
translation
original
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011057718.6A
Other languages
Chinese (zh)
Other versions
CN114339102A (en
Inventor
孙思佳
朱聪超
王宇
卢圣卿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202011057718.6A priority Critical patent/CN114339102B/en
Publication of CN114339102A publication Critical patent/CN114339102A/en
Application granted granted Critical
Publication of CN114339102B publication Critical patent/CN114339102B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The embodiment of the application provides a video recording method and device, which relate to the technical field of electronics, and can combine rotation information of a camera and target translation information of the camera obtained according to image content to perform anti-shake processing under a video recording scene, so that the image stabilizing effect of a video image is improved, and the shooting experience of a user is improved. The specific scheme is as follows: the electronic equipment acquires an original image after starting a video recording function; acquiring target translation information of a camera according to the acquired image information of the multi-frame original image; acquiring rotation information of a camera according to posture sensor data corresponding to a plurality of frames of original images; according to the target translation information of the camera and the rotation information of the camera, calculating an image stabilizing transformation matrix of a first original image, wherein the first original image is an image in a plurality of frames of original images; and carrying out image transformation on the first original image according to the image stabilizing transformation matrix to obtain a target image. The embodiment of the application is used for video anti-shake.

Description

Video recording method and equipment
Technical Field
The embodiment of the application relates to the technical field of electronics, in particular to a video recording method and video recording equipment.
Background
With the development of shooting technology, the requirements of users on video recording effects are also increasing. In the video recording process, due to the shake of a user's hand or the shake of the electronic device, the electronic device is easy to move, so that the shot image is shake. The electronic device may remove image shake between image frames due to rotational movement of the camera based on gyro (gyro) data. But only using gyroscope data to carry out the effect of anti-shake processing to the image is not good, and user shooting experience is poor.
Disclosure of Invention
The embodiment of the application provides a video recording method and device, which can combine rotation information of a camera and target translation information of the camera obtained according to image content under a video recording scene to perform anti-shake processing, reduce image shake caused by manual or electronic equipment shake of a user and the like, improve the image stabilizing effect of video images and improve the shooting experience of the user.
In order to achieve the above purpose, the embodiment of the application adopts the following technical scheme:
in one aspect, an embodiment of the present application provides a video recording method, applied to an electronic device, where the electronic device includes a camera, the method may include: the electronic device collects the original image after the video recording function is started. And the electronic equipment obtains the target translation information of the camera according to the acquired image information of the multi-frame original image. And then, the electronic equipment obtains the rotation information of the camera according to the posture sensor data corresponding to the multi-frame original image. And the electronic equipment calculates an image stabilizing transformation matrix of a first original image according to the target translation information of the camera and the rotation information of the camera, wherein the first original image is an image in a plurality of frames of original images. And the electronic equipment performs image transformation on the first original image according to the image stabilizing transformation matrix to obtain a target image.
In the scheme, the electronic equipment can perform anti-shake processing by combining the rotation information of the camera and the target translation information of the camera obtained according to the image content under the video recording scene, so that image shake caused by manual operation of a user or shaking of the electronic equipment is reduced, the image stabilizing effect of a video image is improved, and the shooting experience of the user is improved.
In one possible design, the electronic device may collect the original image after turning on the video recording function, and may include: the electronic equipment acquires an original image after starting a video recording function and detecting shooting operation of a user. The method may further comprise: after the electronic equipment detects that the shooting operation of the user is stopped, a video file is generated according to the video image, and the video image is a target image.
That is, in the video shooting process, the electronic device may perform anti-shake processing on the original image in combination with the rotation information of the camera and the target translation information of the camera obtained according to the image content, so as to generate a video image, reduce image shake caused by manual operation of a user or shake of the electronic device, improve an image stabilizing effect of the video image, and improve shooting experience of the user.
In another possible design, the electronic device may collect the original image after turning on the video recording function, and may include: the electronic equipment acquires an original image after starting a video recording function and detecting shooting operation of a user. The method may further comprise: the electronic equipment displays a recorded image on a shooting interface, wherein the recorded image is a target image.
That is, in the video shooting process, the electronic device can combine the rotation information of the camera and the target translation information of the camera obtained according to the image content to perform anti-shake processing on the original image so as to generate a recorded image, so that image shake caused by manual operation of a user or shaking of the electronic device is reduced, the image stabilizing effect of the recorded image on a shooting interface is improved, and the shooting experience of the user is improved.
In another possible design, the target image is a preview image, and the method may further include: the electronic device displays the preview image on the preview interface.
That is, in the video shooting process, the electronic device may combine the rotation information of the camera and the target translation information of the camera obtained according to the image content to perform anti-shake processing on the original image, so as to generate a preview image, reduce image shake caused by manual operation of a user or shake of the electronic device, improve the image stabilizing effect of the preview image on the preview interface, and improve the shooting experience of the user.
In another possible design, the target translation information of the camera is a target translation curve of the camera. The electronic device obtains the target translation information of the camera according to the acquired image information of the multi-frame original image, and the method can comprise the following steps: and the electronic equipment obtains target translation vectors corresponding to the adjacent two frames of original images according to the image information of the adjacent two frames of original images in the multi-frame original images. The object translation vectors between successive frames of original images are connected to form an original translation curve of the camera. The electronic device obtains a target translation curve of the camera according to the original translation curve of the camera.
That is, the target translation curve of the camera may be used to represent the translation of the camera. The electronic device may obtain an original translation curve of the camera from the frontal image information of the original image, thereby generating a target translation curve of the camera from the original translation curve of the camera.
In another possible design, the electronic device obtains the target translation vector corresponding to the two adjacent frames of original images according to the image information of the two adjacent frames of original images, which may include: the electronic equipment calculates a first translation vector according to the characteristic points on the adjacent two frames of original images; and obtaining target translation vectors corresponding to the adjacent two frames of original images according to the first translation vectors.
In this way, the electronic device may obtain the first translation vector from the image information of the original image, thereby obtaining the target translation vector from the first translation vector, so as to obtain the original translation curve of the camera from the target translation vector.
In another possible design, the electronic device obtaining, according to the first translation vector, a target translation vector corresponding to two adjacent frames of original images may include: and the electronic equipment calculates a second translation vector according to the motion sensor data corresponding to the adjacent two frames of original images. The electronic equipment selects a third translation vector from the first translation vectors, wherein the third translation vector is positioned in delta adjacent to the second translation vector; and obtaining target translation vectors corresponding to the adjacent two frames of original images according to the third translation vector.
Therefore, the third translation vector obtained by filtering the first translation vector by the electronic equipment is smaller in distance from the second translation vector, the first translation vector corresponding to most of the feature point pairs which are mismatched can be filtered, and the first translation vector corresponding to the local motion of the photographed moving object is filtered, so that the selected third translation vector can more accurately represent the overall translation condition of the camera caused by the shake of the hand of the user or the shake of the electronic equipment.
In another possible design, the electronic device obtaining, according to the third translation vector, a target translation vector corresponding to two adjacent frames of original images may include: the electronic device selects a fourth translation vector with the similarity larger than or equal to a preset value from the third translation vectors. And the electronic equipment obtains target translation vectors corresponding to the adjacent two frames of original images according to the feature points corresponding to the fourth translation vectors.
In this way, the fourth translation vector obtained by filtering the third translation vector by the electronic device can filter the translation vector corresponding to the local motion of the photographed moving object similar to the translation of the target direction and the outlier, so that the fourth translation vector can more accurately represent the translation condition of the original images of two adjacent frames. Further, the electronic device may obtain the target translation vector from the fourth translation vector that is more accurate.
In another possible design, the electronic device calculates an image stabilization transformation matrix of the first original image according to the target translation information of the camera and the rotation information of the camera, and may include: the electronic equipment calculates the translation compensation quantity of the first original image according to the target translation information of the camera; calculating a rotation compensation amount of the first original image according to the rotation information of the camera; the electronic device calculates an image stabilizing transformation matrix of the first original image according to the translation compensation amount and the rotation compensation amount.
Thus, the electronic device can generate the image stabilizing transformation matrix according to the translation compensation amount and the rotation compensation amount, so as to perform warp transformation and motion compensation on the original image.
In another possible design, the method may further include: if the preset condition is met, the electronic equipment obtains the rotation information of the camera according to the gesture sensor data corresponding to the multi-frame original image; the electronic equipment calculates an image stabilizing transformation matrix of the first original image according to the rotation information of the camera; and carrying out image transformation on the first original image according to the image stabilizing transformation matrix to obtain a target image.
That is, the electronic device may exit the five-axis anti-shake mode after a certain condition is satisfied, so that anti-shake processing is performed according to rotation information of the camera without performing anti-shake processing in combination with target translation information of the camera obtained according to image content.
In another possible design, the method may further include: if the preset condition is met, prompting the user to exit the target anti-shake mode.
Thus, the user can conveniently know whether the target anti-shake mode is currently in the target anti-shake mode, and the target anti-shake mode can be a five-axis anti-shake mode.
In another possible design, the preset conditions include: the number of the characteristic points on the adjacent two frames of original images is smaller than or equal to a second preset value; or the proportion of the third translation vector corresponding to the two adjacent frames of original images to the first translation vector is smaller than or equal to a third preset value; or the proportion of the fourth translation vector in the third translation vector of two adjacent frames of original images is smaller than or equal to a fourth preset value; or, the variance of the translation compensation amount between the original images of the continuous P frames is larger than or equal to a fifth preset value, and P is an integer larger than 1; alternatively, the translation amplitude between the original images of consecutive Q frames is greater than or equal to a sixth preset value, Q being an integer greater than 1.
Under the condition that the preset condition is met, the condition that the target translation information determined according to the image information is inaccurate and the image cannot or does not need to be subjected to anti-shake processing by combining the target translation information can be indicated, so that the electronic equipment can determine the image stabilizing transformation matrix only according to the rotation information to perform anti-shake processing on the image.
In another possible design, the electronic device obtains rotation information of the camera according to the posture sensor data corresponding to the multiple frames of original images, and the method may include: the electronic equipment obtains rotation information of the camera according to gesture sensor data corresponding to N frames of original images, N is an integer larger than 1, N=N1+I+N2, N1 and I are positive integers, and N2 is a non-negative integer. The electronic device calculates an image stabilizing transformation matrix of a first original image according to rotation information of a camera, and the image stabilizing transformation matrix comprises: the electronic equipment calculates an image stabilizing transformation matrix of an I frame original image according to a target pose of the camera corresponding to the N frame original image on the rotation information of the camera, wherein the I frame original image is a first original image, the image stabilizing transformation matrix of the I frame original image is used for obtaining an I frame target image, and a starting frame of the I frame original image corresponds to an N1+1st frame original image in the N frame original images.
That is, the electronic device may calculate the camera pose and the image stabilizing transformation matrix corresponding to the original image according to the original image before and the original image after the certain original image, so that the changes between the camera poses corresponding to different original images are smoother, and the image stabilizing effect is improved.
In another possible design, N2 is 0 when the target image is a preview image or a recorded image.
Therefore, on the preview interface and the shooting interface, the electronic equipment does not need to calculate the camera pose and the image stabilizing transformation matrix corresponding to the original image according to the original image after the original image, so that the preview image and the recorded image corresponding to the original image can be processed and displayed in real time.
On the other hand, the embodiment of the application provides a shooting device, and the shooting device is contained in electronic equipment. The device has the function of realizing the behavior of the electronic equipment in any one of the aspects and the possible designs, so that the electronic equipment executes the video recording method executed by the electronic equipment in any one of the possible designs of the aspects. The functions can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes at least one module or unit corresponding to the functions described above. For example, the apparatus may comprise an acquisition unit, a processing unit, and the like.
In yet another aspect, an embodiment of the present application provides an electronic device, including: the camera comprises a camera, and the camera is used for collecting images; a screen for displaying an interface, one or more processors; and a memory in which the code is stored. The code, when executed by an electronic device, causes the electronic device to perform the video recording method performed by the electronic device in any of the possible designs of the above aspects.
In yet another aspect, an embodiment of the present application provides an electronic device, including: one or more processors; and a memory in which the code is stored. The code, when executed by an electronic device, causes the electronic device to perform the video recording method performed by the electronic device in any of the possible designs of the above aspects.
In yet another aspect, embodiments of the present application provide a computer-readable storage medium including computer instructions that, when executed on an electronic device, cause the electronic device to perform the video recording method of any one of the possible designs of the above aspects.
In yet another aspect, embodiments of the present application provide a computer program product which, when run on a computer, causes the computer to perform the video recording method performed by the electronic device in any of the possible designs of the above aspects.
In yet another aspect, an embodiment of the present application provides a chip system that is applied to an electronic device. The system-on-chip includes one or more interface circuits and one or more processors; the interface circuit and the processor are interconnected through a circuit; the interface circuit is used for receiving signals from the memory of the electronic device and sending signals to the processor, wherein the signals comprise computer instructions stored in the memory; the computer instructions, when executed by a processor, cause the electronic device to perform the video recording method of any of the possible designs of the above aspects.
The corresponding advantages of the other aspects mentioned above may be found in the description of the advantages of the method aspects, and are not repeated here.
Drawings
Fig. 1 is a schematic structural diagram of an electronic device according to an embodiment of the present application;
FIG. 2 is a flowchart of a video recording method according to an embodiment of the present application;
FIG. 3 is a set of interface schematic diagrams provided in an embodiment of the present application;
FIG. 4A is a schematic diagram illustrating filtering of translation vectors according to an embodiment of the present disclosure;
FIG. 4B is a flowchart of calculating a translation compensation amount according to an embodiment of the present application;
FIG. 5 is a schematic illustration of the effect of a constraint provided by an embodiment of the present application;
FIG. 6 is a schematic diagram of a timing relationship between an original image and a preview image according to an embodiment of the present disclosure;
FIG. 7A is a schematic diagram of a set of preview interfaces provided in an embodiment of the present application;
FIG. 7B is a schematic diagram of another set of preview interfaces provided by embodiments of the present application;
FIG. 7C is a schematic diagram of another set of preview interfaces provided by embodiments of the present application;
FIG. 8A is a schematic diagram of a set of video images according to an embodiment of the present application;
FIG. 8B is a schematic view of another set of video images provided in an embodiment of the present application;
FIG. 8C is a schematic view of another set of video images provided in an embodiment of the present application;
Fig. 9 is a schematic diagram of a prompt interface provided in an embodiment of the present application;
FIG. 10 is a flowchart of another video recording method according to an embodiment of the present application;
FIG. 11 is a flowchart of another video recording method according to an embodiment of the present application;
FIG. 12 is a schematic illustration of another interface provided in an embodiment of the present application;
fig. 13 is a flowchart of another video recording method according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. Wherein, in the description of the embodiments of the present application, "/" means or is meant unless otherwise indicated, for example, a/B may represent a or B; "and/or" herein is merely an association relationship describing an association object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. In addition, in the description of the embodiments of the present application, "plurality" means two or more than two.
The terms "first" and "second" are used below for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present embodiment, unless otherwise specified, the meaning of "plurality" is two or more.
In the embodiments of the present application, words such as "exemplary" or "such as" are used to mean serving as examples, illustrations, or descriptions. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.
When video is recorded using an electronic device, the electronic device and a camera module (hereinafter, simply referred to as a camera) are easily moved due to a shake of a user's hand or shake of the electronic device, etc., thereby causing a shake and blurring of an image. Wherein the movement may include translational movement and rotational movement. In particular, when a photographed object is closer to the camera or a tele camera is used for photographing, translational motion of the electronic device and the camera is more obvious, and the translational motion has a larger influence on image shake.
In one technical scheme, the electronic equipment performs image anti-shake processing according to the gyroscope data, so that image shake caused by rotational movement of a camera between image frames can be removed, but image shake caused by translational movement between the image frames is difficult to remove, so that the anti-shake effect is poor, and the user shooting experience is poor.
The embodiment of the application provides a video recording method and device, which can be applied to electronic equipment, and can be used for carrying out anti-shake processing by combining rotation information of a camera obtained according to attitude sensor data such as a gyroscope and target translation information of the camera obtained according to image content under a video recording scene, so that image shake caused by hand shake of a user or shake of the electronic equipment is reduced, the image stabilizing effect of a video image is improved, and the shooting experience of the user is improved.
In the embodiment of the application, the target translation information of the camera, which is obtained by the electronic device according to the image content, is used for representing the translation condition of the camera caused by the shake of the hand of the user or the shake of the electronic device, and can be used for representing the global translation trend between the adjacent original images; and is not used for representing the local relative translation condition of the photographed moving object on the adjacent original image.
The video recording method provided by the embodiment of the application can be used for a rear video recording scene, can also be used for a front video recording scene, and is not limited. After the shooting function is started, the electronic device can perform anti-shake processing by combining rotation information of the camera and target translation information of the camera obtained according to image content.
For example, the electronic device may be a mobile terminal such as a mobile phone, a tablet computer, a wearable device (e.g. a smart watch), a vehicle-mounted device, an augmented reality (augmented reality, AR)/Virtual Reality (VR) device, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook or a personal digital assistant (personal digital assistant, PDA), or a professional camera, and the embodiment of the present application does not limit the specific type of the electronic device.
By way of example, fig. 1 shows a schematic diagram of an electronic device 100. The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, keys 190, a motor 191, an indicator 192, a camera 193, a display 194, and a subscriber identity module (subscriber identification module, SIM) card interface 195, etc. The sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.
It is to be understood that the structure illustrated in the embodiments of the present application does not constitute a specific limitation on the electronic device 100. In other embodiments of the present application, electronic device 100 may include more or fewer components than shown, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
The processor 110 may include one or more processing units, such as: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), a controller, a memory, a video codec, a digital signal processor (digital signal processor, DSP), a baseband processor, and/or a neural network processor (neural-network processing unit, NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors.
The controller may be a neural hub and a command center of the electronic device 100, among others. The controller can generate operation control signals according to the instruction operation codes and the time sequence signals to finish the control of instruction fetching and instruction execution.
A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to reuse the instruction or data, it may be called directly from memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby improving the efficiency of the system.
The electronic device 100 implements display functions through a GPU, a display screen 194, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
The display screen 194 is used to display images, videos, and the like. The display 194 includes a display panel. The display panel may employ a liquid crystal display (liquid crystal display, LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (AMOLED) or an active-matrix organic light-emitting diode (matrix organic light emitting diode), a flexible light-emitting diode (flex), a mini, a Micro led, a Micro-OLED, a quantum dot light-emitting diode (quantum dot light emitting diodes, QLED), or the like. In some embodiments, the electronic device 100 may include 1 or R display screens 194, R being a positive integer greater than 1. In embodiments of the present application, the display 194 may be used to display preview and capture interfaces, etc. in a video recording mode.
The electronic device 100 may implement photographing functions through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.
The ISP is used to process data fed back by the camera 193. For example, when photographing, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing, so that the electrical signal is converted into an image visible to naked eyes. ISP can also optimize the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in the camera 193.
The camera 193 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (charge coupled device, CCD) or a Complementary Metal Oxide Semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, which is then transferred to the ISP to be converted into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into an image signal in a standard RGB, YUV, or the like format. In some embodiments, electronic device 100 may include 1 or L cameras 193, L being a positive integer greater than 1.
The cameras 193 may include cameras with different focal segments, such as ultra-wide-angle cameras, and tele cameras with equivalent focal lengths ranging from small to large. The camera with smaller equivalent focal length has larger visual field range and can be used for shooting larger pictures such as scenery and the like. The camera with larger equivalent focal length has smaller visual field range, can be used for shooting remote objects, and has smaller shooting area.
In addition, the camera 193 may also include a depth camera for measuring an object distance of an object to be photographed, and other cameras. For example, the depth camera may include a three-dimensional (3 d) depth camera, a time of flight (TOF) depth camera, a binocular depth camera, or the like.
The digital signal processor is used for processing digital signals, and can process other digital signals besides digital image signals. For example, when the electronic device 100 selects a frequency bin, the digital signal processor is used to fourier transform the frequency bin energy, or the like.
Video codecs are used to compress or decompress digital video. The electronic device 100 may support one or more video codecs. In this way, the electronic device 100 may play or record video in a variety of encoding formats, such as: dynamic picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4, etc.
The NPU is a neural-network (NN) computing processor, and can rapidly process input information by referencing a biological neural network structure, for example, referencing a transmission mode between human brain neurons, and can also continuously perform self-learning. Applications such as intelligent awareness of the electronic device 100 may be implemented through the NPU, for example: image recognition, face recognition, speech recognition, text understanding, pattern recognition, machine self-learning, and the like.
The internal memory 121 may be used to store computer-executable program code that includes instructions. The processor 110 executes various functional applications of the electronic device 100 and data processing by executing instructions stored in the internal memory 121. The internal memory 121 may include a storage program area and a storage data area. The storage program area may store an application program (such as a sound playing function, an image playing function, etc.) required for at least one function of the operating system, etc. The storage data area may store data created during use of the electronic device 100 (e.g., audio data, phonebook, etc.), and so on. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and the like.
In the embodiment of the present application, the processor 110 obtains the target translation information of the camera according to the translation vector obtained by the image information and the translation vector obtained by the motion sensor in the video recording scene by running the instruction stored in the internal memory 121, and performs the anti-shake processing in combination with the rotation information of the camera and the target translation information of the camera, so as to improve the image stabilizing effect of the video image and improve the shooting experience of the user.
The gyro sensor 180B may be used to determine a motion gesture of the electronic device 100. In some embodiments, the angular velocity of the electronic device 100 about three axes (i.e., x, y, and z axes) may be determined by the gyro sensor 180B, i.e., rotational information of the electronic device 100, i.e., rotational information of the camera, may be obtained. The gyro sensor 180B may be used for photographing anti-shake. For example, when the shutter is pressed, the gyro sensor 180B detects the shake angle of the electronic device 100, calculates the distance to be compensated by the lens module according to the angle, and makes the lens counteract the shake of the electronic device 100 through the reverse motion, so as to realize anti-shake. The gyro sensor 180B may also be used for navigating, somatosensory game scenes. Since the camera (module) is fixed on the electronic device 100, the rotation information of the electronic device 100 can be understood as rotation information of the camera. Also, the camera (module) includes the camera 193, and thus the rotation information of the camera can also be understood as rotation information of the camera 193.
The acceleration sensor 180E may detect the magnitude of acceleration of the electronic device 100 in various directions (typically three axes). The magnitude and direction of gravity may be detected when the electronic device 100 is stationary. The electronic equipment gesture recognition method can also be used for recognizing the gesture of the electronic equipment, and is applied to horizontal and vertical screen switching, pedometers and other applications. In an embodiment of the present application, the acceleration sensor 180E may be used to obtain a translation vector of the electronic device 100, that is, a translation vector of the camera, so that the user calculates target translation information of the camera.
A distance sensor 180F for measuring a distance. The electronic device 100 may measure the distance by infrared or laser. In some embodiments, the electronic device 100 may range using the distance sensor 180F to achieve quick focus.
The touch sensor 180K, also referred to as a "touch panel". The touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also called a "touch screen". The touch sensor 180K is for detecting a touch operation acting thereon or thereabout. The touch sensor may communicate the detected touch operation to the application processor to determine the touch event type. Visual output related to touch operations may be provided through the display 194. In other embodiments, the touch sensor 180K may also be disposed on the surface of the electronic device 100 at a different location than the display 194.
In embodiments of the present application, in video recording scenarios, camera 193 may be used to capture images; the display 194 may be used to display preview and capture interfaces, etc.; a motion sensor such as acceleration sensor 180E (also referred to as an accelerometer) may be used to obtain a translation vector for the camera; a posture sensor such as a gyro sensor 180B (also referred to as a gyroscope) may be used to obtain rotation information of the camera; the processor 110 can obtain the target translation information of the camera according to the translation vector obtained by the image information and the translation vector obtained by the motion sensor in the video recording scene by running the instruction stored in the internal memory 121, and perform anti-shake processing by combining the rotation information of the camera and the target translation information of the camera, so as to improve the image stabilizing effect of the video image and improve the shooting experience of the user.
It is understood that the gesture sensor is used to detect the gesture of the electronic device 100, and rotation information of the electronic device 100 may be obtained. For example, the attitude sensor may be a sensor such as a gyroscope or a three-axis electronic compass, and the type of the attitude sensor is not limited in the embodiments of the present application.
In the anti-shake processing process described in the embodiments of the present application, the mobile phone may perform image anti-shake processing in combination with the target translation information of the camera and the rotation information of the camera, so as to improve the image stabilization effect of the preview image, so that the anti-shake mode adopted may be referred to as a translation anti-shake mode.
In the anti-shake processing process described in the embodiments of the present application, the mobile phone may perform image anti-shake processing by combining the two-axis target translation information and the three-axis rotation information, so as to improve the image stabilizing effect of the video image, and thus the adopted anti-shake mode may also be referred to as a five-axis anti-shake mode.
The following will describe a video recording method provided in the embodiment of the present application by taking an electronic device as a mobile phone having a structure shown in fig. 1 and an attitude sensor as a gyroscope as an example. As shown in fig. 2, the video recording method may include:
200. after the shooting function is started, the mobile phone enters a video recording mode, and an original image is acquired according to a preset frame rate in a preview state.
When the user wants to shoot an image by using the mobile phone, the shooting function of the mobile phone can be started. For example, the mobile phone may launch a camera application, or launch other applications with photographing or video recording functions (such as an AR application like tremble or river map cyberverse) to launch the photographing function of the mobile phone.
In some embodiments, the video recording method provided in the embodiments of the present application may be applied to a video recording mode, and the mobile phone may enter the video recording mode after starting a shooting function, so as to perform anti-shake processing in the video recording mode in combination with rotation information of a camera obtained according to gyroscope data and target translation information of the camera obtained according to image content.
Illustratively, after detecting the user's click on the camera icon 301 shown in fig. 3 (a), the mobile phone starts the photographing function of the camera application and displays a preview interface in the photographing mode as shown in fig. 3 (b). After detecting the user clicking the control 302 shown in fig. 3 (b), the mobile phone enters a video recording mode as shown in fig. 3 (c).
For further example, the mobile phone displays a desktop or a non-camera application interface, starts a shooting function and enters a video mode after detecting a voice command indicating video recording by a user, and enters the video mode as shown in (c) of fig. 3.
It should be noted that, the mobile phone may also enter the video mode in response to other touch operations, voice instructions, or shortcut gestures of the user, which are not limited in the embodiment of the present application.
In other embodiments, in the video mode, the mobile phone does not automatically combine rotation information of the camera obtained according to the gyroscope data and target translation information of the camera obtained according to the image content to perform anti-shake processing; after detecting the preset operation 1 of the user, the mobile phone starts anti-shake processing by combining the rotation information of the camera obtained according to the gyroscope data and the target translation information of the camera obtained according to the image content. The preset operation 1 is used for indicating the mobile phone to perform anti-shake processing in combination with the target translation information. In an exemplary video mode, the preview interface includes a five-axis anti-shake control, and after detecting the operation of clicking the five-axis anti-shake control by the user, the mobile phone performs anti-shake processing by combining rotation information of the camera obtained according to gyroscope data and target translation information of the camera obtained according to image content.
After the mobile phone enters a video mode, an original image is acquired according to a preset frame rate in a preview state.
In other embodiments, the video recording method provided in the embodiments of the present application is applied to a specific target shooting mode other than the video recording mode, and the mobile phone may enter the target shooting mode after starting the shooting function, so as to perform anti-shake processing in the target shooting mode in combination with rotation information of the camera obtained according to the gyroscope data and target translation information of the camera obtained according to the image content.
In the following embodiments, a five-axis anti-shake mode is turned on in a video mode of a mobile phone, so as to perform anti-shake processing in combination with rotation information of a camera obtained according to gyroscope data and target translation information of the camera obtained according to image content.
In the preview state, after the mobile phone starts the five-axis anti-shake mode, target translation information can be obtained according to the image content of the adjacent original image, so that anti-shake processing is performed by combining the target translation information and the rotation information of the camera obtained according to the gyroscope data. In some embodiments, the process of the mobile phone obtaining the target translation information according to the image content of the adjacent original image may include the following steps 201-206. The target translation information of the camera can be a target translation curve of the camera, and the mobile phone can obtain target translation vectors corresponding to two adjacent frames of original images according to the image information of the two adjacent frames of original images in the multi-frame original images. The object translation vectors between successive frames of original images are connected to form an original translation curve of the camera. The mobile phone can obtain a target translation curve of the camera according to the original translation curve of the camera.
201. In a preview state, after the mobile phone starts a five-axis anti-shake mode, a first translation vector is calculated according to feature points on two adjacent frames of original images.
In some embodiments, the mobile phone defaults to a five-axis anti-shake mode in a preview state of a video mode; in other embodiments, the handset turns on the five-axis anti-shake mode after detecting the operation of turning on the five-axis anti-shake mode by the user. For example, the preview interface includes a five-axis anti-shake control, and the mobile phone opens the five-axis anti-shake mode after detecting the operation of clicking the control by the user.
In some embodiments of the present application, after the mobile phone starts/exits the five-axis anti-shake mode, the user may be prompted by means of display information, voice broadcast or vibration, etc.
After the five-axis anti-shake mode is started, the mobile phone calculates a first translation vector according to the feature points on the adjacent two frames of original images. The first translation vectors corresponding to the two adjacent frames of original images comprise a plurality of vectors, each first translation vector corresponds to one or a plurality of characteristic point pairs on the two adjacent frames of original images, and the first translation vectors are used for representing translation conditions such as translation directions, translation distances and the like of the mutually matched characteristic point pairs between the two adjacent frames of original images.
For example, the mobile phone can detect feature points of non-edge areas of the 2 nd frame original image and the 1 st frame original image acquired in a preview state. And then, the mobile phone performs inter-frame feature point matching according to the detected feature points, and determines feature point pairs matched on the original image of the 2 nd frame and the original image of the 1 st frame. And the mobile phone calculates a first translation vector according to the matched characteristic point pairs. That is, the mobile phone obtains the first translation vector according to the image information of the two adjacent frames of original images. For another example, the mobile phone may perform feature point matching on the 3 rd frame of original image and the 2 nd frame of original image in the preview state, so as to obtain a corresponding first translation vector.
The original images are obtained through camera shooting, and the translation of the camera can lead to the translation between the two adjacent frames of original images, so that the translation condition between the two adjacent frames of original images can be used for representing the translation condition of the camera and the mobile phone. The first translation vector may be referred to as an original translation vector of the original image, or an original translation vector of the camera.
Wherein the translation vector may also be referred to as translation vector or motion vector, in the form of a vector. One D-dimensional (i.e., multi-dimensional) vector may represent one point in D-dimensional space, and thus the first translation vector may be a plurality of points within D-dimensional space. Illustratively, the first translation vector may correspond to all points in the D-dimensional space as shown in fig. 4A.
Then, the mobile phone can filter the original translation vector to obtain a target translation vector between two adjacent frames of original images, so as to obtain target translation information of the camera. The object translation information is used to accurately represent the translation of the camera, which may be, for example, the object translation curve of the camera that follows.
202. And the mobile phone calculates a second translation vector according to the motion sensor data corresponding to the adjacent two frames of original images.
The motion sensor can be used for monitoring the translational motion condition of the mobile phone, so that a second translational vector is calculated according to the translational motion condition. For example, the motion sensor may be an accelerometer, and the mobile phone may calculate the second translation vector according to data of the accelerometer corresponding to the two adjacent frames of original images (i.e., data of the accelerometer corresponding to the two adjacent frames of original images in the acquisition period). The second translation vector is used for representing the translation condition of the mobile phone, namely the translation condition between the camera and the original images of two adjacent frames. The mobile phone calculates a corresponding second translation vector according to the data of the corresponding accelerometer in the acquisition period of the 1 st to 2 nd frames of original images acquired in the preview state.
203. The mobile phone selects a third translation vector from the first translation vectors, wherein the third translation vector is positioned in delta adjacent to the second translation vector.
Wherein the third translation vector comprises one or more vectors. The mobile phone may select a third translation vector from the first translation vectors that is within the first neighborhood of the second translation vector.
The translation condition between two adjacent frames of original images determined according to the feature points is consistent with the translation condition of the mobile phone determined according to a motion sensor such as an accelerometer, so that the mobile phone can select a first translation vector, namely a third translation vector, positioned in the vicinity of the second translation vector delta. That is, the distance between the third translation vector and the second translation vector is less than or equal to δ.
Therefore, the third translation vector obtained by filtering the first translation vector by the mobile phone is smaller in distance from the second translation vector, the first translation vector corresponding to most of the feature point pairs which are mismatched can be filtered, and the first translation vector corresponding to the local motion of the photographed moving object is filtered, so that the selected third translation vector can more accurately represent the overall translation condition of the camera caused by the shake of the user's hand or the shake of the mobile phone and the like.
That is, the mobile phone combines the motion sensor data such as the accelerometer and the image information to determine the translation information of the camera, so that the accuracy and the robustness of the translation information can be improved.
Illustratively, the points in the D-dimensional space corresponding to the third translation vector may be the points remaining after the points in circle 401 are filtered out in fig. 4A.
And then, the mobile phone can filter the third translation vector to obtain a target translation vector between two adjacent frames of original images, so as to obtain target translation information of the camera.
204. The mobile phone selects a fourth translation vector with the similarity larger than or equal to a preset value 1 from the third translation vectors.
Wherein the fourth translation vector comprises one or more vectors. When the camera translates to the target direction, the pixel points with the same content on the two adjacent frames of original images translate to the target direction, and most of characteristic point pairs on the two adjacent frames of original images translate to the target direction. Therefore, the mobile phone can select a fourth translation vector with the similarity larger than or equal to the preset value 1 from the third translation vectors, wherein the fourth translation vector is a translation vector corresponding to the feature point pair translated basically towards the target direction, and the translation condition of the camera can be represented more accurately. In this way, the mobile phone filters the fourth translation vector obtained by filtering the third translation vector, and can filter the translation vector corresponding to the local motion of the photographed moving object similar to the translation of the target direction and the outlier.
For example, the mobile phone may use a clustering algorithm such as DBSCAN to screen out fourth translation vectors that are most similar to each other from the third translation vectors, so as to accurately represent the translation situation of the camera.
For another example, the mobile phone may use other machine learning methods such as Kmeans, RANSAC, etc. to reject outlier translation vectors from the third translation vector or select the translation vector with the highest confidence, so as to obtain a fourth translation vector, so as to accurately represent the translation situation of the camera.
Illustratively, the points within the D-dimensional space corresponding to the fourth translation vector may be points in the circle 402 that are filtered out again, and points in the circle 403 that remain after points outside the circle 404 in the circle 403.
205. And the mobile phone determines a target translation vector according to the feature points corresponding to the fourth translation vector.
The mobile phone calculates a mean value coordinate 1 according to the coordinates of the feature points on the previous frame of original image corresponding to the fourth translation vector on the two adjacent frames of original images, calculates a mean value coordinate 2 according to the coordinates of the feature points on the next frame of original image corresponding to the fourth translation vector, and one vector formed from the mean value coordinate 1 to the mean value coordinate 2 is the target translation vector. The target translation vector is used for representing the translation condition between two adjacent frames of original images.
206. And the mobile phone determines the target translation information of the camera according to the target translation vectors of the continuous M frames of original images.
The target translation information of the camera is used for representing the translation condition of the camera. The target translation information of the camera is used for calculating an image stabilizing transformation matrix of the original image subsequently, and the image stabilizing transformation matrix is used for carrying out motion compensation on the original image through deformation (warp) transformation, so that the effects of image anti-shake and image stabilizing are achieved.
For example, the target translation information of the camera may be a target translation curve of the camera. The object translation vectors between successive M frames of original images are connected to form an original translation curve of the camera, which may also be referred to as an original translation track of the camera or an original translation path of the camera, etc. The mobile phone can carry out smoothing treatment on the original translation curve of the camera so as to obtain a target translation curve of the camera. The target translation curve of the camera is also called a target translation track or path of the camera, and the like, so that the translation condition of the camera can be accurately represented.
For example, the mobile phone can perform optimal estimation on an original translation curve of the camera through a Kalman (Kalman) algorithm, so as to obtain a target translation curve of the camera, which can accurately represent the translation condition between continuous multi-frame original images.
The mobile phone processes the translation vector through the clustering algorithm, the Kalman algorithm and the like, so that more accurate and robust target translation information can be obtained.
After the mobile phone forms a target translation curve of the camera according to the target translation vector between the M frames of original images for the first time, each frame of original image acquired by the mobile phone can be combined with the M frames of original images acquired recently, and subsequent points on the target translation curve of the camera can be continuously acquired.
207. And the mobile phone calculates the translation compensation quantity of the original image according to the target translation information of the camera.
The translational compensation amount is used for performing translational motion compensation on the original image during deformation (warp) transformation, so as to avoid shaking or blurring of the original image due to translational motion of the camera as much as possible, and achieve the effects of image shaking prevention and image stabilization. The mobile phone can determine the translation compensation amount of the original image according to the target translation information of the camera such as the target translation curve of the camera.
In an embodiment of the present application, when the target translation information of the camera is a target translation curve of the camera, a flowchart of a method for obtaining the translation compensation amount described in the above steps 201-207 may be referred to fig. 4B. The process comprises the following steps: the mobile phone detects non-edge area feature points of two adjacent frames of original images, then carries out inter-frame feature point matching, calculates translation vectors according to the matched feature point pairs, filters the translation vectors according to accelerometer information, further filters the translation vectors by adopting a DBSAN algorithm, obtains target translation vectors according to the filtered translation vectors, obtains an original translation curve of a camera according to the target translation vectors of continuous multi-frame original images, carries out smoothing processing on the original translation curve of the camera by adopting a Kalman algorithm, thereby obtaining a target translation curve of the camera, and calculates translation compensation quantity of the original image according to the target translation curve of the camera.
208. And the mobile phone obtains the rotation information of the camera according to the gyroscope data corresponding to the N frames of original images.
Wherein the rotation information of the camera is used to represent the rotation of the camera. For example, the rotation information of the camera may be a target rotation curve of the camera. The rotation information of the camera and the target translation information of the camera can be used for calculating an image stabilizing transformation matrix of the original image, and the image stabilizing transformation matrix is used for performing motion compensation on the original image through deformation (warp) transformation, so that the effects of image anti-shake and image stabilizing are achieved.
The following description will take the mobile phone as an example to obtain the target rotation curve of the camera according to the gyroscope data corresponding to the N frames of original images.
For example, after the first condition is met, the mobile phone obtains an original rotation curve of the camera according to gyroscope data corresponding to the N frames of original images. And the mobile phone processes the original rotation curve of the camera according to the constraint condition, so as to obtain the target rotation curve of the camera.
The target rotation curve of the camera satisfies the following first condition:
(1) The target rotation curve of the camera is continuous throughout. That is, the movement track of the camera is smooth, and the track change cannot be too severe.
(2) The target rotation curve of the camera is first, second, third order conductive and has as small a curvature as possible (e.g., less than or equal to a preset threshold). That is, the speed, acceleration, and rate of change of acceleration (i.e., jerk) of the target rotation curve of the camera are smoothly changed and cannot be excessively severe.
(3) When the original image is subjected to motion compensation according to the stable image transformation matrix obtained by the target rotation curve of the camera, the compensated original image cannot be caused to exceed the boundary of clipping (after crop) so as to generate black edges.
The mobile phone can adopt various methods to smooth the original rotation curve of the camera so as to obtain the target rotation curve of the camera. For example, the handset may obtain a smooth target rotation curve of the camera by a quadratic programming method.
Wherein the point on the original rotation curve of the camera, i.e. the original pose of the camera, can be represented using a rotation angle sequence of 3 sets of time intervals represented by equation 1:
Figure BDA0002711296990000121
wherein (X, Y, Z) represents the rotation angles in three directions integrated from the gyroscope, respectively. The mobile phone can determine a smooth target rotation curve of the camera meeting the first condition according to the target function represented by the formula 2 and the constraint condition represented by the formula 3 through a quadratic programming method.
minw 1 J 1 +w 2 J 2 +w 3 J 3 2, 2
Wherein J in formula 2 i The i-th derivative of the target rotation curve representing the camera, as shown in equation 3, may include:
Figure BDA0002711296990000122
after the first condition is met, the mobile phone adopts the algorithm provided by the embodiment of the application to process according to the original pose of the camera corresponding to the acquired N frames of original images for the first time, the optimized target pose of the camera corresponding to the I frames of original images is obtained, and I is a positive integer, so that the transition of the target pose of the camera corresponding to the I frames of original images is smoother. The optimized target pose of the camera forms a target rotation curve of the camera, and the optimized target pose of the camera is a point on the target rotation curve of the camera. And then, after the mobile phone acquires the subsequent I-frame original image, processing according to the original pose of the camera corresponding to the recently acquired historical N-frame original image to obtain the optimized target pose of the camera corresponding to the I-frame original image.
Wherein n=n1+i+n2, N1 and I are positive integers and N2 is a non-negative integer. According to the method, the target pose of the optimized camera corresponding to the I frame original image after the N1 frame original image is output according to the original pose of the camera corresponding to the N frame original image. For example, N is 45, N1 is 15, N2 is 27, i is 3, the algorithm processes the original pose of the camera corresponding to the 1 st-45 th frame of original image for the first time, and outputs the optimized target pose of the camera corresponding to the 16 th, 17 th and 18 th frames of original image. That is, the mobile phone obtains the target pose of the optimized camera corresponding to the I-frame original image according to the I-frame original image and the original poses of the camera corresponding to the previous N1-frame original image and the next N2-frame original image.
The sub constraint condition (1) in the formula 6 is used for taking the target pose of the camera corresponding to the optimized partial original image as input to optimize the target pose of the camera corresponding to the adjacent subsequent frames of original images, so that the overall transition of the target pose of the optimized camera is smoother, and not only the local transition among the target poses of the cameras corresponding to the I-frame original images is smoother. For example, N is 45, N1 is 15, N2 is 27, i is 3, the algorithm processes the target pose of the optimized camera corresponding to the 4 th-48 th frame of original image for the second time, and outputs the target pose of the optimized camera corresponding to the 19 th, 20 th and 21 th frames of original images. In order to ensure that the target pose of the optimized camera corresponding to the 18 th and 19 th frame of original images after optimization does not jump, the transition is smooth, and the mobile phone can replace the original positions of the cameras corresponding to the 17 th and 18 th frame of original images used in the second optimization with the target pose of the optimized camera corresponding to the 17 th and 18 th frame of original images after the first optimization based on the sub constraint condition (1).
The sub constraint condition (2) in the formula 6 indicates that the original image cannot exceed the preset clipping boundary after warp transformation based on the target rotation curve of the camera. For example, referring to fig. 5, a block 501 represents the range of the original image, a block 502 represents the clipping retention range during the anti-shake processing, and the preset clipping boundary may be P shown in fig. 5 w And P h A defined boundary. For example, in FIG. 5, P i Representing pixel points before warp transformation on an original image, P i ' represents a pixel point subjected to warp transformation according to rotation information of the camera. Four corner points of the edge of the original image, namely four vertex points of the block 501, cannot exceed P after warp transformation w And P h The boundary being defined, i.e. not being located at P w And P h The clipping represented by block 502 outside the defined boundary remains within the clipping retention range so that the image obtained by clipping does not leave a black border. For another example, a certain pixel point P on the original image c P after warp transformation c ' cannot exceed boundary P w And P h I.e. the clipping retention range indicated by block 503 cannot be exceeded so that the image obtained by clipping does not leave a black border.
After the mobile phone obtains the target rotation curve of the camera according to the N frames of original images for the first time, each subsequent acquisition of the I frames of original images can be combined with the N frames of recently acquired original images to obtain subsequent I points on the target rotation curve of the camera.
209. The mobile phone calculates the rotation compensation amount of the original image according to the rotation information of the camera.
The rotation compensation amount is used for performing rotation motion compensation on the original image during warp transformation so as to avoid shaking or blurring and the like of the original image due to rotation motion of a camera as much as possible, and thus, the anti-shaking and image stabilizing effects of the image are realized. The mobile phone can calculate the rotation compensation amount according to the rotation information of the camera. For example, the handset may calculate the rotation compensation amount from the target rotation curve of the camera.
The rolling shutter of the mobile phone camera exposes the mobile phone camera line by line, and correspondingly, the rotation compensation quantity comprises compensation quantity corresponding to each exposure line. The amount of rotation compensation corresponding to different exposure lines may be the same.
For example, the rotation information of the camera is a target rotation curve of the camera, and the mobile phone can calculate rotation compensation amounts respectively corresponding to the I-frame original images according to target poses of the camera corresponding to the I-frame original images on the target rotation curve of the camera. For example, N is 45, N1 is 15, N2 is 27, i is 3, the mobile phone processes the original pose of the camera corresponding to the 1 st to 45 th frames of original images in the preview state, outputs the target pose of the optimized camera corresponding to the 16 th, 17 th and 18 th frames of original images, and calculates the rotation compensation amount corresponding to the 16 th, 17 th and 18 th frames of original images according to the target pose of the optimized camera corresponding to the 16 th, 17 th and 18 th frames of original images. And then, the mobile phone processes the original pose of the camera corresponding to the 4 th-48 th original image (wherein the original pose corresponding to the 17 th and 18 th original images is replaced by the target pose of the optimized camera), outputs the target pose of the optimized camera corresponding to the 19 th, 20 th and 21 th original images, and calculates the rotation compensation amount corresponding to the 19 th, 20 th and 21 th original images according to the target pose of the optimized camera corresponding to the 19 th, 20 th and 21 th original images.
210. And the mobile phone calculates an image stabilizing transformation matrix of the original image according to the translation compensation quantity and the rotation compensation quantity.
The image stabilizing transformation matrix is a homography matrix corresponding to warp transformation of the original image. The mobile phone can add the rotation compensation amount and the translation compensation amount corresponding to each row to obtain the image stabilizing transformation matrix. Wherein the corresponding translation compensation amount of each exposure row is equal. It is understood that the image stabilization transformation matrix may include not only rotation compensation amount and translation compensation amount, but also other motion compensation amounts such as RS compensation amount.
The mobile phone can calculate the image stabilizing transformation matrix of the I frame original image according to the rotation compensation amount and the translation compensation amount respectively corresponding to the I frame original image. For example, after the mobile phone obtains the rotation compensation amounts corresponding to the 16 th, 17 th and 18 th frames of original images, the rotation compensation amount and the translation compensation amount corresponding to the 16 th frame of original images can be added to obtain the stable image transformation matrix corresponding to the 16 th frame of original images. The mobile phone can add the rotation compensation amount and the translation compensation amount corresponding to the 17 th frame original image to obtain the stable image transformation matrix corresponding to the 17 th frame original image.
211. And the mobile phone transforms the original image according to the image stabilizing transformation matrix to obtain a preview image, and displays the preview image on a preview interface.
The mobile phone can perform warp transformation on the corresponding original images according to the image stabilizing transformation matrixes corresponding to each frame of the original images in the I frame of the original images so as to obtain preview images, and the preview images are displayed on a preview interface. For example, N is 45, N1 is 15, N2 is 27, i is 3, the mobile phone performs warp transformation on the 16 th, 17 th and 18 th original images according to the image stabilizing transformation matrix corresponding to each of the 16 th, 17 th and 18 th original images, so as to obtain 1 st, 2 nd and 3 rd preview images, and sequentially displays the 1 st, 2 nd and 3 rd preview images on a preview interface. And the mobile phone performs warp transformation on the 19 th, 20 th and 21 st original images according to the image stabilizing transformation matrixes corresponding to each of the 19 th, 20 th and 21 st original images so as to obtain 4 th, 5 th and 6 th preview images, and sequentially displays the 4 th, 5 th and 6 th preview images on a preview interface.
The difference between the serial number of the preview image frame displayed by the mobile phone and the serial number of the original image frame is N1. That is, the mobile phone does not display the preview image after the warp conversion of 1 to N1 frames of original images on the preview interface, but starts to display the preview image after the warp conversion of the N1+1 frames of original images.
It can be understood that the acquisition frame rate of the original image is larger in the preview state of the mobile phone, and a certain time is required for switching the mobile phone from other modes to the video mode, the mobile phone does not display the preview image after the warp conversion of 1 to N1 frames of the original image after switching to the video mode, so that poor visual experience such as blocking or black screen can not be generated by a user.
And the mobile phone can obtain the target pose of the camera corresponding to the N1+1st frame original image according to the initial pose of the camera corresponding to the N1+I+N2 frame original image, so as to obtain the image stabilizing transformation matrix corresponding to the N1+1st frame original image according to the target pose of the camera corresponding to the N1+1st frame original image, and further obtain the corresponding 1 st frame preview image after the warp transformation of the N1+1st frame original image. That is, the initial frame of the I-frame original image corresponds to the n+1th frame original image of the N-frame original images, and the preview image displayed by the mobile phone is delayed by at least N2 frames compared with the original image acquired by the mobile phone. For example, when N is 45, N1 is 15, and N2 is 27, the time sequence correspondence relationship between the original image frame acquired by the mobile phone in the preview state and the displayed preview image may be referred to fig. 6.
In other embodiments of the present application, immediately after entering the preview state, in the process of acquiring the first N frames of original images by the mobile phone, each frame of original image in the first N1 frames of original images is displayed as a preview image on the preview interface. Subsequently, after the mobile phone acquires a new original image, a preview image corresponding to the I frame original image is generated according to the N frame original image by adopting the method described in the embodiment, and the generated preview image is displayed on a preview interface.
In other embodiments of the present application, in the preview state, the frame number of the preview image displayed on the preview interface by the mobile phone corresponds to the frame number of the original image, i.e. N2 is 0 and I is 1. That is, after the mobile phone collects a frame of original image, a preview image corresponding to the frame of original image is displayed. Further, N1 may be small, for example, 5, 8, 10, or the like. In the preview state, before the mobile phone collects N1 frames of original images, the mobile phone combines the collected original images in the preview state to generate and display a preview image corresponding to the current original image.
For example, immediately after entering the preview state, the mobile phone acquires the 1 st frame of original image and then displays the 1 st frame of original image as the 1 st frame of preview image on the preview interface. After the mobile phone collects the 2 nd frame original image, generating a 2 nd frame preview image corresponding to the 2 nd frame original image according to the 1 st to 2 nd frame original image, and displaying the 2 nd frame preview image on a preview interface. After the mobile phone collects the 3 rd frame original image, a 3 rd frame preview image corresponding to the 3 rd frame original image is generated according to the 1 st to 3 rd frame original images, and the 3 rd frame preview image is displayed on a preview interface. After the mobile phone collects the N1+1 frame original image, generating an N1+1 frame preview image according to the 1 st to N1+1 frame original image, and displaying the N1+1 frame preview image on a preview interface. Subsequently, after the mobile phone collects a new original image, combining the N1 frame original image before the new original image and the new original image to generate a preview image corresponding to the new original image, and displaying the generated preview image on a preview interface.
In the case where the preview image is obtained by performing image transformation based on the image stabilizing matrix calculated from the target pose on the smoothed target rotation curve of the camera, since the smoothed target rotation curve of the camera satisfies the sub constraint condition (1), the entire transition between the warp transformed preview images obtained based on the target rotation curve of the camera is smoothed. The smoothed target rotation curve of the camera satisfies the sub constraint condition (2), so that the warp transformed preview image obtained based on the target rotation curve of the camera does not exceed the clipping boundary.
For example, in the preview state, the effect of the preview image of the mobile phone not subjected to the anti-shake process may be illustrated in (a) - (c) of fig. 7A. As shown in fig. 7A, in the preview process, the photographed object is translated and rotated, and the original image is dithered. The effect of the preview image of the anti-shake processing performed by the mobile phone according to the rotation information of the camera can be shown in (a) - (c) of fig. 7B. The effect of the preview image of the anti-shake process after the five-axis anti-shake mode is turned on in the mobile phone can be seen in fig. 7C (a) - (C).
212. And if the mobile phone determines that the second condition is met, exiting the five-axis anti-shake mode, and obtaining the rotation information of the camera according to the gyroscope data corresponding to the N frames of original images.
If the second condition is met, the fact that the target translation information determined according to the image information is inaccurate and the image cannot or does not need to be subjected to anti-shake processing by combining the target translation information can be indicated, so that the mobile phone can determine the image stabilizing transformation matrix only according to the rotation information to perform anti-shake processing on the image. That is, after the mobile phone enters the five-axis anti-shake mode, if the mobile phone determines that the second condition is satisfied, the mobile phone exits the five-axis anti-shake mode, and performs image anti-shake processing according to the rotation information.
In some embodiments, in the video mode, the mobile phone may enter the five-axis anti-shake mode by default, and exit the five-axis anti-shake mode after determining that the second condition is satisfied.
For example, the second condition may include any one of the following sub-conditions 1 to 5:
sub-condition 1: the number of the feature points detected by the mobile phone is smaller than or equal to a preset value 2. If the sub-condition 1 is met, it can be shown that the number of feature points on the original image detected by the mobile phone is too small, the first translation vector determined according to the feature points is inaccurate, the determined target translation information is also inaccurate, and the anti-shake processing effect by combining the target translation information is possibly poor, so that the five-axis anti-shake mode can be exited.
Sub-condition 2: and a third translation vector obtained by filtering the first translation vector by adopting a second translation vector corresponding to a motion sensor such as an accelerometer and the like accounts for the proportion of the first translation vector to be less than or equal to a preset value 3. If the sub-condition 2 is satisfied, it may be indicated that the first translation vector and the second translation vector obtained according to the feature point deviate greatly, the feature point may be inaccurate, and thus the determined target translation information may also be inaccurate, and thus the five-axis anti-shake mode may be exited.
Sub-condition 3: the mobile phone selects a fourth translation vector with the similarity larger than or equal to a preset value 1 from the third translation vectors, and the proportion of the fourth translation vector to the third translation vector is smaller than or equal to a preset value 4. If the sub-condition 3 is satisfied, it may indicate that there is a larger local motion between two adjacent frames of original images, and the anti-shake processing effect by combining the target translation information may be poor, so that the five-axis anti-shake mode may be exited.
Sub-condition 4: the variance of the amount of translational compensation between original images of consecutive P (integer greater than 1) frames is greater than or equal to a preset value of 5. If the sub-condition 4 is satisfied, it may indicate that there is a mismatch between the feature points, and the target translation information determined according to the first translation vector corresponding to the feature point is inaccurate, and the anti-shake processing effect performed in combination with the target translation information may be poor, so that the five-axis anti-shake mode may be exited.
Sub-condition 5: the amplitude of the translation between the original images of successive Q (integer greater than 1) frames is greater than or equal to a preset value of 6. If the sub-condition 5 is satisfied, it may indicate that the translation amplitude between consecutive multiframes is too large, the original image may have been blurred or a ghost appears, the matched feature point pair on the original image may be inaccurate or difficult to match, and the target translation information is difficult to determine according to the feature point or the determined target translation information is inaccurate, so that the five-axis anti-shake mode may be exited. For example, when the length of the panning path corresponding to the original images of consecutive Q frames is greater than or equal to the preset value 7 on the initial panning curve of the camera, the mobile phone determines that the panning amplitude is greater than or equal to the preset value 6, and satisfies the sub-condition 5.
For a description of obtaining the rotation information of the camera according to the gyroscope data corresponding to the N frames of original images in step 212, reference may be made to the description of the rotation information in step 208, which is not repeated here. For example, the rotation information of the camera may be a target rotation curve of the camera.
213. The mobile phone calculates the rotation compensation amount of the original image according to the rotation information of the camera.
The description of step 213 may be referred to the related description of step 209, which is not repeated here.
214. And the mobile phone calculates an image stabilizing transformation matrix of the original image according to the rotation compensation quantity.
After the rotation compensation amount is obtained by the mobile phone, the rotation compensation amount can be added into the image stabilizing transformation matrix. It is understood that the image stabilization transformation matrix may include not only rotation compensation amount and translation compensation amount, but also other motion compensation amounts such as RS compensation amount.
Wherein, since the mobile phone calculates the rotation compensation amount of the original image according to the rotation compensation amount obtained from the target rotation curve of the camera and the translation compensation amount obtained from the target translation curve of the camera, in some embodiments, the point on the target translation curve of the camera should include the point on the target rotation curve of the camera, i.e., M is greater than or equal to N.
215. And the mobile phone transforms the original image according to the image stabilizing transformation matrix to obtain a preview image, and displays the preview image on a preview interface.
216. If the mobile phone determines that the second condition is not met, the five-axis anti-shake mode is started, and the steps 203-211 are executed.
That is, after the mobile phone exits the five-axis anti-shake mode, if the mobile phone determines that the second condition disappears, that is, the second condition is not satisfied, the five-axis anti-shake mode is re-entered, so that anti-shake processing is performed in combination with rotation information of the camera obtained according to gyroscope data and target translation information of the camera obtained according to image content, and an image stabilizing effect of the video image is improved.
In this way, in the preview state, the mobile phone can combine the rotation information of the camera obtained according to the gyroscope data and the target translation information of the camera obtained according to the image content to perform anti-shake processing, so as to improve the image stabilizing effect of the preview image presented to the user in the preview state; and exiting the five-axis anti-shake mode when the second condition is satisfied, thereby performing image anti-shake processing according to the rotation information.
Then, the mobile phone enters a shooting process after detecting shooting operation of a user. The method may further comprise step 217:
217. after the mobile phone detects shooting operation of a user, an original image is acquired according to a preset frame rate in the shooting process.
For example, after detecting an operation of clicking the photographing control 700 shown in (C) of fig. 7C by the user, the mobile phone determines that the photographing operation of the user is detected, thereby entering the video photographing process. For another example, after detecting the operation of starting shooting by the voice instruction of the user, the mobile phone determines that the shooting operation of the user is detected, so that the video shooting process is entered.
It can be appreciated that there may be a variety of other ways for triggering the mobile phone to enter the video capturing process, and embodiments of the present application are not limited.
It should be noted that, the image data stream in the shooting process includes a preview stream and a video stream in the video. The preview stream in the video recording is used for displaying recorded images on a shooting interface by a user in the video recording process. The preview stream used in the preview state to present the preview image to the user as opposed to the in-recording preview stream may be referred to as a pre-recording preview stream. The video stream is used to generate video images in a video file.
In some embodiments, the in-video preview stream and the pre-video preview stream are the same data stream. After entering the shooting process, the mobile phone continues to process the state in the preview state, the anti-shake mode and the processing process to generate a recorded and shot image. For example, if the mobile phone is in the five-axis anti-shake mode before entering the shooting process, and the mobile phone generates a 100 th frame preview image; after entering the shooting process, the mobile phone is still in the five-axis anti-shake mode, and a 101 st frame preview image is generated by combining the original image before entering the shooting process, wherein the 101 st frame preview image is the 1 st frame recorded image on the shooting interface. If the mobile phone exits the five-axis anti-shake mode when entering the shooting process, the mobile phone is still in a state of exiting the five-axis anti-shake mode after entering the shooting process.
In other embodiments, the in-recording preview stream is not the same data stream as the pre-recording preview stream. After entering the shooting process, the mobile phone stops the preview flow before video recording and starts the preview flow in video recording. Similarly to the preview state, the mobile phone performs anti-shake processing according to the original image in the preview stream before video recording by combining the rotation information of the camera obtained according to the gyroscope data and the target translation information of the camera obtained according to the image content so as to generate and display a preview image; in the shooting process, the mobile phone can perform anti-shake processing according to the original image in the preview stream in the video and by combining the rotation information of the camera obtained according to the gyroscope data and the target translation information of the camera obtained according to the image content, thereby generating and displaying the video. And the mobile phone can restart the five-axis anti-shake mode according to the original image acquired in the shooting process. For example, before entering a shooting process, the mobile phone is in a five-axis anti-shake mode, and the mobile phone generates a 100 th frame preview image; after entering the shooting process, the mobile phone restarts the five-axis anti-shake mode, generates a 1 st frame of recorded image according to the original image acquired in the shooting process, and displays the 1 st frame of recorded image on a recorded interface. For example, referring to FIG. 2, after step 217, the method may further include the following steps 218-233 for the preview stream in the video:
218. In the shooting process, the mobile phone calculates a first translation vector according to feature points on two adjacent frames of original images in a five-axis anti-shake mode.
In some embodiments, if the five-axis anti-shake mode is turned on before the mobile phone enters the shooting process, the five-axis anti-shake mode is continuously turned on after the mobile phone enters the shooting process. In other embodiments, the five-axis anti-shake mode is turned on by default during the shooting process of the mobile phone. In other embodiments, after detecting the operation of starting the five-axis anti-shake mode by the user during shooting, the mobile phone starts the five-axis anti-shake mode. For example, the shooting interface comprises a five-axis anti-shake control, and the mobile phone starts a five-axis anti-shake mode after detecting the operation of clicking the control by a user.
219. And the mobile phone calculates a second translation vector according to the motion sensor data corresponding to the adjacent two frames of original images.
220. The mobile phone selects a third translation vector from the first translation vectors, wherein the third translation vector is positioned in delta adjacent to the second translation vector.
221. The mobile phone selects a fourth translation vector with the similarity larger than or equal to a preset value 1 from the third translation vectors.
222. And the mobile phone determines a target translation vector according to the feature points corresponding to the fourth translation vector.
223. And the mobile phone determines the target translation information of the camera according to the target translation vectors of the continuous M' frame original images.
Wherein M' and M may be the same or different.
224. And the mobile phone calculates the translation compensation quantity of the original image according to the target translation information of the camera.
225. And the mobile phone obtains the rotation information of the camera according to the gyroscope data corresponding to the N' frame original image.
Wherein N 'and N may be the same or different, N1' and N1 may be the same or different, I 'and I may be the same or different, and N2' and N2 may be the same or different.
226. The mobile phone calculates the rotation compensation amount of the original image according to the rotation information of the camera.
227. And the mobile phone calculates an image stabilizing transformation matrix of the original image according to the translation compensation quantity and the rotation compensation quantity.
228. The mobile phone transforms the original image according to the image stabilizing transformation matrix to obtain a recorded image, and the recorded image is displayed on a shooting interface.
229. And if the mobile phone determines that the second condition is met, exiting the five-axis anti-shake mode, and obtaining the rotation information of the camera according to the gyroscope data corresponding to the N' frame original image.
230. The mobile phone calculates the rotation compensation amount of the original image according to the rotation information of the camera.
231. And the mobile phone calculates an image stabilizing transformation matrix of the original image according to the rotation compensation quantity.
232. The mobile phone transforms the original image according to the image stabilizing transformation matrix to obtain a recorded image, and the recorded image is displayed on a shooting interface.
233. If the handset determines that the second condition is not met, steps 218-228 are performed.
It should be noted that, the descriptions of steps 218-233 may refer to the descriptions related to steps 201-216, and are not repeated here.
In the shooting process, the mobile phone can obtain the target pose of the camera corresponding to the N1+1st frame original image according to the initial pose of the camera corresponding to the N1+I+N2 frame original image, so that the image stabilizing transformation matrix corresponding to the N1+1st frame original image can be obtained according to the target pose of the camera corresponding to the N1+1st frame original image, and further the 1 st frame recorded image corresponding to the N1+1st frame original image after warp transformation is performed. That is, the initial frame of the I-frame original image corresponds to the N1+1-th frame original image in the N-frame original images, and the recorded image displayed by the mobile phone is delayed by at least N2 frames compared with the original image acquired by the mobile phone.
In other embodiments, immediately after entering the shooting process, in the process of acquiring the first N frames of original images by the mobile phone, each frame of original image in the first N1 frames of original images is displayed as a recorded image on the shooting interface. Subsequently, after the mobile phone acquires a new original image, generating a recorded image corresponding to the I frame original image according to the N frame original image by adopting the method described in the embodiment, and displaying the generated recorded image on a shooting interface.
In other embodiments, during the shooting process, the frame number of the recorded image displayed on the shooting interface by the mobile phone corresponds to the frame number of the original image, i.e. N2 is 0 and I is 1. That is, after the mobile phone collects a frame of original image, the recorded image corresponding to the frame of original image is displayed. Further, N1 may be small, for example, 5, 8, 10, or the like. Under the shooting process, before the mobile phone acquires the N1 frame of original image, the mobile phone combines the acquired original image under the shooting process to generate a recorded image corresponding to the current original image and display the recorded image.
For example, after the mobile phone acquires the 1 st frame of original image immediately after entering the shooting process, the 1 st frame of original image is displayed on a shooting interface as the 1 st frame of recorded image. After the mobile phone collects the 2 nd frame original image, generating a 2 nd frame recorded image corresponding to the 2 nd frame original image according to the 1 st to 2 nd frame original image, and displaying the 2 nd frame recorded image on a shooting interface. After the mobile phone collects the 3 rd frame original image, generating a 3 rd frame recorded image corresponding to the 3 rd frame original image according to the 1 st to 3 rd frame original images, and displaying the 3 rd frame recorded image on a shooting interface. After the mobile phone collects the N1+1st frame original image, generating an N1+1st frame recorded image according to the 1 st to N1+1st frame original image, and displaying the N1+1st frame recorded image on a shooting interface. Subsequently, after the mobile phone acquires a new original image, generating a recorded image corresponding to the new original image by combining the N1 frame original image before the new original image and the new original image, and displaying the generated recorded image on a shooting interface.
In the case where the recorded image is obtained by performing image transformation based on the image stabilizing matrix calculated from the target pose on the smoothed target rotation curve of the camera, since the smoothed target rotation curve of the camera satisfies the sub constraint condition (1), the entire transition between the warp-transformed recorded images obtained based on the target rotation curve of the camera is smoothed. The smoothed target rotation curve of the camera satisfies the sub constraint condition (2), so that the warp-transformed recorded image obtained based on the target rotation curve of the camera does not exceed the clipping boundary.
In this way, in the shooting process, the mobile phone can combine the rotation information of the camera obtained according to the gyroscope data and the target translation information of the camera obtained according to the image content to perform anti-shake processing, so that the image stabilizing effect of the recorded image presented to the user in the shooting process is improved, and the recording effect of the video image generated according to the recorded image is improved; and exiting the five-axis anti-shake mode when the second condition is satisfied, thereby performing image anti-shake processing according to the rotation information.
During shooting, the processing of the video stream and the preview stream in the video recording are parallel and independent. For the video stream, regardless of the state and the processing procedure of the mobile phone before the shooting process, the mobile phone can judge whether to exit or enter the five-axis anti-shake mode again according to the original image acquired in the shooting process, and after entering the five-axis anti-shake mode, the mobile phone performs anti-shake processing by combining the rotation information of the camera obtained according to the gyroscope data and the target translation information of the camera obtained according to the image content so as to generate a video image. For example, after entering the shooting process, the mobile phone restarts the five-axis anti-shake mode, and generates and stores a 1 st frame of video image according to the original image collected in the shooting process in the five-axis anti-shake mode, so as to generate a video file according to the stored video image after stopping shooting. For example, referring to fig. 2, after step 217, the method may further comprise the following steps 218'-233':
218', in the shooting process, the mobile phone calculates a first translation vector according to the feature points on the adjacent two frames of original images in the five-axis anti-shake mode.
In some embodiments, if the five-axis anti-shake mode is turned on before the mobile phone enters the shooting process, the five-axis anti-shake mode is continuously turned on after the mobile phone enters the shooting process. In other embodiments, the five-axis anti-shake mode is turned on by default during the shooting process of the mobile phone. In other embodiments, after detecting the operation of starting the five-axis anti-shake mode by the user during shooting, the mobile phone starts the five-axis anti-shake mode. For example, the shooting interface comprises a five-axis anti-shake control, and the mobile phone starts a five-axis anti-shake mode after detecting the operation of clicking the control by a user.
219', the mobile phone calculates a second translation vector according to the motion sensor data corresponding to the adjacent two frames of original images.
220', the handset selects a third translation vector from the first translation vectors, the third translation vector being located within delta of the second translation vector.
221', the mobile phone selects a fourth translation vector with the similarity larger than or equal to a preset value 1 from the third translation vectors.
222', the mobile phone determines the target translation vector according to the feature point corresponding to the fourth translation vector.
223', the mobile phone determines the target translation information of the camera according to the target translation vectors of the continuous M' frame original images.
Wherein M' and M may be the same or different.
224', the mobile phone calculates the translation compensation amount of the original image according to the target translation information of the camera.
225', the mobile phone obtains the rotation information of the camera according to the gyroscope data corresponding to the N' frame original image.
Wherein N ' and N may be the same or different, N1' and N1 may be the same or different, and N2' and N2 may be the same or different.
226', the mobile phone calculates the rotation compensation amount of the original image according to the rotation information of the camera.
227', the mobile phone calculates the image stabilizing transformation matrix of the original image according to the translation compensation amount and the rotation compensation amount.
228', the mobile phone transforms the original image according to the image stabilizing transformation matrix to obtain a recorded image, and displays the recorded image on a shooting interface.
229', if the mobile phone determines that the second condition is met, exiting the five-axis anti-shake mode, and obtaining rotation information of the camera according to gyroscope data corresponding to the N' frame of original image.
230', the mobile phone calculates the rotation compensation amount of the original image according to the rotation information of the camera.
231', the mobile phone calculates the image stabilizing transformation matrix of the original image according to the rotation compensation quantity.
232', the mobile phone transforms the original image according to the image stabilizing transformation matrix to obtain a recorded image, and the recorded image is displayed on a shooting interface.
233', if the handset determines to stop satisfying the second condition, then steps 218' -228' described above are performed.
It should be noted that, the descriptions of steps 218'-233' may refer to the descriptions related to steps 201-216, and are not repeated here.
Similar to the preview image and the recorded image, in the shooting process, the mobile phone can obtain the target pose of the camera corresponding to the N1+1st frame original image according to the initial pose of the camera corresponding to the N1+I+N2 frame original image, so that the image stabilizing transformation matrix corresponding to the N1+1st frame original image can be obtained according to the target pose of the camera corresponding to the N1+1st frame original image, and the corresponding 1 st recorded image after the warp transformation of the N1+1st frame original image is obtained. That is, the initial frame of the I-frame original image corresponds to the N1+1-th frame original image in the N-frame original images, and the recorded image displayed by the mobile phone is delayed by at least N2 frames compared with the original image acquired by the mobile phone.
The video image does not need to be presented to the user in real time in the shooting process, so that the anti-shake processing time length and the processing delay can be longer, and the number of frames of the original image can be more. To obtain better quality video images, S "may be greater than S, M" may be greater than M, N "may be greater than N, N1" may be greater than N1, I' may be greater than I, and/or N2 "may be greater than N2. For example, for preview streams, N2 or N2' may be an integer of 0 or less (e.g., 2 or 3, etc.) to reduce latency; for video streams, N2 "may be a larger integer (e.g., 27 or 15, etc.) to improve anti-shake processing.
In other embodiments, immediately after entering the shooting process, in the process of acquiring the first N frames of original images by the mobile phone, each frame of original image in the first N1 frames of original images is taken as a video image and stored. Subsequently, after the mobile phone acquires a new original image, the method described in the above embodiment is adopted to generate a video image corresponding to the I-frame original image according to the N-frame original image.
In other embodiments, during the shooting process, the frame number of the video image generated by the mobile phone corresponds to the frame number of the original image, i.e. N2 is 0 and I is 1. That is, after the mobile phone acquires a frame of original image, the video image corresponding to the frame of original image is displayed. Further, N1 may be small, for example, 5, 8, 10, or the like. Under the shooting process, before the mobile phone acquires the N1 frame of original image, the mobile phone combines the acquired original image under the shooting process to generate and store a video image corresponding to the current original image.
For example, immediately after entering a shooting process, the mobile phone acquires the 1 st frame of original image, and then takes the 1 st frame of original image as the 1 st frame of video image. And after the mobile phone acquires the 2 nd frame of original image, generating a 2 nd frame of video image corresponding to the 2 nd frame of original image according to the 1 st to 2 nd frames of original images. And after the mobile phone acquires the 3 rd frame of original image, generating a 3 rd frame of video image corresponding to the 3 rd frame of original image according to the 1 st to 3 rd frame of original image. And after the mobile phone acquires the N1+1 frame of original image, generating the N1+1 frame of video image according to the 1 st to N1+1 frame of original image. Subsequently, after the mobile phone acquires a new original image, combining the N1 frame original image before the new original image and the new original image to generate a video image corresponding to the new original image.
In the case where the video image is obtained by performing image transformation based on the image stabilizing matrix calculated from the target pose on the smoothed target rotation curve of the camera, since the smoothed target rotation curve of the camera satisfies the sub constraint condition (1), the entire transition between the warp transformed video images obtained based on the target rotation curve of the camera is smoothed. The smoothed target rotation curve of the camera satisfies the sub constraint condition (2), so that the warp transformed video image obtained based on the target rotation curve of the camera does not exceed the clipping boundary.
In this way, in the shooting process, the mobile phone can combine the rotation information of the camera obtained according to the gyroscope data and the target translation information of the camera obtained according to the image content to perform anti-shake processing, so that the image stabilizing effect of the recorded image presented to the user in the shooting process is improved, and the recording effect of the video image generated according to the recorded image is improved; and exiting the five-axis anti-shake mode when the second condition is satisfied, thereby performing image anti-shake processing according to the rotation information.
234. And after shooting, the mobile phone generates a video file according to the video image.
The mobile phone determines that shooting is finished after detecting that a user clicks a shooting stopping control on a shooting interface. It can be understood that the shooting stopping operation may also be other gesture operations or user voice indication operations, and the operations for triggering the mobile phone to end the shooting process are not limited in the embodiments of the present application. And after shooting, the mobile phone generates a video according to the recorded image.
Under the condition that the five-axis anti-shake mode is adopted for anti-shake processing in the shooting process, after shooting is finished, the mobile phone is an image after the anti-shake processing is carried out according to the video image in the generated video file by combining the rotation information of the camera obtained according to the gyroscope data and the target translation information of the camera obtained according to the image content, so that the image stabilizing effect is good, and the shooting experience of a user can be improved.
For example, the effect of the video image of the mobile phone without the anti-shake process can be shown in (a) - (c) of fig. 8A. As shown in fig. 8A, in the preview process, the photographed object is translated and rotated, and the original image is dithered. The effect of the video image of the mobile phone subjected to the anti-shake processing according to the rotation information of the camera can be shown in (a) - (c) of fig. 8B. The effect of the anti-shake processing of the video image after the five-axis anti-shake mode is turned on in the mobile phone can be seen in fig. 8C (a) - (C).
As described above, after the mobile phone starts/exits the five-axis anti-shake mode, the user may be prompted by means of display information, voice broadcast or vibration. For example, referring to fig. 9, in the shooting process, after the mobile phone starts the five-axis anti-shake mode, the user may be prompted on the shooting interface through text information: five-axis anti-shake is started, translation and rotation anti-shake can be performed-!
In the scheme described in the steps 200-234, the mobile phone can perform anti-shake processing in combination with the rotation information of the camera obtained according to the gyroscope data and the target translation information of the camera obtained according to the image content in the preview state and the shooting process, so as to improve the image stabilizing effect of the preview image and the recorded image presented to the user, improve the image stabilizing effect of the generated video image and improve the shooting experience of the user.
Particularly, under the scene that the influence of translational motion such as shooting close range or adopting a long-focus camera to record video on image shake is large, the video recording method provided by the embodiment of the application is combined with the rotation information of the camera obtained according to gyroscope data and the target translational information of the camera obtained according to image content to perform anti-shake processing, so that the image anti-shake and image stabilizing effects are better.
In other embodiments, in the video mode, if the mobile phone determines that the second condition is satisfied, the mobile phone does not perform the image anti-shake processing according to the target translation information and the rotation information.
The above description uses the method provided by the embodiment of the application as an example when the mobile phone is in the preview state of the video mode and in the shooting process, and the anti-shake processing is performed in combination with the rotation information of the camera obtained according to the gyroscope data and the target translation information of the camera obtained according to the image content. In other embodiments, the mobile phone does not adopt the method provided by the embodiment of the application to perform anti-shake processing on the preview stream before video recording in the preview state of the video recording mode, but starts the method provided by the embodiment of the application to perform anti-shake processing on the preview stream in video recording during shooting. Referring to fig. 10, steps 200-216 shown in fig. 2 above may be replaced with step 200A:
200A, after a shooting function is started, the mobile phone enters a video recording mode, an original image is collected according to a preset frame rate in a preview state, a preview image is generated according to the original image, and the preview image is displayed on a preview interface.
In the scheme, the mobile phone can combine the rotation information of the camera obtained according to the gyroscope data and the target translation information of the camera obtained according to the image content in the shooting process to perform anti-shake processing, so that the image stabilizing effect of the recorded and shot image presented to the user is improved, the image stabilizing effect of the generated video image is improved, and the shooting experience of the user is improved.
In other embodiments, the mobile phone does not perform anti-shake processing on the pre-video preview stream and the in-video preview stream in the video recording mode by adopting the method provided by the embodiment of the present application, but starts the anti-shake processing on the video stream in the shooting process by adopting the method provided by the embodiment of the present application. Referring to fig. 11, the method may include the above-described step 200A, the following step 217A, and steps 218' -234.
And 217A, after the mobile phone detects shooting operation of a user, acquiring an original image according to a preset frame rate in the shooting process, generating a recorded image according to the original image, and displaying the recorded image on a shooting interface.
In the scheme, the mobile phone can combine the rotation information of the camera obtained according to the gyroscope data and the target translation information of the camera obtained according to the image content in the shooting process to perform anti-shake processing, so that the image stabilizing effect of the video image is improved, the image stabilizing effect of the generated video image is improved, and the shooting experience of a user is improved.
In some embodiments of the present application, the five-axis anti-shake processed video file stored in the mobile phone may be specifically identified different from other video files, so as to facilitate a user to intuitively learn the five-axis anti-shake processed video file. For example, referring to fig. 12 (a), the video file generated by the five-axis anti-shake processing has a text label 1201 of "wzfd" displayed thereon. For another example, referring to (b) of fig. 12, a text label 1202 of "pyfd" is displayed on the five-axis anti-shake processed video file.
In other embodiments, the mobile phone may perform anti-shake processing according to only the target translation information of the camera obtained by the image content without combining the rotation information of the camera, so as to suppress image shake caused by camera translation caused by user hand shake or mobile phone shake. For example, the mobile phone may calculate a panning compensation amount according to target panning information of the camera obtained from the image content, calculate an image stabilizing transformation matrix according to the panning compensation amount, and perform warp transformation on the original image according to the image stabilizing transformation matrix so as to implement panning anti-shake.
In combination with the foregoing embodiments and the corresponding drawings, another embodiment of the present application provides a photographing method, which may be implemented in an electronic device having a hardware structure shown in fig. 1, where the electronic device includes a camera, and the camera includes a camera. As shown in fig. 13, the method may include:
1301. the electronic device collects the original image after the video recording function is started.
After the video recording function is started, the electronic device can continuously acquire the original image according to a preset acquisition frame rate.
1302. And the electronic equipment obtains the target translation information of the camera according to the acquired image information of the multi-frame original image.
Wherein the object translation information of the camera can be used to represent the translation of the camera. For example, the target translation information of the camera may be a target translation curve of the camera described above.
1303. And the electronic equipment obtains the rotation information of the camera according to the posture sensor data corresponding to the multi-frame original image.
Wherein the rotation information of the camera is used to represent the rotation of the camera. For example, the rotation information of the camera may be a target rotation curve of the camera.
For example, the attitude sensor may be a gyroscope, and the rotation information of the camera may be a target rotation curve of the camera.
1304. And the electronic equipment calculates an image stabilizing transformation matrix of a first original image according to the target translation information of the camera and the rotation information of the camera, wherein the first original image is an image in a plurality of frames of original images.
The image stabilizing transformation matrix is used for performing motion compensation and warp transformation on the first original image.
1305. And the electronic equipment performs image transformation on the first original image according to the image stabilizing transformation matrix to obtain a target image.
For example, the target image may be a preview image, a recorded image, or a video image in a video recording scene.
In the scheme, the electronic equipment can perform anti-shake processing by combining the rotation information of the camera and the target translation information of the camera obtained according to the image content under the video recording scene, so that image shake caused by manual operation of a user or shaking of the electronic equipment is reduced, the image stabilizing effect of a video image is improved, and the shooting experience of the user is improved.
The above description is given by taking the electronic device as the mobile phone, and is not limited to the mobile phone, and the anti-shake processing can be performed on other electronic devices such as the smart watch or the tablet computer by adopting the above method, which is not described herein again.
It will be appreciated that in order to achieve the above-described functionality, the electronic device comprises corresponding hardware and/or software modules that perform the respective functionality. The steps of an algorithm for each example described in connection with the embodiments disclosed herein may be embodied in hardware or a combination of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Those skilled in the art may implement the described functionality using different approaches for each particular application in conjunction with the embodiments, but such implementation is not to be considered as outside the scope of this application.
The present embodiment may divide the functional modules of the electronic device according to the above method example, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated modules described above may be implemented in hardware. It should be noted that, in this embodiment, the division of the modules is schematic, only one logic function is divided, and another division manner may be implemented in actual implementation.
Embodiments of the present application also provide an electronic device including one or more processors and one or more memories. The one or more memories are coupled to the one or more processors, the one or more memories being configured to store computer program code comprising computer instructions that, when executed by the one or more processors, cause the electronic device to perform the relevant method steps described above to implement the video recording method of the above embodiments.
Embodiments of the present application also provide a computer-readable storage medium having stored therein computer instructions that, when executed on an electronic device, cause the electronic device to perform the above-described related method steps to implement the video recording method in the above-described embodiments.
Embodiments of the present application also provide a computer program product, which when run on a computer, causes the computer to perform the above-mentioned related steps to implement the video recording method performed by the electronic device in the above-mentioned embodiments.
In addition, embodiments of the present application also provide an apparatus, which may be specifically a chip, a component, or a module, and may include a processor and a memory connected to each other; the memory is used for storing computer-executed instructions, and when the device is operated, the processor can execute the computer-executed instructions stored in the memory, so that the chip executes the video recording method executed by the electronic device in the above method embodiments.
The electronic device, the computer readable storage medium, the computer program product or the chip provided in this embodiment are used to execute the corresponding method provided above, so that the beneficial effects thereof can be referred to the beneficial effects in the corresponding method provided above, and will not be described herein.
It will be appreciated by those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional modules is illustrated, and in practical application, the above-described functional allocation may be performed by different functional modules according to needs, i.e. the internal structure of the apparatus is divided into different functional modules to perform all or part of the functions described above.
In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another apparatus, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and the parts displayed as units may be one physical unit or a plurality of physical units, may be located in one place, or may be distributed in a plurality of different places. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a readable storage medium. Based on such understanding, the technical solution of the embodiments of the present application may be essentially or a part contributing to the prior art or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, including several instructions for causing a device (may be a single-chip microcomputer, a chip or the like) or a processor (processor) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read Only Memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (12)

1. A video recording method applied to an electronic device, the electronic device including a camera, the method comprising:
collecting an original image after a video recording function is started;
calculating a first translation vector according to feature points on two adjacent frames of original images in the acquired multi-frame original images;
calculating a second translation vector according to the motion sensor data corresponding to the two adjacent frames of original images;
selecting a third translation vector from the first translation vectors, wherein the third translation vector is positioned in delta adjacent to the second translation vector;
selecting a fourth translation vector with similarity greater than or equal to a first preset value from the third translation vectors;
obtaining target translation vectors corresponding to the adjacent two frames of original images according to the feature points corresponding to the fourth translation vectors; the target translation vectors between the continuous multi-frame original images are connected to form an original translation curve of the camera;
obtaining a target translation curve of the camera according to the original translation curve of the camera;
acquiring rotation information of a camera according to posture sensor data corresponding to the multi-frame original image;
according to the target translation curve of the camera and the rotation information of the camera, calculating an image stabilizing transformation matrix of a first original image, wherein the first original image is an image in the multi-frame original image;
And carrying out image transformation on the first original image according to the image stabilizing transformation matrix to obtain a target image.
2. The method of claim 1, wherein the capturing the original image after the video recording function is turned on comprises:
after a video recording function is started and shooting operation of a user is detected, an original image is acquired;
the method further comprises the steps of:
and after the shooting stopping operation of the user is detected, generating a video file according to a video image, wherein the video image is the target image.
3. The method of claim 1, wherein the capturing the original image after the video recording function is turned on comprises:
after a video recording function is started and shooting operation of a user is detected, an original image is acquired;
the method further comprises the steps of:
and displaying a recorded image on a shooting interface, wherein the recorded image is the target image.
4. The method of claim 1, wherein the target image is a preview image, the method further comprising:
and displaying the preview image on a preview interface.
5. The method of any of claims 1-4, wherein the computing a stationary transformation matrix of the first raw image from the target translation curve of the camera and the rotation information of the camera comprises:
Calculating the translation compensation amount of the first original image according to the target translation curve of the camera;
calculating a rotation compensation amount of the first original image according to the rotation information of the camera;
and calculating an image stabilizing transformation matrix of the first original image according to the translation compensation quantity and the rotation compensation quantity.
6. The method according to any one of claims 1-4, further comprising:
if the preset condition is met, acquiring rotation information of the camera according to the posture sensor data corresponding to the multi-frame original image;
calculating an image stabilizing transformation matrix of the first original image according to the rotation information of the camera;
and carrying out image transformation on the first original image according to the image stabilizing transformation matrix to obtain a target image.
7. The method of claim 6, wherein the method further comprises:
if the preset condition is met, prompting the user to exit the target anti-shake mode.
8. The method of claim 7, wherein the preset conditions include at least one of:
the number of the characteristic points on the two adjacent frames of original images is smaller than or equal to a second preset value;
or the proportion of the third translation vector corresponding to the two adjacent frames of original images to the first translation vector is smaller than or equal to a third preset value;
Or the proportion of the fourth translation vector of the adjacent two frames of original images to the third translation vector is smaller than or equal to a fourth preset value;
or, the variance of the translation compensation amount between the original images of the continuous P frames is larger than or equal to a fifth preset value, and P is an integer larger than 1;
alternatively, the translation amplitude between the original images of consecutive Q frames is greater than or equal to a sixth preset value, Q being an integer greater than 1.
9. The method according to any one of claims 1-4 or 7-8, wherein obtaining rotation information of the camera from the attitude sensor data corresponding to the multi-frame raw image includes:
acquiring rotation information of the camera according to gesture sensor data corresponding to N frames of original images, wherein N is an integer greater than 1, N=N1+I+N2, N1 and I are positive integers, and N2 is a non-negative integer;
the calculating the image stabilizing transformation matrix of the first original image according to the rotation information of the camera comprises the following steps:
according to the target pose of the camera corresponding to the N frame original images on the rotation information of the camera, calculating an image stabilizing transformation matrix of the I frame original images, wherein the I frame original images are the first original images, the image stabilizing transformation matrix of the I frame original images is used for obtaining the target images of the I frames, and a starting frame of the I frame original images corresponds to an N1+1st frame original image in the N frame original images.
10. The method of claim 9, wherein when the target image is a preview image or a recorded image, the N2 is 0.
11. An electronic device, comprising:
the camera comprises a camera head, wherein the camera head is used for acquiring images;
the screen is used for displaying an interface;
one or more processors;
a memory;
and one or more computer programs, wherein the one or more computer programs are stored in the memory, the one or more computer programs comprising instructions that, when executed by the electronic device, cause the electronic device to perform the video recording method of any of claims 1-10.
12. A computer readable storage medium comprising computer instructions which, when run on a computer, cause the computer to perform the video recording method as claimed in any one of claims 1 to 10.
CN202011057718.6A 2020-09-29 2020-09-29 Video recording method and equipment Active CN114339102B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011057718.6A CN114339102B (en) 2020-09-29 2020-09-29 Video recording method and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011057718.6A CN114339102B (en) 2020-09-29 2020-09-29 Video recording method and equipment

Publications (2)

Publication Number Publication Date
CN114339102A CN114339102A (en) 2022-04-12
CN114339102B true CN114339102B (en) 2023-06-02

Family

ID=81010961

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011057718.6A Active CN114339102B (en) 2020-09-29 2020-09-29 Video recording method and equipment

Country Status (1)

Country Link
CN (1) CN114339102B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115242981A (en) * 2022-07-25 2022-10-25 维沃移动通信有限公司 Video playing method, video playing device and electronic equipment
CN115134534B (en) * 2022-09-02 2022-11-18 深圳前海鹏影数字软件运营有限公司 Video uploading method, device, equipment and storage medium based on e-commerce platform
CN117135459A (en) * 2023-04-07 2023-11-28 荣耀终端有限公司 Image anti-shake method and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106331480A (en) * 2016-08-22 2017-01-11 北京交通大学 Video image stabilizing method based on image stitching
CN110610465A (en) * 2019-08-26 2019-12-24 Oppo广东移动通信有限公司 Image correction method and device, electronic equipment and computer readable storage medium
CN111314604A (en) * 2020-02-19 2020-06-19 Oppo广东移动通信有限公司 Video anti-shake method and apparatus, electronic device, computer-readable storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7705884B2 (en) * 2004-07-21 2010-04-27 Zoran Corporation Processing of video data to compensate for unintended camera motion between acquired image frames
JP6700872B2 (en) * 2016-03-07 2020-05-27 キヤノン株式会社 Image blur correction apparatus and control method thereof, image pickup apparatus, program, storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106331480A (en) * 2016-08-22 2017-01-11 北京交通大学 Video image stabilizing method based on image stitching
CN110610465A (en) * 2019-08-26 2019-12-24 Oppo广东移动通信有限公司 Image correction method and device, electronic equipment and computer readable storage medium
CN111314604A (en) * 2020-02-19 2020-06-19 Oppo广东移动通信有限公司 Video anti-shake method and apparatus, electronic device, computer-readable storage medium

Also Published As

Publication number Publication date
CN114339102A (en) 2022-04-12

Similar Documents

Publication Publication Date Title
CN114339102B (en) Video recording method and equipment
US10609273B2 (en) Image pickup device and method of tracking subject thereof
CN113747050B (en) Shooting method and equipment
CN113630545B (en) Shooting method and equipment
US8400532B2 (en) Digital image capturing device providing photographing composition and method thereof
CN115701125B (en) Image anti-shake method and electronic equipment
CN114419073B (en) Motion blur generation method and device and terminal equipment
CN115209057B (en) Shooting focusing method and related electronic equipment
CN115061770A (en) Method and electronic device for displaying dynamic wallpaper
CN113747044B (en) Panoramic shooting method and equipment
CN114390186B (en) Video shooting method and electronic equipment
CN114449151B (en) Image processing method and related device
CN114390188B (en) Image processing method and electronic equipment
CN114339101B (en) Video recording method and equipment
CN111385481A (en) Image processing method and device, electronic device and storage medium
CN115546043B (en) Video processing method and related equipment thereof
CN114390191B (en) Video recording method, electronic device and storage medium
CN115484383B (en) Shooting method and related device
CN117278839A (en) Shooting method, electronic equipment and storage medium
CN114125298A (en) Video generation method and device, electronic equipment and computer readable storage medium
EP4280154A1 (en) Image blurriness determination method and device related thereto
CN115150542B (en) Video anti-shake method and related equipment
CN116012262B (en) Image processing method, model training method and electronic equipment
CN117714840A (en) Image processing method, device, chip, electronic equipment and medium
CN117135459A (en) Image anti-shake method and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant