CN117221722A

CN117221722A - Video anti-shake method and electronic equipment

Info

Publication number: CN117221722A
Application number: CN202210628375.7A
Authority: CN
Inventors: 段光菲; 郑淇; 胡斌; 刘蒙
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2022-06-01
Filing date: 2022-06-01
Publication date: 2023-12-12

Abstract

The application provides a video anti-shake method and electronic equipment. According to the technical scheme, the video is smoothed, so that the visual experience of a user when watching a preview picture or a shot video is improved, and the watching requirement of the user is met.

Description

Video anti-shake method and electronic equipment

Technical Field

The embodiment of the application relates to the technical field of terminals, in particular to a video anti-shake method and electronic equipment.

Background

When a user holds the electronic equipment to shoot video, the video shot by the electronic equipment can also shake to different degrees due to the shake of the hands of the user, so that the look and feel of the video are affected. The video anti-shake method can enable shot video pictures to be stable, reduce shake of the video pictures in the left-right direction, the up-down direction and the like, help a user to shoot high-quality videos, and improve experience of shooting and watching the videos of the user.

Disclosure of Invention

The application provides a video anti-shake method and electronic equipment, which can enable video pictures displayed in the electronic equipment to be more stable, and improve the visual experience of a user when watching preview pictures or photographed videos so as to meet the watching requirements of the user.

In a first aspect, the present application provides a method for video anti-shake, where the method is applicable to an electronic device, where the electronic device may receive a first operation, and the electronic device displays a first interface provided by a camera application, where the first interface displays a preview area. The electronic device collects images through the camera. The electronic device determines the pose of the camera in the process of acquiring the images. The camera pose is used for representing the position and the pose of the camera under the condition that the electronic equipment shakes. And the electronic equipment performs smoothing treatment on the camera pose to obtain a smoothed camera pose. The smooth camera pose is used for representing the position and the pose of the camera under the condition that the electronic equipment does not generate shake, or the generated shake is smaller than the shake. And the electronic equipment processes the image according to the pose of the smoothing camera and the depth information of the image to obtain a smooth image. The electronic device displays the smooth image in the preview area.

According to the method, the electronic equipment obtains the camera pose of the camera when the image is acquired, performs smoothing treatment on the camera pose to obtain the smooth camera pose, performs mapping treatment on the acquired image according to the smooth camera pose, and displays a stable image video formed by the smooth image in the preview interface. Thus, the electronic device can perform anti-shake processing on the photographed image by estimating the camera pose under the condition that no shake is generated or the generated shake is small when the image is photographed.

And because the shutter time can be improved when the electronic equipment is shot under the dark light, the video blurring can be caused by the tiny shake, so that the video shot under the dark light is easier to shake than the video shot under the normal light, namely, the small-amplitude shake of the electronic equipment can also cause the shot video to shake greatly.

Therefore, when the conditions of large-amplitude jitter, video with darker ambient light and the like are processed, the jitter reducing effect of the anti-jitter method is more obvious, and the accuracy and the robustness of the video anti-jitter can be improved.

In a second aspect, the present application provides a method for video anti-shake, where the method is applicable to an electronic device, and in the method, the electronic device receives a second operation, and the electronic device displays a second interface provided by a camera application, where the second interface displays a video capturing area. The electronic device collects images through the camera. The electronic device determines the pose of the camera in the process of acquiring the images. The camera pose is used for representing the position and the pose of the camera under the condition that the electronic equipment shakes. And the electronic equipment performs smoothing treatment on the camera pose to obtain a smoothed camera pose. The smooth camera pose is used for representing the position and the pose of the camera under the condition that the electronic equipment does not generate shake, or the generated shake is smaller than the shake. And the electronic equipment processes the image according to the pose of the smoothing camera and the depth information of the image to obtain a smooth image. The electronic device displays the smooth image in the video capturing area.

The method can be used for preventing the video from being trembled not only in shooting previews or scenes of shooting video previews, but also in scenes of shooting video. In both the above scenarios, the electronic device may process the image collected by the electronic device to obtain a smooth image.

With reference to the first aspect and the second aspect, in some embodiments, an image acquired by the electronic device through the camera includes multiple frames of images, and a smooth image processed by the electronic device also includes multiple frames of images.

With reference to the first aspect and the second aspect, in some embodiments, the image acquired by the electronic device through the camera includes a first image acquired at a first time and a second image acquired at a second time. The electronic device may determine a camera pose in the process of capturing an image, and may specifically include: the electronic device may determine a camera pose change from the first image and the second image, and the camera pose change may be used to represent a change in camera pose of the camera at the first time and the camera pose at the second time. The electronic equipment can determine the IMU pose change amount through the IMU data acquired by the inertial measurement unit IMU between the first moment and the second moment, the IMU pose change amount can be used for representing the change amount between the IMU pose of the IMU at the first moment and the IMU pose of the IMU at the second moment, and the IMU pose can be used for representing the position and the pose of the IMU under the condition that the electronic equipment generates the shake. The electronic device may calibrate the camera pose variation using the IMU pose variation to obtain a calibrated camera pose variation. The electronic equipment can obtain the camera pose at the second moment according to the camera pose at the first moment and the calibrated camera pose variation. The electronic device may perform smoothing processing on the camera pose to obtain a smoothed camera pose, which may specifically include: and the electronic equipment performs smoothing treatment on the camera pose at the second moment to obtain a second smooth camera pose. The second smooth camera pose may be used to characterize the position and pose of the camera at the second moment in the case where the electronic device is not producing jitter, or where the produced jitter is smaller than the above-mentioned jitter. The electronic device may process the image according to the pose of the smoothing camera and the depth information of the image to obtain a smoothed image, and may specifically include: the electronic device may process the second image according to the pose of the second slide camera and depth information of the second image, to obtain a second slide image. The smooth image displayed by the electronic device in the preview area or the video capturing area includes a second smooth image.

It can be seen that, because the pose of the camera obtained by the electronic device after the calibration by using the IMU can more truly reflect the position and the pose of the camera in the electronic device than the pose obtained by using the camera or the data acquired by the IMU alone, the smooth camera pose obtained based on the calibrated camera pose can more accurately reflect the position and the pose of the camera in the electronic device without shaking or with shaking generated far less than the actual shaking of the camera. Therefore, the video anti-shake method provided by the embodiment of the application has more obvious effect and can improve the visual experience of the user.

In some embodiments, the first image may be a first frame image acquired by the electronic device after receiving the first operation. Alternatively, the first image may be a first frame image acquired by the electronic device after receiving the second operation. The camera pose at the first moment is a preset initial camera pose. In this way, the electronic equipment obtains the initial camera pose by initializing the IMU data and the characteristic point data, thereby providing convenience for the smooth processing of the follow-up camera pose and obtaining the calibrated camera pose and the smooth camera pose more conveniently.

In some embodiments, the electronic device may determine the pose change amount of the camera according to the first image and the second image, and may specifically include: the electronic device extracts a first feature point from the first image and a second feature point from the second image. The electronic device may determine a camera pose variation amount from the first feature point and the second feature point. Thus, the electronic equipment can construct a matrix from a plurality of groups of characteristic point pairs and solve the matrix to obtain the pose variation of the camera.

In some embodiments, the image captured by the electronic device via the camera includes a first image captured at a first time and a second image captured at a second time. The camera pose determined by the electronic device in the process of acquiring the image comprises a camera pose at a first moment and a camera pose at a second moment. The electronic device performs smoothing on the camera pose to obtain a smoothed camera pose, which may specifically include: and carrying out smoothing treatment on the camera pose at the second moment to obtain a second smooth camera pose, wherein the variation between the second smooth camera pose and the camera pose at the first moment is smaller than a threshold value. The camera pose at the second moment may be denoted as T and the second smooth camera pose may be denoted as The electronic device can input T into the following pose optimization model, and the pose optimization model can output +.>

s.t.||H·P _i -P _i ||≤Δ，i＝1，2，3，4

In some embodiments, before the electronic device may process the image according to the smoothed camera pose and the depth information of the image, the method may further include: the electronic device obtains depth information of the image through the depth camera. Or the electronic equipment determines the depth information of the characteristic points in the acquired image, and obtains the depth information of the image according to the depth information of the characteristic points in the image. Thus, the electronic equipment can obtain the depth information of the image through the two methods, and can provide data support for the subsequent mapping processing.

In some embodiments, the electronic device may process the image according to the pose of the smoothing camera and the depth information of the image to obtain a smoothed image, which may specifically include: the electronic device may use the smoothed camera pose and depth information of the image to derive a mapping matrix, which may be expressed as:

wherein d represents depth information of the image, n represents normal vector of the photographed image, R represents rotation matrix in the smooth camera pose, t represents translation matrix in the smooth camera pose, K represents camera internal reference, which can be represented as transformation matrix of pixel coordinate system and camera coordinate system. It is worth to say that the camera pose and the smooth camera pose can both be composed of a translation matrix and a rotation matrix, and the camera pose and the smooth camera pose can be represented in a matrix form.

The electronic device can map the plurality of first pixel points on the acquired image through the mapping matrix to obtain a plurality of second pixel points, and a smooth image is obtained according to the plurality of second pixel points. The smooth image obtained in this way can form a stable video, and the watching requirement of a user can be met when the user watches a preview picture or a shot video.

In some embodiments, after the electronic device receives the second operation, a third operation may also be received, the electronic device stopping capturing the image and storing the smoothed image in the electronic device, which may be viewed by entering the camera application.

In a third aspect, the present application provides an electronic device, comprising: the electronic device may receive a first operation, and the electronic device may display a first interface provided by the camera application, and the first interface may display a preview area. The electronic device may capture images via the camera. The electronic device may determine a camera pose during acquisition of the image. The camera pose can be used for representing the position and the pose of the camera under the condition that the electronic equipment shakes. The electronic equipment can carry out smoothing treatment on the camera pose to obtain a smooth camera pose. The smooth camera pose may be used to characterize the position and pose of the camera in the case where the electronic device is not producing jitter, or where the produced jitter is less than the jitter. The electronic equipment can process the image according to the pose of the smoothing camera and the depth information of the image to obtain a smooth image. The electronic device may display the smooth image in the preview area.

According to the method, the electronic equipment obtains the camera pose of the camera when the image is acquired, performs smoothing treatment on the camera pose to obtain the smooth camera pose, performs mapping treatment on the acquired image according to the smooth camera pose, and displays a stable image video formed by the smooth image in the preview interface. Thus, the electronic device can perform anti-shake processing on the photographed image by estimating the camera pose under the condition that no shake is generated or the generated shake is small when the image is photographed. And because the shutter time can be improved when the electronic equipment is shot under the dark light, the video blurring can be caused by the tiny shake, so that the video shot under the dark light is easier to shake than the video shot under the normal light, namely, the small-amplitude shake of the electronic equipment can also cause the shot video to shake greatly. Therefore, when the electronic equipment processes the conditions of large-amplitude jitter, video with darker ambient light and the like, the jitter reducing effect is more obvious, and the accuracy and the robustness of video jitter prevention can be improved.

With reference to the third aspect, in some embodiments, the image acquired by the electronic device through the camera includes a first image acquired at a first time and a second image acquired at a second time. The electronic device may determine a camera pose in the process of capturing an image, and may specifically include: the electronic device may determine a camera pose change from the first image and the second image, and the camera pose change may be used to represent a change in camera pose of the camera at the first time and the camera pose at the second time. The electronic equipment can determine the IMU pose change amount through the IMU data acquired by the inertial measurement unit IMU between the first moment and the second moment, the IMU pose change amount can be used for representing the change amount between the IMU pose of the IMU at the first moment and the IMU pose of the IMU at the second moment, and the IMU pose can be used for representing the position and the pose of the IMU under the condition that the electronic equipment generates the shake. The electronic device may calibrate the camera pose variation using the IMU pose variation to obtain a calibrated camera pose variation. The electronic equipment can obtain the camera pose at the second moment according to the camera pose at the first moment and the calibrated camera pose variation. The electronic device may perform smoothing processing on the camera pose to obtain a smoothed camera pose, which may specifically include: and the electronic equipment performs smoothing treatment on the camera pose at the second moment to obtain a second smooth camera pose. The second smooth camera pose may be used to characterize the position and pose of the camera at the second moment in the case where the electronic device is not producing jitter, or where the produced jitter is smaller than the above-mentioned jitter. The electronic device may process the image according to the pose of the smoothing camera and the depth information of the image to obtain a smoothed image, and may specifically include: the electronic device may process the second image according to the pose of the second slide camera and depth information of the second image, to obtain a second slide image. The smooth image displayed by the electronic device in the preview area or the video capturing area includes a second smooth image.

It can be seen that, because the pose of the camera obtained by the electronic device after the calibration by using the IMU can more truly reflect the position and the pose of the camera in the electronic device than the pose obtained by using the camera or the data acquired by the IMU alone, the smooth camera pose obtained based on the calibrated camera pose can more accurately reflect the position and the pose of the camera in the electronic device without shaking or with shaking generated far less than the actual shaking of the camera. The anti-shake effect that electronic equipment provided is more obvious, can promote user's visual experience more.

In some embodiments, the image captured by the electronic device via the camera includes a first image captured at a first time and a second image captured at a second time. The camera pose determined by the electronic device in the process of acquiring the image comprises a camera pose at a first moment and a camera pose at a second moment. The electronic device performs smoothing on the camera pose to obtain a smoothed camera pose, which may specifically include: smoothing the camera pose at the second moment to obtain a second smooth camera pose, the second smooth camera pose and a first camera poseThe variation between camera poses at a moment is smaller than a threshold value. The camera pose at the second moment may be denoted as T and the second smooth camera pose may be denoted asThe electronic device can input T into the following pose optimization model, and the pose optimization model can output +.>

s.t.||H·P _i -P _i ||≤Δ，i＝1，2，3，4

wherein d represents depth information of the image, n represents normal vector of the photographed image, R represents rotation matrix in the smooth camera pose, t represents translation matrix in the smooth camera pose, K represents camera internal reference, which can be represented as transformation matrix of pixel coordinate system and camera coordinate system.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium comprising instructions which, when run on an electronic device, cause the electronic device to perform a method as in the first aspect or the second aspect or any implementation of the first aspect.

It will be appreciated that the electronic device provided in the third aspect and the computer storage medium provided in the fourth aspect are each configured to perform the method provided by the embodiment of the present application. Therefore, the advantages achieved by the method can be referred to as the advantages of the corresponding method, and will not be described herein.

Drawings

Fig. 1 is a schematic structural diagram of an electronic device 100 according to an embodiment of the present application.

Fig. 2 is a schematic block diagram of a software structure of an electronic device 100 according to an embodiment of the present application.

Fig. 3A to fig. 3D are schematic diagrams of some users capturing video according to an embodiment of the present application.

Fig. 4A and fig. 4B are schematic diagrams of a video anti-shake method according to an embodiment of the present application.

Fig. 5 is a schematic diagram of an MSCKF algorithm according to an embodiment of the present application.

Fig. 6 is a simulation diagram of camera pose and translation path before and after smoothing processing according to an embodiment of the present application.

Fig. 7 is a flowchart of a video anti-shake method according to an embodiment of the present application.

Fig. 8 is a schematic diagram of a method for obtaining a stable image video by mapping pixel points according to an embodiment of the present application.

Detailed Description

The technical solutions of the embodiments of the present application will be clearly and thoroughly described below with reference to the accompanying drawings. Wherein, in the description of the embodiments of the present application, unless otherwise indicated, "/" means or, for example, a/B may represent a or B; the text "and/or" is merely an association relation describing the associated object, and indicates that three relations may exist, for example, a and/or B may indicate: the three cases where a exists alone, a and B exist together, and B exists alone, and furthermore, in the description of the embodiments of the present application, "plural" means two or more than two.

The terms "first," "second," and the like, are used below for descriptive purposes only and are not to be construed as implying or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature, and in the description of embodiments of the application, unless otherwise indicated, the meaning of "a plurality" is two or more.

The term "User Interface (UI)" in the following embodiments of the present application is a media interface for interaction and information exchange between an application program or an operating system and a user, which enables conversion between an internal form of information and a form acceptable to the user. The user interface is source code written in a specific computer language such as iava, extensible markup language (extensible markup language, XML) and the like, and the interface source code is analyzed and rendered on the electronic equipment to finally be presented as content which can be identified by a user. A commonly used presentation form of the user interface is a graphical user interface (graphic user interface, GUI), which refers to a user interface related to computer operations that is displayed in a graphical manner. It may be a visual interface element of text, icons, buttons, menus, tabs, text boxes, dialog boxes, status bars, navigation bars, widgets, etc., displayed in a display of the electronic device.

When a user shoots a video by using the electronic device, because the shake of the camera or the movement of the camera relative to a shooting object is too fast, a shake picture may appear in the video acquired by the electronic device, the frequency of the picture shake may be higher, and the user may obviously feel the picture shake displayed in the electronic device, so that the user may have poor use experience.

Therefore, video anti-shake is a basic requirement when the electronic device shoots video, so that the video shot by the electronic device is more stable. Video anti-shake methods are mainly divided into two main categories:

one is optical anti-shake (Optical Image Stabilizer, OIS), which is mainly an improvement of the hardware of camera modules on electronic devices. Specifically, a gyroscope in the electronic equipment detects tiny movement, the movement signal is transmitted to the micro-processing to calculate the displacement to be compensated, and then a compensation lens group can be arranged in the camera module to compensate according to the direction of camera shake and the displacement, so that the problem of video picture blurring is solved; the electronic equipment can also compensate the displacement of the camera through an external device cradle head. However, in general, optical anti-shake can solve the problem of minute shake generated by a camera, and the anti-shake effect for large-scale camera shake is relatively insufficient.

One is electronic anti-shake (Electric Image Stabilization, EIS), which mainly processes video acquired by electronic equipment through a software algorithm, so that the processed video can be stably output. Most of the electronic anti-shake techniques today are actually a technique for compensating for shake by reducing the image quality, i.e. a balance point is obtained between the video image quality and the shake. Compared with optical anti-shake, video image quality generated by electronic anti-shake is poor, and data acquired by a single gyroscope sensor used by the electronic anti-shake at present is used as a source for analysis, so that processing and utilization of an image per se are lacked. Because the shutter time can be improved when the electronic equipment is shot under the dark light, the video blurring can be caused by the tiny shake, and thus the anti-shake effect shot under the dark light is slightly insufficient compared with that shot under the normal light.

It can be seen that the phenomenon of shake of the video shot by the electronic device can be analyzed and processed by correcting the position form of a device in the electronic device or directly analyzing and processing the shot image, but under the condition that the electronic device is in a dark light environment and shakes greatly, the two anti-shake modes may have a certain degree of defects, the anti-shake effect of the optical anti-shake during shaking greatly has a certain limit, and the anti-shake effect of the electronic anti-shake during the dark light condition has a certain limit.

The embodiment of the application provides a video anti-shake method, in which a user can use electronic equipment to shoot video. The electronic device may utilize an anti-shake method of combining Visual-Inertial Odometry (VIO) with EIS in real-time localization and mapping (Simultaneous localization and mapping, SLAM) to reduce picture shake in video. The electronic device can calculate characteristic point data through images acquired by the camera, and acquire angular velocity and acceleration through an inertial measurement unit (Inertial Measurement Unit, IMU). The electronic equipment can calculate the camera pose by utilizing the characteristic point data, the acceleration and the angular velocity, and the camera pose can be obtained after the IMU pose calibration. The electronic device may obtain a smoothed camera pose by smoothing the camera pose. The smooth camera pose may refer to a camera pose of a camera in an electronic device in which shake is generated far less than shake generated when the electronic device is actually photographed, or in which shake is not generated. The electronic device may perform mapping processing on the image actually photographed by the camera by using a mapping matrix to obtain an image with reduced jitter, where the mapping matrix may be calculated according to the smoothed camera pose and depth information of the image. And the electronic equipment performs the anti-shake processing on the continuous multi-frame images shot by the camera to obtain images with reduced multi-frame shake. The image with reduced multi-frame jitter can form a stable video.

The VIO algorithm described above may refer to the electronic device using the camera and IMU acquired data to reduce jitter on the video, rather than using the camera or IMU acquired data alone to process the video. The video processed by the anti-shake method may include a dynamic picture displayed on a photographing preview interface in a photographing preview scene, a video obtained by photographing in a photographing video scene, and the like. The embodiment of the application does not limit the application scene of the anti-shake method.

By implementing the method, the electronic equipment can display the image stabilizing video subjected to the electronic anti-shake treatment, and the visual perception of a user is ensured. Because the position and the gesture of the camera in the electronic equipment can be reflected more truly by the electronic equipment through the camera gesture obtained after the IMU calibration than by using the camera or the gesture obtained by the data acquired by the IMU alone, the smooth camera gesture obtained based on the calibrated camera gesture can be reflected more accurately on the position and the gesture of the camera in the electronic equipment under the condition that the shake is not generated or the generated shake is far smaller than the actual shake of the camera. That is, the electronic apparatus can perform anti-shake processing on a photographed image by estimating a camera pose in a case where shake is not generated or generated shake is small when the image is photographed. Therefore, when the conditions of large-amplitude jitter, video with darker ambient light and the like are processed, the jitter reducing effect of the video jitter preventing method is more obvious, and the video jitter preventing method provided by the embodiment of the application can improve the accuracy and the robustness of video jitter prevention.

In order to more clearly describe the method provided by the embodiment of the present application, the electronic device provided by the embodiment of the present application is described below.

Referring to fig. 1, fig. 1 is a schematic structural diagram of an electronic device 100 according to an embodiment of the present application.

The electronic device 100 may be a mounted deviceOr other operating system, the electronic device 100 may be a cell phone, tablet, desktop, laptop, handheld, notebook, ultra-mobile personal computer (UMPC), netbook, and cellular telephone, personal digital assistant (personal digital assistant, PDA), augmented reality (augmented reality, AR) device, virtual Reality (VR) device, artificial intelligence (artificial intelligence, AI) device, wearable deviceThe electronic device 100 may also include, without limitation, a laptop computer (laptop) having a touch-sensitive surface or touch panel, a desktop computer having a touch-sensitive surface or touch panel, and the like, a non-portable terminal device such as a vehicle, an in-vehicle device, a smart home device, and/or a smart city device. The embodiment of the application does not limit the specific type of the electronic device.

As shown in fig. 1, the electronic device 100 may include: processor 110, external memory interface 120, internal memory 121, universal serial bus (universal serial bus, USB) interface 130, charge management module 140, power management module 141, battery 142, antenna 1, antenna 2, mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headset interface 170D, sensor module 180, keys 190, motor 191, indicator 192, camera 193, display 194, and subscriber identity module (subscriber identification module, SIM) card interface 195, etc. The sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.

It should be understood that the illustrated structure of the embodiment of the present application does not constitute a specific limitation on the electronic device 100. In other embodiments of the application, electronic device 100 may include more or fewer components than shown, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The processor 110 may include one or more processing units, such as: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), a controller, a video codec, a digital signal processor (digital signal processor, DSP), a baseband processor, and/or a neural network processor (neural-network processing unit, NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors.

A memory may also be provided in the processor 110 for storing instructions and data. In some implementations, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby improving the efficiency of the system.

In some embodiments, the processor 110 may perform feature point extraction on successive images, and obtain camera pose from the feature point data; the inertial measurement unit IMU (acceleration sensor and gyroscope sensor) data can be processed to obtain the IMU pose. The processor 110 may also perform modeling processing on the pose to obtain the pose after the optimization processing. The processor 110 may perform an algorithmic process on the captured video and output a stabilized video.

In some implementations, the processor 110 may include one or more interfaces. The interfaces may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous receiver transmitter (universal asynchronous receiver/transmitter, UART) interface, a mobile industry processor interface (mobile industry processor interface, MIPI), a general-purpose input/output (GPIO) interface, a subscriber identity module (subscriber identity module, SIM) interface, and/or a universal serial bus (universal serial bus, USB) interface, among others.

The I2C interface is a bi-directional synchronous serial bus comprising a serial data line (SDA) and a serial clock line (derail clock line, SCL). In some implementations, the processor 110 may contain multiple sets of I2C buses. The processor 110 may be coupled to the touch sensor 180K, charger, flash, camera 193, etc., respectively, through different I2C bus interfaces. For example: the processor 110 may be coupled to the touch sensor 180K through an I2C interface, such that the processor 110 communicates with the touch sensor 180K through an I2C bus interface to implement a touch function of the electronic device 100.

The I2S interface may be used for audio communication. In some implementations, the processor 110 may contain multiple sets of I2S buses. The processor 110 may be coupled to the audio module 170 via an I2S bus to enable communication between the processor 110 and the audio module 170. In some embodiments, the audio module 170 may transmit an audio signal to the wireless communication module 160 through the I2S interface, to implement a function of answering a call through a bluetooth headset.

PCM interfaces may also be used for audio communication to sample, quantize and encode analog signals. In some implementations, the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface. In some embodiments, the audio module 170 may also transmit audio signals to the wireless communication module 160 through the PCM interface, so as to implement a function of answering a call through the bluetooth headset. Both the I2S interface and the PCM interface may be used for audio communication.

The UART interface is a universal serial data bus for asynchronous communications. The bus may be a bi-directional communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, a UART interface is typically used to connect the processor 110 with the wireless communication module 160. For example: the processor 110 communicates with a bluetooth module in the wireless communication module 160 through a UART interface to implement a bluetooth function. In some embodiments, the audio module 170 may transmit an audio signal to the wireless communication module 160 through a UART interface, to implement a function of playing music through a bluetooth headset.

The MIPI interface may be used to connect the processor 110 to peripheral devices such as a display 194, a camera 193, and the like. The MIPI interface includes a camera serial interface (camera serial interface, CSB, display serial interface (display serial interface, DSI), etc. in some embodiments, processor 110 and camera 193 communicate via a CSI interface to implement the camera function of electronic device 100, processor 110 and display 194 communicate via a DSI interface to implement the display function of electronic device 100.

The GPIO interface may be configured by software. The GPIO interface may be configured as a control signal or as a data signal. In some embodiments, a GPIO interface may be used to connect the processor 110 with the camera 193, the display 194, the wireless communication module 160, the audio module 170, the sensor module 180, and the like. The GPIO interface may also be configured as an I2C interface, an I2S interface, a UART interface, an MIPI interface, etc.

The USB interface 130 is an interface conforming to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, or the like. The USB interface 130 may be used to connect a charger to charge the electronic device 100, and may also be used to transfer data between the electronic device 100 and a peripheral device. And can also be used for connecting with a headset, and playing audio through the headset. The interface may also be used to connect other electronic devices, such as AR devices, etc.

It should be understood that the interfacing relationship between the modules illustrated in the embodiments of the present application is only illustrative, and is not meant to limit the structure of the electronic device 100. In other embodiments of the present application, the electronic device 100 may also employ different interfacing manners in the above embodiments, or a combination of multiple interfacing manners.

The charge management module 140 is configured to receive a charge input from a charger. The charger can be a wireless charger or a wired charger. In some wired charging embodiments, the charge management module 140 may receive a charging input of a wired charger through the USB interface 130. In some wireless charging embodiments, the charge management module 140 may receive wireless charging input through a wireless charging coil of the electronic device 100. The charging management module 140 may also supply power to the electronic device through the power management module 141 while charging the battery 142.

The power management module 141 is used for connecting the battery 142, and the charge management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140 to power the processor 110, the internal memory 121, the display 194, the camera 193, the wireless communication module 160, and the like. The power management module 141 may also be configured to monitor battery capacity, battery cycle number, battery health (leakage, impedance) and other parameters. In other embodiments, the power management module 141 may also be provided in the processor 110. In other embodiments, the power management module 141 and the charge management module 140 may be disposed in the same device.

The wireless communication function of the electronic device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like. The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals.

The mobile communication module 150 may provide a solution for wireless communication including 2G/3G/4G/5G, etc., applied to the electronic device 100. The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (low noise amplifier, LNA), etc. The mobile communication module 150 may receive electromagnetic waves from the antenna 1, perform processes such as filtering, amplifying, and the like on the received electromagnetic waves, and transmit the processed electromagnetic waves to the modem processor for demodulation. The mobile communication module 150 can amplify the signal modulated by the modem processor, and convert the signal into electromagnetic waves through the antenna 1 to radiate. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be provided in the same device as at least some of the modules of the processor 110.

The modem processor may include a modulator and a demodulator. The modulator is used for modulating the low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then transmits the demodulated low frequency baseband signal to the baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then transferred to the application processor. The application processor outputs sound signals through an audio device (not limited to the speaker 170A, the receiver 170B, etc.), or displays images or video through the display screen 194. In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be provided in the same device as the mobile communication module 150 or other functional module, independent of the processor 110.

The wireless communication module 160 may provide solutions for wireless communication including wireless local area network (wireless local area networks, WLAN) (e.g., wireless fidelity (wireless fidelity, wi-Fi) network), bluetooth (BT), global navigation satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field wireless communication technology (near field communication, NFC), infrared technology (IR), etc., as applied to the electronic device 100. The wireless communication module 160 may be one or more devices that integrate at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, demodulates and filters the electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, frequency modulate it, amplify it, and convert it to electromagnetic waves for radiation via the antenna 2.

In some embodiments, antenna 1 and mobile communication module 150 of electronic device 100 are coupled, and antenna 2 and wireless communication module 160 are coupled, such that electronic device 100 may communicate with a network and other devices through wireless communication techniques. The wireless communication techniques may include the Global System for Mobile communications (global system for mobile communications, GSM), general packet radio service (general packet radio service, GPRS), code division multiple access (code division multiple access, CDMA), wideband code division multiple access (wideband code division multiple access, WCDMA), time division code division multiple access (time-division code division multiple access, TD-SCDMA), long term evolution (long term evolution, LTE), BT, GNSS, WLAN, NFC, FM, and/or IR techniques, among others. The GNSS may include a global satellite positioning system (global positioning system, GPS), a global navigation satellite system (global navigation satellite system, GLONASS), a beidou satellite navigation system (beidou navigation satellite system, BDS), a quasi zenith satellite system (quasi-zenith satellite system, QZSS) and/or a satellite based augmentation system (satellite based augmentation systems, SBAS).

The electronic device 100 implements display functions through a GPU, a display screen 194, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.

The display screen 194 is used to display images, videos, and the like. The display 194 includes a display panel. The display panel may employ a liquid crystal display (liquid crystal display, LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (AMOLED) or an active-matrix organic light-emitting diode (matrix organic light emitting diode), a flexible light-emitting diode (flex), a mini, a Micro led, a Micro-OLED, a quantum dot light-emitting diode (quantum dot light emitting diodes, QLED), or the like. In some implementations, the electronic device 100 may include 1 or N display screens 194, N being a positive integer greater than 1.

In some embodiments, the display 194 may display not only the picture of the photographing preview, but also the stabilized video processed by the processor 110.

The electronic device 100 may implement photographing functions through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.

The ISP is used to process data fed back by the camera 193. For example, when photographing, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electric signal, and the camera photosensitive element transmits the electric signal to the ISP for processing and is converted into an image visible to naked eyes. ISP can also optimize the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some implementations, the ISP may be provided in the camera 193.

The camera 193 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (charge coupled device, CCD) or a Complementary Metal Oxide Semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, which is then transferred to the ISP to be converted into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into an image signal in a standard RGB, YUV, or the like format. In some implementations, the electronic device 100 may include 1 or N cameras 193, N being a positive integer greater than 1.

In some embodiments, the camera 193 may acquire successive images to be processed and send image related information to the processor 110, the camera 193 may also be referred to as a camera.

The digital signal processor is used for processing digital signals, and can process other digital signals besides digital image signals. For example, when the electronic device 100 selects a frequency bin, the digital signal processor is used to fourier transform the frequency bin energy, or the like.

Video codecs are used to compress or decompress digital video. The electronic device 100 may support one or more video codecs. In this way, the electronic device 100 may play or record video in a variety of encoding formats, such as: dynamic picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4, etc.

The NPU is a neural-network (NN) computing processor, and can rapidly process input information by referencing a biological neural network structure, for example, referencing a transmission mode between human brain neurons, and can also continuously perform self-learning. Applications such as intelligent awareness of the electronic device 100 may be implemented through the NPU, for example: image recognition, face recognition, speech recognition, text understanding, etc.

The internal memory 121 may include one or more random access memories (random access memory, RAM) and one or more non-volatile memories (NVM).

The random access memory may include a static random-access memory (SRAM), a dynamic random-access memory (dynamic random access memory, DRAM), a synchronous dynamic random-access memory (synchronous dynamic random access memory, SDRAM), a double data rate synchronous dynamic random-access memory (double datarate synchronous dynamic random access memory, DDR SDRAM, such as fifth generation DDR SDRAM is commonly referred to as DDR5 SDRAM), etc.; the nonvolatile memory may include a disk storage device, a flash memory (flash memory).

The FLASH memory may include NOR FLASH, NAND FLASH, 3D NAND FLASH, etc. divided according to an operation principle, may include single-level memory cells (SLC), multi-level memory cells (MLC), triple-level memory cells (TLC), quad-level memory cells (QLC), etc. divided according to a storage specification, may include universal FLASH memory (english: universal FLASH storage, UFS), embedded multimedia memory cards (embedded multi media Card, eMMC), etc. divided according to a storage specification.

The random access memory may be read directly from and written to by the processor 110, may be used to store executable programs (e.g., machine instructions) for an operating system or other on-the-fly programs, may also be used to store data for users and applications, and the like.

The nonvolatile memory may store executable programs, store data of users and applications, and the like, and may be loaded into the random access memory in advance for the processor 110 to directly read and write.

The external memory interface 120 may be used to connect external non-volatile memory to enable expansion of the memory capabilities of the electronic device 100. The external nonvolatile memory communicates with the processor 110 through the external memory interface 120 to implement a data storage function. For example, files such as music and video are stored in an external nonvolatile memory.

The electronic device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music playing, recording, etc.

The pressure sensor 180A is used to sense a pressure signal, and may convert the pressure signal into an electrical signal.

The gyro sensor 180B may be used to determine a motion gesture of the electronic device 100. In some implementations, the angular velocity of electronic device 100 about three axes (i.e., the x, y, and z axes) may be determined by gyro sensor 180B. The gyro sensor 180B may be used for photographing anti-shake. For example, when the shutter is pressed, the gyro sensor 180B may detect the shake angle of the electronic device 100, calculate the distance to be compensated by the lens module according to the angle, and let the lens counteract the shake of the electronic device 100 through the reverse motion, so as to realize anti-shake. The gyro sensor 180B may also be used for navigating, somatosensory game scenes.

In some embodiments, the gyro sensor 180B may be configured to collect data such as angular velocity during video capturing or previewing of the electronic device 100, and transmit the data to the processor 110, so that the processor 110 can calculate the IMU pose change in combination with information such as acceleration.

The air pressure sensor 180C may be used to measure air pressure.

The acceleration sensor 180E may detect the magnitude of acceleration of the electronic device 100 in various directions (typically three axes). The magnitude and direction of gravity may be detected when the electronic device 100 is stationary. The electronic equipment gesture recognition method can also be used for recognizing the gesture of the electronic equipment, and is applied to horizontal and vertical screen switching, pedometers and other applications.

In some embodiments, the acceleration sensor 180E may be configured to collect data such as acceleration during video capturing or previewing of the electronic device 100, and transmit the data to the processor 110, so that the processor 110 can calculate the IMU pose change in combination with information such as angular velocity.

In some embodiments, the gyroscope sensor 180B and the acceleration sensor 180E may be collectively referred to as an inertial measurement unit, i.e., IMU.

A distance sensor 180F for measuring a distance. The electronic device 100 may measure the distance by infrared or laser. In some implementations, the electronic device 100 may range using the distance sensor 180F to achieve quick focus.

In some implementations, the distance sensor 180F may also be referred to as a depth sensor. The depth sensor may be used to measure the distance between the electronic device 100 and environmental objects, and its output may be in the form of both a depth map (depthmap) and a point cloud.

The proximity light sensor 180G may include, for example, a Light Emitting Diode (LED) and a light detector, such as a photodiode.

The ambient light sensor 180L is used to sense ambient light level.

The fingerprint sensor 180H is used to collect a fingerprint. The electronic device 100 may utilize the collected fingerprint feature to unlock the fingerprint, access the application lock, photograph the fingerprint, answer the incoming call, etc.

The temperature sensor 180J is for detecting temperature.

The touch sensor 180K, also referred to as a "touch device". The touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also called a "touch screen". The touch sensor 180K is for detecting a touch operation acting thereon or thereabout. The touch sensor may communicate the detected touch operation to the application processor to determine the touch event type. Visual output related to touch operations may be provided through the display 194. In other embodiments, the touch sensor 180K may also be disposed on the surface of the electronic device 100 at a different location than the display 194.

In some implementations, the touch sensor 180K may be used to receive a user operation, turn on a camera, initiate a video capture function, or bring the electronic device 100 to a capture preview interface.

The bone conduction sensor 180M may acquire a vibration signal.

The keys 190 include a power-on key, a volume key, etc. The keys 190 may be mechanical keys. Or may be a touch key. The electronic device 100 may receive key inputs, generating key signal inputs related to user settings and function controls of the electronic device 100.

The motor 191 may generate a vibration cue.

The SIM card interface 195 is used to connect a SIM card. The SIM card may be inserted into the SIM card interface 195, or removed from the SIM card interface 195 to enable contact and separation with the electronic device 100. The electronic device 100 may support 1 or N SIM card interfaces, N being a positive integer greater than 1. The SIM card interface 195 may support Nano SIM cards, micro SIM cards, and the like. The same SIM card interface 195 may be used to insert multiple cards simultaneously. The types of the plurality of cards may be the same or different. The SIM card interface 195 may also be compatible with different types of SIM cards. The SIM card interface 195 may also be compatible with external memory cards. The electronic device 100 interacts with the network through the SIM card to realize functions such as communication and data communication. In some implementations, the electronic device 100 employs esims, i.e.: an embedded SIM card. The eSIM card can be embedded in the electronic device 100 and cannot be separated from the electronic device 100.

Fig. 2 is a schematic block diagram illustrating a software structure of the electronic device 100 according to an embodiment of the present application.

As shown in fig. 2, the video anti-shake software architecture provided by the embodiment of the application includes a data acquisition module 201, a pose calculation module 202, a pose optimization module 203, and an anti-shake output module 204. The modules may be coupled by a communication bus 205.

The data acquisition module 201 may be used to acquire not only an image captured by the electronic device 100, so as to calculate feature point data of the image, but also an acceleration and an angular velocity of the electronic device 100.

Specifically, the data acquisition module 201 may include a camera. When the camera is in a working state, the camera can collect images. The data acquisition module 201 can acquire image information through a camera and extract feature point data of an image from the image. The feature point data may refer to coordinates of a specified pixel point on the image. The data acquisition module 201 may comprise an inertial measurement unit. The data acquisition module 201 may acquire acceleration and angular velocity of the electronic device 100 through the IMU.

The pose calculation module 202 may be used to calculate a camera pose of the electronic device 100. The camera pose may represent the actual position and pose of the camera of the electronic device 100. The camera pose may include values of the camera in front and back, up and down, left and right, and pitch, roll, and roll. That is, if the electronic device 100 generates shake during shooting, the camera pose may reflect the position and the pose of the camera when the electronic device 100 generates shake.

For example, the pose calculation module 202 may calculate the amount of change in the position and the pose of the camera of the electronic device 100 from the time t1 to the time t2 (i.e., the camera pose change amount) according to the feature point data extracted by the data acquisition module 201. The camera pose change amount can be represented by a translation matrix and a rotation matrix. The pose calculation module 202 may calculate the amount of change in the position and pose of the IMU of the electronic device 100 from time t1 to time t2 (i.e., the amount of change in the pose of the IMU) according to the acceleration and angular velocity data. The pose calculation module 202 may then calibrate the camera pose variation using the IMU pose variation to obtain a calibrated camera pose variation at time t1 to time t 2. Based on the camera pose at time t1 and the calibrated camera pose variation, the pose calculation module 202 may calculate the camera pose at time t 2. the calculation manner of the camera pose at the time t3 is similar to the calculation manner of the camera pose at the time t2, and will not be described herein.

The camera pose at time t1 and the camera pose corresponding to the camera pose at time t2 may include a change in the position and the pose of the camera due to shake of the electronic device 100 between time t1 and time t 2.

The pose optimization module 203 may be configured to perform smoothing processing on the camera pose obtained by the pose calculation module 202, so as to obtain a smoothed camera pose (hereinafter referred to as a smoothed camera pose). The above smoothing process can reduce variations in the position and posture of the camera due to the shake generated by the electronic apparatus 100. The smooth camera pose does not represent the actual position and pose of the camera of the electronic device 100, but may represent the position and pose of the camera of the electronic device 100 without generating shake or with the generated shake being much smaller than the actual shake amount.

Illustratively, the pose optimization module 203 may input the pose of the camera at the time t2 calculated by the pose calculation module 202 into a pose optimization model, and estimate the pose of the smooth camera at the time t2 according to an algorithm of the pose optimization model. The pose optimization model may be used to optimize the input camera pose such that the output smoothed camera pose can represent the position and pose of the camera of the electronic device 100 without shaking or with shaking far less than the actual amount of shaking. The pose optimization module 203 may input the pose of the camera at the time t3 calculated by the pose calculation module 202 into a pose optimization model, and estimate the smooth pose of the camera at the time t3 according to an algorithm of the pose optimization model.

The amount of change in the camera pose at time t1 and the amount of change in the smoothed camera pose at time t2 may not include the amount of change in the position and the pose of the camera due to the shake of the electronic device 100 generated from time t1 to time t2, or may include the amount of change in the position and the pose of the camera due to the shake of the electronic device 100 generated from time t1 to time t2, but the amount of change is far smaller than the amount of change in the shake actually generated by the electronic device 100. The specific inclusion of the change amounts corresponding to the smooth camera pose at the time t2 and the smooth camera pose at the time t3 may refer to the above, and will not be described herein.

The anti-shake output module 204 is configured to perform mapping processing on pixel points on an image actually captured by the electronic device 100 through the camera by using the mapping matrix, so as to obtain an image with reduced shake. The mapping matrix may be determined according to the smoothed camera pose obtained by the pose optimization module 203.

Illustratively, the anti-shake output module 204 may perform mapping processing on the pixel points on the image actually captured at the time t2 by using the mapping matrix at the time t2, to obtain the image with reduced shake corresponding to the time t 2. the mapping matrix at time t2 may be determined from the smoothed camera pose at time t 2. The anti-shake output module 204 may perform mapping processing on the pixel points on the image actually captured at the time t3 by using the mapping matrix at the time t3, to obtain an image with reduced shake corresponding to the time t 3. the mapping matrix at time t3 may be determined from the smoothed camera pose at time t 3. The image actually photographed at the time t2 and the image actually photographed at the time t3 may be two frames of images continuously photographed by the camera of the electronic device 100. The image with reduced jitter corresponding to the time t2 and the image with reduced jitter corresponding to the time t3 may be images displayed on the interface by the electronic device 100.

That is, the anti-shake output module 204 may perform mapping processing on images continuously captured by the camera, and the images output by the anti-shake output module 204 may form a stable image video. For example, in a scene of a photo preview, the image displayed by the electronic device 100 on the photo preview interface may be the image after the mapping processing using the mapping matrix. Thus, the dynamic picture on the photographing preview interface observed by the user has continuity and does not have the shaking phenomenon which influences the look and feel of the user. In a scene where video is captured, the electronic device 100 may capture a stabilized video with reduced jitter. The image in the stabilized video may be the image mapped by the mapping matrix. In this way, the video viewed by the user does not generate a judder of the picture during shooting or when the user views the video again after shooting is completed.

It should be understood that the software architecture of the electronic device 100 shown in fig. 2 is merely an example, and the electronic device 100 may have more or fewer modules than shown in the figures, may combine two or more modules, or may have different modules. The various modules illustrated in the figures may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.

The method provided by the application can be applied to the electronic equipment 100 to start the video shooting function and the electronic equipment 100 is in the scene of shooting photo preview, so that the video shot by the electronic equipment 100 or the shooting picture displayed on the preview interface becomes smooth, thereby improving the visual perception of a user.

A shooting scene according to the present application is described below.

As shown in fig. 3A, the electronic device 100 may include a camera 193. Wherein the camera 193 may be a front camera. Camera 193 may also include a rear camera. The electronic device 100 may display the user interface 310 shown in fig. 3A. The user interface 310 may include an application icon display area 311, a tray 312 with commonly used application icons. Wherein:

the application icon display area 311 may contain gallery icons 311A. In response to a user operation (e.g., a touch operation) acting on gallery icon 311A, electronic device 100 may launch a gallery application program to display information such as pictures and videos stored in electronic device 100. The pictures and videos stored in the electronic device 100 include photographs and videos taken by the electronic device 100 through a camera application. The application icon display area 311 may also include further application icons, such as mail icons, music icons, sports health icons, etc., which are not limiting embodiments of the present application.

A tray 312 with commonly used application icons may show a camera icon 312A. In response to a user operation (e.g., a touch operation) acting on the camera icon 312A, the electronic device 100 may open a camera application to perform functions such as photographing and video recording. When the electronic device 100 starts the camera application, the camera 193 (front camera and/or rear camera) may be started to perform functions such as photographing and video recording. The tray 312 with commonly used application icons may also present icons for more applications, such as dialing icons, informational icons, contact icons, etc., as embodiments of the present application are not limited in this regard.

The user interface 310 may also contain more or less content, such as controls to display the current time and date, controls to display weather, and so forth. It will be appreciated that fig. 3A illustrates a user interface on electronic device 100 by way of example only and should not be construed as limiting embodiments of the application.

Fig. 3B illustrates a user interface of the electronic device 100 for a photo preview.

In response to a user operation acting on camera icon 312A, electronic device 100 may display user interface 320 as shown in fig. 3B. The user interface 320 may include a preview area 321, a flash control 322, a setup control 323, a camera mode option 301, a gallery shortcut control 302, a shutter control 303, a camera flip control 304. Wherein:

The preview area 321 may be used to display images captured in real time by the camera 193. The electronic device 100 may refresh the display content therein in real time to facilitate the user's preview of the image currently captured by the camera 193.

Flash control 322 may be used to turn on or off a flash.

The setting control 323 can be used to adjust parameters of the photographed picture (e.g., resolution, filter, etc.), turn on or off some of the ways to photograph (e.g., timed photograph, dynamic snapshot, voice-controlled photograph, etc.), etc.

One or more photographing mode options may be displayed in the camera mode options 301. The one or more photography mode options may include: large aperture mode option 301A, video mode option 301B, photographing mode option 301C, professional mode option 301D, and portrait mode option 301E. Not limited to that shown in fig. 3B, more or fewer shooting mode options may also be included in the camera mode options 301. The user can browse other shooting mode options by sliding left/right in the camera mode option 301.

Gallery shortcut control 302 may be used to launch a gallery application. In response to a user operation (e.g., a click operation) acting on gallery shortcut control 302, electronic device 100 may launch a gallery application. Thus, the user can conveniently view the shot photos and videos without first exiting the camera application and then starting the gallery application.

The shutter control 303 may be used to monitor user operations that trigger photographing. The electronic device 100 may detect a user operation on the shutter control 303, in response to which the electronic device 100 may save the image in the preview area 321 as a picture in a gallery application. That is, the user may click on the shutter control 303 to trigger photographing.

The camera flip control 304 may be used to monitor user operations that trigger flip of the camera. The electronic device 100 may detect a user operation (e.g., a click operation) on the camera flip control 304, in response to which the electronic device 100 may flip the camera for capturing, such as switching the rear camera to the front camera, or switching the front camera to the rear camera.

User interface 320 may also contain more or less content, which is not limited by embodiments of the present application.

In some implementations, the user operation that acts on the camera icon 312A may be referred to as a first operation.

The video anti-shake method provided by the embodiment of the application can be applied to a scene of shooting preview. In a scene of a photo preview, the camera 193 of the electronic device 100 may continuously capture images. The electronic apparatus 100 may perform anti-shake processing on an image acquired by the camera 193 in real time and display the image subjected to the anti-shake processing in the preview area 321 shown in fig. 3B. The above-mentioned process of performing the anti-shake process on the image will be specifically described in the following embodiments, and will not be first described.

Fig. 3C illustrates a user interface of the electronic device 100 for taking a video preview.

As shown in fig. 3B, in response to a user operation acting on the recording mode option 301B, the electronic device 100 may display a user interface 330 as shown in fig. 3C. The preview area 331 may be used to display images captured in real time by the camera 193. The electronic device 100 may refresh the display content therein in real time to facilitate the user's preview of the image currently captured by the camera 193. The user interface 330 includes basic controls that are substantially identical to the user interface 320 and are not described in detail herein.

In some embodiments, the user operation acting on the recording mode option 301B may also be referred to as a first operation.

User interface 330 may also contain more or less content, which is not limiting in embodiments of the application.

The video anti-shake method provided by the embodiment of the application can be applied to a scene of shooting video preview. In a scene where a video preview is taken, the camera 193 of the electronic device 100 may continue to capture images. The electronic apparatus 100 may perform anti-shake processing on an image acquired by the camera 193 in real time and display the anti-shake processed image in the preview area 321 shown in fig. 3C. The above-mentioned process of performing the anti-shake process on the image will be specifically described in the following embodiments, and will not be first described.

In some embodiments, the video anti-shake methods shown in fig. 3B and 3C may be applied to preview interfaces in camera applications.

Fig. 3D illustrates a user interface of the electronic device 100 for video capture.

As shown in fig. 3C, in response to a user operation (e.g., a click operation) by a user on the shutter control 303, the electronic device 100 may begin recording video. The video display area 341 may be used to display video photographed by the camera 193 in real time. The time cue information 345 may be used to display the duration of the video captured by the electronic device 100, for example, as shown in fig. 3D, the recording duration of the video in the video display area 341 has reached 10s. Again in response to a user operation (e.g., a click operation) by the user on the shutter control 303, the electronic device 100 may stop recording video. The electronic device 100 may save the recorded video as video in a gallery application, and the electronic device 100 may extract the video from the gallery application for viewing in response to a user operation.

In some implementations, a user operation that acts on shutter control 303 to cause electronic device 100 to begin recording video may be referred to as a second operation. A user operation that acts on the shutter control 303 to cause the electronic device 100 to stop recording video may be referred to as a third operation.

The video anti-shake method provided by the embodiment of the application can be applied to shooting video scenes. In a scene of video capturing, the camera 193 of the electronic device 100 may capture video. The electronic device 100 may perform anti-shake processing on the video shot by the camera 193 during the video shooting period, or may perform anti-shake processing on the video stored in the gallery application after the video shot by the camera 193 is completed, which is not limited by the embodiment of the present application. The above-mentioned process of performing anti-shake processing on video will be described in detail in the following embodiments, which will not be expanded here.

The embodiment of the application can be applied to the situation that a user uses the electronic equipment 100 to shoot videos and can also be applied to the scene of shooting photo previews. And the user can hold the electronic device 100 to shoot video, or hold the cradle head, and the electronic device 100 is fixed on the cradle head to shoot video. There are various camera types mounted on the electronic device 100, wherein the camera types may be a main camera, a tele, an ultra wide angle, a black and white, a macro, a tele macro, a depth of field, a depth camera, and the like. The number of cameras may be one or more. The camera of the electronic device 100 may be referred to as a monocular camera or a multi-view camera. At present, a monocular camera is widely used, can be used for calibrating and identifying objects, but when the monocular camera is used, the monocular camera cannot determine the real size of the shot objects, so that a pose measurement algorithm can be realized by combining a laser range finder; the depth camera may refer to a monocular camera mounted with a depth sensor, which can measure depth information of a photographed object by a physical method of structured light or time of Flight (ToF) on the basis of photographing the object, and can be applied to three-dimensional imaging and distance measurement. It should be noted that, when the camera of the electronic device 100 used in the embodiment of the present application is a multi-view camera, the above-mentioned camera pose may refer to a camera set pose, that is, a method of implementing video anti-shake mentioned in the embodiment of the present application by regarding a plurality of cameras as a whole.

In some implementations, the camera application may be a system application in the electronic device 100 or may be a third party application downloaded in the electronic device 100.

One implementation of video anti-shake performed by the electronic device 100 according to the embodiment of the present application is specifically described below.

Fig. 4A and 4B illustrate implementations of the electronic device 100 utilizing camera pose and IMU pose for video anti-shake.

Here, the image P1, the image P2, the image P3, the image P4, and the image P5 captured by the electronic apparatus 100 by the camera are specifically described as an example. The above-described image P1, image P2, image P3, image P4, and image P5 may be images continuously captured by the camera. The images P1, P2, P3, P4, and P5 may be images captured by the camera at times T1, T2, T3, T4, and T5, respectively.

As shown in fig. 4A, fig. 4A exemplarily shows an image P1 captured by the time T1 camera, an image P2 captured by the time T2 camera, an image P3 captured by the time T3 camera, an image P4 captured by the time T4 camera, and an image P5 captured by the time T5 camera. The differences among the image P1, the image P2, the image P3, the image P4, and the image P5, specifically, the camera pose of the camera of the electronic device 100 may generate a shake of clockwise rotation relative to the camera pose of the camera of the electronic device 100 that captures the image P1 when capturing the image P2; the camera pose of the camera of the electronic device 100 may also generate a shake of clockwise rotation with respect to the camera pose of the camera of the electronic device 100 capturing the image P2 when capturing the image P3; the camera pose of the camera of the electronic device 100 may generate a shake of moving upward with respect to the camera pose of the camera of the electronic device 100 capturing the image P3 when capturing the image P4; the camera pose of the camera of the electronic device 100 when capturing the image P5 may generate a shake of counterclockwise rotation with respect to the camera pose of the camera of the electronic device 100 when capturing the image P4. In this way, the images P1, P2, P3, P4, and P5 in fig. 4A may reflect that the video captured by the camera of the electronic device 100 between the time T1 and the time T5 has jitter.

It should be understood that the dithering effects presented by the images P1-P5 shown in fig. 4A are only exemplary illustrations of embodiments of the present application, and should not be construed as limiting the present application. In the process of capturing an image by the electronic device 100, the image obtained by the electronic device 100 without the anti-shake processing may also exhibit other shake effects. The embodiment of the present application is not limited thereto.

Fig. 4A also illustrates a T1 moment camera pose, a T2 moment camera pose, a T3 moment camera pose, a T4 moment camera pose, and a T5 moment pose. The camera pose at the above five moments may represent the actual position and pose of the camera of the electronic device 100. That is, the camera of the electronic device 100 captures an image P1 when in the camera pose at time T1. When the camera of the electronic device 100 is in the camera pose at time T2, an image P2 is captured. When the camera of the electronic device 100 is in the camera pose at time T3, an image P3 is captured. When the camera of the electronic device 100 is in the camera pose at time T4, an image P4 is captured. When the camera of the electronic device 100 is in the camera pose at time T5, an image P5 is captured.

Specifically, the electronic device 100 may extract a plurality of feature point data from the image P1, and likewise, the electronic device 100 may extract a plurality of feature point data from the image P2. The electronic device 100 may determine feature point data of the image according to coordinates of pixel points of the feature point on the image. For example, the electronic device 100 may obtain feature points from the image P1: c1, C2, &.. feature points are obtained from the image P2: c1', C2'. Specifically, the feature points C1, C2, cn may refer to a plurality of pixels forming a white cloud in the image P1, and the feature points C1', C2', cn 'may refer to a plurality of pixels forming the same white cloud in the image P2, that is, the feature points C1 and C1' may refer to corresponding pixels of the same object in the real world on the image. The feature point C1 and the feature point C1' may be referred to as a pair of feature points.

In some implementations, feature points are obtained on image P1: c1, C2, &..cn can be referred to as a first feature point. Feature points are obtained in the image P2: c1', C2'.

The electronic device 100 can calculate the transformation matrix H1 from the plurality of pairs of feature points between the image P1 and the image P2. Also, the electronic device 100 may calculate a transformation matrix H2, a transformation matrix H3, and a transformation matrix H4 as shown in fig. 5. The transformation matrix H1 may represent a variation amount of the camera pose of the camera of the electronic device 100 from the time T1 to the time T2. The feature point data extracted by the electronic device 100 may have a calculation bias or other factors, so that the pose change amount represented by the transformation matrix H1 may have an error. In one possible implementation, the electronic device 100 may calibrate the above-described pose variation (i.e., the transformation matrix H1) with the IMU pose variation.

Specifically, the electronic device 100 may calibrate the amount of change in the camera pose between the time T1 and the time T2 by using the amount of change in the IMU pose between the time T1 and the time T2, and so on, the electronic device 100 may calibrate the amount of change in the camera pose corresponding to the time period by using the amount of change in the IMU pose between the time T2 and the time T3, between the time T3 and the time T4, and between the time T4 and the time T5.

Because the IMU of the electronic device 100 continuously collects the acceleration and the angular velocity between the time T1 and the time T5, the electronic device 100 may integrate the acceleration and the angular velocity collected by the IMU between the time T1 and the time T2, so as to calculate the transformation matrix h1 of the IMU. The IMU transformation matrix h1 may represent an amount of change in IMU pose of the IMU of the electronic device 100 from time T1 to time T2. Also, the electronic device 100 may calculate a transformation matrix h2, a transformation matrix h3, and a transformation matrix h4 as shown in fig. 5.

It is understood that the camera and IMU of the electronic device 100 are fixed within the electronic device 100. The camera and IMU may move with the movement of the electronic device 100. The relative positional relationship between the camera and the IMU may be fixed. That is, the relative positional relationship between the camera and the IMU may remain unchanged during the change of the pose of the electronic device 100. If the camera generates motion, the motion generated by the IMU and the motion generated by the camera have a certain correlation, that is, when determining the pose of any one of the camera of the electronic device 100 and the IMU, the electronic device 100 can determine the pose of the other according to the relative positional relationship between the camera and the IMU. That is, when determining the pose change amount of any one of the camera and the IMU of the electronic device 100, the electronic device 100 may determine the pose change amount of the other according to the relative positional relationship between the camera and the IMU. Because the data sources of the calculated camera pose change amounts are different, the camera pose change amounts determined by the IMU pose change amounts may have a certain error with the camera pose change amounts calculated by the feature point data, and the electronic device 100 may perform calibration processing on the two camera pose change amounts. That is, the electronic device 100 may calibrate the amount of change in the camera pose using the amount of change in the IMU pose to obtain a calibrated camera pose amount of change. The above calculation of the corrected camera pose change will be described in detail in the following embodiments, which are not developed here.

The electronic device 100 may calibrate the transformation matrix H1 by using the transformation matrix H1 to obtain the calibrated camera pose variation between the time T1 and the time T2. The electronic device 100 may obtain the pose of the camera at the time T2 according to the pose of the camera at the time T1 and the calibrated pose change amount between the time T1 and the time T2. The obtained pose of the camera at the time T2 can truly reflect the actual position and pose of the camera of the electronic device 100 at the time T2. Under the condition that the image P1 at the time T1 is the first frame, the pose of the camera at the time T1 may be obtained by the initializing process of the electronic device 100; in the case where the image P1 at the time T1 is not the first frame, the camera pose at the time T1 may be calculated from the camera pose of the previous frame and the transformation matrix. Likewise, the electronic device 100 may obtain a T3 moment camera pose, a T4 moment camera pose, and a T5 moment camera pose.

The first frame may refer to an image acquired by the electronic device 100 at an initial time of performing anti-shake processing on the acquired image. For example, the first frame may refer to a first frame image acquired by the electronic device 100 entering the photographing preview interface, or may refer to a first frame image in a video photographed after the electronic device 100 starts a video photographing function in response to a user operation. That is, the initial time at which the electronic device 100 performs the anti-shake process on the acquired image may be the time at which the electronic device 100 receives the operation to start the camera application to display the photographing preview interface, or may be the time at which the electronic device 100 receives the operation to start the photographing video function. The embodiment of the present application does not limit the initial time of the anti-shake processing of the acquired image by the electronic device 100.

Fig. 4A also illustrates a T1 moment smoothed camera pose, a T2 moment smoothed camera pose, a T3 moment smoothed camera pose, a T4 moment smoothed camera pose, and a T5 smoothed moment pose. The above-mentioned five-time smooth camera pose represents the position and pose of the camera of the electronic device 100 in the case that no shake is generated or the generated shake is much smaller than the actual shake of the electronic device 100.

The electronic device 100 may obtain a T2 moment smoothed camera pose from the T2 moment camera pose. The T2 moment smooth camera pose may represent the position and pose of the camera when the electronic device 100 is not producing shake at the T2 moment. Likewise, the electronic device 100 may obtain a T3 moment smoothed camera pose, a T4 moment smoothed camera pose, and a T5 moment smoothed camera pose. That is, assuming that the position and the posture of the camera of the electronic device 100 can be located in the corresponding smooth camera pose at the time T2, the time T3, the time T4, and the time T5 in order in time sequence, the video shot by the camera of the electronic device 100 may be smooth.

As shown in fig. 4B, fig. 4B exemplarily shows images before and after the anti-shake processing. The images P1, P2, P3, P4 and P5 may be images actually captured by the camera of the electronic device 100, and the images P1', P2', P3', P4' and P5' may be images displayed in the interface after the electronic device 100 has undergone anti-shake processing.

Specifically, the electronic device 100 may perform a mapping process on the image P2 to obtain an image P2', where the mapping process may refer to that the electronic device 100 maps the image P2 into the image P2' by using a mapping matrix at a time T2, and the mapping matrix at the time T2 may be obtained according to the smoothing matrix at the time T2 and depth information of the image P2. The depth information of the image obtained by the calculation will be described in detail in the following embodiments, which are not developed here. Likewise, the electronic device 100 can obtain the images P3', P4', and P5'. In the case where the image P1 at the time T1 is the first frame, since the electronic device 100 does not perform the anti-shake processing on the first frame image in the video, the image P1 and the image P1' in fig. 4B may be identical; in the case where the image P1 at the time T1 is not the first frame, the mapping matrix at the time T1 may be obtained from the smoothing matrix at the time T1 and the depth information of the image P1. After the mapping process is completed, as shown in fig. 4B, the electronic device 100 may continuously display the image P1', the image P2', the image P3', the image P4', and the image P5', while the video composed of the mapped images of the interface sound presented at the electronic device 100 has an effect of jitter reduction compared with the image P1, the image P2, the image P3, the image P4, and the image P5 in the video photographed by the electronic device 100.

Implementing the video anti-shake method of fig. 4A and fig. 4B can reduce the shake of the video shot by the electronic device 100, so that the user can visually and obviously perceive that the video is smoother and the video shake is far smaller than the hand shake of the user.

In some embodiments, time T1 may be referred to as a first time and image P1 may be referred to as a first image. The time T2 may be referred to as a second time and the image P2 may be referred to as a second image. The camera pose at time T1 may be referred to as the camera pose at the first time and the camera pose at time T2 may be referred to as the camera pose at the second time. The pose of the smoothing camera at time T2 may be referred to as a second smoothing camera pose. The image P2' may be referred to as a second smooth image.

The following is a description of a method for calculating a camera pose transformation matrix using feature point data.

For example, it is known that the electronic device 100 acquires two continuous frames of images taken by a camera, and acquires feature point data of a plurality of the two frames of images by a feature point extraction algorithm (for example, an ORB algorithm). Wherein, the coordinates of a pair of feature points are p (u, v) and p ' (u ', v '), the basic matrix F may be a 3*3 matrix, and the calculation formula is:

Since there are 8 unknowns in the base matrix F (since the z-axis direction is constant 1), that is, at least 8 pairs of feature matching points are needed to solve the base matrix F. Further, to improve accuracy of obtaining the camera pose transformation matrix, the number of feature points obtained by the electronic device 100 should be far greater than 8.

The electronic device 100 may solve the base matrix F and matrix decompose the matrix after the solution. The electronic device 100 may obtain a camera pose transformation matrix, i.e., a rotation matrix and a translation matrix, from the decomposed matrix. That is, if the electronic device 100 can obtain a sufficient number of pairs of feature points in the two images, the pose transformation matrix of the camera capturing the two images can be calculated. In the present embodiment, the manner of calculating the camera pose transformation matrix by the electronic device 100 is not limited, and other manners of calculating the camera pose transformation matrix may also exist.

The following describes how to calibrate the pose of the camera by using the pose variation of the IMU.

In some implementations, the electronic device 100 may construct a state vector that may represent data of IMU pose and camera pose, etc., through a Multi-constraint state kalman filter (Multi-Constraint Kalman Filter, MSCKF) algorithm. The MSCKF algorithm described above may be used to calibrate camera pose using IMU pose and output the calibrated camera pose. The input of the state vector may be the camera pose change matrix and the IMU pose change matrix, and the output of the state vector may be information such as IMU pose, camera pose, and feature point data. For example, the output of the state vector may be represented as x= [ x_imu, x_camera, x_f ], where x_imu represents an IMU pose of the electronic device 100 at the current time, x_camera represents a camera pose of the electronic device 100 at the current time, and x_f represents a plurality of feature point data extracted from an image acquired by the camera by the electronic device 100 at the current time. The electronic device 100 may obtain the camera pose from the above state vector.

During the video capturing period of the electronic device 100, the image collected by the camera of the electronic device 100 and the data such as the acceleration and the angular velocity collected by the IMU are continuously updated, that is, the camera pose change matrix and the IMU pose change matrix obtained by the calculation are also continuously updated. The electronic device 100 may calibrate the camera pose change using the continuously updated IMU pose change. That is, the input and output of the state vector at different times may be different, and each time point has a state vector corresponding to it.

In some embodiments, in addition to using the camera pose change matrix and the IMU pose change matrix, the electronic device 100 may further process the camera pose with more parameters to improve the accuracy of the camera pose, and achieve self-calibration of parameters inside and outside the camera of the electronic device 100. The camera parameters refer to data information related to the characteristics of the camera, such as the focal length and pixel size of the camera, and are generally fixed after the electronic device 100 leaves the factory. Camera exogenous reference can be understood as the camera pose mentioned above.

The electronic device 100 may also input further parameters into the state vector via the MSCKF algorithm. In this case, the input of the state vector may be the camera pose change matrix and the IMU pose change matrix, and the output of the state vector may be information such as IMU pose, camera pose, feature point data, and parameters. For example, the output of the state vector may also be represented as x= [ x_imu, x_camera, x_f, x_introsides, x_ (σ_imu), where x_imu represents an IMU pose of the electronic device 100 at the current time, x_camera represents a camera pose of the electronic device 100 at the current time, x_f represents a plurality of feature point data extracted from an image acquired by the camera by the electronic device 100 at the current time, x_introsides represents a camera reference of the electronic device 100, X- (σ_imu) represents a drift error of a gyro sensor in the IMU of the electronic device 100, and x_ (σ IMU) is directly acquired by the electronic device 100. The above-mentioned two parameters can be used for processing when x_interpersics, x_ (σ_imu) present in the expression of the output of the state vector represent the computed camera pose.

The following describes the structure of the MSCKF algorithm used in the embodiments of the present application.

Referring to fig. 5, fig. 5 illustrates a schematic diagram of a configuration of the electronic device 100 using the MSCKF algorithm. As shown in fig. 5, the electronic device 100 inputs the IMU pose change amount, the camera pose change amount, and the feature point data into the MSCKF algorithm, so that a relationship between the camera pose change amount and the IMU pose change amount (i.e., an external parameter between the camera pose change amount and the IMU pose change amount) can be obtained, the camera pose change amount can be calibrated by using the IMU pose change amount, and then the camera pose change amount after the first frame of camera pose is combined with the calibrated camera pose change amount is calculated.

The structural diagram shown in fig. 5 includes a corresponding representation f in the real world of three feature points on the images extracted at times T1 to T5 ₁ 、f ₂ F ₃ The IMU pose change amount in adjacent time and the relationship between the camera pose and the IMU pose.

Wherein five open circles shown in fig. 5 may represent positions and attitudes of cameras on the electronic device 100 at times T1 to T5, open squares on lines between the open circles may represent IMU pose changes of the electronic device 100 between two frames, and 3 solid circles above fig. 5 may represent objects f corresponding to three feature points in the real world ₁ 、f ₂ F ₃ The solid square represents an external parameter between the camera pose change amount and the IMU pose change amount obtained by the estimation calculation of the plurality of sets of IMU pose change amounts and the camera pose change amount, that is, a conversion relationship (conversion matrix) between the camera pose change amount and the IMU pose change amount. After the external parameters between the camera pose change amount and the IMU pose change amount are calculated in this way, the electronic device 100 may perform transformation processing on the IMU pose change amount to obtain the camera pose change amount calculated by the data acquired by the IMU, and the electronic device 100 may obtain the calibrated camera pose change amount by combining the camera pose change amount calculated by the feature points.

Specifically, the electronic device 100 may obtain the external parameters between the pose changing amount of the camera and the pose of the IMU changing amount by processing the multiple sets of the camera pose changing matrix and the IMU pose changing matrix. For example, the electronic device 100 acquires a camera pose transformation matrix H1 between T1 and T2, a camera pose transformation matrix H2 between T2 and T3, a camera pose transformation matrix H3 between T3 and T4, and a camera pose transformation matrix H4 between T4 and T5, for a total of 4 camera pose transformation matrices. The electronic device 100 may further obtain an IMU pose transformation matrix h1 between T1 and T2, an IMU pose transformation matrix h2 between T2 and T3, an IMU pose transformation matrix h3 between T3 and T4, and an IMU pose transformation matrix h4 between T4 and T5, which are 4 IMU pose transformation matrices in total. At this time, the 4 camera pose transformation matrices and the 4 IMU pose transformation matrices may be in one-to-one correspondence in time, that is, the electronic device 100 may obtain 4 sets of camera pose transformation matrices and IMU pose transformation matrices. Because the camera and the IMU are both mounted on the electronic device 100, the independent angles and the displacement changes of the camera and the IMU are associated to a certain extent, so that the electronic device 100 can calculate the external parameters between the pose transformation amount of the camera and the pose transformation amount of the IMU according to the 4 groups of camera pose transformation matrices and the pose transformation matrix estimates of the IMU. In the embodiment of the application, the number of the groups of the camera pose transformation matrix and the IMU pose transformation matrix is not limited, and more groups of data can be used for prediction calculation. It should be noted that the step of obtaining the external parameters between the camera pose transformation matrix and the IMU pose transformation matrix may also be performed before actually processing the video captured by the electronic device 100.

In addition, the electronic device 100 may further use a calibration tool to calibrate the conversion relationship between the camera pose transformation matrix and the IMU pose transformation matrix before the electronic device 100 captures the video, so as to directly obtain the external parameters between the camera pose transformation matrix and the IMU pose transformation matrix. In the embodiment of the application, the external parameters between the camera pose transformation matrix and the IMU pose transformation matrix are not limited.

It should be noted that, in fig. 5, three feature points correspond to real worldObject f under boundary ₁ 、f ₂ F ₃ The multiple links to T1 through T5 may indicate that as the electronic device 100 shakes the camera during video capture, f may not be extracted by the electronic device 100 from the image of the current frame ₂ The corresponding feature point may be extracted from the image of the next frame by the electronic device 100 to f ₂ The corresponding feature points. Is not described by a single f ₁ The corresponding feature point data in the two images acquired at the time T1 and the time T2 respectively can obtain the camera pose variation between the two frames, generally speaking, the camera pose variation needs to exceed 8 pairs of feature point data to calculate, and the specific calculation process is described in detail in S704 and is not described herein. Meanwhile, it cannot be said that only data of a pair of feature points is extracted from the two images acquired at the time T1 and the time T2. Fig. 5 is only a simple description of the MSCKF algorithm, and is not limited to 3 feature points in practice, and more or fewer feature points may be input.

The effect of the camera pose smoothing process will be described in detail below.

Referring to fig. 6, fig. 6 illustrates a simulated view of camera pose and translation path before and after smoothing. As shown in fig. 6, the electronic device 100 performs a simulation on the rotation vector and the translation vector before and after the camera pose is smoothed, for example. The simulation diagram of the camera pose shown in fig. 6 includes the real pose of the camera, the real translation curve of the camera, the pose of the camera after the anti-shake and the translation curve of the camera after the anti-shake.

For example, fig. 6 exemplarily shows the pose and the pan of the camera of five consecutive pictures taken within the time T1 to T5 and the pose and the pan of the camera after the smoothing process.

Specifically, the 5 solid rectangles shown in fig. 6 represent the pose and position of the camera at 5 times between T1 and T5, and the direction perpendicular to the outside of the rectangle of the upper line of the solid rectangle may represent the direction in which the camera shoots, that is, the direction in which the real orientation of the camera is the direction in which the normal vector of the upper line of the solid rectangle is outward. The line at the midpoint of the upper line of the solid rectangle constitutes a solid curve shown in fig. 6, which represents the actual translation path of the camera between T1 and T5, which is not subjected to the anti-shake process, and thus, as shown in fig. 6, the actual translation path of the camera may appear as a curve with a large shake amplitude.

In the case where it is assumed that the time T1 in fig. 6 is the time at which the initial camera starts shooting, that is, the image at the time T1 is the first frame, the camera pose at the time T1 does not need to be subjected to anti-shake processing, and there are actually 5 dotted rectangles shown in fig. 6, where the dotted rectangles at the time T1 may be coincident with the solid rectangles. The 5 dotted rectangles represent the positions and the postures of the cameras after the smoothing processing at 5 moments between T1 and T5, and the direction of the upper line of the dotted rectangle, which is vertical to the outside of the rectangle, can represent the direction shot after the smoothing processing of the cameras, namely the direction of the camera after the anti-shake is the direction of the normal vector of the upper line of the dotted rectangle. It should be noted that the camera pose after the smoothing process is virtual, and only the pose of the camera is assumed to be changed, not in the real world. The line at the midpoint of the line at the upper end of the dotted rectangle constitutes a dotted curve shown in fig. 6, which represents a panning path of the camera after the camera has been smoothed between T1 and T5, which is anti-shake-processed, so that the panning path of the camera after the smoothing process appears as a smoothed curve with a small shake amplitude, as shown in fig. 6.

Fig. 6 illustrates the specific effects of the electronic device 100 in the T1 time smoothed camera pose, the T2 time smoothed camera pose, the T3 time smoothed camera pose, the T4 time smoothed camera pose, and the T5 time smoothed camera pose as illustrated in fig. 4A. That is, the smooth camera pose produces a smaller magnitude of change when changed than when the camera pose is changed, including the magnitude of angular change in camera orientation and the magnitude of span change in the camera translation path. The smaller the amplitude of the variation of the smoothed camera pose, the more the video composed of the images taken when the camera is in the smoothed camera pose, the less the shake.

The following describes a video anti-shake method provided in the embodiment of the present application. This embodiment describes a method of video anti-shake taking a case where the electronic apparatus 100 captures video as an example.

Fig. 7 is a flowchart illustrating a method for video anti-shake according to an embodiment of the present application.

As shown in fig. 7, the method for preventing video jitter includes:

s701, in response to a user operation, turning on the camera.

In some implementations, the electronic device 100 can receive a user operation and launch a camera application of the electronic device 100 in response to the user operation. The user operations include, but are not limited to: user operations (e.g., clicking operations, touching operations, etc.) received by the electronic device 100 on a start control in the camera interface, user operations (e.g., pressing operations) received by a key on a side of the electronic device 100, voice commands, gesture commands, or pressing operations received by a user's key on a cradle head to which the electronic device 100 is connected when the cradle head is connected to the electronic device 100. In the embodiment of the present application, the manner in which the electronic device 100 starts the video capturing function is not limited.

After the electronic device 100 turns on the camera, the electronic device 100 may be in a scene of a photo preview. The electronic device 100 may also be responsive to a user's operation for the shutter control and cause the electronic device 100 to be in a scene in which video is captured. The video frames photographed as mentioned below may also refer to dynamic frames collected by the camera when the electronic device 100 is in a photographing preview, that is, S701 is optional, and the electronic device 100 may also start photographing video in response to a user operation to turn on the video photographing function.

S702, extracting characteristic point data from images acquired by a camera, and acquiring acceleration and angular velocity through an IMU.

In some embodiments, after the electronic device 100 turns on the camera in response to a user operation, video during the shooting period may be acquired, and in fact, the electronic device 100 acquires continuous images that are shot by the camera of the electronic device 100. In some embodiments, the mentioned camera may refer to a device in the electronic device 100 having a photographing function, such as a video camera 193.

The electronic device 100 may acquire feature point data of the current image using a feature point extraction algorithm for an image acquired by the camera. The feature points may be points where the gray value of the image changes drastically (for example, a plurality of points constituting the outline of the object), or points where the curvature is large on the edge of the image (for example, corner points of the edge), and are not limited thereto. The feature points may reflect the essential features of the image, which may be expressed in the form of coordinates of the pixel points, for example (u, v). The feature point extraction algorithm may include, but is not limited to: edge detection algorithms, corner detection algorithms, spot detection algorithms, wrinkle detection algorithms, etc. Currently, the extraction of feature points is mostly performed using the fast rotation of orientation (Oriented FAST and Rotated BRIEF, ORB) algorithm. The ORB algorithm combines the FAST feature detection algorithm and the BRIEF feature description sub-algorithm, so that the running time of the algorithm is far superior to that of other algorithms, and the algorithm has rotation invariance and scale invariance, and is commonly used for real-time feature point detection.

In some embodiments, after the electronic device 100 turns on the camera in response to the user operation, the acceleration and the angular velocity of the electronic device 100 during the video capturing process may also be continuously obtained through the inertial measurement unit of the electronic device 100. Generally, an IMU may include three single-axis acceleration sensors and three single-axis gyroscope sensors, so that the obtained acceleration and angular velocity can be more accurate, and the number of acceleration sensors and gyroscope sensors in the IMU is not limited in the embodiment of the present application.

S703, initializing the feature point data and the data acquired by the IMU, wherein the initializing is used for determining the initial camera pose of the camera, and aligning the feature point data acquired by the electronic device 100 with the data acquired by the IMU.

In some embodiments, the IMU may continuously collect data information about acceleration and angular velocity before the electronic device 100 starts shooting, which may result in an accumulation of irrelevant data in the system, which may reduce the processing efficiency of the subsequent data. For example, the data collected and accumulated by the IMU prior to shooting may have an error in the subsequent processing of the data of acceleration and angular velocity collected during shooting by the electronic device 100.

Further, when the VIO algorithm is used, the electronic device 100 needs to operate on the basis that the gravity direction of the world coordinate system established by the system coincides with the gravity direction of the IMU, and if the IMU data is not initialized, the world coordinate system may be offset, which may affect the accuracy of the pose calculation of the subsequent electronic device 100. Therefore, the electronic device 100 needs to perform an initialization process on the data acquired by the IMU before performing a calculation process on the data acquired during shooting. In addition, the initialization process may further obtain a camera pose (i.e., an initial camera pose) of the camera of the electronic device 100, where the camera pose is acquired by the first frame of image, so as to provide initial data for a subsequent smoothing process of the camera pose. The first frame image may refer to the first frame image, and the initial camera pose may refer to an actual camera pose when the electronic device 100 captures the first frame image.

Because the sampling frequency of the IMU is far greater than that of the camera, the data collected by the IMU and the image collected by the camera may be deviated in time correspondingly in a period of time. It is necessary to perform an alignment process of the feature point data extracted by the electronic device 100 with the data acquired by the IMU.

For example, the electronic device 100 is known to calculate the camera pose change amount between two frames of images, but when the IMU acquired data is pre-integrated, the acquisition time of the plurality of data used for the pre-integration may not be limited to the acquisition time of the two frames of images due to the deviation, and the data acquired by the IMU outside the two frames may be used, so that the calculated IMU pose change amount does not represent the change situation of the IMU pose between the two frames, and thus, there is a time error with the camera pose change amount between the two frames of images.

Therefore, it is necessary to perform alignment processing on the feature point data extracted by the electronic device 100 during the shooting period and the data acquired by the inertial measurement unit, so that the camera pose and the IMU pose obtained by the electronic device 100 in the subsequent processing can also be mutually corresponding in time. Specifically, the electronic device can align the data acquired by the IMU and the data acquired by the camera on a time stamp, limit and align the data acquired by the IMU between two frames of images acquired by the camera, and reduce errors generated when the pose of the IMU corresponds to the pose of the camera.

S704, calculating the pose change amount of the camera through the feature point data, calculating the pose change amount of the IMU through pre-integration of the data acquired by the IMU, and calibrating the pose change amount of the IMU to obtain the pose of the camera.

In some embodiments, if the number of feature points in the image extracted by the electronic device 100 is sufficiently large, the pose change amount of the camera may be calculated according to the coordinates of the plurality of feature points in the image.

The camera pose change amount may refer to a change value of the pose of the camera from one moment to another moment. The pose change amount can be represented by a transformation matrix, which includes a rotation matrix and a translation matrix. The camera pose variation may refer to a transformation matrix of camera poses between two moments.

The method of representing the pose may include, but is not limited to: euler angles, rodrigas representations, quaternions, and the like. Since the camera is moving during shooting, there is also a corresponding change in the pose of the camera. The camera pose change amount (namely a camera pose transformation matrix) can be obtained through a basic matrix or a homography matrix. Specifically, if the camera moves, the movement includes both rotation and translation, the camera pose transformation matrix can calculate the transformation matrix of the camera through the base matrix; if the camera moves, only the rotation movement occurs, the camera pose transformation matrix can calculate the camera pose through the homography matrix.

In some embodiments, the electronic device 100 may obtain the IMU pose variation by performing pre-integration processing on the acceleration and angular velocity acquired by the IMU. The IMU pose change may refer to a transformation matrix of IMU poses between two moments. Specifically, the electronic device 100 may perform double integration on the acceleration in the IMU dynamics model to obtain a displacement, may perform single integration on the acceleration to obtain a speed, may perform single integration on the angular speed to obtain an angle, and perform modeling processing on the displacement, the speed, the angle and other data obtained by the integration, so as to obtain the IMU pose change matrix. The electronic device 100 may also perform other algorithm processing on the acceleration and the angular velocity to obtain other data that may be used to calculate the pose change amount of the IMU, which is not limited.

In some implementations, the electronic device 100 may calibrate the camera pose variation using the IMU pose variation to obtain a calibrated camera pose variation, thereby obtaining a camera pose.

The electronic device 100 may calibrate the camera pose variation amount by using the external parameter between the camera pose variation amount and the IMU pose variation amount to obtain a calibrated camera pose variation amount (i.e., a calibrated camera pose variation matrix). The electronic apparatus 100 uses the calibrated camera pose change matrix for the camera pose of the captured first frame image obtained in S503, and may obtain the camera pose of the captured multi-frame image. Compared with the method for directly processing the camera pose transformation matrix, the camera pose can more truly and accurately reflect the position and the pose of the camera when shooting video.

S705, performing smoothing on the camera pose to obtain a smoothed camera pose, where the smoothing may be used to reduce an influence on the camera pose caused by shake of the electronic device 100.

In some embodiments, the electronic device 100 may acquire an accurate, actual camera pose through S301 to S304. The camera pose is not subjected to the optimized anti-shake process, and may refer to an orientation, a position, and the like of the camera of the electronic device 100 in the real world during photographing. Wherein, the camera pose of the camera of the electronic device 100 may be expressed as:wherein R is _3×3 May be referred to as the rotation matrix of 3*3 described above, i.e., the electronic device 100The camera can be oriented (i.e., angled) with R _3×3 To represent; t may refer to the translation matrix of 3*1 described above, i.e., the position (i.e., distance) of the camera of the electronic device 100 may be denoted by t.

In this way, the electronic device 100 may perform smoothing processing on the camera pose T to obtain a smoothed camera pose (i.e., a smoothed camera pose).

For example, the electronic device 100 may establish a camera pose optimization model, and may obtain the smoothed camera pose after inputting the camera pose T into the camera pose optimization model The camera pose optimization model can be expressed as:

s.t.||H·P _i -P _i ||≤Δ，i＝1，2，3，4

the camera pose optimization model can be designed and configured according to the algorithm, and can be used for carrying out operation processing on the input camera pose T and outputting the camera poseSpecifically, the camera pose optimization model can be used for performing association processing on the camera poses adjacent in time, setting a threshold based on the current camera pose, and stabilizing the change degree of the later camera pose within the threshold so as to achieve a smoothing effect. In the embodiment of the application, a specific algorithm of the camera pose optimization model is not limited, and the situation that the current camera pose is used as a standard equipment threshold value and the threshold value stabilization processing is carried out on the previous camera pose may exist. Camera pose->It may also consist of a rotation matrix R and a smoothing matrix t.

Thus, the electronic deviceThe device 100 can obtain the camera pose after the anti-shake processingThe influence of the shake of the electronic device 100 on the camera pose T can be reduced, and data support is provided for the subsequent mapping processing of the pixel points.

S706, performing mapping processing on the acquired image by using a mapping matrix to obtain an image with reduced jitter, so as to output a stabilized video composed of the image with reduced jitter, wherein the mapping matrix can be determined according to the smooth camera pose and the depth information of the image.

In some embodiments, the electronic device 100 may calculate depth information of the feature points in addition to the coordinate information from which the feature points can be extracted. Specifically, the electronic device 100 may calculate the depth information of the feature point according to the coordinates of the pixel point of the feature point, the camera internal parameter, the camera external parameter, the world coordinates of the feature point, and the like, and in the embodiment of the present application, the manner of obtaining the depth information of the feature point is not limited.

In one possible implementation, the electronic device 100 may calculate dense depth values for the image. The electronic device 100 may then calculate a mapping matrix from the dense depth values and the smooth camera pose described above. The depth information of the image may be divided into sparse depth values and dense depth values. The sparse depth value may refer to depth information of feature points in the image, and the dense depth value may include depth information of feature points in the image and depth information of some or all pixels other than the feature points in the image.

There are two ways for the electronic device 100 to obtain dense depth information:

(1) If the camera of the electronic device 100 is a depth (RGB-D) camera, a depth sensor (depth sensor) may be included within the electronic device 100. The depth sensor of the electronic device 100 may obtain dense depth values of the image directly from the image captured by the camera, from which the electronic device 100 may construct a depth map.

(2) If the camera of the electronic device 100 is a monocular camera, and the monocular camera does not include a depth sensor, the electronic device 100 may calculate sparse depth values of the image, and obtain dense depth values by performing an operation of a correlation algorithm on the sparse depth values. Optionally, the electronic device 100 may also acquire dense depth values in combination with a device capable of measuring depth information (e.g., a lidar). For example, the electronic device 100 may combine the depth information of the feature points of the acquired image with the laser point cloud depth values acquired by the laser radar to obtain dense depth values of the image. The electronic device 100 may also construct a depth map according to the dense depth value, where the depth map includes depth information of the image, and may intuitively display the depth information of the image in the form of the image.

The electronic device 100 may combine the depth information (i.e. dense depth values) and the camera smoothing camera to map the image actually collected by the electronic device 100, so as to obtain a smoothed multi-frame image. Further, the electronic device 100 may obtain coordinates of the pixel points on the image, and then map the coordinates of the pixel points to obtain the coordinates of the smoothed pixel points, thereby obtaining the image with reduced jitter.

The process of mapping the image will be described in detail.

For example, the electronic device 100 may obtain the smoothed pixel point by performing the mapping process on the pixel point.

Referring to fig. 8, fig. 8 schematically illustrates a mapping process of pixels by a mapping matrix. Since the video shot by the electronic device 100 is composed of a sequence of images with one frame being continuous, in order to more intuitively illustrate the mapping of pixels in the video shot by the electronic device 100, the mapping of pixels in a single image is taken as an example. As shown in fig. 8, an image 801 refers to an image actually captured by the camera of the electronic device 100, and a point m on the image 801 is a pixel point (i.e., a pixel point where mapping processing is required) on the image actually captured by the camera of the electronic device 100; plane 802 refers to an object in the real world photographed by the camera of electronic device 100, which is simplified to a plane for illustration in fig. 8 for more visual brief description, and point X on plane 802 is the corresponding object position of m in the real world; the image 803 is a smoothed image of the image 801, and a point m' on the image 803 is a pixel point on the smoothed image (i.e., a pixel point for which the mapping process has been completed); point o represents the position of the camera of the electronic device 100, also the origin of the camera coordinate system; the point o' represents the virtual position of the camera assumed to be capturing the image after the smoothing process.

The electronic device 100 may calculate the coordinates of the pixel point of the known point m by using a mapping matrix to obtain coordinates of the pixel point of the point m', where the mapping matrix may be expressed as:

the mapping matrix described above may be referred to as a homography matrix induced by a spatial plane. Where d represents the distance from the plane 802 where the point X is located to the origin o of the camera coordinate system, and actually, the value indicated by d is the obtained dense depth value; n represents the normal vector to plane 802; r represents the rotation matrix in the smoothed camera pose obtained in S705; t represents a translation matrix in the smoothed camera pose obtained in S705; k is a camera reference, and here may be represented as a transformation matrix of a pixel coordinate system and a camera coordinate system, which is not limited.

In one possible implementation, if the electronic device 100 does not obtain the dense depth value, it may be understood that the d value is infinity, and then the mapping matrix may be expressed as:

H _∞ ＝K′RK ^-1

the above-described mapping matrix may be referred to as an infinity homography matrix. Wherein R also represents the rotation matrix in the smoothed camera pose obtained in S705; k is also denoted a camera reference, which here can be denoted as a transformation matrix of the pixel coordinate system and the camera coordinate system. Because of the lack of depth values, the mapping effect of the infinity homography matrix is far less than that of the homography matrix induced by a space plane, and a large error occurs in the coordinates of mapped pixel points, so that smooth transition of continuous images cannot be achieved, namely the jitter reducing effect of video formed by the continuous images is not obvious. In the embodiment of the present application, the electronic device 100 may use the homography matrix induced by the spatial plane to map the pixel points.

In some embodiments, a pixel point requiring the mapping process may be referred to as a first pixel point, and a pixel point completing the mapping process may be referred to as a second pixel point.

After the electronic device 100 obtains all the pixel points in the continuous image after the mapping process, the smooth continuous image can be obtained, so that a stable image video can be output, and the anti-shake effect of the video is realized. The dynamic picture or video displayed on the interface of the electronic device 100 is a stabilized video with anti-shake effect.

In some embodiments, the data acquisition module 201 of the electronic device 100 may perform S701, the pose calculation module 202 may perform S702, S703, and S704, the pose optimization module 203 may perform S705, and the anti-shake output module 204 may perform S706.

According to the method, the electronic equipment can display the image stabilizing video subjected to the electronic anti-shake processing, and the visual perception of a user is guaranteed. Because the position and the gesture of the camera in the electronic equipment can be reflected more truly by the electronic equipment through the camera gesture obtained after the IMU calibration than by using the camera or the gesture obtained by the data acquired by the IMU alone, the smooth camera gesture obtained based on the calibrated camera gesture can be reflected more accurately on the position and the gesture of the camera in the electronic equipment under the condition that the shake is not generated or the generated shake is far smaller than the actual shake of the camera. That is, the electronic apparatus can perform anti-shake processing on a photographed image by estimating a camera pose in a case where shake is not generated or generated shake is small when the image is photographed. Therefore, when the conditions of large-amplitude jitter, video with darker ambient light and the like are processed, the jitter reducing effect of the video jitter preventing method is more obvious, and the video jitter preventing method provided by the embodiment of the application can improve the accuracy and the robustness of video jitter prevention.

In the embodiment of the present application, the anti-shake processing, smoothing processing and optimizing processing on the image, the camera pose and the pixel point can be understood as similar to the same meaning, that is, the image, the camera pose and the pixel point after the processing are the image, the camera pose and the pixel point with the anti-shake effect required by the method according to the embodiment of the present application. The functions represented by the anti-shake processing, the smoothing processing, and the optimization processing are described in the present embodiment, and the names thereof are not limited to the present embodiment.

The embodiments of the present application may be arbitrarily combined to achieve different technical effects.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions in accordance with the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.

Those of ordinary skill in the art will appreciate that implementing all or part of the above-described method embodiments may be accomplished by a computer program to instruct related hardware, the program may be stored in a computer readable storage medium, and the program may include the above-described method embodiments when executed. And the aforementioned storage medium includes: ROM or random access memory RAM, magnetic or optical disk, etc.

In summary, the foregoing description is only exemplary embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made according to the disclosure of the present invention should be included in the protection scope of the present invention.

Claims

1. A method of video anti-shake applied to an electronic device, comprising:

the electronic equipment receives a first operation, and displays a first interface provided by a camera application, wherein the first interface displays a preview area;

the electronic equipment acquires an image through a camera;

the electronic equipment determines the pose of a camera in the process of collecting the image; the camera pose is used for representing the position and the pose of the camera under the condition that the electronic equipment shakes;

The electronic equipment performs smoothing treatment on the camera pose to obtain a smoothed camera pose; the smooth camera pose is used for representing the position and the pose of the camera under the condition that the electronic equipment does not generate the shake, or the generated shake is smaller than the shake;

the electronic equipment processes the image according to the smooth camera pose and the depth information of the image to obtain a smooth image;

the electronic device displays the smoothed image in the preview area.

2. A method of video anti-shake applied to an electronic device, comprising:

the electronic equipment receives a second operation, and displays a second interface provided by a camera application, wherein the second interface displays a video shooting area;

the electronic equipment acquires an image through a camera;

the electronic device displays the smoothed image in the video capture area.

3. The method of claim 1 or 2, wherein the image acquired by the electronic device through the camera comprises a first image acquired at a first time and a second image acquired at a second time; the electronic equipment determines the pose of a camera in the process of collecting the image, and specifically comprises the following steps:

the electronic equipment determines a camera pose change amount according to the first image and the second image, wherein the camera pose change amount is used for representing the change amount of the camera between the camera pose at the first moment and the camera pose at the second moment;

the electronic equipment determines IMU pose change amount according to IMU data acquired by an Inertial Measurement Unit (IMU) between the first moment and the second moment, wherein the IMU pose change amount is used for representing change amount between IMU pose of the IMU at the first moment and IMU pose of the IMU at the second moment, and the IMU pose is used for representing position and pose of the IMU under the condition that the electronic equipment generates shaking;

The electronic equipment uses the IMU pose change amount to calibrate the camera pose change amount to obtain a calibrated camera pose change amount;

the electronic equipment obtains the camera pose at the second moment according to the camera pose at the first moment and the calibrated camera pose variation;

the electronic equipment performs smoothing treatment on the camera pose to obtain a smoothed camera pose, and specifically comprises the following steps:

the electronic equipment performs smoothing treatment on the camera pose at the second moment to obtain a second smooth camera pose; the second smooth camera pose is used for representing the position and the pose of the camera under the condition that the electronic equipment does not generate the shake or the generated shake is smaller than the shake at the second moment;

the electronic device processes the image according to the smooth camera pose and the depth information of the image to obtain a smooth image, and specifically comprises the following steps:

the electronic equipment processes the second image according to the pose of the second sliding camera and the depth information of the second image to obtain a second sliding image; the smooth image displayed by the electronic device in the preview area or the video capturing area includes the second smooth image.

4. The method of claim 3, wherein the first image is a first frame image acquired by the electronic device after receiving the first operation; or the first image is a first frame image acquired after the electronic equipment receives the second operation;

and the camera pose at the first moment is a preset initial camera pose.

5. The method according to claim 3 or 4, wherein the electronic device determines a camera pose change amount according to the first image and the second image, specifically comprising:

the electronic equipment extracts a first characteristic point from the first image and extracts a second characteristic point from the second image;

and the electronic equipment determines the pose change amount of the camera according to the first characteristic points and the second characteristic points.

6. The method of any of claims 1-5, wherein the image acquired by the electronic device through the camera comprises a first image acquired at a first time and a second image acquired at a second time;

the camera pose determined by the electronic equipment in the process of acquiring the image comprises the camera pose at the first moment and the camera pose at the second moment;

The electronic equipment performs smoothing treatment on the camera pose to obtain a smoothed camera pose, and specifically comprises the following steps: and carrying out smoothing treatment on the camera pose at the second moment to obtain the second smooth camera pose, wherein the variation between the second smooth camera pose and the camera pose at the first moment is smaller than a threshold value.

7. The method of any of claims 1-6, wherein the electronic device, prior to processing the image according to the smoothed camera pose and depth information of the image, further comprises:

the electronic equipment acquires depth information of the image through a depth camera;

or the electronic equipment determines the depth information of the characteristic points in the acquired image, and obtains the depth information of the image according to the depth information of the characteristic points in the image.

8. The method according to any one of claims 1-7, wherein the electronic device processes the image according to the smoothed camera pose and depth information of the image to obtain the smoothed image, specifically comprising:

the electronic device obtains a mapping matrix using the smoothed camera pose and depth information of the image,

The electronic equipment performs mapping processing on a plurality of first pixel points on the image through the mapping matrix to obtain a plurality of second pixel points, and the smooth image is obtained according to the plurality of second pixel points.

9. An electronic device comprising a memory for storing a computer program, one or more processors for invoking the computer program to cause the electronic device to perform the method of any of claims 1 to 8.

10. A computer readable storage medium comprising instructions which, when run on an electronic device, cause the electronic device to perform the method of any one of claims 1 to 8.