CN115150542B

CN115150542B - Video anti-shake method and related equipment

Info

Publication number: CN115150542B
Application number: CN202110343723.1A
Authority: CN
Inventors: 郑淇; 刘蒙; 段光菲; 徐其超; 刘志鹏; 贾志平
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2021-03-30
Filing date: 2021-03-30
Publication date: 2023-11-14
Anticipated expiration: 2041-03-30
Also published as: CN115150542A

Abstract

The embodiment of the application provides a video anti-shake method and related equipment. Wherein the method comprises the following steps: acquiring data, the data comprising first sensor data and second sensor data; the first sensor data and the second sensor data are used for pose estimation; the pose is used for representing the conversion relation between the world coordinate system and the camera coordinate system; the first sensor data includes event camera data; the second sensor data comprises a sequence of images; the image sequence comprises a series of image frames; obtaining a first panning position based on the event camera data and the image sequence; obtaining a second translational pose according to the image sequence; and correcting the second translational pose according to the first translational pose to obtain a corrected translational pose. The method can collect more effective data through the event camera, breaks through the limit of the original sensor, and can achieve better anti-shake effect in more scenes.

Description

Video anti-shake method and related equipment

Technical Field

The application relates to the technical field of anti-shake, in particular to a video anti-shake method and related equipment.

Background

At present, because intelligent terminals such as mobile phones are powerful and portable, many people use the intelligent terminals to shoot, and people have higher and higher requirements on shooting, so that the intelligent terminals are required to bring better shooting effects, and most intelligent terminals adopt an anti-shake scheme.

In the prior art, the pose is generally acquired by using sensors such as an inertial sensor, an image sensor and the like, the acquired pose is processed, and finally a video is output. The anti-shake mode is likely to influence the anti-shake effect of shooting due to the limitation of sensor hardware, scene change and the like.

Therefore, how to improve the anti-shake effect of shooting under the condition of scene change, limited sensor and the like is a problem to be solved in the present day.

Disclosure of Invention

The application provides a video anti-shake method and related equipment, which can acquire data through an event camera to obtain a high-dynamic and high-frame-rate pose, then correct other low-frame-rate and unreliable poses according to the obtained high-dynamic and high-frame-rate pose, perform subsequent anti-shake processing on the corrected poses, and finally output a video.

In a first aspect, the present application provides a video anti-shake method, which may include: acquiring data, the data comprising first sensor data and second sensor data; the first sensor data and the second sensor data are used for pose estimation; the pose is used for representing the conversion relation between the world coordinate system and the camera coordinate system; the first sensor data includes event camera data; the second sensor data comprises a sequence of images; the image sequence comprises a series of image frames; obtaining a first panning position based on the event camera data and the image sequence; obtaining a second translational pose according to the image sequence; and correcting the second translational pose according to the first translational pose to obtain a corrected translational pose.

In the scheme provided by the application, the first sensor (event camera and the like) is used for acquiring the first sensor data (event camera data and the like), the pose estimation is carried out based on the first sensor data to obtain the high-frame-rate and high-dynamic pose, the pose estimation is carried out based on the second sensor data to obtain the low-frame-rate and unreliable pose, the low-frame-rate and unreliable pose is corrected according to the obtained high-frame-rate and high-dynamic pose, the corrected pose is subjected to the subsequent anti-shake processing, and finally the video is output.

With reference to the first aspect, in a possible implementation manner of the first aspect, before the correcting the second translational pose according to the first translational pose, the method further includes: inputting the event camera data and the image sequence into a scene judgment model, and determining correction weight according to the scene judgment model; the correction weight comprises a combination relation of the first translational pose and the second translational pose; the correcting the second translational posture according to the first translational posture includes: and correcting the second translational pose based on the correction weight and the first translational pose.

In the scheme provided by the application, the scene judgment model is used for determining the correction weight to obtain a combination relation of the first translational pose and the second translational pose, and the second translational pose is corrected based on the combination relation to obtain the corrected translational pose, wherein the scene judgment model can be used for judging the illumination intensity in the image sequence, and the correction weight is changed according to the difference of the illumination intensity so as to achieve a better correction effect.

With reference to the first aspect, in one possible implementation manner of the first aspect, smoothing the corrected translational pose to obtain a smoothed translational pose; obtaining image deformation information according to the smoothed translation pose; the image deformation information comprises an image deformation vector; the image deformation vector is used for representing the offset of the image frame; performing image deformation processing on the image frame according to the image deformation information to obtain a processed image frame; and outputting the processed image frame.

In the scheme provided by the application, after the corrected pose is obtained, a series of subsequent anti-shake processing including smoothing processing and deformation processing can be further carried out on the corrected pose, so that the anti-shake effect is improved on the basis of pose correction.

With reference to the first aspect, in a possible implementation manner of the first aspect, the second sensor data further includes IMU data, and the method further includes: obtaining a first rotation gesture based on the event camera data and the image sequence; the first rotational gesture includes a first rotational vector; obtaining a second rotation pose according to the IMU data; the second rotational gesture includes a second rotational vector; and correcting the second rotation posture according to the first rotation posture to obtain a first rotation correction posture.

In the scheme provided by the application, the general sensors (such as an IMU sensor and the like) are utilized to realize the anti-shake under the normal scene, but due to the limitation of the sensors, the obtained pose may be a low frame rate and unreliable pose, so that the anti-shake effect is not ideal, and at the moment, the low frame rate and unreliable pose is corrected by utilizing the high frame rate and high dynamic pose, so that the data acquired by the general sensors can be utilized, and the better anti-shake effect can be ensured.

With reference to the first aspect, in a possible implementation manner of the first aspect, the correcting the second rotation gesture according to the first rotation gesture includes: determining an outer product of the first rotation vector and the second rotation vector to obtain a first error vector; and carrying out compensation processing on the second rotation vector according to the first error vector.

In the scheme provided by the application, the error vector can be determined to compensate the low frame rate and unreliable pose, so that the more accurate pose is obtained, the finally output video has no obvious drift, and the video anti-shake effect is improved.

With reference to the first aspect, in a possible implementation manner of the first aspect, the second sensor data further includes depth data, and the method further includes: obtaining a first rotation gesture based on the event camera data and the image sequence; the first rotational gesture includes a first rotational vector; obtaining a third rotation pose according to the depth data and the image sequence; the third rotational gesture includes a third rotational vector; and correcting the third rotation pose according to the first rotation pose to obtain a second rotation correction pose.

In the scheme provided by the application, the pose estimation can be performed by using the depth sensor to obtain the low-frame-rate unreliable pose, and the low-frame-rate unreliable pose is corrected according to the high-frame-rate and high-dynamic pose obtained by the sensor such as the event camera, so that the corrected pose is obtained, and the stable video is output after the subsequent anti-shake processing, so that the data acquired by the depth sensor is utilized, and the better anti-shake effect is ensured.

With reference to the first aspect, in a possible implementation manner of the first aspect, the second sensor data further includes depth data, and the method further includes: obtaining a third rotation pose according to the depth data and the image sequence; the third rotational gesture includes a third rotational vector; correcting the third rotation pose according to the first rotation pose to obtain a second rotation correction pose; and obtaining a third rotation correcting pose according to the first rotation correcting pose and the second rotation correcting pose.

In the scheme provided by the application, the pose estimation can be performed by utilizing various types of sensors, such as a depth sensor, an IMU sensor and the like, so that the pose estimation is performed to obtain the pose with low frame rate and unreliability, and the pose is corrected according to the pose with high frame rate and high dynamic state, so that the effective data acquired by the sensors are utilized, and meanwhile, the anti-shake effect is effectively improved.

With reference to the first aspect, in a possible implementation manner of the first aspect, correcting the third rotation gesture according to the first rotation gesture includes: determining an outer product of the first rotation vector and the third rotation vector to obtain a second error vector; and carrying out compensation processing on the third rotation vector according to the second error vector.

In the scheme provided by the application, the error vector can be determined to compensate the low frame rate and unreliable pose, so that the more accurate pose is obtained, the finally output video has no obvious drift or jelly effect, and the video anti-shake effect is improved.

In a second aspect, the present application provides an electronic device, comprising: one or more processors, memory, and communication interfaces; the memory is coupled with the one or more processors, the memory is for storing computer program code, the computer program code comprising computer instructions for invoking the computer instructions to cause the electronic device to perform: acquiring data, the data comprising first sensor data and second sensor data; the first sensor data and the second sensor data are used for pose estimation; the pose is used for representing the conversion relation between the world coordinate system and the camera coordinate system; the first sensor data includes event camera data; the second sensor data comprises a sequence of images; the image sequence comprises a series of image frames; obtaining a first panning position based on the event camera data and the image sequence; obtaining a second translational pose according to the image sequence; and correcting the second translational pose according to the first translational pose to obtain a corrected translational pose.

With reference to the second aspect, in a possible implementation manner of the second aspect, before the one or more processors are configured to call the computer instructions to cause the electronic device to perform correcting the second translational pose according to the first translational pose, the one or more processors are further configured to call the computer instructions to cause the electronic device to perform: inputting the event camera data and the image sequence into a scene judgment model, and determining correction weight according to the scene judgment model; the correction weight comprises a combination relation of the first translational pose and the second translational pose; the one or more processors, when configured to invoke the computer instructions to cause the electronic device to perform correcting the second translational pose according to the first translational pose, are specifically configured to invoke the computer instructions to cause the electronic device to perform: and correcting the second translational pose based on the correction weight and the first translational pose.

With reference to the second aspect, in a possible implementation manner of the second aspect, the one or more processors are further configured to invoke the computer instructions to cause the electronic device to perform: performing smoothing treatment on the corrected translational pose to obtain a smoothed translational pose; obtaining image deformation information according to the smoothed translation pose; the image deformation information comprises an image deformation vector; the image deformation vector is used for representing the offset of the image frame; performing image deformation processing on the image frame according to the image deformation information to obtain a processed image frame; and outputting the processed image frame.

With reference to the second aspect, in a possible implementation manner of the second aspect, the second sensor data further includes IMU data; the one or more processors are further configured to invoke the computer instructions to cause the electronic device to perform: obtaining a first rotation gesture based on the event camera data and the image sequence; the first rotational gesture includes a first rotational vector; obtaining a second rotation pose according to the IMU data; the second rotational gesture includes a second rotational vector; and correcting the second rotation posture according to the first rotation posture to obtain a first rotation correction posture.

With reference to the second aspect, in a possible implementation manner of the second aspect, the one or more processors are configured, when configured to invoke the computer instructions to cause the electronic device to execute the correcting the second rotation gesture according to the first rotation gesture, specifically configured to invoke the computer instructions to cause the electronic device to execute: determining an outer product of the first rotation vector and the second rotation vector to obtain a first error vector; and carrying out compensation processing on the second rotation vector according to the first error vector.

With reference to the second aspect, in a possible implementation manner of the second aspect, the second sensor data further includes depth data; the one or more processors are further configured to invoke the computer instructions to cause the electronic device to perform: obtaining a first rotation gesture based on the event camera data and the image sequence; the first rotational gesture includes a first rotational vector; obtaining a third rotation pose according to the depth data and the image sequence; the third rotational gesture includes a third rotational vector; and correcting the third rotation pose according to the first rotation pose to obtain a second rotation correction pose.

With reference to the second aspect, in a possible implementation manner of the second aspect, the second sensor data further includes depth data; the one or more processors are further configured to invoke the computer instructions to cause the electronic device to perform: obtaining a third rotation pose according to the depth data and the image sequence; the third rotational gesture includes a third rotational vector; correcting the third rotation pose according to the first rotation pose to obtain a second rotation correction pose; and obtaining a third rotation correcting pose according to the first rotation correcting pose and the second rotation correcting pose.

With reference to the second aspect, in a possible implementation manner of the second aspect, the one or more processors are configured, when configured to invoke the computer instructions to cause the electronic device to execute the correcting the third rotation gesture according to the first rotation gesture, specifically configured to invoke the computer instructions to cause the electronic device to execute: determining an outer product of the first rotation vector and the third rotation vector to obtain a second error vector; and carrying out compensation processing on the third rotation vector according to the second error vector.

In a third aspect, the present application provides a computer readable storage medium storing a computer program which, when executed by a processor, performs the above first aspect and the video anti-shake method provided in connection with any one of the implementations of the above first aspect.

In a fourth aspect, the present application provides a computer program product comprising instructions which, when executed by a computer, cause the computer to perform the above-described first aspect and the flow of a video anti-shake method provided in connection with any one of the implementations of the above-described first aspect.

In a fifth aspect, an embodiment of the present application provides a chip, where the chip is applied to an electronic device, and the chip includes one or more processors, where the processors are configured to invoke computer instructions to cause the electronic device to perform the video anti-shake method described in the first aspect and any implementation manner of the first aspect.

In a sixth aspect, the present application provides a chip system, where the chip system is applied to an electronic device, and is configured to support the electronic device to implement the functions referred to in the first aspect, for example, generate or process information referred to in the video anti-shake method in the first aspect. In one possible design, the chip system further includes a memory for storing program instructions and data necessary for the data transmission device. The chip system can be composed of a chip, and can also comprise the chip and other discrete devices.

It will be appreciated that an electronic device provided in the second aspect, a computer readable storage medium provided in the third aspect, a computer program product containing instructions provided in the fourth aspect, a chip provided in the fifth aspect, and a chip system provided in the sixth aspect are all configured to perform the video anti-shake method provided in the first aspect. Therefore, the advantages achieved by the method of the first aspect can be referred to as the advantages of the video anti-shake method, and will not be described herein.

Drawings

Fig. 1 is a schematic diagram of a video anti-shake system architecture according to an embodiment of the present application;

FIG. 2 is a schematic diagram of another video anti-shake system architecture according to an embodiment of the present application;

fig. 3 is a flow chart of a video anti-shake method according to an embodiment of the present application;

fig. 4 is a flowchart of another video anti-shake method according to an embodiment of the present application;

fig. 5 is a flowchart of another video anti-shake method according to an embodiment of the present application;

fig. 6 is a flowchart of another video anti-shake method according to an embodiment of the present application;

fig. 7 is a flowchart of another video anti-shake method according to an embodiment of the present application;

fig. 8 is a flowchart of another video anti-shake method according to an embodiment of the present application;

fig. 9 is a schematic diagram of a hardware structure of a terminal according to an embodiment of the present application;

fig. 10 is a schematic software structure of a terminal according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made more apparent and fully hereinafter with reference to the accompanying drawings, in which some, but not all embodiments of the application are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

First, some of the terms and related techniques involved in the present application are explained for easy understanding by those skilled in the art.

An event camera, also known as a neuromorphic camera, a silicon retina or dynamic vision sensor, is an imaging sensor capable of responding to local brightness changes. The event camera captures an image without using a shutter as in the conventional camera, and the event camera captures an "event" which can be simply understood as "change in pixel brightness", that is, the event camera outputs a change in pixel brightness.

An inertial sensor (Inertial Measurement Unit, IMU) is a sensor that primarily detects and measures acceleration, tilt, shock, vibration, rotation, and multiple degree of freedom (DoF) motions, and is an important component for solving navigation, orientation, and motion vector control. The most basic inertial sensors include accelerometers and angular velocity meters (gyroscopes), which are the core components of the inertial system and are the primary factors affecting the performance of the inertial system. In particular, the influence of the drift of the gyroscope on the increase of the position error of the inertial navigation system is a cubic function of time. And the gyroscope with high precision is difficult to manufacture and has high cost. It is therefore also a currently sought-after goal to increase the accuracy of gyroscopes, while reducing their cost.

The depth sensor is a sensor capable of detecting depth information, and mainly comprises a camera matrix sensor, a TOF sensor and a structure light sensor, because three technologies, namely a camera array technology, a TOF (time of flight) technology and a structured light-based depth detection technology, are three technologies on which people mainly depend for detecting environmental depth information at present.

Camera arrays typically use multiple cameras placed at different locations to capture multiple images of the same target and calculate a depth map from the geometry. For each point in space, a measurable difference occurs in position in the images captured by the different cameras, and depth can be calculated by the basic geometry.

TOF is an abbreviation for Time of Flight (Time of Flight) technology, and the principle of detecting depth information using TOF sensors is: the TOF sensor emits modulated near infrared light, reflects after encountering an object, converts the distance of a shot scene by calculating the time difference or the phase difference between light emission and reflection so as to generate depth information, and can display the three-dimensional outline of the object in a topographic map mode of representing different distances in different colors by combining with the traditional camera shooting.

The structured light sensor projects a target object using a laser light source, detects deformation of the reflective target object, and calculates a depth map based on the geometry. The structured light sensor has to scan the whole plane to obtain the depth map for the required time and is therefore very accurate. However, structured light sensors are sensitive to ambient light and are therefore typically used only in dark or indoor areas.

The jelly effect is a phenomenon generated during photographing and is determined by the characteristics of the camera itself. Most cameras using CMOS sensors use rolling shutters, which are implemented by way of line-by-line exposure of the image sensor. At the beginning of exposure, the image sensor scans line by line and exposes line by line until all pixels are exposed. Of course, all actions are completed in a very short time, and generally have no influence on shooting. But if the photographed object moves at a high speed or vibrates rapidly with respect to the camera. When the shutter type photographing is adopted, the progressive scanning speed is insufficient, and the photographing result may be inclined, unstable swing or partial exposure. The phenomenon of this rolling shutter type photographing is defined as a jelly effect.

Each pixel of a digitized image is described by a set of binary numbers containing bits representing the color of the image, the number of binary bits each pixel color occupies in the image being referred to as the image depth. The image depth determines the number of colors that each pixel of the color image may have, or the number of gray levels that each pixel of the gray image may have. It determines the maximum number of colors that can occur in a color image, or the maximum gray level in a gray image. For a bitmap, the image depth is a constant, which determines the number of colors that can be used at most in an image, in which case each pixel is either dark or bright, i.e. a monochrome image, if it has only one color bit (note that this is not necessarily a black and white image, which simply limits the image to use only two chromaticities or colors). If there are 4 color bits per pixel, the bitmap supports 2^4 =16 colors; if 8 color bits per pixel, the bitmap may support 256 different colors. The image depth is a basis for describing the gray value or the color number of each pixel in the image depth information, in the monocular image depth estimation problem, the depth information of a scene corresponding to an image is generally described by a gray map with the same size, the gray value of each pixel in the gray map describes the depth value of the scene corresponding to the point, the gray map is also called a depth map, and only if the pixel depth is known, the depth value corresponding to each pixel point can be calculated.

Visual Odometry (VO) is a very critical component of SLAM technology, focusing mainly on computer vision algorithms.

The world coordinate system is the absolute coordinate system of the system, and the coordinates of all points on the screen are determined from the origin of the coordinate system before the user coordinate system is not established.

The camera coordinate system is a three-dimensional rectangular coordinate system established by taking the focusing center of the camera as an origin and taking the optical axis as a Z axis. The origin of the camera coordinate system is the optical center of the camera, the X-axis and the Y-axis are parallel to the X-axis and the Y-axis of the image, and the z-axis is the optical axis of the camera and is perpendicular to the plane of the image. The intersection point of the optical axis and the image plane is the origin of the image coordinate system, and the image coordinate system is a two-dimensional rectangular coordinate system.

The quaternion is a simple supercomplex. Complex numbers are made up of real numbers plus imaginary units i, where i 2 = -1. Similarly, quaternions are all made up of real numbers plus three imaginary units i, j, k, and they have the following relationship: i 2 = j 2 = k 2 = -1, i 0 = j 0 = k 0 = 1, each quaternion is a linear combination of 1, i, j and k, i.e. the quaternion may be generally expressed as a+bi+cj+dk, where a, b, c, d is a real number.

At present, most people can save beautiful moments in life by taking videos and pictures, and along with the continuous development of multimedia technology, the requirements of people for shooting are higher and higher, which means that the requirements of people for intelligent terminals with shooting functions are higher and higher.

In order to better meet the requirements of users, most intelligent terminals adopt an anti-shake technology, as shown in fig. 1, fig. 1 is a schematic diagram of a video anti-shake system architecture provided by an embodiment of the present application, generally, first, data are acquired by sensors in the intelligent terminals, pose estimation is performed after information fusion of the data, the intelligent terminals perform smoothing processing and image deformation processing on the obtained pose, and finally output videos, where in the anti-shake process shown in fig. 1, the used sensors are different, and corresponding processing processes are different.

If an inertial sensor is adopted to perform anti-shake, IMU data are acquired through the IMU, the pose of the camera is obtained according to the IMU data, so that an initial rotation path is obtained, then the initial rotation path is smoothed, and a smoothing result is applied to image deformation to realize the function of video image stabilization. However, this method is affected by IMU accuracy, frequency drift, and update frequency, where low IMU accuracy results in unreliable IMU data, frequency drift results in an offset from the true value, and low update frequency results in untimely data update.

If the anti-shake is performed based on the image acquired by the camera, specifically, the motion of the frame is estimated by tracking the feature points between the continuous frames, an initial path is obtained, the initial path is smoothed, and finally, the smoothing result is applied to image deformation, so that the function of video image stabilization is realized. However, this method relies on extraction and tracking of feature points, which is not effective in special scenes, for example, in strong and dim light scenes, the image may be overexposed or underexposed; when the motion is fast, motion blur occurs. These special scenes may result in poor quality of the acquired image, and may not effectively extract feature points, and may not perform subsequent anti-shake operations. In addition, because of the limitation of the image frame rate, the update rate of the pose of the camera is low.

If the anti-shake is performed based on the depth sensor, firstly, a depth image is acquired through the depth sensor, then a motion field is obtained through the depth image and the image frame, and the camera motion is compensated according to the motion field, so that the function of video image stabilization is realized. However, this method is limited by the depth sensor, and many pixel values of the depth map are thus lost, and when the image morphing operation is performed, the portion where the pixel value is lost performs image morphing of the portion where no pixel value is lost, resulting in occurrence of a jelly effect. Furthermore, because of the limitation of the depth sensor, the update rate of the pose of the camera is low.

It can be understood that the anti-shake method based on the three sensors cannot accurately represent the real pose information of the camera, and the anti-shake effect of the video cannot be effectively guaranteed.

Based on the above, the application provides a video anti-shake method and related equipment, which can fully exert the advantages of high frame rate and high dynamic characteristics of an event camera in anti-shake, supplement low frame rate information in the existing anti-shake technology by utilizing the high frame rate characteristics, and acquire reliable data under special scenes (such as strong light, dim light and other extreme illumination conditions) by utilizing the high dynamic characteristics, so that the anti-shake effect of shooting is improved under the conditions of scene change, limited sensor and the like.

For a better understanding of the embodiments of the present application, a description will first be given of a video anti-shake system architecture based on the embodiments of the present application. Referring to fig. 2, fig. 2 is a schematic diagram of another video anti-shake system architecture provided by an embodiment of the present application, where the video anti-shake system architecture includes a first sensor and a second sensor, and a terminal performs information fusion on data generated by the first sensor and data generated by the second sensor, and performs pose estimation according to a result of the information fusion, so as to obtain a pose of a camera.

It is understood that the first sensor includes, but is not limited to, an event camera, and the second sensor includes, but is not limited to, a camera head, an inertial sensor (IMU), and a depth sensor.

Specifically, referring to fig. 3 for explaining the system architecture shown in fig. 2, fig. 3 is a flow chart of a video anti-shake method provided by an embodiment of the present application, as shown in fig. 3, when a user starts a video recording function, an instruction to start an event camera, a camera and other anti-shake sensors (IMU, a depth sensor, etc.) is triggered, that is, a first sensor and a second sensor shown in fig. 2 are triggered to be started, pose estimation is performed according to data acquired by these sensors, and a pose of the camera can be obtained, specifically, a high frame rate and a high dynamic pose can be acquired according to the event camera and the camera, and a low frame rate and an unreliable pose can be acquired according to other sensors. And after the pose of the camera is obtained, correcting the pose with low frame rate and unreliable pose according to the obtained pose with high frame rate and high dynamic pose, smoothing the corrected pose, obtaining image deformation information according to the smoothing result, deforming according to the image deformation information and the image information obtained by the camera, and finally outputting a stable image video.

Based on the video anti-shake system architecture shown in fig. 2, an embodiment of the present application provides a video anti-shake method, referring to fig. 4, fig. 4 is a flow chart of another video anti-shake method according to an embodiment of the present application, where the method includes but is not limited to the following steps:

s401: data is acquired.

Specifically, the terminal acquires data through a first sensor and a second sensor, wherein the data comprises the first sensor data and the second sensor data; the first sensor data and the second sensor data are used for pose estimation; the pose is used for representing the conversion relation between the world coordinate system and the camera coordinate system; the first sensor data includes event camera data; the second sensor data comprises a sequence of images.

It is understood that the first sensor includes, but is not limited to, an event camera and the second sensor includes, but is not limited to, a camera; the event camera data includes an event stream representing a change in brightness of an image pixel; the image sequence comprises a series of image frames.

S402: a first panning position is determined based on the event camera data and the image sequence.

Specifically, since the event camera data acquired by the terminal includes an event stream, that is, includes a change condition of pixel brightness of an image, and the image sequence includes a series of image frames, reconstructing a corresponding image frame in the image sequence according to the change condition of the pixel brightness of the image, obtaining a reconstructed image, and then extracting and matching feature points of the reconstructed image to obtain a first panning gesture. It will be appreciated that the first panning gestures may represent a translation of adjacent reconstructed images, and in addition, the first panning gestures may be represented in a vector or matrix.

It is understood that the reconstructed image is an image of the image sequence after the image frames are processed, that is, the reconstructed image and the image frames in the image sequence represent images acquired at the same time, but the reconstructed image has more details than the image frames in the image sequence, which can be also understood as having higher image quality.

In addition, the adjacent reconstructed images are two-frame reconstructed images obtained by processing two adjacent image frames in the image sequence.

It can be understood that the event camera and the camera can continuously acquire event camera data and an image sequence, at a certain time point, the event camera data represents the change condition of the pixel brightness of the image at the time point, and the image sequence is represented as a certain image frame, namely a frame of image, at this time, the frame of image can be reconstructed according to the change condition of the pixel brightness of the image at the time point to obtain a reconstructed image, and then feature point extraction and matching are performed on the reconstructed image to obtain a first panning gesture. Since more and more stable feature points can be extracted when feature point extraction is performed on the reconstructed image, the obtained first panning posture is also more accurate.

In one embodiment of the application, a series of specific points in time may be selected to determine a first panning gesture at different points in time.

In one embodiment of the application, the event camera data may be processed to obtain an event image similar to an image, i.e. the event camera data is processed to be displayed in the form of an image, and then the first panning position is obtained based on the event image and the image sequence.

It is understood that the process of reconstructing an image can be implemented by a back projection method, an iterative reconstruction algorithm, an analytic method, and the like, and the image reconstruction can be performed by a training model, for example, a high-quality image is recovered based on an event-enhanced image degradation model and an event-enhanced sparse learning network (eSL-Net).

S403: from the image sequence, a second translational pose is determined.

Specifically, feature point extraction and matching are performed on corresponding image frames in an image sequence to obtain a second translational pose, wherein the second translational pose can be represented by a vector or a matrix, and it can be understood that the second translational pose can represent the translation of adjacent image frames in the image sequence.

S404: and correcting the second translational posture according to the first translational posture.

Specifically, scene judgment is performed on corresponding image frames in an image sequence to obtain correction weights, and based on the correction weights and the first translational pose, the second translational pose is corrected to obtain corrected translational pose. It is understood that the scene determination includes, but is not limited to, determining the illumination intensity in the image frame; the correction weight comprises a combination relation of the first flat shift gesture and the second flat shift gesture.

In one embodiment of the application, the first panning gesture is denoted as T ₁ The second translational pose is denoted as T ₂ The corrected translational pose is denoted as T ₃ ，T ₃ ＝F(T ₁ ,T ₂ ) Wherein F (T) ₁ ,T ₂ ) The relationship between the first and second translational positions is also understood to be F (T ₁ ,T ₂ ) Representing the combined relationship of the first and second translational gestures, i.e. F (T) ₁ ,T ₂ ) The corrective weights may be represented.

In one embodiment of the application, scene judgment can be realized by establishing a scene judgment model, and corresponding correction weights are obtained, namely, the image frames can be input into the scene judgment model, and the combination relation of the first flat shift gesture and the second flat shift gesture is obtained.

It can be appreciated that the scene determination model may be an AI model, and training of the initial AI model is required before processing with the AI model, and in the embodiment of the present application, the initial AI model is trained with a sample image acquired by a camera and event camera data (such as an event stream and the like) acquired by an event camera. Because the scene judgment can comprise judgment of a plurality of factors such as illumination intensity, the training process can also consider the plurality of factors correspondingly, taking the illumination intensity as an example, a sample image and a corresponding event stream (the sample image and the event stream generated at the same time) are input to obtain a first translational gesture and a second translational gesture, judging the illumination intensity according to the sample image, correcting the second translational gesture according to different correction weights to obtain corrected translational gestures under different correction weights, determining the most accurate translational gesture in the corrected translational gestures, outputting the correction weight corresponding to the most accurate corrected translational gesture, and the output correction weight is the optimal correction weight under the illumination intensity, namely the combination relation of the first translational gesture and the second translational gesture with the best correction effect under the illumination condition. In one embodiment of the present application, the illumination intensity may be determined by the brightness of the sample image, for example, comparing the number of pixels in the image frame with the brightness of less than or greater than a certain threshold to the number of all pixels in the image frame, thereby determining whether the illumination condition at the time of acquiring the image frame is darker or lighter.

In one embodiment of the present application, after the corrected translational pose is obtained, a subsequent anti-shake operation (such as smoothing the corrected translational pose, image deformation, etc.) may be performed continuously, and a processed image frame may be finally obtained. It can be understood that the accuracy of the corrected translational pose can be determined by determining the quality of the processed image frame.

In one embodiment of the application, the corrected panning gestures are subjected to smoothing processing to obtain smoothed panning gestures, then image deformation information is obtained according to the smoothed panning gestures, then image deformation processing and shearing processing are performed on the image sequence according to the image deformation information, and finally video (image frames) is output. It will be appreciated that the image deformation information includes image deformation vectors that are used to represent the offset of the image sequence.

It should be noted that, the corrected translational pose may be represented as a vector, this vector may represent the translation of the corrected image frame on the two-dimensional plane, and illustratively, the vector is represented as u, u may be a directional line segment on the two-dimensional cartesian coordinate system, two unit vectors i, j having directions identical to the x-axis and the y-axis on the two-dimensional cartesian coordinate system are taken as a set of bases, and as known from the plane vector basic theorem, there are and only a pair of real numbers (a, b) may be used to represent the coordinates of the vector, so that u=ai+bj. Since the camera continues to acquire image frames during the shooting process, the processing of the acquired image frames continues, and correspondingly the vector also changes continuously, that is, the coordinates (a, b) of the vector also change continuously, two curves can be generated to respectively represent the change conditions of a and b, and the smoothing process can be performed To include smoothing the two curves to obtain vector u ₁ 。

It can be understood that the smoothed translation pose includes the smoothed vector u ₁ Can be based on the vector u ₁ The coordinates of (2) obtain image deformation information, and image deformation processing and cutting processing are carried out on the image sequence according to the image deformation information, specifically, after smoothing processing, vectors u at different moments after processing can be obtained ₁ I.e. the vector u at different moments can be obtained ₁ Which may or may not be the same as the coordinates of the vector u at the corresponding moment before the smoothing process, may be passed through the vector u ₁ Obtaining image deformation information, wherein the image deformation information can comprise image deformation vectors, the image deformation vectors are used for representing the offset of image frames, the image frames are translated in the direction opposite to the direction of the image deformation vectors and the same size, then the image frames are sheared, the mirror-out parts are removed, and the processed image frames are obtained and output.

The embodiment of the application also provides a video anti-shake method, referring to fig. 5, fig. 5 is a flow chart of another video anti-shake method provided by the embodiment of the application, and the method includes but is not limited to the following steps:

S501: data is acquired.

Specifically, the terminal may acquire data through the first sensor and the second sensor, and the detailed content may refer to step S401, which is not described herein.

S502: a first translational and rotational pose is determined.

Specifically, a first panning position and a first rotation position are obtained based on the event camera data and the image sequence. It can be appreciated that the first panning gesture may be determined by the feature point method in VO, and the specific reference may be made to step S402, which is not described herein. In addition, the first rotation pose can be determined through a feature Point method, that is, a feature Point pair after matching is obtained through feature Point extraction and matching, so that the first rotation pose can be determined, and the first rotation pose can be determined through solving a PnP (perfect-n-Point) problem, which describes how to estimate the pose of a camera when n 3D space points and their projection positions are known, in one embodiment of the present application, the PnP problem can be solved through an iterative method, and the PnP problem can be solved by directly calling a library function "solvePnP" of OpenCV, so that the first rotation pose is determined.

S503: a second translational pose is determined.

Specifically, the feature point extraction and matching are performed on the corresponding image frames in the image sequence, so as to determine the second translational pose, and the detailed content can refer to step S403, which is not described herein again.

S504: and correcting the second translational posture.

Specifically, the second panning posture is corrected according to the first panning posture, and the detailed content can refer to step S404, which is not described herein.

S505: and smoothing the first rotation gesture.

In one embodiment of the application, the first rotational gesture may comprise a first rotational vector, which may comprise three elements, which may be an angular velocity about the x-axis, an angular velocity about the y-axis, and an angular velocity about the z-axis. Since the camera continuously acquires the image frames during the shooting process, the processing of the acquired image frames is also continuously performed, and accordingly, the first rotation pose is also continuously changed, that is, the first rotation vector is also continuously changed, three curves can be generated to respectively represent the change condition of the angular velocity, and the smoothing process can include smoothing the three curves. It can be understood that the smoothed translation pose includes the smoothed first rotation vector, image deformation information can be obtained according to the smoothed first rotation vector, and image deformation processing and shearing processing are performed on the image sequence according to the image deformation information, specifically, after the smoothing processing, the processed first rotation vectors at different moments can be obtained, that is, angular velocities around x, y and z axes at different moments can be obtained.

In another embodiment of the application, the first rotational gesture may comprise a first rotational matrix, a first rotational gestureThe transformation matrix can be expressed asEuler angles and corresponding quaternions can be calculated through the first rotation matrix. It will be appreciated that differentiating the first rotation matrix may result in a first rotation vector.

S506: and carrying out smoothing treatment on the corrected horizontal displacement gesture.

It can be appreciated that the specific content can refer to step S404, and will not be described herein.

S507: and performing image deformation processing.

It can be appreciated that the image deformation processing for the translated pose after the smoothing processing can refer to step S404, which is not described herein.

In addition, the smoothed first rotation vector may be subjected to projective transformation, so as to obtain image deformation information, where the image deformation information may include an image deformation vector, where the image deformation vector is used to represent an offset of an image frame, and the image frame is translated in a direction opposite to and the same as the direction of the image deformation vector, and then sheared to remove a portion of the image frame that is removed from the mirror.

S508: an image frame is output.

Specifically, the image frame after the image distortion processing is output.

The embodiment of the application also provides a video anti-shake method, please refer to fig. 6, fig. 6 is a flow chart of another video anti-shake method provided by the embodiment of the application, the method includes but is not limited to the following steps:

S601: data is acquired.

Specifically, the terminal may acquire data through the first sensor and the second sensor, the data including the first sensor data and the second sensor data; the first sensor data and the second sensor data are used for pose estimation; the pose is used for representing the conversion relation between the world coordinate system and the camera coordinate system; the first sensor data includes event camera data; the second sensor data includes an image sequence and IMU data.

It is understood that the first sensor includes, but is not limited to, an event camera, and the second sensor includes, but is not limited to, a camera head and an IMU; the event camera data includes an event stream representing a change in brightness of an image pixel; the image sequence comprises a series of image frames; the IMU data may include a rotation angle of an x-axis, a rotation angle of a y-axis, and a rotation angle of a z-axis.

S602: a second rotational position is determined.

Specifically, a second rotation pose is obtained according to IMU data, where the second rotation pose includes a second rotation vector, and similar to the first rotation vector, the second rotation vector may include three elements, which may be an angular velocity about an x-axis, an angular velocity about a y-axis, and an angular velocity about a z-axis, and it is understood that although the meaning represented by the three elements of the second rotation vector is the same as the meaning represented by the three elements of the first rotation vector, the specific values of the three elements of the second rotation vector are not necessarily the same as the specific values of the three elements of the first rotation vector.

S603: a first translational and rotational pose is determined.

It can be appreciated that the specific content can refer to step S502, and will not be described herein.

S604: a second translational pose is determined.

Specifically, the feature point extraction and matching are performed on the corresponding image frames in the image sequence, so as to determine the second translational pose, and the detailed content can refer to step S403 and step S503, which are not described herein again.

S605: and correcting the second rotation gesture.

Specifically, determining an outer product of the first rotation vector and the second rotation vector to obtain a first error vector; and carrying out compensation processing on the second rotation vector according to the first error vector to obtain a first rotation correction pose.

In one embodiment of the application, the first rotation vector is denoted as V ₁ The second rotation vector is denoted as V ₂ Calculate V ₁ And V ₂ To obtain a first error vector V ₃ ＝V ₁ ×V ₂ The first error vector reflects the angle change and is used for the proportional gain K _P And integral gain K _I To compensate for the second rotation vector, i.e. to correct the angular velocity about the x, y, z axes, in particular for V ₁ And V ₂ Normalization is performed, and when the included angle of the two vectors is small, the included angle is approximately equal to the sine value of the included angle, so that V ₃ I is proportional to the angle, a proportional gain K can be used _P And integral gain K _I Multiplied by |V ₃ I to effect compensation of the second rotation vector, e.g. calculate K _P ·|V ₃ |+K _I ·∫|V ₃ And (3) correcting the angular velocities around the x, y and z axes represented by the second rotation vector by using the value to obtain a first rotation correction pose (for example, adding the angular velocities around the x, y and z axes represented by the second rotation vector to obtain a first rotation correction vector, wherein the first rotation correction pose comprises the first rotation correction vector).

S606: and correcting the second translational posture.

Specifically, the second panning posture is corrected according to the first panning posture, and the details of step S404 and step S504 are referred to herein and will not be described in detail.

S607: and carrying out smoothing treatment on the corrected rotation gesture.

It is understood that the corrected rotational pose is a first rotational corrected pose, which may include a fourth rotational vector, which may include three elements, which may be an angular velocity about the x-axis, an angular velocity about the y-axis, and an angular velocity about the z-axis. Because the camera continuously acquires the image frames during the shooting process, the terminal can also continuously process the acquired image frames, and correspondingly, the first rotation correcting pose can also continuously change, that is, the fourth rotation vector can also continuously change, three curves can be generated to respectively represent the change condition of the angular velocity, and the smoothing process can comprise smoothing the three curves.

It can be understood that after the first rotation correcting pose is subjected to the smoothing treatment, a fourth rotation vector after the smoothing treatment can be included, image deformation information can be obtained according to the fourth rotation vector after the smoothing treatment, and image deformation treatment and shearing treatment are performed on the image sequence according to the image deformation information.

S608: and carrying out smoothing treatment on the corrected horizontal displacement gesture.

It can be appreciated that the specific content can refer to step S404 and step S506, and will not be described herein.

S609: and performing image deformation processing.

It can be appreciated that the specific content may refer to step S404 and step S507, and will not be described herein.

S610: an image frame is output.

Specifically, the image frame after the image distortion processing is output.

The embodiment of the application also provides a video anti-shake method, referring to fig. 7, fig. 7 is a flow chart of another video anti-shake method provided by the embodiment of the application, and the method includes but is not limited to the following steps:

s701: data is acquired.

It is understood that the terminal may acquire data through the first sensor and the second sensor, the data including the first sensor data and the second sensor data; the first sensor data and the second sensor data are used for pose estimation; the pose is used for representing the conversion relation between the world coordinate system and the camera coordinate system; the first sensor data includes event camera data; the second sensor data includes an image sequence and depth data.

It is understood that the first sensor includes, but is not limited to, an event camera, and the second sensor includes, but is not limited to, a camera and a depth sensor; the event camera data includes an event stream representing a change in brightness of an image pixel; the image sequence comprises a series of image frames; the depth data may comprise depth information, which in one embodiment of the application comprises a depth map.

S702: a third rotational pose is determined.

Specifically, based on the depth data and the image sequence, a third rotation pose may be obtained through VO (refer to step S502), which includes a third rotation vector, which may include three elements, which may be an angular velocity about the x-axis, an angular velocity about the y-axis, and an angular velocity about the z-axis, similar to the first rotation vector and the second rotation vector, and it is understood that although the three elements of the three rotation vectors represent the same meaning, the specific values of the three elements of the three rotation vectors are not necessarily the same.

S703: a first translational and rotational pose is determined.

It can be appreciated that the specific content may refer to step S502 and step S603, and will not be described herein.

S704: a second translational pose is determined.

Specifically, feature point extraction and matching are performed on the corresponding image frames in the image sequence, so as to determine the second translational pose, and details can refer to step S403, step S503 and step S604, which are not described herein again.

S705: and correcting the third rotation gesture.

Specifically, determining an outer product of the first rotation vector and the third rotation vector to obtain a second error vector; and carrying out compensation processing on the third rotation vector according to the second error vector to obtain a second rotation correcting pose. It is understood that the correction process may refer to step S605, and will not be described herein.

S706: and correcting the second translational posture.

Specifically, the second panning posture is corrected according to the first panning posture, and the details of the correction can refer to step S404 and step S504 and step S606, which are not described herein.

S707: and carrying out smoothing treatment on the corrected rotation gesture.

It can be understood that the corrected rotation pose is the second rotation correction pose, and the second rotation correction pose is smoothed, and the specific content can refer to step S607, which is not described herein.

S708: and carrying out smoothing treatment on the corrected horizontal displacement gesture.

It can be appreciated that the specific content can refer to step S404, step S506 and step S608, and will not be described herein.

S709: and performing image deformation processing.

It can be appreciated that the specific details refer to step S404, step S507 and step S609, and will not be described herein.

S710: an image frame is output.

Specifically, the image frame after the image distortion processing is output.

The embodiment of the application also provides a video anti-shake method, please refer to fig. 8, fig. 8 is a flow chart of another video anti-shake method provided by the embodiment of the application, the method includes but is not limited to the following steps:

s801: data is acquired.

It is understood that the terminal may acquire data through the first sensor and the second sensor, the data including the first sensor data and the second sensor data; the first sensor data and the second sensor data are used for pose estimation; the pose is used for representing the conversion relation between the world coordinate system and the camera coordinate system; the first sensor data includes event camera data; the second sensor data includes an image sequence, IMU data, and depth data.

It is understood that the first sensor includes, but is not limited to, an event camera, and the second sensor includes, but is not limited to, a camera head, an IMU, and a depth sensor; the event camera data includes an event stream representing a change in brightness of an image pixel; the image sequence comprises a series of image frames; the IMU data may include a rotation angle of an x-axis, a rotation angle of a y-axis, and a rotation angle of a z-axis; the depth data may comprise depth information, which in one embodiment of the application comprises a depth map.

S802: a second rotational position is determined.

It can be appreciated that the specific content can refer to step S602, and will not be described herein.

S803: a third rotational pose is determined.

It can be appreciated that the specific content can refer to step S702, and will not be described herein.

S804: a first translational and rotational pose is determined.

It can be understood that the specific content refers to step S502, step S603 and step S703, which are not described herein.

S805: a second translational pose is determined.

Specifically, the feature point extraction and matching are performed on the corresponding image frames in the image sequence, so as to determine the second translational pose, and the details of the step S403, the step S503, the step S604, and the step S704 may be referred to, which are not described herein.

S806: and correcting the second rotation position and the third rotation position.

Specifically, according to the first rotation position and the second rotation position, correcting the second rotation position and the third rotation position to obtain a first rotation correction position and a second rotation correction position respectively; and according to the first rotation correcting pose and the second rotation correcting pose, obtaining a third rotation correcting pose. It can be appreciated that the specific content refers to step S605 and step S705, and will not be described herein.

In one embodiment of the present application, the outer product of the first rotation vector and the second rotation vector may be determined to obtain a first error vector, and the outer product of the first rotation vector and the third rotation vector may be determined to obtain a second error vector, which may be obtained by using the proportional gain K as described in step S605 _P And integral gain K _I Multiplied by |V ₃ I to effect compensation of the second rotation vector, e.g. calculate K _P ·|V ₃ |+K _I ·∫|V ₃ The value of i and taking it as the compensation value, similarly, the third rotation vector is denoted V ₄ The proportional gain K can be utilized _P And integral gain K _I Multiplied by |V ₄ I to effect compensation of the third rotation vector, e.g. calculate K _P ·|V ₄ |+K _I ·∫|V ₄ The value of i and take it as the compensation value. It will be appreciated that the resulting first rotational corrected pose may include the compensation value (K _P ·|V ₃ |+K _I ·∫|V ₃ I) and a second rotation vector, the second rotation-corrected pose may include the above-described compensation value (K) _P ·|V ₄ |+K _I ·∫|V ₄ I) and the third rotation vector. In addition, according to the first rotation correction pose and the second rotation correction pose, a third rotation correction pose is obtained, the third rotation correction pose may include a combination relationship of the two compensation values, the second rotation vector and/or the third rotation vector, for example, the two compensation values may be added, and then, angular velocities about x, y and z axes represented by the second rotation vector are corrected, or angular velocities about x, y and z axes represented by the third rotation vector are corrected, so as to obtain a third rotation correction vector, and it is understood that the third rotation correction pose includes the third rotation correction vector.

The correction may be one of mathematical operations such as addition, subtraction, multiplication, division, integration, differentiation, and the like, or may be a combination of a plurality of mathematical operations.

S807: and correcting the second translational posture.

Specifically, the second panning posture is corrected according to the first panning posture, and the details of the correction may refer to step S404 and step S504, step S606 and step S706, which are not described herein.

S808: and carrying out smoothing treatment on the corrected rotation gesture.

It can be understood that the corrected rotation pose is the second rotation correction pose, and the second rotation correction pose is smoothed, and the specific content may refer to step S607 and step S707, which are not described herein.

S809: and carrying out smoothing treatment on the corrected horizontal displacement gesture.

It can be appreciated that the specific content can refer to step S404, step S506, step S608 and step S708, and will not be described herein.

S810: and performing image deformation processing.

It can be appreciated that the specific details refer to step S404, step S507, step S609 and step S709, and will not be described herein.

S811: an image frame is output.

Specifically, the image frame after the image distortion processing is output.

In order to better implement the above-mentioned scheme of the embodiment of the present application, correspondingly, the following description describes related devices implemented in a matching manner, fig. 9 is a schematic hardware structure of a terminal provided by the embodiment of the present application, and the terminal shown in fig. 9 may be applied to the video anti-shake system architecture shown in fig. 2, or may be used to implement the video anti-shake methods shown in fig. 3, fig. 4, fig. 5, fig. 6, fig. 7 and fig. 8.

It will be appreciated that the terminal shown in fig. 9 may have more or fewer components than shown, may combine two or more components, or may have a different configuration of components. The various components shown in fig. 9 may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.

The terminal may include: processor 910, external memory interface 920, internal memory 921, universal serial bus (universal serial bus, USB interface 930, charge management module 940, power management module 941, battery 942, antenna 1, antenna 2, mobile communication module 950, wireless communication module 960, audio module 970, speaker 970A, receiver 970B, microphone 970C, headset interface 970D, sensor module 980, keys 990, motor 991, indicator 992, camera 993, display 994, and user identification module (subscriber identification module, SIM) card interface 995, etc. where sensor module 980 may include pressure sensor 980A, air pressure sensor 980B, magnetic sensor 980C, acceleration sensor 980D, distance sensor 980E, proximity sensor 980F, fingerprint sensor 980G, temperature sensor 980H, touch sensor 980I, ambient light sensor 980J, bone conduction sensor 980K, depth sensor 980L, inertial sensor (Inertial Measurement Unit, IMU) 980M, event 980N, etc.

It will be appreciated that the structure illustrated in the embodiments of the present application does not constitute a specific limitation on the terminal. In other embodiments of the application, the terminal may include more or less components than illustrated, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The processor 910 may include one or more processing units such as, for example: the processor 910 may include an application processor (Application Processor, AP), a modem processor, a graphics processor (Graphics Processing unit, GPU), an image signal processor (Image Signal Processor, ISP), a controller, a memory, a video codec, a digital signal processor (Digital Signal Processor, DSP), a baseband processor, and/or a Neural network processor (Neural-network Processing Unit, NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors.

The controller can be a neural center and a command center of the terminal. The controller can generate operation control signals according to the instruction operation codes and the time sequence signals to finish the control of instruction fetching and instruction execution.

In the present embodiment provided by the present application, the terminal may execute the video anti-shake method shown in fig. 3, 4, 5, 6, 7 and 8 through the processor 910.

A memory may also be provided in the processor 910 for storing instructions and data. In some embodiments, the memory in the processor 910 is a cache memory. The memory may hold instructions or data that the processor 910 has just used or recycled. If the processor 910 needs to reuse the instruction or data, it may be called directly from the memory. Repeated accesses are avoided and the latency of the processor 910 is reduced, thereby improving the efficiency of the system.

In some embodiments, processor 910 may include one or more interfaces. The interfaces may include an integrated circuit (Inter-Integrated Circuit, I2C) interface, an integrated circuit built-in audio (Inter-Integrated Circuit Sound, I2S) interface, a pulse code modulation (Pulse Code Modulation, PCM) interface, a universal asynchronous receiver Transmitter (Universal Asynchronous Receiver/Transmitter, UART) interface, a mobile industry processor interface (Mobile Industry Processor Interface, MIPI), a General-Purpose Input/output (GPIO) interface, a subscriber identity module (Subscriber Identity Module, SIM) interface, and/or a universal serial bus (Universal Serial Bus, USB) interface, among others.

The I2C interface is a bi-directional synchronous Serial bus, comprising a Serial Data Line (SDA) and a Serial clock Line (Serial Clock Line, SCL). In some embodiments, the processor 910 may include multiple sets of I2C buses. The processor 910 may be coupled to the touch sensor 980I, charger, flash, camera 993, etc., respectively, via different I2C bus interfaces. For example: the processor 910 may be coupled to the touch sensor 980I through an I2C interface, so that the processor 910 and the touch sensor 980I communicate through an I2C bus interface to implement a touch function of the terminal.

The I2S interface may be used for audio communication. In some embodiments, the processor 910 may include multiple sets of I2S buses. The processor 910 may be coupled to the audio module 970 by an I2S bus to enable communication between the processor 910 and the audio module 970. In some embodiments, the audio module 970 may communicate audio signals to the wireless communication module 960 through an I2S interface to implement a function of answering a phone call through a bluetooth headset.

PCM interfaces may also be used for audio communication to sample, quantize and encode analog signals. In some embodiments, the audio module 970 and the wireless communication module 960 may be coupled through a PCM bus interface. In some embodiments, the audio module 970 may also communicate audio signals to the wireless communication module 960 through a PCM interface to enable answering a call through a bluetooth headset. Both the I2S interface and the PCM interface may be used for audio communication.

The UART interface is a universal serial data bus for asynchronous communications. The bus may be a bi-directional communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, a UART interface is typically used to connect the processor 910 with the wireless communication module 960. For example: the processor 910 communicates with a bluetooth module in the wireless communication module 960 through a UART interface to implement a bluetooth function. In some embodiments, the audio module 970 may communicate audio signals to the wireless communication module 960 through a UART interface to implement a function of playing music through a bluetooth headset.

The MIPI interface may be used to connect the processor 910 with peripheral devices such as a display 994, a camera 993, and the like. The MIPI interfaces include camera serial interfaces (Camera Serial Interface, CSI), display serial interfaces (Display Serial Interface, DSI), and the like. In some embodiments, the processor 910 and the camera 993 communicate through a CSI interface to implement a photographing function of the terminal. The processor 910 and the display 994 communicate via a DSI interface to implement the display functions of the terminal.

The GPIO interface may be configured by software. The GPIO interface may be configured as a control signal or as a data signal. In some embodiments, a GPIO interface may be used to connect the processor 910 with the camera 993, display 994, wireless communication module 960, audio module 970, sensor module 980, and the like. The GPIO interface may also be configured as an I2C interface, an I2S interface, a UART interface, an MIPI interface, etc.

The SIM interface may be used to communicate with the SIM card interface 995 to perform functions of transferring data to or reading data from the SIM card.

The USB interface 930 is an interface conforming to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, or the like. The USB interface 930 may be used to connect a charger to charge a terminal, or may be used to transfer data between the terminal and a peripheral device. And can also be used for connecting with a headset, and playing audio through the headset. The interface may also be used to connect other electronic devices, such as AR devices, etc.

It should be understood that the connection relationship between the modules illustrated in the embodiment of the present application is only illustrative, and does not limit the structure of the terminal. In other embodiments of the present application, the terminal may also use different interfacing manners in the foregoing embodiments, or a combination of multiple interfacing manners.

The charge management module 940 is configured to receive a charge input from a charger. The charger can be a wireless charger or a wired charger.

The power management module 941 is used to connect the battery 942, the charge management module 940 and the processor 910. The power management module 941 receives input from the battery 942 and/or the charge management module 940 and provides power to the processor 910, the internal memory 921, the external memory, the display 994, the camera 993, the wireless communication module 960, and the like.

The wireless communication function of the terminal can be implemented by the antenna 1, the antenna 2, the mobile communication module 950, the wireless communication module 960, a modem processor, a baseband processor, and the like.

The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in the terminal may be used to cover a single or multiple communication bands. Different antennas may also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed into a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

The mobile communication module 950 may provide a solution for wireless communication including 2G/3G/4G/5G or the like applied on the terminal. The mobile communication module 950 may include at least one filter, switch, power amplifier, low noise amplifier (low noise amplifier, LNA), etc. The mobile communication module 950 may receive electromagnetic waves from the antenna 1, perform processes such as filtering, amplifying, and the like on the received electromagnetic waves, and transmit the processed electromagnetic waves to the modem processor for demodulation. The mobile communication module 950 can amplify the signal modulated by the modem processor, and convert the signal into electromagnetic waves through the antenna 1 to radiate the electromagnetic waves. In some embodiments, at least some of the functional modules of the mobile communication module 950 may be provided in the processor 910. In some embodiments, at least some of the functional modules of the mobile communication module 950 may be provided in the same device as at least some of the modules of the processor 910.

The modem processor may include a modulator and a demodulator. The modulator is used for modulating the low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then transmits the demodulated low frequency baseband signal to the baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then transferred to the application processor. The application processor outputs sound signals through an audio device (not limited to speaker 970A, receiver 970B, etc.), or displays images or video through display 994. In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be provided in the same device as the mobile communications module 950 or other functional modules, independent of the processor 910.

The wireless communication module 960 may provide solutions for wireless communication including wireless local area network (Wireless Local Area Networks, WLAN) (e.g., wireless fidelity (Wireless Fidelity, wi-Fi) network), bluetooth (BT), global navigation satellite system (Global Navigation Satellite System, GNSS), frequency modulation (Frequency Modulation, FM), near field wireless communication technology (Near Field Communication, NFC), infrared technology (IR), etc. as applied on a terminal. The wireless communication module 960 may be one or more devices that integrate at least one communication processing module. The wireless communication module 960 receives electromagnetic waves via the antenna 2, modulates the electromagnetic wave signals, filters the electromagnetic wave signals, and transmits the processed signals to the processor 910. The wireless communication module 960 may also receive a signal to be transmitted from the processor 910, frequency modulate it, amplify it, and convert it to electromagnetic waves for radiation via the antenna 2.

In some embodiments, the terminal's antenna 1 and mobile communication module 950 are coupled, and the antenna 2 and wireless communication module 960 are coupled, so that the terminal can communicate with a network and other devices through wireless communication techniques. The wireless communication techniques may include the Global System for Mobile communications (Global System for Mobile Communications, GSM), general packet radio service (General Packet Radio Service, GPRS), code division multiple access (Code Division Multiple Access, CDMA), wideband code division multiple access (Wideband Code Division Multiple Access, WCDMA), time division code division multiple access (Time-Division Code Division Multiple Access, TD-SCDMA), long term evolution (Long Term Evolution, LTE), BT, GNSS, WLAN, NFC, FM, and/or IR techniques, among others. The GNSS may include a global satellite positioning system (Global Positioning System, GPS), a global navigation satellite system (Global Navigation Satellite System, GLONASS), a beidou satellite navigation system (BeiDou Navigation Satellite System, BDS), a Quasi zenith satellite system (Quasi-Zenith Satellite System, QZSS) and/or a satellite based augmentation system (Satellite Based Augmentation Systems, SBAS).

The terminal implements display functions through the GPU, the display 994, and the application processor, etc. The GPU is a microprocessor for image processing, and is connected to the display 994 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 910 may include one or more GPUs that execute program instructions to generate or change display information.

The display 994 is used to display images, videos, and the like. The display 994 includes a display panel. The display panel may employ a liquid crystal display (Liquid Crystal Display, LCD), an Organic Light-Emitting Diode (OLED), an Active-matrix Organic Light-Emitting Diode (AMOLED), a flexible Light-Emitting Diode (Flex Light-Emitting Diode), miniLED, microLED, micro-OLED, a quantum dot Light-Emitting Diode (Quantum Dot Light Emitting Diodes, QLED), or the like. In some embodiments, the terminal may include 1 or N displays 994, N being a positive integer greater than 1.

The terminal may implement shooting functions through an ISP, a camera 993, a video codec, a GPU, a display 994, an application processor, and the like.

The ISP is used to process the data fed back by the camera 993. For example, when photographing, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electric signal, and the camera photosensitive element transmits the electric signal to the ISP for processing and is converted into an image visible to naked eyes. ISP can also optimize the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, an ISP may be provided in the camera 993.

The camera 993 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (Charge Coupled Device, CCD) or a Complementary Metal Oxide Semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, which is then transferred to the ISP to be converted into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into an image signal in a standard RGB, YUV, or the like format. In some embodiments, the terminal may include 1 or N cameras 993, N being a positive integer greater than 1.

In an embodiment of the present application, the image sequence may be acquired by the camera 993 for subsequent pose estimation.

The digital signal processor is used for processing digital signals, and can process other digital signals besides digital image signals. For example, when the terminal selects a frequency bin, the digital signal processor is used to fourier transform the frequency bin energy, etc.

Video codecs are used to compress or decompress digital video. The terminal may support one or more video codecs. In this way, the terminal may play or record video in multiple encoding formats, such as: dynamic picture experts group (Moving Picture Experts Group, MPEG) 1, MPEG2, MPEG3, MPEG4, etc.

The NPU is a Neural-Network (NN) computing processor, and can rapidly process input information by referencing a biological Neural Network structure, for example, referencing a transmission mode between human brain neurons, and can also continuously perform self-learning. Applications such as intelligent cognition of the terminal can be realized through the NPU, for example: image recognition, face recognition, speech recognition, text understanding, etc.

The external memory interface 920 may be used to connect an external memory card, such as a Micro SD card, to implement the memory capability of the extension terminal. The external memory card communicates with the processor 910 through an external memory interface 920 to implement data storage functions. For example, files such as music, video, etc. are stored in an external memory card.

The internal memory 921 may be used to store computer-executable program code including instructions. The processor 910 executes various functional applications of the terminal and data processing by executing instructions stored in the internal memory 921. The internal memory 921 may include a stored program area and a stored data area. The storage program area may store an operating system, an application required for at least one function (such as a face recognition function, a fingerprint recognition function, a mobile payment function, etc.), and the like. The storage data area may store data created during use of the terminal (e.g., face information template data, fingerprint information templates, etc.), and so on. In addition, the internal memory 921 may include a high-speed random access memory, and may further include a nonvolatile memory such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (Universal Flash Storage, UFS), and the like.

The terminal may implement audio functions by an audio module 970, speaker 970A, receiver 970B, microphone 970C, earphone interface 970D, and an application processor, etc. Such as music playing, recording, etc.

The audio module 970 is used to convert digital audio information to an analog audio signal output and also to convert an analog audio input to a digital audio signal. The audio module 970 may also be used to encode and decode audio signals. In some embodiments, the audio module 970 may be disposed in the processor 910 or some functional modules of the audio module 970 may be disposed in the processor 910.

Speaker 970A, also known as a "horn," is configured to convert audio electrical signals into sound signals. The terminal may listen to music through the speaker 970A or to hands-free conversations.

A receiver 970B, also known as a "earpiece," is used to convert an audio electrical signal into an acoustic signal. When the terminal receives a call or voice message, it can receive voice by placing the receiver 970B close to the human ear.

Microphone 970C, also known as a "microphone" or "microphone," is used to convert acoustic signals into electrical signals. When making a call or transmitting voice information, the user can sound near the microphone 970C through the mouth, inputting an acoustic signal to the microphone 970C. The terminal may be provided with at least one microphone 970C. In other embodiments, the terminal may be provided with two microphones 970C, which may also perform a noise reduction function in addition to collecting sound signals. In other embodiments, the terminal may further be provided with three, four or more microphones 970C to collect sound signals, reduce noise, identify the source of sound, implement directional recording functions, etc.

The earphone interface 970D is for connecting a wired earphone. The earphone interface 970D may be a USB interface 930 or a 3.5mm open mobile electronic device platform (Open Mobile Terminal Platform, OMTP) standard interface, a american cellular telecommunications industry association (Cellular Telecommunications Industry Association of the USA, CTIA) standard interface.

The pressure sensor 980A is configured to sense a pressure signal and convert the pressure signal into an electrical signal. In some embodiments, the pressure sensor 980A may be disposed on the display 994. The pressure sensor 980A is of a wide variety, such as a resistive pressure sensor, an inductive pressure sensor, a capacitive pressure sensor, and the like. The capacitive pressure sensor may be a capacitive pressure sensor comprising at least two parallel plates with conductive material. When a force is applied to the pressure sensor 980A, the capacitance between the electrodes changes. The terminal determines the strength of the pressure from the change in capacitance. When a touch operation is applied to the display 994, the terminal detects the intensity of the touch operation based on the pressure sensor 980A. The terminal may also calculate the location of the touch based on the detection signal of the pressure sensor 980A. In some embodiments, touch operations that act on the same touch location, but at different touch operation strengths, may correspond to different operation instructions. For example: and executing an instruction for checking the short message when the touch operation with the touch operation intensity smaller than the first pressure threshold acts on the short message application icon. And executing an instruction for newly creating the short message when the touch operation with the touch operation intensity being greater than or equal to the first pressure threshold acts on the short message application icon.

The air pressure sensor 980B is for measuring air pressure. In some embodiments, the terminal calculates altitude from barometric pressure values measured by barometric pressure sensor 980B, aiding in positioning and navigation.

The magnetic sensor 980C includes a hall sensor. The terminal may detect the opening and closing of the flip holster using the magnetic sensor 980C. In some embodiments, when the terminal is a flip machine, the terminal may detect the opening and closing of the flip according to the magnetic sensor 980C. And then according to the detected opening and closing state of the leather sheath or the opening and closing state of the flip, the characteristics of automatic unlocking of the flip and the like are set.

The acceleration sensor 980D may detect the magnitude of the acceleration of the terminal in various directions (typically three axes). The magnitude and direction of gravity can be detected when the terminal is stationary. The electronic equipment gesture recognition method can also be used for recognizing the gesture of the electronic equipment, and is applied to horizontal and vertical screen switching, pedometers and other applications.

A distance sensor 980E for measuring distance. The terminal may measure the distance by infrared or laser. In some embodiments, the terminal may range using the distance sensor 980E to achieve fast focusing.

The proximity light sensor 980F may include, for example, a Light Emitting Diode (LED) and a light detector, such as a photodiode. The light emitting diode may be an infrared light emitting diode. The terminal emits infrared light outwards through the light emitting diode. The terminal uses a photodiode to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the terminal. When insufficient reflected light is detected, the terminal may determine that there is no object in the vicinity of the terminal. The terminal can detect that the user holds the terminal close to the ear to talk by using the proximity light sensor 980F so as to automatically extinguish the screen to achieve the purpose of saving electricity. The proximity light sensor 980F can also be used in holster mode, pocket mode to automatically unlock and lock the screen.

The fingerprint sensor 980G is used to collect a fingerprint. The terminal can utilize the fingerprint characteristic of gathering to realize fingerprint unblock, visit application lock, fingerprint is photographed, fingerprint answer incoming call etc..

The temperature sensor 980H is for detecting temperature. In some embodiments, the terminal performs a temperature processing strategy using the temperature detected by temperature sensor 980H. For example, when the temperature reported by the temperature sensor 980H exceeds a threshold, the terminal performs a reduction in performance of a processor located near the temperature sensor 980H in order to reduce power consumption to implement thermal protection. In other embodiments, the terminal heats the battery 942 when the temperature is below another threshold to avoid an abnormal shutdown of the terminal due to low temperatures. In other embodiments, when the temperature is below a further threshold, the terminal performs boosting of the output voltage of the battery 942 to avoid abnormal shutdown caused by low temperatures.

Touch sensor 980I, also referred to as a "touch panel". The touch sensor 980I may be disposed on the display 994, and the touch sensor 980I and the display 994 form a touch screen, which is also referred to as a "touch screen". The touch sensor 980I is for detecting a touch operation acting on or near it. The touch sensor may communicate the detected touch operation to the application processor to determine the touch event type. Visual output related to touch operations may be provided through the display 994. In other embodiments, the touch sensor 980I can also be positioned on the surface of the terminal in a different location than the display 994.

The ambient light sensor 980J is for sensing ambient light level. The terminal may adaptively adjust the brightness of the display 994 based on the perceived ambient light level. The ambient light sensor 980J may also be used to automatically adjust white balance when taking a photograph. The ambient light sensor 980J may also cooperate with the proximity light sensor 980F to detect if the terminal is in a pocket to prevent false touches.

The depth sensor 980L is mainly used to detect depth information, and the depth sensor 980L mainly includes a camera matrix sensor, a TOF sensor, and a structured light sensor.

In the embodiment of the present application, depth data may be acquired by the depth sensor 980L, and pose estimation may be performed according to the depth data and the image sequence acquired by the camera 993.

The IMU980M is a sensor that is primarily used to detect and measure acceleration and rotational motion, and its principle is implemented using the law of inertia. The most basic inertial sensors include accelerometers and angular velocity meters (gyroscopes), which are the core components of the inertial system and are the primary factors affecting the performance of the inertial system. The gyro sensor may be used to determine a motion gesture of the terminal. In some embodiments, the angular velocity of the terminal about three axes (i.e., x, y, and z axes) may be determined by a gyroscopic sensor. The gyro sensor may be used for photographing anti-shake. For example, when the shutter is pressed down, the gyroscope sensor detects the shake angle of the terminal, calculates the distance to be compensated by the lens module according to the angle, and enables the lens to counteract the shake of the terminal through reverse movement, thereby realizing anti-shake. The gyroscopic sensor may also be used to navigate, somatosensory a game scene.

In the embodiment of the application, IMU data (such as angular velocity) can be obtained through the IMU980M, and pose estimation is performed according to the IMU data.

The event camera 980N, also known as a neuromorphic camera, a silicon retinal or dynamic vision sensor, is an imaging sensor capable of responding to local brightness changes. As a new sensor, the difference between the event camera and the traditional camera is very obvious, and the event camera outputs the change condition of the brightness of the pixels instead of the whole image.

In an embodiment of the present application, event camera data (such as an event stream) may be acquired by the event camera 980N, and pose estimation may be performed according to the event camera data and the image sequence acquired by the camera 993, so as to obtain a pose with high dynamic and high frame rate.

The keys 990 include a power-on key, a volume key, etc. The keys 990 may be mechanical keys. Or may be a touch key. The terminal may receive key inputs, generating key signal inputs related to user settings of the terminal as well as function controls.

The motor 991 may generate a vibratory alert. The motor 991 may be used for incoming call vibration alerting as well as for touch vibration feedback. For example, touch operations acting on different applications (e.g., photographing, audio playing, etc.) may correspond to different vibration feedback effects. The motor 991 may also correspond to different vibration feedback effects by touch operations applied to different areas of the display screen 994. Different application scenarios (such as time reminding, receiving information, alarm clock, game, etc.) can also correspond to different vibration feedback effects. The touch vibration feedback effect may also support customization.

The indicator 992 may be an indicator light, which may be used to indicate a state of charge, a change in charge, an indication message, a missed call, a notification, or the like.

The SIM card interface 995 is used to connect a SIM card. The SIM card may be inserted into the SIM card interface 995 or withdrawn from the SIM card interface 995 to enable contact and separation with the terminal. The terminal may support 1 or N SIM card interfaces, N being a positive integer greater than 1. SIM card interface 995 may support Nano-SIM cards, micro SIM cards, and the like. The same SIM card interface 995 may be used to insert multiple cards simultaneously. The types of the plurality of cards may be the same or different. The SIM card interface 995 may also be compatible with different types of SIM cards. SIM card interface 995 may also be compatible with external memory cards. The terminal interacts with the network through the SIM card to realize the functions of communication, data communication and the like.

The layered architecture divides the software into several layers, each with distinct roles and branches. The layers communicate with each other through a software interface. In some embodiments, the system is divided into four layers, from top to bottom, an application layer, an application framework layer, runtime (run time) and system libraries, and a kernel layer, respectively.

The application layer may include a series of application packages.

As shown in fig. 10, the application package may include applications (also referred to as applications) such as cameras, gallery, calendar, phone calls, maps, navigation, WLAN, bluetooth, music, video, short messages, etc.

The application framework layer provides an application programming interface (Application Programming Interface, API) and programming framework for application programs of the application layer. The application framework layer includes a number of predefined functions.

As shown in fig. 10, the application framework layer may include a window manager, a content provider, a view system, a phone manager, a resource manager, a notification manager, and the like.

The window manager is used for managing window programs. The window manager can acquire the size of the display screen, judge whether a status bar exists, lock the screen, intercept the screen and the like.

The content provider is used to store and retrieve data and make such data accessible to applications. The data may include video, images, audio, calls made and received, browsing history and bookmarks, phonebooks, etc.

The view system includes visual controls, such as controls to display text, controls to display pictures, and the like. The view system may be used to build applications. The display interface may be composed of one or more views. For example, a display interface including a text message notification icon may include a view displaying text and a view displaying a picture.

The telephony manager is used to provide the communication functions of the terminal. Such as the management of call status (including on, hung-up, etc.).

The resource manager provides various resources for the application program, such as localization strings, icons, pictures, layout files, video files, and the like.

The notification manager allows the application to display notification information in a status bar, can be used to communicate notification type messages, can automatically disappear after a short dwell, and does not require user interaction. Such as notification manager is used to inform that the download is complete, message alerts, etc. The notification manager may also be a notification presented in the form of a chart or scroll bar text in the system top status bar, such as a notification of a background running application, or a notification presented on a screen in the form of a dialog interface. For example, a text message is prompted in a status bar, a prompt tone is emitted, the electronic device vibrates, and an indicator light blinks, etc.

The Runtime (run time) includes core libraries and virtual machines. Run time is responsible for scheduling and management of the system.

The core library consists of two parts: one part is the function that the programming language (e.g., java language) needs to call, and the other part is the core library of the system.

The application layer and the application framework layer run in a virtual machine. The virtual machine executes the programming files (e.g., java files) of the application layer and the application framework layer as binary files. The virtual machine is used for executing the functions of object life cycle management, stack management, thread management, security and exception management, garbage collection and the like.

The system library may include a plurality of functional modules. For example: surface Manager (Surface Manager), media library (Media Libraries), three-dimensional graphics processing library (e.g., openGL ES), two-dimensional graphics engine (e.g., SGL), etc.

The surface manager is used to manage the display subsystem and provides a fusion of two-Dimensional (2D) and three-Dimensional (3D) layers for multiple applications.

Media libraries support a variety of commonly used audio, video format playback and recording, still image files, and the like. The media library may support a variety of audio and video encoding formats, such as MPEG4, h.264, MP3, AAC, AMR, JPG, PNG, etc.

The three-dimensional graphic processing library is used for realizing 3D graphic drawing, image rendering, synthesis, layer processing and the like.

The 2D graphics engine is a drawing engine for 2D drawing.

The kernel layer is a layer between hardware and software. The kernel layer at least comprises a display driver, a camera driver, an audio driver, a sensor driver and a virtual card driver.

The workflow of the device software and hardware is illustrated below in connection with capturing a photo scene.

When touch sensor 980I receives a touch operation, a corresponding hardware interrupt is issued to the kernel layer. The kernel layer processes the touch operation into the original input event (including information such as touch coordinates, time stamp of touch operation, etc.). The original input event is stored at the kernel layer. The application framework layer acquires an original input event from the kernel layer, and identifies a control corresponding to the input event. Taking the touch operation as a touch click operation, taking a control corresponding to the click operation as an example of a control of a camera application icon, the camera application calls an interface of an application framework layer, starts the camera application, further starts a camera driver by calling a kernel layer, and captures a still image or video through a camera 993.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.

It should be understood that the first, second, third, fourth and various numerical numbers referred to herein are merely descriptive convenience and are not intended to limit the scope of the application.

It should be understood that the term "and/or" is merely an association relationship describing the associated object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.

It should also be understood that, in various embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present application.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.

In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The steps in the method of the embodiment of the application can be sequentially adjusted, combined and deleted according to actual needs.

The modules in the device of the embodiment of the application can be combined, divided and deleted according to actual needs.

The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the application.

Claims

1. A method for video anti-shake, the method comprising:

acquiring data, the data comprising first sensor data and second sensor data; the first sensor data and the second sensor data are used for pose estimation; the pose is used for representing the conversion relation between the world coordinate system and the camera coordinate system; the first sensor data includes event camera data; the second sensor data comprises a sequence of images; the image sequence comprises a series of image frames;

based on the event camera data and the image sequence, obtaining a first panning position comprising: reconstructing corresponding image frames in an image sequence according to the event camera data to obtain a reconstructed image, and obtaining a first panning position according to the reconstructed image, wherein the first panning position is used for representing the panning of adjacent reconstructed images;

Obtaining a second translational pose according to the image sequence, including: obtaining a second translational pose according to the corresponding image frames in the image sequence, wherein the second translational pose is used for representing the translation of adjacent image frames in the image sequence;

performing scene judgment on corresponding image frames in the image sequence to obtain correction weights, wherein the correction weights comprise a combination relation of the first flat shift gesture and the second flat shift gesture;

based on the first translational pose, correcting the second translational pose to obtain a corrected translational pose, including: based on the correction weight and the first translational pose, correcting the second translational pose to obtain a corrected translational pose;

and carrying out anti-shake operation on the corrected translation pose to obtain the translation pose after the anti-shake operation, processing the image frame through the translation pose after the anti-shake operation to obtain a processed image frame, and outputting the processed image frame.

2. The method of claim 1, wherein prior to said correcting said second translational pose in accordance with said first translational pose, said method further comprises:

Inputting the event camera data and the image sequence into a scene judgment model, and determining correction weight according to the scene judgment model; the corrective weights include a combined relationship of the first and second translational gestures.

3. The method of claim 1 or 2, wherein the method further comprises:

performing smoothing treatment on the corrected translational pose to obtain a smoothed translational pose;

obtaining image deformation information according to the smoothed translation pose; the image deformation information comprises an image deformation vector; the image deformation vector is used for representing the offset of the image frame;

performing image deformation processing on the image frame according to the image deformation information to obtain a processed image frame;

and outputting the processed image frame.

4. A method according to any of claims 1-3, wherein the second sensor data further comprises IMU data, the method further comprising:

obtaining a first rotation gesture based on the event camera data and the image sequence; the first rotational gesture includes a first rotational vector;

obtaining a second rotation pose according to the IMU data; the second rotational gesture includes a second rotational vector;

And correcting the second rotation posture according to the first rotation posture to obtain a first rotation correction posture.

5. The method of claim 4, wherein said correcting said second rotational position from said first rotational position comprises:

determining an outer product of the first rotation vector and the second rotation vector to obtain a first error vector;

and carrying out compensation processing on the second rotation vector according to the first error vector.

6. A method according to any of claims 1-3, wherein the second sensor data further comprises depth data, the method further comprising:

obtaining a third rotation pose according to the depth data and the image sequence; the third rotational gesture includes a third rotational vector;

and correcting the third rotation pose according to the first rotation pose to obtain a second rotation correction pose.

7. The method of claim 4 or 5, wherein the second sensor data further comprises depth data, the method further comprising:

correcting the third rotation pose according to the first rotation pose to obtain a second rotation correction pose;

and obtaining a third rotation correcting pose according to the first rotation correcting pose and the second rotation correcting pose.

8. The method of claim 6 or 7, wherein said correcting said third rotational position from said first rotational position comprises:

determining an outer product of the first rotation vector and the third rotation vector to obtain a second error vector;

and carrying out compensation processing on the third rotation vector according to the second error vector.

9. An electronic device comprising a processor, a memory and a communication interface, the memory coupled to the communication interface, the memory for storing computer program code, the computer program code comprising computer instructions, the processor for invoking the computer instructions to implement the method of any of claims 1-8.

10. A computer readable storage medium comprising instructions which, when run on an electronic device, cause the electronic device to perform the method of any of claims 1-8.