WO2022227040A1

WO2022227040A1 - Video stability augmentation method, imaging apparatus, handheld gimbal, movable platform and storage medium

Info

Publication number: WO2022227040A1
Application number: PCT/CN2021/091620
Authority: WO
Inventors: 李路; 唐克坦; 李广; 朱传杰; 邹文; 于雄飞
Original assignee: 深圳市大疆创新科技有限公司
Priority date: 2021-04-30
Filing date: 2021-04-30
Publication date: 2022-11-03

Abstract

A video stability augmentation method, an imaging apparatus, a handheld gimbal, a movable platform and a storage medium. The method comprises: acquiring image sequences, which are collected by an imaging apparatus; identifying, on the basis of a pose sensor, an image jitter in the image sequences that is caused by the high-frequency jitter of the imaging apparatus; identifying, on the basis of the relative movement between the image sequences, the image jitter in the image sequences that is caused by the low-frequency shaking of the imaging apparatus; performing image stability augmentation on the image sequences according to the high-frequency jitter and the low-frequency shake, so as to eliminate the image jitter from the image sequences that is caused by the high-frequency jitter and the low-frequency shake of the imaging apparatus. By means of the present embodiment, the complete stability of an image sequence is realized.

Description

Video stabilization method, imaging device, handheld pan/tilt, movable platform and storage medium

technical field

The present application relates to the technical field of image processing, and in particular, to a video stabilization method, an imaging device, a handheld PTZ, a movable platform, and a storage medium.

Background technique

At present, most imaging devices (such as motion cameras) are equipped with electronic image stabilization algorithms. Since the Electronic Image Stabilization (EIS) performs stabilization processing on the videos (or images) obtained by the imaging devices, users can watch Relatively smooth video screen. Among them, the electronic anti-shake algorithm mainly refers to the use of forcibly increasing the photosensitive parameters of the photosensitive element (Charge coupled Device, CCD) on the imaging device to speed up the shutter, and analyzes the image obtained on the CCD, and then uses the edge image to compensate for anti-shake. technology. However, the video or image after electronic anti-shake processing still has the phenomenon of picture shaking.

SUMMARY OF THE INVENTION

In view of this, one of the objectives of the present application is to provide a video stabilization method, an imaging device, a handheld pan/tilt, a movable platform and a storage medium.

In a first aspect, an embodiment of the present application provides a video stabilization method, including:

acquiring a sequence of images acquired by the imaging device;

Identifying, based on an attitude sensor, image shake in the image sequence caused by high-frequency shake of the imaging device;

Identifying screen shake in the image sequence due to low-frequency shaking of the imaging device based on relative motion between the image sequences;

Image stabilization is performed on the image sequence according to the high-frequency jitter and the low-frequency jitter, so as to eliminate picture jitter in the image sequence caused by the high-frequency jitter and the low-frequency jitter of the imaging device.

In a second aspect, an embodiment of the present application provides an imaging device, including an image sensor and one or more processors;

the image sensor is used to acquire a sequence of images;

The one or more processors are individually or collectively configured to:

In a third aspect, an embodiment of the present application provides a handheld pan/tilt head, including an attitude sensor and the imaging device according to the second aspect; wherein, the attitude sensor is used to collect attitude data of the imaging device.

In a fourth aspect, an embodiment of the present application provides a movable platform, including:

body;

a power system, mounted on the body, for driving the movable platform to move;

The imaging device of the second aspect;

and an attitude sensor, which is installed on the body and used to collect attitude data of the imaging device.

In a fifth aspect, an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores executable instructions, and when the executable instructions are executed by a processor, the method according to the first aspect is implemented .

In the video stabilization method provided by the embodiment of the present application, considering the picture jitter problem in the image sequence caused by the low-frequency jitter of the imaging device, in addition to using the identified high-frequency jitter to perform image stabilization , and further identify the low-frequency shaking based on the relative motion between the image sequences, so as to use the identified low-frequency shaking to perform image stabilization, and eliminate the high-frequency shaking and low-frequency shaking of the imaging device in the image sequence. The picture shakes, and the complete stabilization of the image sequence is achieved.

Description of drawings

In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the drawings that are used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative labor.

Fig. 1 and Fig. 6 are different schematic diagrams of the pan/tilt provided by the embodiment of the present application;

2 and 7 are schematic diagrams of a movable platform provided by an embodiment of the present application;

FIG. 3 and FIG. 4 are different schematic flowcharts of the video stabilization method provided by the embodiment of the present application;

FIG. 5 is a schematic structural diagram of an imaging device provided by an embodiment of the present application.

Detailed ways

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, but not all of the embodiments.

The inventor found that the video or image after electronic anti-shake processing in the related art still has the phenomenon of picture shaking. The reason is: EIS electronic anti-shake technology is based on the data collected by the attitude sensor to eliminate the picture shaking phenomenon; The accuracy of the sensor is limited, and it is impossible to collect attitude data about various shaking conditions of the imaging device. For example, the attitude sensor can collect attitude data about high-frequency shaking of the imaging device, but cannot collect attitude data about low-frequency shaking of the imaging device. As a result, the video or image after the electronic anti-shake processing still has the phenomenon of picture shaking; second, in order to eliminate noise, after a frame of image is exposed, the EIS algorithm collects the posture of the imaging device from the posture sensor for a period of time before the exposure time. The data is subjected to low-pass filtering. At this time, the attitude data related to the low-frequency shaking of the imaging device will also be filtered out, resulting in the video or image after the electronic anti-shake processing.

Wherein, the high-frequency shaking can be considered as the imaging device moving back and forth around a certain position, and the frequency of shaking occurs in a unit time is high; the low-frequency shaking can be considered as the imaging device shifting in a certain direction , the frequency of shaking that occurs per unit time is low, and the low-frequency shaking is usually generated by the imaging device involuntary (or unconsciously).

In view of the problems in the related art, an embodiment of the present application provides a video stabilization method. After acquiring an image sequence collected by an imaging device, the image sequence caused by the high-frequency jitter of the imaging device can be identified based on an attitude sensor. image jitter in the image sequence, and identify the image jitter in the image sequence caused by the low-frequency shaking of the imaging device based on the relative motion between the image sequences, and finally according to the high-frequency jitter and the low-frequency shaking The image sequence is subjected to image stabilization, so as to eliminate the image shake caused by the high-frequency shake and the low-frequency shake of the imaging device in the image sequence. In this embodiment, considering the problem of picture jitter in the image sequence caused by the low-frequency shaking of the imaging device, in addition to using the identified high-frequency jitter to perform image stabilization, it is further based on the difference between the image sequences. The relative motion of the imaging device is used to identify the low-frequency shaking, so as to use the identified low-frequency shaking to perform image stabilization, eliminate the picture shaking caused by the high-frequency shaking and low-frequency shaking of the imaging device in the image sequence, and realize the complete stabilization of the image sequence. .

Wherein, the video stabilization method may be applied to an imaging device, and the imaging device may be a physical imaging device. The imaging device may be configured to detect electromagnetic radiation (eg, visible light, infrared light, and/or ultraviolet light) and generate image data based on the detected electromagnetic radiation. Imaging devices may include charge coupled device (CCD) sensors or complementary metal oxide semiconductor (CMOS) sensors that generate electrical signals in response to wavelengths of light. The resulting electrical signals can be processed to generate image data. The image data generated by the imaging device may include one or more images, which may be still images (eg, photographs), moving images (eg, videos), or a suitable combination thereof. Image data may be multi-color (eg, RGB, CMYK, HSV) or monochrome (eg, grayscale, black and white, sepia). The imaging device may include a lens configured to direct light onto the image sensor.

The imaging device may be a camera. The camera may be a motion camera or a video camera that captures motion image data (eg, video). The camera may be a still camera that captures still images (eg, photographs). The camera can capture both moving image data and still images. The camera can switch between capturing moving image data and still images. While certain embodiments provided herein are described in the context of a camera, it is to be understood that the present application may apply to any suitable imaging device, and that any description of a camera herein may also apply to any suitable imaging device. imaging devices, and any descriptions herein regarding cameras may also apply to other types of imaging devices. A camera may be used to generate multiple 2D images of a 3D scene (eg, an environment, one or more objects, etc.). These images generated by the camera can represent the projection of the 3D scene onto the 2D image plane. Therefore, each point in the 2D image corresponds to a coordinate in 3D space in the scene. A camera may include optical elements (eg, lenses, mirrors, filters, etc.). Cameras can capture color images, grayscale images, infrared images, and more. The camera may be a thermal imaging device when it is configured to capture infrared images.

The imaging device may capture an image or series of images with a particular image resolution. In some embodiments, the image resolution may be defined by the number of pixels in the image. In some embodiments, the image resolution may be greater than or equal to about 352x420 pixels, or 720x480 pixels, or the like. In some embodiments, the camera may be a 4K camera or a camera with a higher resolution.

The imaging device may capture a series of images at a particular capture rate. In some embodiments, the series of images can be captured at a standard video frame rate, such as about 24p, 30p, 48p, 72p, 120p, 50i, or 60i. In some embodiments, the series of images can be captured at a rate of less than or equal to about one image every 0.0001 seconds, 0.002 seconds, 0.01 seconds, 0.1 seconds, 5 seconds, or 10 seconds. In some embodiments, the capture rate may vary based on user input and/or external conditions (eg, rain, snow, wind, subtle surface textures of the environment).

The imaging device may have a number of adjustable parameters. Imaging devices may capture different images with different parameters when subjected to the same external conditions (e.g., location, lighting). Adjustable parameters may include exposure (eg, exposure time, shutter speed, aperture, film speed), gain, brightening factor (gamma), region of interest, binning/subsampling, pixel clock, offset, trigger, ISO, and the like. Exposure-related parameters can control the amount of light reaching an image sensor in an imaging device. For example, shutter speed controls the amount of time light hits the image sensor, while aperture controls the amount of light that reaches the image sensor in a given time. A gain-related parameter can control the amplification of the signal from the optical sensor. ISO controls the level of sensitivity of the camera to the available light.

In some embodiments, the imaging device may be a handheld device or the imaging device may also be mounted on a movable platform. Exemplarily, in a handheld device scenario, the imaging device may be a handheld camera or the imaging device may be installed on a handheld pan/tilt, for example, the imaging device may be detachably connected to the handheld pan/tilt, or the imaging device may be detachably connected to the handheld pan/tilt. The imaging device is integrally formed with the handheld pan/tilt, which is not limited in this embodiment.

Wherein, the movable platform may be a self-propelled vehicle. The vehicle may traverse the environment by means of one or more propulsion units. The vehicle may be an air vehicle, a land vehicle, a water vehicle or a space vehicle. The vehicle may be an unmanned vehicle. The vehicle may be able to traverse the environment without a human occupant on it. Alternatively, the vehicle may carry a human occupant. Exemplarily, the movable platform includes, but is not limited to, an unmanned aerial vehicle (UAV), an unmanned vehicle, an unmanned vessel, or a mobile robot, and the like.

In an exemplary application scenario, the imaging device 10 may be a handheld device, for example, the imaging device 10 may be a handheld camera or (as shown in FIG. 1 ) the imaging device 10 may be mounted on a handheld platform 20 . Wherein, the hand-held camera or the hand-held pan/tilt 20 may be installed with an attitude sensor, and the attitude sensor may collect attitude data corresponding to the high-frequency shaking of the imaging device 10 in a short period of time, and the attitude data may be used for The picture shake caused by the high frequency shake of the imaging device 10 in the image sequence is eliminated. In addition, when the user holds the handheld device to capture images or videos, the part holding the handheld device (such as the hand) may shake unintentionally at a low frequency, causing the imaging device 10 to shift, thereby causing The image sequence captured by the imaging device 10 has screen shake; therefore, after the imaging device 10 captures the image sequence, it can identify the image caused by the low-frequency shaking of the imaging device 10 according to the relative motion between the image sequences The picture shaking in the sequence can further eliminate the picture shaking caused by the low-frequency shaking of the imaging device 10 in the image sequence.

In another exemplary application scenario, as shown in FIG. 2 ( FIG. 2 takes a drone as an example), the imaging device 10 may be installed on a movable platform 30 , for example, the imaging device 10 is fixedly installed on the The movable platform 30, or the imaging device 10 is mounted on the movable platform 30 through a pan/tilt head. The movable platform 30 may be installed with an attitude sensor, and the attitude sensor may collect attitude data corresponding to the high-frequency shaking of the imaging device 10 in a short period of time, and the attitude data may be used to eliminate the image sequence The picture jitter caused by the high-frequency jitter of the imaging device 10. In addition, during the movement of the movable pan/tilt head, the imaging device 10 may shake unintentionally at a low frequency due to the operation of its own power system or due to natural factors (such as wind or rain). or the imaging device 10 is shifted due to the unintentional low-frequency shaking of the pan/tilt head, which in turn causes the image sequence captured by the imaging device 10 to appear jittery; therefore, the imaging device 10 is collecting After the image sequence, the image shake in the image sequence caused by the low-frequency shaking of the imaging device 10 can be identified according to the relative motion between the image sequences, and then the image sequence caused by the low-frequency shaking of the imaging device 10 can be eliminated. caused the screen to shake.

Next, the video stabilization method provided by the embodiment of the present application will be described: please refer to FIG. 3 , which provides a schematic flowchart of a video stabilization method. The method can be applied to an imaging device, and the method includes:

In step S101, an image sequence acquired by an imaging device is acquired.

In step S102, the image shake in the image sequence caused by the high-frequency shake of the imaging device is identified based on the attitude sensor.

In step S103, screen shake in the image sequence caused by low-frequency shaking of the imaging device is identified based on the relative motion between the image sequences.

In step S104, image stabilization is performed on the image sequence according to the high-frequency jitter and the low-frequency jitter, so as to eliminate the image jitter caused by the high-frequency jitter and the low-frequency jitter of the imaging device in the image sequence.

For step S101, the image sequence includes multiple images, and the image sequence may be a sequence composed of images that are being collected by the imaging device, or may be multiple images that have been collected by the imaging device.

Wherein, there is a preset overlap ratio between adjacent images in the image sequence. The images in the image sequence may be images currently collected by the imaging device, or images collected by the imaging device at a certain point in time in the past. After acquiring the image sequence, the imaging device may identify, based on the gesture data about the imaging device collected by the gesture sensor, the picture shaking in the image sequence caused by the high-frequency shaking of the imaging device; and, based on The relative motion between the image sequences identifies picture shake in the image sequence due to low frequency shaking of the imaging device. It can be understood that this embodiment does not impose any restrictions on the processing sequence of step S102 and step S103; for example, in the case of sufficient processing resources, the imaging device may execute steps S102 and S103 in parallel; in the case of insufficient processing resources Next, step S102 may be processed first and then step S103 may be processed, or step S103 may be processed first and then step S102 may be processed.

Wherein, the high-frequency jitter may include jitter caused by high-frequency rotation and/or jitter caused by high-frequency translation, that is, the imaging device may rotate or translate with high frequency in a unit time, and this high-frequency jitter may Captured by the attitude sensor. The attitude sensor can be any combination of accelerometer, gyroscope, gravity detection sensor, inertial measurement unit (IMU) and/or compass; an axis electronic compass; the attitude sensor is used to collect attitude data of the imaging device.

The low-frequency shaking includes shaking caused by low-frequency translation, that is, the imaging device may undergo low-frequency translation in a certain direction. The gesture sensor captures or may be filtered out in the data processing stage, so this embodiment captures such low-frequency shaking through relative motion between image sequences.

In some embodiments, the images in the image sequence have a first offset corresponding to the low-frequency shaking and a second offset corresponding to the high-frequency shaking, and the imaging device can An offset and the second offset are used to perform stabilization processing on the images in the image sequence, so as to eliminate the image jitter caused by the high-frequency jitter and low-frequency jitter of the imaging device, so as to realize the Full stabilization of the image.

In some embodiments, considering that the higher the zoom factor of the imaging device, the more pixels in the image sequence are shifted due to the low-frequency shaking of the imaging device; while the zoom factor of the imaging device is lower. In the case of the image sequence, there are fewer pixels shifted due to low-frequency shaking of the imaging device. Based on the principle of human eye vision, if only a small number of pixels are offset, the human eye may not be very sensitive to this. Therefore, in the case of limited processing resources of the imaging device, in low magnification scenarios, it may not be considered The picture jitter in the image sequence caused by the low-frequency shaking of the imaging device is only in a high-magnification scene, when the current zoom factor of the imaging device is higher than the preset factor, because there are more pixels. The visual perception of the user is affected, so the imaging device recognizes the picture shake in the image sequence caused by the low-frequency shaking of the imaging device based on the relative motion between the image sequences, and further eliminates the image caused by the low-frequency shaking. The screen shaking caused by shaking, that is, the offset problem of pixels in the image is corrected, thereby improving the user's visual perception.

In some embodiments, it is considered that in some special scenarios, the picture shake in the image sequence caused by the low-frequency shaking of the imaging device is generated when the imaging device is operated autonomously, such as in the imaging device. In the process of zooming by the device, or in the process of changing the orientation of the imaging device (or in the case of the imaging device being controlled to move), the image sequence caused by the low-frequency shaking of the imaging device in the above scenario The picture jitter in the image is not generated unconsciously, but is caused by the imaging device being autonomously changing the shooting parameters (zoom, orientation, etc.), then the imaging device may not recognize the image sequence caused by the low-frequency shaking of the imaging device. The screen shakes, so as to ensure the correct presentation of the captured images.

In addition, if the imaging device is in the target following mode, that is, when the image sequence collected by the imaging device is used for target tracking, the tracked target is usually a moving target. , usually the imaging device needs to change its orientation or zoom, etc. At this time, the low-frequency shaking and/or high-frequency shaking of the imaging device causes the pixel shift between the images because of the follow-up shooting of the target. In this case, you can The image sequence is not stabilized, thereby ensuring the correct presentation of the acquired images.

In some embodiments, after identifying the picture shake in the image sequence caused by the low-frequency shaking of the imaging device based on the relative motion between the image sequences, the imaging device may be controlled to correct its own posture according to the low-frequency shaking , for example, the pixels in the image mentioned above will be offset due to the low-frequency shaking, the imaging device can obtain the first offset corresponding to the low-frequency shaking, and then control the Imaging device movement. In this embodiment, the posture of the imaging device is corrected by the first offset caused by the low-frequency shaking, so that the imaging device is more stable, which is beneficial to ensure the stable acquisition of the image sequence.

In some scenarios, the imaging device is mounted on a pan/tilt, for example, the imaging device is mounted on a handheld pan/tilt or the imaging device is mounted on a movable platform through the pan/tilt, in which case the imaging device can be adjusted by adjusting the The posture of the gimbal can be used to correct the posture of the imaging device, that is, the gimbal can be controlled to correct its posture according to the low-frequency shaking. For example, the first offset corresponding to the low-frequency shaking can be obtained, and then the The first offset controls the movement of the gimbal. More specifically, the first offset corresponding to the low-frequency shaking can be converted into a data format (such as a quaternion) that the gimbal can read, and then the converted The first offset is sent to the gimbal, so that the gimbal can move its own gimbal axis according to the obtained converted first offset (such as a quaternion) to realize the attitude correction process, so that the cloud The stage and the imaging device are more stable, which is beneficial to ensure the stable acquisition of the image sequence.

In some embodiments, the pixels of the images in the image sequence have a first offset due to the low-frequency jitter and a second offset due to the high-frequency jitter; please refer to FIG. 4 , which is a schematic diagram of FIG. Another schematic flowchart of a video stabilization method provided by an application embodiment, the method can be executed by an imaging device, and the method includes:

In step S201, an image sequence acquired by an imaging device is acquired.

In step S202, for the images in the image sequence, perform motion estimation on the images according to preset key frames, and determine a first offset of the images relative to the key frames caused by the low-frequency shaking of the imaging device quantity.

In step S203, for the images in the image sequence, electronic anti-shake processing is performed on the images based on the attitude data of the imaging device collected by the attitude sensor, and the second image is obtained due to the high-frequency jitter of the imaging device. Offset.

In step S204, stabilization processing is performed on the images in the image sequence according to the first offset and the second offset.

For step S202, after acquiring the image sequence, the imaging device may perform motion estimation on the image according to a preset key frame, determine a first offset of the image relative to the key frame, and implement the use of The motion offset of the image (that is, the first offset) is calculated based on the global motion vector (GMV) in motion estimation (Motion Estimation, ME), so that the image is increased according to the first offset. The stabilization process improves the stability of the image and eliminates the image shake caused by the low-frequency shaking of the image.

Wherein, in the case of no key frame, the image can be used as a key frame, and the first offset of the image relative to the key frame is set to 0; in the case of a key frame, all The imaging device determines a first offset of the image relative to the key frame through a motion estimation process.

In some embodiments, in the process of acquiring the first offset, the imaging device may first acquire at least one key point in the key frame, and the key point is located at a first position in the key frame , then determine the second position of the at least one key point in the image, and finally determine the first position of the image relative to the key frame according to the difference between the first position and the second position Offset.

The first offset may include an offset along a specified direction, for example, the specified direction is at least one of a horizontal direction or a vertical direction in an image coordinate system. Exemplarily, the first offset may be represented by a coordinate mapping table, and the imaging device may obtain a coordinate mapping table describing the coordinate transformation of the pixels in the horizontal direction in the image and coordinates describing the coordinate transformation of the pixels in the vertical direction in the image. mapping table.

Wherein, at least one key point in the key frame can be obtained in the following ways:

Exemplarily, in the process of acquiring the key points, the key frame can be divided into several non-overlapping image blocks; then for each of the image blocks, the center point of the image block and/ Or at least one feature point with the largest gradient in the image block determines the key point; for example, both the center point of the image block and at least one feature point with the largest gradient in the image block can be used as the key point, or One can be randomly selected from the center point and the at least one feature point as the key point, and the feature point with the largest gradient in the image block can also be used as the key point to improve the accuracy of the key point; After at least one key point of the key frame, the imaging device records the position information of the first position of the key point in the key frame, for example, the position information includes the corresponding position of the key point in the key frame. Horizontal and vertical coordinates.

Exemplarily, feature point extraction may be performed on the key frame, for example, using FAST, SIFT, SURF, SUSAN or Harris detection algorithm to obtain feature points, and the extracted feature points are used as the key points, and the imaging device The position information of the first position of the key point is recorded based on the position of the feature point in the key frame.

Exemplarily, the key point may be obtained by uniform sampling in the key frame, and the imaging device records position information of the first position of the uniformly sampled key point in the key frame.

In some embodiments, after acquiring at least one key point in the key frame, the imaging device may acquire first feature information of the at least one key point in the key frame respectively; it can be understood that the The first feature information is information that can characterize the key point. This embodiment of the present application does not impose any restrictions on the specific content of the first feature information, and can be specifically set according to actual application scenarios.

Exemplarily, the first feature information may be gradient information or color information of the key point, or the like.

Exemplarily, the first feature information may include a feature vector obtained by projecting the key point by the imaging device, such as performing a horizontal projection or a vertical projection on the key point; specifically, the first feature The information may be a feature vector obtained by projecting a preset area including the key point along a specified direction. In an example, taking the specified direction as the horizontal direction and/or the vertical direction in the image coordinate system as an example, a one-dimensional feature vector obtained by projecting the preset area centered on the key point in the horizontal direction may be obtained and/or a one-dimensional feature vector obtained by projecting in the vertical direction; wherein, the shape and size of the preset area can be specifically set according to the actual application scenario, for example, the preset area is a rectangular area.

In addition, considering that if the image sequence is collected under the circumstance that the illumination changes greatly, the pixel value of the same key point in different image frames may be inconsistent, which may lead to the problem of matching failure. Therefore, in order to avoid or reduce the illumination changes, the Influence brought by, after obtaining the first feature information of the key points, the first feature information of the key points can be mapped to a preset range, and the preset range is used to make the The brightness variation range is the same, which is beneficial to remove the influence of illumination variation and is suitable for more complex scenes.

Exemplarily, take the first feature information including the feature vector obtained by the imaging device projecting the key point as an example: in order to avoid or reduce the influence of illumination changes, the key point can be projected to obtain The eigenvectors of are mapped to the preset range. In one example, the mean value of the feature vectors obtained by the projection of all key points may be obtained, and the feature vector of the key point mapped to a preset range is obtained by the difference between the feature vector obtained by the projection of each key point and the mean value. In another example, the feature vector obtained by the projection of the key points can be normalized, for example, the standard deviation of the feature vectors obtained by the projection of all key points can be obtained, and the feature vector obtained by the projection of each key point The ratio of the standard deviations is used to obtain the feature vectors that map the key points to a preset range.

In some embodiments, in the process of acquiring the first offset, it is necessary to determine the second position of at least one key point of the key frame in the image. Specifically, the imaging device may first A second position in the image of at least one key point in the key frame is estimated using a preset image transformation relationship. Wherein, considering that the difference between the image and the image of the previous frame is usually relatively small, the second position of the key point in the image may be similar to the position of the key point in the image of the previous frame, so , the preset image conversion relationship may be determined according to the positional relationship between the first position of the key point in the key frame and the position of the key point in the previous frame of image. In addition, when the image is used as a key frame, the preset image conversion relationship is a unit matrix.

After acquiring the estimated second position of the at least one key point in the image, the imaging device may acquire second features of the at least one key point under different offsets based on the estimated second position information, and further for each key point, the second feature information under different offsets can be matched with the first feature information respectively, and then according to the second feature information matched with the first feature information The second position is adjusted by the corresponding offset, so as to obtain the accurate second position of the key point in the image.

Exemplarily, the different offsets refer to different distances by which the key points are offset along a specified direction, for example, the key points may be shifted by a specified number of unit lengths in turn along the horizontal direction and/or the vertical direction, For example, the key points can be shifted by 1 unit length, 2 unit lengths, 3 unit lengths... The different offsets may be different distances by which the key points are offset along a specified direction within a preset offset range, and the preset offset range may be specifically set according to the actual application scenario, which is not done in this embodiment. any restrictions.

In the image, after the key points are shifted by different distances along a specified direction, the imaging device may calculate the second feature information of the shifted key points, that is, the key points are at different offsets. characteristic information under the quantity. The second feature information is obtained in the same manner as the first feature information. For example, the second feature information includes a feature vector obtained by projecting the shifted key points. Further, the The second feature information includes a feature vector obtained by projecting a preset area including the shifted key points along a specified direction, where the specified direction at least includes a horizontal direction and/or a vertical direction in image coordinates.

In addition, considering the influence of illumination changes, the second feature information may be mapped into a preset range, and the preset range is used to make the brightness change ranges of each of the key points the same. In one example, the feature vector obtained by projecting the shifted key points can be mapped to a preset range. In one example, under the same offset, the mean value of the feature vectors obtained by the projection of all the shifted key points can be obtained, and the difference between the feature vector obtained by the projection of each shifted key point and the mean value can be obtained . In another example, the feature vector obtained by the projection of the shifted key points may be normalized, for example, the standard of the feature vector obtained by the projection of all the shifted key points under the same offset may be obtained. difference, obtain the ratio of the feature vector obtained by the projection of each shifted key point to the standard deviation.

After acquiring the second feature information of the at least one key point under different offsets, for each of the key points, the imaging device compares the second feature information under the different offsets with the second feature information respectively. The first feature information is matched, and the differences between the second feature information under the different offsets and the first feature information are compared, and the difference between the second feature information and the first feature information is the smallest according to the second feature information. The estimated second position is adjusted by the corresponding offset, so as to obtain a relatively accurate second position of the key point in the image, and then the imaging device can be based on the first position of the key point and the adjusted first position. The positional relationship between the two positions is determined by using the principle of coordinate transformation to determine the first offset of the image relative to the key frame.

In some embodiments, after acquiring a relatively accurate second position of the key point in the image, the imaging device may perform the adjustment according to the first position of the at least one key point and the adjusted second position The positional relationship between them determines the image conversion relationship for the next frame of image.

In order to further improve the accuracy of the determined image conversion relationship for the next frame of image, the imaging device may filter valid key points from the at least one key point, and then select the valid key point according to the first position of the valid key point and The adjusted positional relationship between the second positions determines an image conversion relationship for the next frame of image.

Exemplarily, when determining the effective key points, the preset image conversion relationship may be used to determine the third position of the key point located at the adjusted second position in the key frame, and then determine each The displacement of the third position of the key point relative to the first position, the displacement of one of the key points (for example, a key point can be randomly selected) is determined as the target displacement, and the relative displacement of other key points is obtained. based on the error value of the target displacement, and then select from the at least one key point an effective key point with an error value less than a preset value; the above process of obtaining effective key points can be repeated until the number of iterations reaches a preset number, or Define the required number of iterations niters as follows,

Among them, conf represents the confidence level (for example, 0.99), pix_size is the total number of keypoints, and num is the number of valid keypoints.

Of course, the valid key points may also be acquired in other manners, which are not limited in this embodiment.

In some embodiments, when the number of valid key points is less than a preset number and/or when the first offset of the image relative to the key frame is greater than a preset offset threshold, it is indicated that the image If the difference from the key frame is large, the key frame may be replaced at this time, the image may be determined as the key frame, and the first offset of the image relative to the key frame is set to 0.

In some embodiments, considering that after the key frame is replaced, since the first offset is suddenly set to 0, the originally stable picture will change significantly, and the user will feel a sudden jump from one picture to another A screen change, the user's visual perception is not good. Therefore, in order to eliminate or reduce the influence of such changes, after acquiring the first offset, the imaging device may use the first offset of at least one stabilized image acquired before the key frame smooth the first offset of the image, and obtain the first offset after the image processing, so as to ensure the smooth transition of the video picture after switching the key frame, eliminate or reduce the above-mentioned change band The influence of the coming, makes the transition of the video picture more natural.

Exemplarily, in order to reduce the workload, all images may not be smoothed, but only a small number of images after the key frame is replaced, for example, 30 frames of images after the key frame need to be smoothed. , the image starting from the 31st frame after the key frame does not need to be smoothed. Or it can be said that if the image is acquired within a preset time period after the key frame, the first offset of the at least one stabilized image acquired before the key frame can be used to determine the value of the image. Smoothing is performed on the first offset, and the first offset after image processing is obtained.

Wherein, in order to further ensure the natural transition of the video picture, the difference between the acquisition time of the stabilized image and the acquisition time of the key frame should be as small as possible, that is, the acquisition time interval between the stabilized image and the key frame If it is less than the specified duration, the specified duration should be as small as possible, and the specified duration can be specifically set according to the actual application scenario; for example, the stabilized image includes the image of the previous frame of the key frame.

Exemplarily, the smoothing processing may include linear filtering processing, that is, linear filtering may be performed on the first offset of the image by using the first offset of the at least one stabilized image acquired before the key frame. processing, to obtain a processed first offset, and further, the processed first offset may be used to eliminate picture shake of the image caused by low-frequency shaking of the imaging device.

As an example, the linear filtering process may be performed in a weighted average manner, that is, the first offset after the image processing is the weighted average of the first offset of the stabilized image and the first offset of the image where, in order to gradually reduce the influence of the first offset of the stabilized image, as the acquisition time interval between the image and the key frame gradually increases, the stabilized image The weight coefficient corresponding to the first offset of the image may gradually decrease, and the weight coefficient corresponding to the first offset of the image may gradually increase, that is, the weight coefficient corresponding to the first offset of the stabilized image There is a negative correlation with the acquisition time interval, and a weight coefficient corresponding to the first offset of the image is in a positive correlation with the acquisition time interval.

In some embodiments, in order to further reduce the amount of calculation, before identifying the picture shake in the image sequence caused by the low-frequency shaking of the imaging device (or before obtaining the first offset), the The key frame and the image are down-sampled, and the down-sampling rate is recorded, and then the offset of the down-sampled image relative to the down-sampled key frame is calculated, and then the pre-recorded down-sampling rate is used to restore the sample The first offset is obtained from the offset of the post-image relative to the down-sampled key frame, so as to achieve the purpose of reducing the amount of calculation and improving the calculation efficiency, and can also meet the real-time requirements in some scenarios.

In some embodiments, the low-frequency shaking or the first offset generated by the low-frequency shaking may also be used to align the image with other images.

In step S102 , for the images in the image sequence, the imaging device may perform electronic anti-shake processing on the images based on the attitude data about the imaging device collected by the attitude sensor, and obtain a result of the high-frequency jitter of the imaging device. The generated second offset, so that stabilization processing is performed on the image based on the second offset to improve the stability of the image.

Wherein, performing electronic anti-shake processing on the image can be considered as aligning the image with its adjacent images, and then cropping off the uneven part of the image and the adjacent images; for example, after the image is exposed, The imaging device acquires the attitude data collected by the attitude sensor for a period of time before the exposure time. For example, the attitude data is expressed in the form of quaternions, and the imaging device can perform the attitude data expressed in the form of quaternions. Interpolate and convert it into a rotation matrix; then use the positional relationship between the attitude sensor and the imaging device to obtain the second offset according to the rotation matrix.

The second offset includes an offset relative to each pixel in the image, for example, the second offset can be represented by a coordinate mapping table, and each element in the coordinate mapping table represents an offset relative to the pixel in the image. The offset of each pixel in the image. In some embodiments, considering that the higher the zoom factor of the imaging device is, the more pixels in the image sequence are shaken due to the high-frequency dithering of the imaging device; stable, the imaging device can adjust the second offset according to its own current zoom factor, and the adjusted second offset is adapted to the number of pixels of the image under the current zoom factor, and the adjusted second offset is used. The second offset is used for stabilization processing, which is beneficial to improve the stability of images captured in high-magnification scenes.

In some embodiments, the second offset includes an offset along a specified direction, for example, the specified direction may be at least a horizontal direction and/or a vertical direction in image coordinates. Exemplarily, the second offset may be represented by a coordinate mapping table, and the imaging device may obtain a coordinate mapping table describing the coordinate transformation of the pixels in the horizontal direction in the image and coordinates describing the coordinate transformation of the pixels in the vertical direction in the image. mapping table.

In step S204, after acquiring the first offset and the second offset, the imaging device may, according to the first offset and the second offset, analyze the image Images in the sequence are stabilized. Exemplarily, a total offset of the image may be acquired according to the first offset and the second offset, and stabilization processing is performed on the image according to the total offset. Exemplarily, it is also possible to use the first offset to perform a first stabilization process on the image, and then use the second offset to perform a second stabilization process on the image after the first stabilization process. . Exemplarily, the first offset and the second offset may be transmitted to a distortion correction (GDC) module in the imaging device, and the image is stabilized by the distortion correction module. . In this embodiment, the image after stabilization processing eliminates the image jitter caused by the high-frequency jitter and the low-frequency jitter of the imaging device, and its stability is improved, which is beneficial to improve the user's visual perception.

Correspondingly, referring to FIG. 5 , an embodiment of the present application further provides an imaging device 10 , including an image sensor 11 and one or more processors 12 ;

The image sensor 11 is used to collect image sequences;

The one or more processors 12 are individually or collectively configured to:

Identifying the image shake in the image sequence caused by the high-frequency shake of the imaging device 10 based on the attitude sensor;

Identifying screen shake in the image sequence caused by low-frequency shaking of the imaging device 10 based on relative motion between the image sequences;

Image stabilization is performed on the image sequence according to the high-frequency jitter and the low-frequency jitter, so as to eliminate the image jitter caused by the high-frequency jitter and the low-frequency jitter of the imaging device 10 in the image sequence.

The processor 12 executes executable instructions included in the memory, and the executable instructions include instructions for executing the above-described video stabilization method. The processor 12 may be a central processing unit (Central Processing Unit, CPU), or other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor 12 may be any conventional processor or the like.

In one embodiment, the high-frequency jitter includes jitter caused by high-frequency rotation and/or jitter caused by high-frequency translation; and the low-frequency jitter includes jitter caused by low-frequency translation.

In one embodiment, the imaging device 10 is a handheld device, and the low-frequency shaking is caused by the shaking of the part where the user holds the handheld device; or, the imaging device 10 is mounted on a movable platform, and the low-frequency shaking is caused by The movable platform shakes during the movement.

In one embodiment, the higher the zoom factor of the imaging device 10 is, the more pixels in the image sequence are shifted due to the low-frequency shaking of the imaging device 10 .

In one embodiment, the processor 12 is further configured to: in the case that the current zoom factor of the imaging device 10 is higher than a preset factor, identify the cause of the imaging device 10 based on the relative motion between the image sequences. Picture shaking in the image sequence caused by low frequency shaking.

In an embodiment, the processor 12 is further configured to: during the process of zooming by the imaging device 10 or during the process of changing the orientation of the imaging device 10, not to identify the low frequency caused by the imaging device 10 Shaking in the image sequence caused by shaking.

In one embodiment, the processor 12 is further configured to: in the case of using the image sequence for target tracking, not to perform stabilization processing on the image sequence.

In an embodiment, the processor 12 is further configured to: control the imaging device 10 to correct its own posture according to the low-frequency shaking.

In an embodiment, the processor 12 is further configured to: acquire a first offset corresponding to the low-frequency shaking, and control the imaging device 10 to move according to the first offset.

In one embodiment, the imaging device 10 is installed on a pan/tilt head. The processor 12 is further configured to: obtain a first offset according to the low-frequency shaking, and control the movement of the gimbal according to the first offset.

In one embodiment, the images in the sequence of images correspond to a first offset due to the low-frequency shaking and a second offset corresponds to the high-frequency jitter.

The processor 12 is further configured to: perform stabilization processing on the images in the image sequence according to the first offset and the second offset.

In one embodiment, the first offset includes an offset in a specified direction; and/or the second offset includes an offset in a specified direction.

In an embodiment, the processor 12 is further configured to: for the images in the image sequence, perform electronic anti-shake processing on the images based on the attitude data of the imaging device 10 collected by the attitude sensor, and obtain the image factors. The second offset is generated by the high frequency jitter of the imaging device 10 .

In one embodiment, the second offset includes an offset for each pixel in the image.

The processor 12 is further configured to: adjust the second offset according to the current zoom factor of the imaging device 10; the adjusted second offset is adapted to the number of pixels of the image under the current zoom factor .

In one embodiment, the processor 12 is further configured to: for the images in the image sequence, perform motion estimation on the images according to preset key frames, and determine whether the images are caused by low-frequency shaking of the imaging device 10 . The first offset relative to the keyframe.

In one embodiment, in the absence of a key frame, the image is used as a key frame, and the first offset is set to 0.

In one embodiment, the processor 12 is further configured to: acquire at least one key point in the key frame, where the key point is located at a first position in the key frame; determine where the at least one key point is located; a second position in the image; and determining a first offset of the image relative to the key frame according to the difference between the first position and the second position.

In an embodiment, the processor 12 is further configured to: divide the key frame into several image blocks; for each image block, according to the center point of the image block and/or the At least one feature point with the largest gradient determines the keypoint.

In an embodiment, the processor 12 is further configured to: acquire first feature information of the at least one key point in the key frame respectively; use a preset image conversion relationship to estimate at least one of the key frames; The second position of the key point in the image; in the image, obtain the second feature information of the at least one key point under different offsets; for each of the key points, the different The second feature information under the offset is respectively matched with the first feature information, and the second position is adjusted according to the offset corresponding to the second feature information matched with the first feature information.

In one embodiment, the different offset amounts are different distances offset along a specified direction within a preset offset range.

The matched second feature information has the smallest difference from the first feature information.

In an embodiment, the first feature information includes: a feature vector obtained by projecting the key point; and/or the second feature information includes: a feature vector obtained by projecting the key point.

In an embodiment, the first feature information includes: a feature vector obtained by projecting a preset area including the key point along a specified direction; and/or the second feature information includes: The feature vector obtained by projecting the preset area of the keypoint along the specified direction.

In one embodiment, the specified direction includes at least a horizontal direction and/or a vertical direction in image coordinates.

In an embodiment, the feature vector is a feature vector mapped to a preset range, and the preset range is used to make the luminance variation range of each of the key points the same.

In an embodiment, the feature vector mapped to the preset range includes: the difference between the feature vector obtained by the projection and the mean value of the feature vector, or the result of normalization of the feature vector obtained by the projection.

In one embodiment, the preset image conversion relationship is determined according to a positional relationship between a first position of the key point in the key frame and a position of the key point in a previous frame of image.

In an embodiment, when the image is used as a key frame, the preset image conversion relationship is an identity matrix.

In one embodiment, the processor 12 is further configured to: determine an image conversion relationship for the next frame of image according to the positional relationship between the first position of the at least one key point and the adjusted second position .

In one embodiment, the processor 12 is further configured to: filter valid key points from the at least one key point; according to the positional relationship between the first position of the valid key point and the adjusted second position , to determine the image conversion relationship for the next frame of image.

In one embodiment, the processor 12 is further configured to: use the preset image conversion relationship to determine the third position of the key point at the adjusted second position in the key frame; for each key point, determine the displacement of the third position of the key point relative to the first position; determine the displacement of one of the key points as the target displacement, and obtain the displacement of other key points relative to the The error value of the target displacement; from the at least one key point, select an effective key point whose error value is less than the preset value.

In one embodiment, the processor 12 is further configured to: determine the image as a key frame when the number of valid key points is less than a preset number, and determine the first offset Set to 0.

In an embodiment, the processor 12 is further configured to: when the first offset is greater than a preset offset threshold, determine the image as a key frame, and convert the first offset into a key frame. The amount is set to 0.

In one embodiment, after the determining of the first offset of the image relative to the key frame, the processor 12 is further configured to: use at least one stabilized image acquired before the key frame The first offset of the image is processed by smoothing the first offset of the image to obtain the processed first offset of the image.

In an embodiment, when the image is acquired within a preset time period after acquiring the key frame, smoothing is performed on the first offset of the image.

In one embodiment, the acquisition time interval between the stabilized image and the key frame is less than a specified time period.

In one embodiment, the stabilized image includes a previous frame image of the key frame.

In one embodiment, the smoothing process includes a linear filtering process.

In one embodiment, the first offset after image processing is a weighted average result of the first offset of the stabilized image and the first offset of the image.

In an embodiment, as the acquisition time interval between the image and the key frame gradually increases, the weight coefficient corresponding to the first offset of the stabilized image gradually decreases, and the The weight coefficient corresponding to the first offset gradually increases.

In an embodiment, the processor 12 is further configured to: acquire a total offset of the image according to the first offset and the second offset, and pair the image according to the total offset. The image is stabilized.

In one embodiment, the processor 12 is further configured to: use the first offset to perform a first stabilization process on the image, and then use the second offset to perform a first stabilization process on the image. The resulting image is subjected to a second stabilization process.

In one embodiment, the image and the key frame are obtained after down-sampling.

For the apparatus embodiments, since they basically correspond to the method embodiments, reference may be made to the partial descriptions of the method embodiments for related parts. The various embodiments described herein can be implemented using computer readable media such as computer software, hardware, or any combination thereof. For hardware implementation, the embodiments described herein can be implemented using application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays ( FPGA), processors, controllers, microcontrollers, microprocessors, electronic units designed to perform the functions described herein are implemented. For software implementation, embodiments such as procedures or functions may be implemented with separate software modules that allow the performance of at least one function or operation. The software codes may be implemented by a software application (or program) written in any suitable programming language, which may be stored in memory and executed by a controller.

Correspondingly, referring to FIG. 6 , an embodiment of the present application further provides a handheld pan/tilt head 20 , including an attitude sensor 21 and the above-mentioned imaging device 10 ; wherein, the attitude sensor 21 is used to collect attitude data of the imaging device .

In one embodiment, the hand-held pan/tilt head includes a pan/tilt head shaft, and the pan/tilt head shaft is used to change the posture of the imaging device.

Correspondingly, referring to FIG. 7 , an embodiment of the present application further provides a movable platform 30, including:

body 31;

a power system 32, mounted on the body 31, for driving the movable platform 30 to move;

The above-mentioned imaging device 10 installed on the body 31;

and an attitude sensor 33 , which is installed on the body 31 and used to collect attitude data of the imaging device 10 .

In one embodiment, the movable platform includes, but is not limited to, an unmanned aerial vehicle, an unmanned vehicle, an unmanned vessel, or a mobile robot.

In an exemplary embodiment, there is also provided a non-transitory computer-readable storage medium, such as a memory including instructions, executable by a processor of an apparatus to perform the above-described method. For example, the non-transitory computer-readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

A non-transitory computer-readable storage medium, when the instructions in the storage medium are executed by the processor of the terminal, enable the terminal to execute the above method.

It should be noted that, in this document, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any relationship between these entities or operations. any such actual relationship or sequence exists. The terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion such that a process, method, article or device comprising a list of elements includes not only those elements, but also other not expressly listed elements, or also include elements inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.

The methods and devices provided by the embodiments of the present application have been introduced in detail above, and specific examples are used to illustrate the principles and implementations of the present application. At the same time, for those of ordinary skill in the art, according to the idea of the application, there will be changes in the specific implementation and application scope. In summary, the content of this specification should not be construed as a limitation to the application. .

Claims

A video stabilization method, comprising:

acquiring a sequence of images acquired by the imaging device;

Identifying, based on an attitude sensor, image shake in the image sequence caused by high-frequency shake of the imaging device;

Identifying screen shake in the image sequence due to low-frequency shaking of the imaging device based on relative motion between the image sequences;

Image stabilization is performed on the image sequence according to the high-frequency jitter and the low-frequency jitter, so as to eliminate picture jitter in the image sequence caused by the high-frequency jitter and the low-frequency jitter of the imaging device.
The method according to claim 1, wherein the high-frequency jitter includes jitter generated by high-frequency rotation and/or jitter generated by high-frequency translation; and the low-frequency jitter includes jitter generated by low-frequency translation.
The method according to claim 1 or 2, wherein the imaging device is a handheld device, and the low-frequency shaking is generated by the shaking of a part where a user holds the handheld device;

Alternatively, the imaging device is mounted on a movable platform, and the low-frequency shaking is generated by the shaking of the movable platform during movement.
The method according to claim 1, wherein the higher the zoom factor of the imaging device is, the more pixels in the image sequence are shifted due to low-frequency shaking of the imaging device.
The method according to claim 4, wherein the identifying the picture shake in the image sequence caused by the low-frequency shaking of the imaging device based on the relative motion between the image sequences comprises:

In the case that the current zoom factor of the imaging device is higher than a preset factor, the image shake in the image sequence caused by the low-frequency shaking of the imaging device is identified based on the relative motion between the image sequences.
The method according to claim 1, wherein the method further comprises:

During the process of zooming by the imaging device, or during the process of changing the orientation of the imaging device, the picture shaking in the image sequence caused by the low-frequency shaking of the imaging device is not recognized.
The method according to claim 1, wherein the method further comprises:

In the case of using the image sequence for target tracking, no stabilization processing is performed on the image sequence.
The method according to claim 1, wherein the method further comprises: controlling the imaging device to correct its own posture according to the low-frequency shaking.
The method according to claim 8, wherein the controlling the imaging device to correct its own posture according to the low-frequency shaking comprises:

A first offset corresponding to the low-frequency shaking is acquired, and the imaging device is controlled to move according to the first offset.
The method according to claim 8, wherein the imaging device is installed on a pan/tilt;

The controlling the imaging device to correct its posture according to the low-frequency shaking includes:

A first offset is obtained according to the low-frequency shaking, and the movement of the gimbal is controlled according to the first offset.
The method according to claim 1, wherein the images in the image sequence correspond to a first offset due to the low-frequency jitter and have a second offset due to the high-frequency jitter;

The performing image stabilization on the image sequence according to the high-frequency jitter and the low-frequency jitter includes:

Stabilization is performed on the images in the sequence of images based on the first offset and the second offset.
The method according to claim 11, wherein the first offset includes an offset along a specified direction; and/or the second offset includes an offset along a specified direction.
The method according to claim 11, wherein the identifying the picture shake in the image sequence caused by the high-frequency shaking of the imaging device based on the gesture sensor comprises:

For the images in the image sequence, electronic anti-shake processing is performed on the images based on the attitude data of the imaging device collected by the attitude sensor, so as to obtain a second offset of the images caused by the high-frequency jitter of the imaging device.
The method according to claim 11 or 13, wherein the second offset comprises an offset for each pixel in the image;

The method also includes:

The second offset is adjusted according to the current zoom factor of the imaging device; the adjusted second offset is adapted to the number of pixels of the image at the current zoom factor.
The method according to claim 11, wherein the identifying the picture shake in the image sequence caused by the low-frequency shaking of the imaging device based on the relative motion between the image sequences comprises:

For the images in the image sequence, motion estimation is performed on the images according to preset key frames, and a first offset of the images relative to the key frames caused by low-frequency shaking of the imaging device is determined.
16. The method according to claim 15, wherein in the absence of a key frame, the image is used as a key frame, and the first offset is set to 0.
The method according to claim 15, wherein the motion estimation is performed on the image according to a preset key frame, and the first motion of the image relative to the key frame is determined due to low-frequency shaking of the imaging device. an offset, including:

acquiring at least one key point in the key frame, the key point is located at a first position in the key frame;

determining a second position of the at least one keypoint in the image;

A first offset of the image relative to the key frame is determined based on the difference between the first position and the second position.
The method according to claim 17, wherein the acquiring at least one key point in the key frame comprises:

dividing the key frame into several image blocks;

For each of the image blocks, the key point is determined according to the center point of the image block and/or at least one feature point with the largest gradient in the image block.
The method of claim 17, wherein the determining the second position of the at least one key point in the image comprises:

acquiring first feature information of the at least one key point in the key frame respectively;

estimating a second position in the image of at least one key point in the key frame using a preset image conversion relationship;

In the image, acquiring second feature information of the at least one key point under different offsets;

For each of the key points, the second feature information under the different offsets is respectively matched with the first feature information, and according to the offset corresponding to the second feature information matched with the first feature information The shift amount adjusts the second position.
The method according to claim 19, wherein the different offsets are different distances offset along a specified direction within a preset offset range;

The matched second feature information has the smallest difference from the first feature information.
The method according to claim 19, wherein the first feature information comprises: a feature vector obtained by projecting the key point;

And/or, the second feature information includes: a feature vector obtained by projecting the key point.
The method according to claim 19 or 21, wherein the first feature information comprises: a feature vector obtained by projecting a preset area including the key point along a specified direction;

And/or, the second feature information includes: a feature vector obtained by projecting a preset area including the key point along a specified direction.
The method according to any one of claims 12, 20 or 22, wherein the specified direction includes at least a horizontal direction and/or a vertical direction in image coordinates.
The method according to claim 21 or 22, wherein the feature vector is a feature vector mapped to a preset range, and the preset range is used to make the luminance variation range of each of the key points the same.
The method according to claim 24, wherein the feature vector mapped to the preset range comprises: the difference between the feature vector obtained by the projection and the mean value of the feature vector, or the normalization of the feature vector obtained by the projection result of .
The method according to claim 19, wherein the preset image conversion relationship is based on a first position of the key point in the key frame and a position of the key point in a previous frame of image The positional relationship between them is determined.
The method according to claim 19, wherein when the image is used as a key frame, the preset image conversion relationship is a unit matrix.
The method of claim 19, further comprising:

An image conversion relationship for the next frame of image is determined according to the positional relationship between the first position of the at least one key point and the adjusted second position.
The method according to claim 28, wherein the determining the image conversion relationship for the next frame of image comprises:

screening valid keypoints from the at least one keypoint;

According to the positional relationship between the first position of the effective key point and the adjusted second position, an image conversion relationship for the next frame of image is determined.
The method according to claim 29, wherein the screening of valid key points from the at least one key point comprises:

Using the preset image conversion relationship to determine the third position of the key point located at the adjusted second position in the key frame;

For each key point, determining the displacement amount of the third position of the key point relative to the first position;

Determine the displacement of one of the key points as the target displacement, and obtain the error value of the displacement of other key points relative to the target displacement;

Valid key points whose error value is less than a preset value are selected from the at least one key point.
The method of claim 29, further comprising:

When the number of valid key points is less than a preset number, the image is determined as a key frame, and the first offset is set to 0.
The method of claim 15, further comprising:

When the first offset is greater than a preset offset threshold, the image is determined as a key frame, and the first offset is set to 0.
The method of claim 15, wherein after the determining the first offset of the image relative to the key frame, further comprising:

Smoothing the first offset of the image by using the first offset of the at least one stabilized image collected before the key frame, to obtain the processed first offset of the image.
The method according to claim 33, wherein when the image is acquired within a preset time period after acquiring the key frame, smoothing is performed on the first offset of the image.
The method according to claim 33, wherein the acquisition time interval between the stabilized image and the key frame is less than a specified duration.
34. The method of claim 33, wherein the stabilized image comprises an image of a previous frame of the key frame.
34. The method of claim 33, wherein the smoothing process comprises a linear filtering process.
The method according to claim 33 or 37, wherein the first offset after image processing is weighted by the first offset of the stabilized image and the first offset of the image Averaged results.
The method according to claim 38, wherein as the acquisition time interval between the image and the key frame gradually increases, the weight coefficient corresponding to the first offset of the stabilized image gradually increases decreases, the weight coefficient corresponding to the first offset of the image gradually increases.
The method according to claim 11, wherein the performing stabilization processing on the images in the image sequence according to the first offset and the second offset comprises:

A total offset of the image is acquired according to the first offset and the second offset, and stabilization processing is performed on the image according to the total offset.
The method according to claim 11, wherein the performing stabilization processing on the images in the image sequence according to the first offset and the second offset comprises:

A first stabilization process is performed on the image by using the first offset, and a second stabilization process is performed on the image after the first stabilization process by using the second offset.
The method according to claim 15, wherein the image and the key frame are obtained after down-sampling.
An imaging device, comprising an image sensor and one or more processors;

the image sensor is used to acquire a sequence of images;

The one or more processors are individually or collectively configured to:

Identifying, based on an attitude sensor, image shake in the image sequence caused by high-frequency shake of the imaging device;

Identifying screen shake in the image sequence due to low-frequency shaking of the imaging device based on relative motion between the image sequences;

Image stabilization is performed on the image sequence according to the high-frequency jitter and the low-frequency jitter, so as to eliminate picture jitter in the image sequence caused by the high-frequency jitter and the low-frequency jitter of the imaging device.
The device according to claim 43, wherein the high-frequency jitter includes jitter caused by high-frequency rotation and/or jitter caused by high-frequency translation; and the low-frequency jitter includes jitter caused by low-frequency translation.
The device according to claim 43 or 44, wherein the imaging device is a handheld device, and the low-frequency shaking is generated by the shaking of the part where the user holds the handheld device;

Alternatively, the imaging device is mounted on a movable platform, and the low-frequency shaking is generated by the shaking of the movable platform during movement.
The device according to claim 43, wherein the higher the zoom factor of the imaging device is, the more pixels are shifted in the image of the image sequence due to the low-frequency shaking of the imaging device.
The device according to claim 46, wherein the processor is further configured to: in the case that the current zoom factor of the imaging device is higher than a preset factor, identify the factor based on the relative motion between the image sequences Picture shaking in the image sequence caused by low-frequency shaking of the imaging device.
The device according to claim 43, wherein the processor is further configured to: in a process of zooming by the imaging device or in a process of changing the orientation of the imaging device, not to identify the cause of the Picture shaking in the image sequence caused by low-frequency shaking of the imaging device.
The apparatus according to claim 43, wherein the processor is further configured to not perform stabilization processing on the image sequence when the image sequence is used for target tracking.
The device according to claim 43, wherein the processor is further configured to: control the imaging device to correct its posture according to the low-frequency shaking.
The device according to claim 50, wherein the processor is further configured to: acquire a first offset corresponding to the low-frequency shaking, and control the movement of the imaging device according to the first offset.
The device according to claim 50, wherein the imaging device is mounted on a pan/tilt;

The processor is further configured to: obtain a first offset according to the low-frequency shaking, and control the movement of the gimbal according to the first offset.
The apparatus according to claim 43, wherein the images in the image sequence correspond to a first offset due to the low frequency jitter and have a second offset due to the high frequency jitter;

The processor is further configured to: perform stabilization processing on the images in the image sequence according to the first offset and the second offset.
The apparatus of claim 53, wherein the first offset includes an offset in a specified direction; and/or the second offset includes an offset in a specified direction.
The device according to claim 53, wherein the processor is further configured to: for the images in the image sequence, perform electronic anti-shake processing on the images based on the attitude data of the imaging device collected by the attitude sensor, A second offset of the image due to high frequency jitter of the imaging device is obtained.
The apparatus of claim 53 or 55, wherein the second offset comprises an offset for each pixel in the image;

The processor is further configured to: adjust the second offset according to the current zoom factor of the imaging device; the adjusted second offset is adapted to the number of pixels of the image at the current zoom factor.
The apparatus according to claim 53, wherein the processor is further configured to: for the images in the image sequence, perform motion estimation on the images according to preset key frames, and determine that the images are caused by the imaging The first offset relative to the key frame generated by the low-frequency shaking of the device.
The device according to claim 57, wherein in the case of no key frame, the image is used as a key frame, and the first offset is set to 0.
The apparatus of claim 57, wherein the processor is further configured to:

acquiring at least one key point in the key frame, the key point is located at a first position in the key frame;

determining a second position of the at least one keypoint in the image;

A first offset of the image relative to the key frame is determined based on the difference between the first position and the second position.
The apparatus of claim 59, wherein the processor is further configured to:

dividing the key frame into several image blocks;

For each of the image blocks, the key point is determined according to the center point of the image block and/or at least one feature point with the largest gradient in the image block.
The apparatus of claim 59, wherein the processor is further configured to:

acquiring first feature information of the at least one key point in the key frame respectively;

estimating a second position in the image of at least one key point in the key frame using a preset image conversion relationship;

In the image, acquiring second feature information of the at least one key point under different offsets;

For each of the key points, the second feature information under the different offsets is respectively matched with the first feature information, and according to the offset corresponding to the second feature information matched with the first feature information The shift amount adjusts the second position.
The device according to claim 61, wherein the different offsets are different distances offset along a specified direction within a preset offset range;

The matched second feature information has the smallest difference from the first feature information.
The device according to claim 61, wherein the first feature information comprises: a feature vector obtained by projecting the key point;

And/or, the second feature information includes: a feature vector obtained by projecting the key point.
The device according to claim 61 or 63, wherein the first feature information comprises: a feature vector obtained by projecting a preset area including the key point along a specified direction;

And/or, the second feature information includes: a feature vector obtained by projecting a preset area including the key point along a specified direction.
The apparatus according to any one of claims 54, 62 or 64, wherein the specified direction includes at least a horizontal direction and/or a vertical direction in image coordinates.
The apparatus according to claim 63 or 64, wherein the feature vector is a feature vector mapped to a preset range, and the preset range is used to make the luminance variation range of each of the key points the same.
The device according to claim 66, wherein the feature vector mapped to the preset range comprises: the difference between the feature vector obtained by the projection and the mean value of the feature vector, or the normalization of the feature vector obtained by the projection result of .
The device according to claim 61, wherein the preset image conversion relationship is based on a difference between a first position of the key point in the key frame and a position of the key point in a previous frame of image The positional relationship between them is determined.
The device according to claim 61, wherein, when the image is used as a key frame, the preset image conversion relationship is a unit matrix.
The device according to claim 61, wherein the processor is further configured to: according to the positional relationship between the first position of the at least one key point and the adjusted second position, determine whether to use the next key point The image conversion relationship of the frame image.
The apparatus of claim 70, wherein the processor is further configured to:

screening valid keypoints from the at least one keypoint;

According to the positional relationship between the first position of the effective key point and the adjusted second position, an image conversion relationship for the next frame of image is determined.
The apparatus of claim 71, wherein the processor is further configured to:

Using the preset image conversion relationship to determine the third position of the key point located at the adjusted second position in the key frame;

For each key point, determining the displacement amount of the third position of the key point relative to the first position;

Determine the displacement of one of the key points as the target displacement, and obtain the error value of the displacement of other key points relative to the target displacement;

Valid key points whose error value is less than a preset value are selected from the at least one key point.
The apparatus according to claim 71, wherein the processor is further configured to: determine the image as a key frame when the number of valid key points is less than a preset number, and set the The first offset is set to 0.
The apparatus according to claim 57, wherein the processor is further configured to: in the case that the first offset is greater than a preset offset threshold, determine the image as a key frame, and convert the image to a key frame. The first offset is set to 0.
The apparatus according to claim 57, wherein, after the determining the first offset of the image relative to the key frame, the processor is further configured to: use acquisition before the key frame The first offset of the at least one stabilized image is smoothed on the first offset of the image to obtain the processed first offset of the image.
The apparatus according to claim 75, characterized in that in the case that the image is acquired within a preset time period after acquiring the key frame, smoothing is performed on the first offset of the image.
The device according to claim 75, wherein the acquisition time interval between the stabilized image and the key frame is less than a specified duration.
76. The apparatus of claim 75, wherein the stabilized image comprises a previous frame of the key frame.
76. The apparatus of claim 75, wherein the smoothing process comprises a linear filtering process.
The apparatus according to claim 75 or 79, wherein the first offset after image processing is weighted by the first offset of the stabilized image and the first offset of the image Averaged results.
The apparatus according to claim 80, wherein as the acquisition time interval between the image and the key frame gradually increases, the weight coefficient corresponding to the first offset of the stabilized image gradually increases decreases, the weight coefficient corresponding to the first offset of the image gradually increases.
The apparatus according to claim 53, wherein the processor is further configured to: obtain the total offset of the image according to the first offset and the second offset, and obtain the total offset of the image according to the first offset and the second offset. The image is stabilized by the total offset.
The apparatus according to claim 53, wherein the processor is further configured to: use the first offset to perform a first stabilization process on the image, and then use the second offset to perform a first stabilization process on the image. The image after the first stabilization process is subjected to the second stabilization process.
The apparatus of claim 57, wherein the image and the key frame are obtained after down-sampling.
A handheld pan/tilt head, characterized by comprising an attitude sensor and the imaging device according to any one of claims 43 to 84; wherein the attitude sensor is used to collect attitude data of the imaging device.
A movable platform, characterized in that, comprising:

body;

a power system, mounted on the body, for driving the movable platform to move;

The imaging device of any one of claims 43 to 84;

and an attitude sensor, which is installed on the body and used to collect attitude data of the imaging device.
A computer-readable storage medium, characterized in that, the computer-readable storage medium stores executable instructions, and when the executable instructions are executed by a processor, the method according to any one of claims 1 to 42 is implemented.