CN113132612B

CN113132612B - Image stabilization processing method, terminal shooting method, medium and system

Info

Publication number: CN113132612B
Application number: CN201911413482.2A
Authority: CN
Inventors: 王宇; 朱聪超
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2022-08-09
Anticipated expiration: 2039-12-31
Also published as: CN113132612A

Abstract

The application discloses an image stabilization processing method which specifically comprises the steps of obtaining motion information of terminal shooting equipment and an image to be processed; the image to be processed comprises at least one target area containing a target; determining a background area outside the target area in the image to be processed according to the target area; calculating a transformation matrix corresponding to each pixel in the image to be processed; the transformation matrix is determined based on the motion information and the camera internal parameters; the transformation matrix establishes a mapping relation between the positions of pixels of the image to be processed and the image after image stabilization processing, and represents the rotation transformation and/or translation transformation of the pixels; performing first processing on each pixel of the target area, and performing second processing on each pixel of the background area; the first processing includes weakening processing performed on a rotational transformation and/or a translational transformation of each pixel of the target region; the second process is different from the first process; and outputting the image after image stabilization processing based on the transformation matrix corresponding to each pixel of the image after the first processing and the second processing.

Description

Image stabilization processing method, terminal shooting method, medium and system

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image stabilization processing method.

Background

In image or video capture, distortion often occurs due to the difference in motion of the foreground and background in the field of view of the capture relative to the camera. The rotation self-shooting of the front camera is a typical scene, and the portrait can be distorted. The reason for the distortion is that if the video stabilization algorithm corrects the entire image based only on MEMS (micro electro mechanical gyroscope) data in the cell phone, the correction will not distinguish between the human image and the background while the cell phone is moving. When the front camera is used for rotation self-shooting, the change of the relative position relationship between the portrait and the camera is small, the portrait rotates along with the camera all the time and is approximately kept at the same position of the picture, so that the portrait part is not corrected, and the background part is corrected. Since the video stabilization algorithm does not use image content information, the portrait as a part of the background is deformed along with the picture distortion operation.

Disclosure of Invention

The application provides a method for solving the problem that human face distortion easily occurs when a user uses a front camera to rotate for self-shooting, namely human image segmentation and mobile phone gyroscope information are combined to perform targeted processing on human images and non-human image areas in a picture, background information is guaranteed to have an image stabilizing effect, and the human face is prevented from being seriously distorted.

The embodiment of the application provides an image stabilization processing method, a terminal photographing method, a medium and a system.

In a first aspect, an embodiment of the present application provides an image stabilization processing method, including:

acquiring motion information and an image to be processed of terminal shooting equipment; the image to be processed comprises at least one target area containing a target; determining a background area outside the target area in the image to be processed according to the target area; calculating a transformation matrix corresponding to each pixel in the image to be processed; the transformation matrix is determined based on the motion information and the camera internal parameters; the transformation matrix establishes a mapping relation between the positions of pixels of the image to be processed and the image after image stabilization processing, and represents the rotation transformation and/or translation transformation of the pixels; performing first processing on each pixel of the target area, and performing second processing on each pixel of the background area; the first processing includes weakening processing performed on a rotational transformation and/or a translational transformation of each pixel of the target region; the second process is different from the first process; and outputting the image after image stabilization processing based on the transformation matrix corresponding to each pixel of the image after the first processing and the second processing.

In this embodiment, motion data of the terminal photographing device and a video frame acquired at intervals by a CMOS sensor or a CCD sensor are acquired by a gyroscope, an acceleration sensor, an Optical Image Stabilizer (OIS), image feature matching, an optical flow method, and the like, in this embodiment, the video frame is image data, and an acquired image to be processed at least includes a target area of a target, where the target area includes a target contour area and a target, for example, the target is a portrait area, and the target contour area is a portrait buffer area, and the target area is a portrait protection area. And determining a background area according to the target area, determining a homography matrix corresponding to each pixel of the background area and the target area according to the motion information and the camera internal parameters, performing first processing and second processing on each pixel of the target area, and performing second processing on each pixel of the background area. Specifically, the first processing is a weakening processing of performing rotation transformation and/or translation transformation on each pixel of the target region. The aforementioned transformation matrix is determined by a rotation transformation and/or a translation transformation and camera internal parameters.

In a possible implementation of the first aspect, the method for image stabilization further includes:

and determining a motion vector corresponding to each pixel of the image based on the motion information, and calculating a transformation matrix corresponding to each pixel of the image according to the motion vector, wherein the transformation matrix comprises a rotation matrix and/or a translation matrix. That is, the transformation matrix corresponding to each pixel of the image is determined by the motion vector determined by the motion information. It is understood that the rotation matrix corresponding to each pixel of the image is calculated depending on the rotation vector of each pixel, the rotation vector is obtained by integrating the angular velocity of the corresponding grid with respect to time, and the data of the angular velocity is obtained by a gyro measurement, or may be calculated by a method based on image feature matching, for example, an optical flow method or the like.

the first process includes: and determining a rotation matrix and/or a translation matrix corresponding to each pixel of the target area based on the motion information, and performing degradation processing on the rotation matrix and/or the translation matrix corresponding to each pixel of the target area. I.e. since the transformation matrix is determined by the rotation and/or translation transformation and the camera internal parameters, the degeneration operation on the rotation and/or translation matrix will eventually be reflected in the weakening of the transformation matrix. And the rotation matrix and/or the translation matrix is determined by the acquired motion information.

the degradation process includes: reducing the rotation vector corresponding to each pixel in the target area, and calculating a degraded rotation matrix, or performing interpolation calculation between the rotation matrix corresponding to each pixel in the target area and the unit matrix to obtain the degraded rotation matrix; and/or reducing translation vectors corresponding to each pixel in the target area, and calculating a degraded translation matrix, or performing interpolation calculation between the translation matrix corresponding to each pixel in the target area and a zero matrix to obtain the degraded translation matrix. That is, the target of the degradation process is a rotation matrix and/or a translation matrix corresponding to each pixel of the target region, and the degradation process may be performed by reducing the rotation vector and/or the translation vector of each pixel of the target region and then representing the degradation of the rotation matrix and/or the translation matrix, or may be directly obtained by performing interpolation calculation on the rotation matrix and/or the translation matrix of each pixel of the target region.

and performing second processing on each pixel in the background area, wherein the second processing comprises the following steps: and performing image coordinate transformation and pixel value interpolation on each pixel in the background area to finish image stabilization and rolling shutter correction. The step is to adjust the background area and the target area separately so that the background area can perform normal image stabilization processing while suppressing distortion of the target area.

first processing and second processing are performed on each pixel of the target area. In the embodiment of the present application, when image stabilization processing is performed, two cases are included: (1) directly carrying out degradation processing on a rotation matrix and/or a translation matrix corresponding to each pixel in a target area, and carrying out pixel coordinate transformation and pixel interpolation on each pixel in a background area to finish image stabilization processing and rolling shutter correction; (2) and performing pixel coordinate transformation and pixel interpolation based on each pixel of the whole image, completing stable pixel processing and rolling shutter correction, and performing degradation processing on a rotation matrix and/or a translation matrix corresponding to each pixel of the target area.

and determining a rotation matrix and/or a translation matrix corresponding to each pixel of the target area based on the motion information, performing degradation processing on the rotation matrix and/or the translation matrix corresponding to each pixel of the target area, and performing image coordinate transformation and pixel value interpolation on each pixel of the image subjected to the degradation processing to finish image stabilization and rolling shutter correction.

in the target region, a target contour region is determined based on the contour of the target, and a region other than the target contour region in the target region is set as a buffer region. In the image, a transition area is arranged between the target contour area and the background area so as to reduce image defects caused by the difference of degradation degrees of a rotation matrix and/or a translation matrix corresponding to each pixel of different areas, and the size of a buffer area can be further set according to the determined target contour coordinates.

and (3) combining the rotation matrix and/or translation matrix corresponding to each pixel in the target area and the rotation matrix and/or translation matrix corresponding to each pixel in the background area to calculate the rotation matrix and/or translation matrix corresponding to the image pixel in the buffer area by interpolation. That is, since the buffer area is located between the target area and the background area, interpolation calculation can be performed on the rotation matrix and/or the translation matrix corresponding to each pixel in the buffer area, so as to obtain the rotation matrix and/or the translation matrix after interpolation, and further obtain the transformation matrix corresponding to each pixel in the buffer area according to the rotation matrix and/or the translation matrix in the buffer area.

dividing an image to be processed into N multiplied by M grids, calculating a transformation matrix corresponding to the position coordinates of each grid point, and carrying out interpolation calculation on each pixel in a square grid area surrounded by four adjacent grid points according to the coordinate position relation of the pixel and the adjacent grid points and the transformation matrix corresponding to the adjacent grid points to obtain the transformation matrix corresponding to each pixel. The pixels are processed by taking grids as units, and the transformation matrixes of the pixels in the same grid are interpolated by the coordinate position relation of the adjacent grid points and the transformation matrixes corresponding to the adjacent grid points to obtain the same value, so that the calculation process is simplified.

and determining a motion vector corresponding to each grid of the image based on the motion information, and calculating a transformation matrix corresponding to each pixel of the image according to the motion vector, wherein the transformation matrix comprises a rotation matrix and/or a translation matrix. As described above, when the image pixels are mapped in units of grids, the angular velocity of each grid center may be obtained by interpolation, and the motion vector and the transformation matrix corresponding to each grid may be obtained.

the first process includes: and determining a rotation matrix and/or a translation matrix corresponding to each grid of the image based on the motion information, and performing degradation processing on the rotation matrix and/or the translation matrix corresponding to each grid of the target area.

and performing second processing on each pixel in the background area, wherein the second processing comprises the following steps: and performing image coordinate transformation and pixel value interpolation based on each grid of the background area to finish image stabilization and rolling shutter correction.

and performing first processing and second processing on each grid of the target area.

and determining a rotation matrix and/or a translation matrix corresponding to each grid of the image based on the motion information, performing degradation processing on the rotation matrix and/or the translation matrix corresponding to each grid of the target area, and performing image coordinate transformation and pixel value interpolation based on each pixel of the image subjected to the degradation processing to finish image stabilization and rolling shutter correction.

in the target region, a target contour region is determined based on the contour of the target, and a region other than the target contour region in the target region is set as a buffer region.

and (3) calculating a rotation matrix and/or a translation matrix corresponding to the image pixel in the buffer area by combining the rotation matrix and/or the translation matrix corresponding to each grid of the target area and the rotation matrix and/or the translation matrix corresponding to each grid of the background area through interpolation.

when the proportion of the target area to the image to be processed is larger than a preset threshold value, the proportion of the set buffer area to the image to be processed is larger than a preset proportion threshold value;

and when the proportion of the target area to the image to be processed is smaller than the preset threshold value, the proportion of the set buffer area to the image to be processed is smaller than the preset proportion threshold value. The preset threshold and/or the preset proportion threshold are/is set according to a preset function; the preset function includes a gaussian blur and/or a piecewise function.

identifying a target of the image to be processed through a convolution neural grid;

and determining a target area according to the target.

the motion vector is smoothed;

the smoothing process comprises smoothing process through a neural network, Gaussian filter, IIR filter, PID regulator, L1 optimization, L2 optimization and other methods.

In a second aspect, the present application provides a terminal photographing method, including:

acquiring an image to be processed by a terminal;

and processing the image to be processed by adopting the image stabilization processing method disclosed in various possible implementations of the aspects to obtain the image stabilization image.

In a third aspect, an embodiment of the present application provides an image stabilization processing apparatus, including:

the acquisition module is used for acquiring the motion information and the image to be processed of the terminal shooting equipment; the image to be processed comprises at least one target area containing a target; determining a background area outside the target area in the image to be processed according to the target area;

the calculation module is used for calculating a transformation matrix corresponding to each pixel in the image to be processed; the transformation matrix is determined based on the motion information and the camera internal parameters; the transformation matrix establishes a mapping relation between the positions of pixels of the image to be processed and the image after image stabilization processing, and represents the rotation transformation and/or translation transformation of the pixels;

the processing module is used for carrying out first processing on each pixel of the target area and carrying out second processing on each pixel of the background area; the first processing includes weakening processing performed on a rotational transformation and/or a translational transformation of each pixel of the target region; the second process is different from the first process;

and the output module is used for outputting the image after image stabilization processing based on the transformation matrix corresponding to each pixel of the image after the first processing and the second processing.

In a fourth aspect, this application provides a terminal, including:

the image acquisition unit is used for acquiring an image to be processed by a terminal;

and the image processing unit is used for processing the image to be processed by adopting the image stabilizing processing method disclosed in various possible implementations of the aspects to obtain the target image.

In a fifth aspect, this application provides in its implementation a machine-readable medium having stored thereon instructions that, when executed on a machine, cause the machine to perform the image stabilization processing method disclosed in the various possible implementations of the above aspects.

In a sixth aspect, the present application provides a system, comprising:

a memory for storing instructions for execution by one or more processors of the system, and a processor, which is one of the processors of the system, for performing the image stabilization processing method disclosed in the various possible implementations of the above aspects.

Drawings

Fig. 1 illustrates a system framework diagram of a handset, according to some embodiments of the present application.

Fig. 2 illustrates a usage scenario of a cell phone, according to some embodiments of the present application.

Fig. 3 illustrates a structure of a computer vision module in the handset system of fig. 1, according to some embodiments of the present application.

Fig. 4a illustrates an image captured by a camera in the handset system of fig. 1, in accordance with some embodiments of the present application.

Fig. 4b illustrates an outline of a portrait in an image captured by a camera in the handset system of fig. 1, in accordance with some embodiments of the present application.

FIG. 5 illustrates an image after partitioning, where region A represents a target region, region B represents a background region, region N represents a buffer region, and region D is a portrait protected region, according to some embodiments of the present application.

Fig. 6 illustrates a flow diagram of AI portrait segmentation, according to some embodiments of the present application.

Fig. 7a is a schematic view of alignment of gyroscope motion data with image frames according to some embodiments of the present application.

FIG. 7b is a flow chart of calculating a rotation matrix for each pixel according to some embodiments of the present application.

Fig. 8 shows a graph of smoothing of a sequence of rotation vectors, where curve x represents a sequence of rotation vectors that have not been smoothed and curve y represents a sequence of rotation vectors that have been smoothed, according to some embodiments of the present application.

FIG. 9 illustrates a flow diagram of a method of image stabilization processing, according to some embodiments of the present application.

FIG. 10 illustrates a flow diagram of a method of image stabilization processing, according to some embodiments of the present application.

Fig. 11 illustrates a flow chart of a method of front-facing rotation self-timer for a cell phone, according to some embodiments of the present application.

FIG. 12 illustrates an apparatus for image processing, according to some embodiments of the present application.

Fig. 13 illustrates a terminal according to some embodiments of the present application.

Fig. 14 illustrates a block diagram of a system, according to some embodiments of the present application.

FIG. 15 illustrates a block diagram of a system on a chip (SoC), according to some embodiments of the present application.

DETAILED DESCRIPTION OF EMBODIMENT (S) OF INVENTION

The illustrative embodiments of the present application include, but are not limited to, a blockchain-based IP address prefix authentication method and apparatus.

It will be appreciated that as used herein, the term module may refer to or include an Application Specific Integrated Circuit (ASIC), an electronic circuit, a processor (shared, dedicated, or group) and/or memory that executes one or more software or firmware programs, a combinational logic circuit, and/or other suitable hardware components that provide the described functionality, or may be part of such hardware components.

It is to be appreciated that in various embodiments of the present application, the processor may be a microprocessor, a digital signal processor, a microcontroller, or the like, and/or any combination thereof. According to another aspect, the processor may be a single-core processor, a multi-core processor, the like, and/or any combination thereof.

The image stabilization processing method is applied to video shooting equipment, for example, the image stabilization processing method can be applied to handheld video shooting equipment such as a smart phone, a tablet personal computer and handheld DV (digital video recorder), can also be applied to equipment such as a vehicle data recorder, and even can be applied to video recording equipment of an unmanned aerial vehicle and a sweeping robot, and the equipment can move or vibrate in a short time in a large range in the using process.

Embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

It can be understood that in some embodiments of the present application, an image stabilization processing method is disclosed, in which different processing modes are respectively used for a portrait protected area and a background part in an image.

In the image stabilization processing method according to the present application, it is necessary to acquire motion information of the terminal device in various ways such as a gyroscope, an acceleration sensor, an optical image stabilization device (OIS), image feature matching, and an optical flow method.

Based on the motion information, according to an embodiment of the present application, a homography matrix corresponding to each pixel of the image is determined according to the motion information, specifically, (a) pixel coordinate transformation and pixel interpolation of each pixel of the image are performed based on the pixels of the whole image, image stabilization processing and rolling shutter correction are completed, and then degradation processing is performed on the homography matrix corresponding to each pixel in the human image protection area D; (b) carrying out degradation processing on the homography matrix corresponding to each pixel in the human image protection area D, and then carrying out pixel coordinate transformation and pixel interpolation on each pixel in the non-human image protection area B (also called as a background area) to complete image stabilization processing and rolling shutter correction; and then determining a homography matrix between the image subjected to image stabilization processing and the image to be processed according to the operation, and outputting a final image according to the homography matrix.

Based on these motion information, according to another embodiment of the present application, it is also possible to first divide the image into N × M meshes, that is, to process the pixels in units of meshes, so that the amount of partial calculation can be reduced. Specifically, the image is segmented through a neural network model, and the image area is mapped to a grid to obtain a grid set A occupied by the image; setting a portrait protection area D, wherein the portrait protection area D comprises an image buffer area N and a portrait area A; the region of the image other than the portrait protected region D is determined as a non-portrait protected region, i.e., a background region B.

Calculating a rotation matrix R and/or a translation matrix T corresponding to each region of the image;

for the non-portrait protected area B, the calculated rotation matrix R is maintained ₀ And/or translation matrix T ₀ Performing pixel coordinate transformation and pixel interpolation based on each pixel in the non-human image area B without changing, and finishing image stabilization processing and shutter correction of the rolling shutter;

for the portrait area A, degrading a rotation matrix R of the portrait area into an identity matrix I and/or degrading a translation matrix T of the portrait area into a zero matrix; or pixel coordinate transformation and pixel interpolation are carried out based on the pixels of the whole image, image stabilization processing and rolling shutter correction of the whole image are completed, and then a rotation matrix R and/or a translation matrix T of a portrait area are/is degraded into an identity matrix I and/or the translation matrix T after correction is degraded into a zero matrix; for the buffer region N, SLERP (R) is represented by the formula R _A ,R _B ) (spherical linear interpolation) calculating a rotation matrix R of the buffer area and/or calculating a rotation matrix R of the buffer area according to a formula T ═ SLERP (T) _A ,T _B ) Calculating a translation matrix T of the buffer area; and calculating homography matrixes H corresponding to the image parts by combining a camera internal parameter matrix K according to the rotation matrix and the translation matrix of each part after adjustment and a formula H (K, T, R), and outputting a final image according to the homography matrix H.

An exemplary application scenario of the image stabilization processing method of the present application, an image stabilization processing in a mobile phone front-mounted rotation self-shooting process, is described below, wherein the image stabilization processing method of the present application is described in detail.

It is understood that this example is only for explaining the specific embodiments and principles of the present application, but the image stabilization processing method according to the present application is not limited to be used in a scenario where a front camera of a mobile phone performs self-timer shooting, and the method is also applicable to images or videos shot by other cameras of the mobile phone. Moreover, the image stabilization processing method is not only applicable to a mobile phone terminal, but also applicable to other shooting equipment provided with a camera.

The mobile phone 10 is taken as an example to describe the mobile phone front-facing rotation self-timer technology of the present application. Fig. 1 illustrates a system framework diagram of a handset 10, according to some embodiments of the present application. It is understood that the system framework is also applicable to other terminals, not limited to mobile phones.

It is understood that the exemplary structure of the embodiment of the present application relates to CMOS sensor-ISP-CPU/GPU/NPU-TP, etc. but does not constitute a specific limitation to the mobile phone 10. In other embodiments of the present application, the handset 10 may include more or fewer components than shown, or some components may be combined, some components may be separated, or a different arrangement of components may be used. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

Some embodiments according to the present application disclose a handset 10 system.

Specifically, as shown in FIG. 1, the handset 10 includes a software system 110 and a hardware system 120. The hardware system 120 includes a camera module 121, a sensor module 122, and a display screen 123, and the software system 110 includes an operating system 111 and a computer vision module 113 at an application layer 112. The operating system 111 is a computer program integrated in the terminal that manages hardware and software resources of the terminal device. The application layer 112 is a module program having a specific function that runs on top of the operating system 111. The camera module 121 (including the front camera) is used to capture video or image information, such as capturing video when the user rotates to take a self-portrait.

The display screen 102 is used for displaying human-computer interaction interfaces, images, videos and the like. The display screen 102 includes a display panel. The display panel may adopt a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (active-matrix organic light-emitting diode, AMOLED), a flexible light-emitting diode (FLED), a miniature, a Micro-oeld, a quantum dot light-emitting diode (QLED), and the like.

The sensor module 122 may include a proximity light sensor, a pressure sensor, a gyroscope sensor, an air pressure sensor, a magnetic sensor, an acceleration sensor, a distance sensor, a fingerprint sensor, a temperature sensor, a touch sensor, an ambient light sensor, a bone conduction sensor, and the like.

The camera module 121 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image to the photosensitive element. The light receiving element converts an optical Signal into an electrical Signal, and then transmits the electrical Signal to an ISP (Image Signal Processing) to convert the electrical Signal into a digital Image Signal. The mobile phone 10 can implement a shooting function through an ISP, a camera module 121, a video codec, a GPU (graphics Processing Unit), a display screen 102, an application processor, and the like.

Fig. 2 shows a scenario in which a user is taking a self-portrait while using the front camera of the handset 10 to rotate, according to some embodiments of the present application. Specifically, in fig. 2, when the user performs self-timer rotation using the mobile phone 10, the camera module 121 of the mobile phone 10 reads the camera ID, and recognizes that self-timer rotation using the front camera 1211 is being performed at this time. For this case, the mobile phone 10 performs image stabilization on the acquired video by using an image stabilization method.

The specific implementation process of the image stabilization processing method according to the application is described as follows:

when the user performs rotation self-timer shooting using the front camera 1211, the sensor module 122 acquires gyroscope data while the camera module 121 acquires a plurality of frames of image data. Since the gyroscope outputs a set of data including three-axis angular velocities and time stamps every fixed time t when the self-timer device is in operation, for example, a set of data collected by the gyroscope may be represented as [ g ] _t ，Ω _x ，Ω _y ，Ω _z ]Wherein g is _t Time stamp, Ω, representing the data acquisition of the group _x ，Ω _y ，Ω _z Respectively representing the angular velocity components of the rotation about the three coordinate axes at that moment. Since the sensor module 122 captures video frames at a fixed frame rate, in the present embodiment, one video frame is just one image data, or is referred to as one image, and playing multiple consecutive images will form a video. In order to improve sampling accuracy, the frequency of the output data of the gyroscope is usually set to be higher than the frequency of the video frames acquired by the sensor, so that a plurality of groups of gyroscope data are contained between two adjacent video frames. What is needed isSo that the alignment will be performed before the time stamp data and the image data are used.

The user image captured by the front camera 1211 is subjected to human image segmentation by the computer vision module 113. And segmenting the image to be processed into a human image area and a background area.

As shown in fig. 3, the computer vision module 113 includes a human figure contour detection unit 1131, a human figure segmentation unit 1132, a rotation matrix calculation unit 1133, a homography matrix calculation unit 1134, and an image deformation unit 1135.

The process of the computer vision module 113 identifying the portrait outline in the user image through the respective cells is as follows:

the human image contour detection unit 1131 detects a human image contour in the user image captured by the front camera 1121. For example, fig. 4(a) shows that the front camera 1211 detects a user captured image, and fig. 4(b) shows a person image contour of the user in the image. The portrait contour detection unit 1131 sends the image in which the portrait contour is detected to the image segmentation unit 1132.

Fig. 5 illustrates an image after zoning, where zone a represents a target zone, zone B represents a background zone, zone N represents a buffer zone, and zone D represents a portrait protected zone, according to some embodiments of the present application.

After receiving the image, as shown in fig. 6, the image segmentation unit 1132 generates a portrait contour coordinate through a convolutional neural network model, then positions the portrait area, determines a background area B of the image based on the portrait area a, further sets a portrait protection area D by using a piecewise function or a gaussian filter, and determines a portrait buffer area N (see fig. 5) according to the portrait protection area and the portrait area, for example, if the image proportion of the portrait area is greater than a threshold or the motion state is severe, the portrait buffer area is appropriately increased, and if the image proportion of the portrait area is less than or the threshold or the motion state is mild, the portrait buffer area is appropriately decreased;

further, fig. 7a shows that the time stamp data and the image data are aligned during the operation of the rotation matrix calculation unit 1133.

The working process of the rotation matrix calculation unit 1133 is shown in fig. 7b, and specifically includes:

(a) calculating the angular velocity of each pixel by interpolation to obtain a rotation vector

(b) And inputting the calculated rotation vector sequence into a neural network model for smoothing, and obtaining a further smooth rotation vector curve through a recursive filter and a PID (proportion integration differentiation) regulator, namely obtaining rotation vectors Rot (see figure 8) corresponding to each grid through filtering, wherein the curve x is before smoothing and the curve y is after smoothing.

(c) And finally, converting the rotation vector Rot into a rotation matrix R according to a Rodrigues rotation transformation formula.

The homography matrix calculation unit 1134 combines the calculated rotation matrix with the portrait area, the portrait buffer area, the portrait protection area, and the background area divided by the portrait dividing unit 1132, and performs the following operations:

for background areas B (i.e., non-portrait protected areas), the calculated rotation matrix R is maintained ₀ And/or T ₀ The normal rolling shutter correction is carried out without change, and then the formula H is passed _B ＝f(K,R ₀ ) Calculating a homography matrix H corresponding to each pixel of the background area B _B Or by the formula H _B ＝f(K,R ₀ ,T ₀ ) Calculating a homography matrix H corresponding to each pixel of the background area B _B ；

For the portrait area A, degrading a rotation matrix R of the portrait area into an identity matrix I and/or degrading a translation matrix T of the portrait area into a zero matrix; or firstly carrying out rolling shutter on a rotation matrix R and/or a translation matrix T of the portrait area, and then degenerating the corrected rotation matrix into a unit matrix I and/or degenerating the corrected translation matrix T into a zero matrix;

for the buffer region N, the rotation matrix corresponding to each pixel in the nearest human image region and each pixel in the nearest background region (see FIG. 9) is determined according to the formula R _N ＝SLEPR(R _A ,R _B ) And/or by the formula T _N ＝SLERP(T _A ,T _B ) Calculate the slownessRotation matrix R of impact area _N And/or translation matrix T _N Then according to formula H _N ＝f(K,R _N ) And/or H _N ＝f(K,R _N ,T _N ) Calculating a homography matrix corresponding to each pixel of the buffer area;

wherein, the K is a known camera internal reference matrix, and the camera is calibrated by adopting a calibration method to obtain the camera internal reference matrix

Wherein, c _x 、c _y Representing the coordinates of the optical axis of the camera, f _x 、f _y Representing the focal length of the camera in the X-axis Y-axis.

Further, according to the above method, the pixels may also be correspondingly processed according to a grid as a unit, and details are not described herein to avoid repetition.

The image deformation unit 1135 outputs an output coordinate value corresponding to each pixel of the image-stabilizing processing image according to the homography matrix corresponding to each pixel of the human image area a, the background area B, and the human image buffer area N calculated by the homography matrix calculation unit 1134, and further outputs the image-stabilizing image through an isp (image Signal processor).

Meanwhile, when the collected image to be processed is detected to include a plurality of portraits by the portraits contour detection unit 1131, the user of the mobile phone can select a region through the display screen 102, that is, the user can select to perform the above-mentioned processing on at least one of the portraits, the sensor module 190 transmits the region specified by the user to the image segmentation unit 1132, the image segmentation unit 1132 performs region segmentation according to the region selected by the user, and specifically segments the region into a processing region, a protection region, a processing buffer region and a background region, and then calculates the homography matrix corresponding to each pixel of each region respectively by the rotation matrix calculation unit 1133 and the homography matrix calculation unit 1134, and calculates the homography matrix corresponding to each pixel of the processing region, the background region and the processing buffer region by the image deformation unit 1135 according to the homography matrix calculation unit 1134, and outputting the output coordinate value corresponding to each pixel of the image stabilization processing image, and further outputting the image stabilization image through an ISP (image Signal processor).

In addition, although the above description has been made with the embodiment according to the present application taking a portrait as an example of an object, the image stabilization processing method according to the present application is not limited to being used only for processing a portrait, and may be any object that is intended to achieve image stabilization during shooting, such as an animal, a moving object, or the like

Based on the above description of the image processing method of the mobile phone 10, a specific flow of the image processing method of the present application is specifically described below. Various relevant details in the above description are still applicable in the present flow, and are not repeated herein to avoid repetition. Specifically, as shown in fig. 9 to 10, when only the rotation transformation is considered, that is, when the translation matrix T is equivalent to the identity matrix I, the image stabilization processing method of the present application includes:

an image to be processed is acquired (901). For example, images captured by a camera. For example, the camera may capture images of a living body (such as an animal, a person, etc.) or images of a particular landscape (such as a tree, a flower, a building, etc.).

A rotation matrix is calculated for each pixel of the image (902). And combining the angular velocity data output by the gyroscope, obtaining the angular velocity of each pixel through interpolation calculation, performing integral calculation on the angular velocity of each pixel to the time of the gyroscope to obtain a rotation vector corresponding to each pixel, and converting the rotation vector corresponding to each pixel into a rotation matrix through a Rodrigues rotation transformation formula.

Partitioning is performed by artificial intelligence partitioning (903). For example, the images are subjected to partition processing through a neural network, the neural network can identify a building area in the images, and then the size of a building protection area can be set according to the size of the building area and the motion state of a camera in combination with a gaussian filter function, wherein the building image protection area comprises the building area and a buffer area of which the gray value is between the building area and a background area, and then the range of the background area is obtained according to the range of the building protection area; further, the range of the buffer area can be obtained according to the range of the building protection area.

The building area rotation matrix is directly degraded and pixel coordinate transformation and pixel interpolation are performed based on image pixels of the building area to complete image stabilization and rolling shutter correction (9041) and rolling shutter correction is performed based on each pixel of the background area (9042) and further the rotation matrix of the buffer area is interpolated (1043). By performing a corresponding operation on the rotation matrix of each pixel of the different partitions.

A homography matrix is computed for each pixel of each region based on the processed rotation matrices (906). And combining the calculated rotation matrix, calculating a homography matrix corresponding to each pixel according to a known camera internal parameter matrix through an H ═ f (K, R) formula, and further obtaining a corresponding relation of pixel coordinate values between the image subjected to image stabilization processing and the acquired image to be processed.

Corresponding image output coordinate values are calculated based on the homography matrix for each pixel of the respective region (907).

And outputting the image subjected to image stabilization processing according to the output coordinate value of the image (908).

In other embodiments of the present application, the present application further provides a terminal shooting method, and similarly, only the rotation transformation is considered in this embodiment, that is, the translation matrix T is equivalent to the identity matrix I. As shown in fig. 11:

when a user rotates to take a self-timer, a terminal camera can acquire a video shot by the user, the terminal can calculate a rotation matrix (1101) of each pixel of each frame of image in the video, the terminal reads an ID (1102) of the camera at the moment and identifies whether the camera is a front camera (1103), and when the camera is the front camera, a portrait is in a fixed relative position relative to a terminal display screen. If the image is not the front camera, directly calculating a homography matrix H (1107) corresponding to each pixel of the image, calculating an output coordinate value (1108) corresponding to the pixel according to the homography matrix H, and further outputting a stable image (1109) by an ISP (Internet service provider); if the image belongs to the front camera, the terminal detects whether a portrait exists or not through AI portrait segmentation (1104) (1105), if no portrait exists, the image is not subjected to partition correction, namely a homography matrix H (1107) corresponding to each pixel of the image is directly calculated, an output coordinate value (1108) corresponding to the pixel is calculated according to the homography matrix H, and then an ISP outputs a stable image (1109); if the portrait is detected (1105), partitioning the image, dividing the image into a portrait area, a background area and a buffer area, modifying the rotation matrix corresponding to each pixel of each partition (1106), namely keeping the rotation matrix of the background area unchanged, reducing the rotation matrix of the portrait area into a unit matrix, and obtaining the rotation matrix of the buffer area through interpolation calculation by combining the rotation matrices of the portrait area and the background area; then calculating a homography matrix (1107) corresponding to each pixel of each region according to the rotation matrix of each pixel of each region, and further calculating an output coordinate value (1108) corresponding to each pixel according to the homography matrix H of each pixel of each region; finally, the image-stabilized image is output by the ISP (1109).

Alternatively, the image segmentation unit 1132 may also process the pixels in units of a grid.

Specifically, as shown in fig. 5, the image to be processed is divided into N × M grids, a homography matrix corresponding to the coordinates of the grid point position is calculated, and then, for each pixel in a square grid region surrounded by four adjacent grid points, interpolation calculation is performed according to the coordinate position relationship between the pixel and the adjacent grid point and the transformation matrix corresponding to the adjacent grid point, so as to obtain the transformation matrix corresponding to each pixel. Then, as shown in fig. 6, a portrait contour coordinate is generated through a convolutional neural network model, then, a portrait area is located, and a grid and the portrait area are mapped, that is, when the proportion of the number of pixels of the portrait area in the grid occupying the total pixels in the grid is greater than or equal to a preset proportion δ being 0.5, the grid belongs to the grid of the portrait area, and the value of δ is [0.5, 1 ]. And determining a background area B of the image based on the portrait area A, further setting a portrait protection area D by using a piecewise function or a Gaussian filter and other models according to the portrait area, and determining a portrait buffer area N according to the portrait protection area and the portrait area.

Fig. 12 is a schematic structural diagram of an image stabilization processing apparatus corresponding to the image stabilization processing method, and it can be understood that specific technical details in the image stabilization processing method are also applicable to the apparatus, and are not described herein again to avoid repetition.

As shown in fig. 12, the image stabilization processing apparatus includes:

an obtaining module 1201, configured to obtain motion information of a terminal shooting device and an image to be processed; the image to be processed comprises at least one target area containing a target; determining a background area outside the target area in the image to be processed according to the target area;

a calculating module 1202, configured to calculate a transformation matrix corresponding to each pixel in the image to be processed; the transformation matrix is determined based on the motion information and the camera internal parameters; the transformation matrix establishes a mapping relation between the positions of pixels of the image to be processed and the image after image stabilization processing, and represents the rotation transformation and/or translation transformation of the pixels;

a processing module 1203, configured to perform first processing on each pixel of the target area, and perform second processing on each pixel of the background area; the first processing includes weakening processing performed on a rotational transformation and/or a translational transformation of each pixel of the target region; the second treatment is different from the first treatment;

an output module 1204, configured to output the image after image stabilization processing based on the transformation matrix corresponding to each pixel of the image after the first processing and the second processing.

Fig. 13 shows a schematic structural diagram of a terminal corresponding to the terminal shooting method, and it can be understood that specific technical details in the terminal message processing method are also applicable to the apparatus, and are not described herein again to avoid repetition.

Specifically, as shown in fig. 13, the terminal includes:

the image acquisition unit 1301 is used for acquiring an image to be processed by a terminal;

an image processing unit 1302, configured to process the image to be processed by using any one of the methods in claims 1 to 13 to obtain a target image.

Referring now to FIG. 14, shown is a block diagram of a system 1400 in accordance with one embodiment of the present application. Fig. 14 schematically illustrates an example system 1400 in accordance with various embodiments. In one embodiment, system 1400 may include one or more processors 1404, system control logic 1408 coupled to at least one of processors 1404, system memory 1414 coupled to system control logic 1408, non-volatile memory (NVM)1415 coupled to system control logic 1408, and a network interface 1420 coupled to system control logic 1408.

In some embodiments, processor 1404 may include one or more single-core or multi-core processors. In some embodiments, processor 1404 may include any combination of general-purpose processors and dedicated processors (e.g., graphics processors, application processors, baseband processors, etc.). In embodiments where system 1400 employs an eNB (enhanced Node B) 101 or RAN (Radio Access Network) controller 102, processor 1404 may be configured to perform various consistent embodiments.

In some embodiments, system control logic 1408 may include any suitable interface controllers to provide any suitable interface to at least one of processors 1404 and/or to any suitable device or component in communication with system control logic 1408.

In some embodiments, system control logic 1408 may include one or more memory controllers to provide an interface to system memory 1414. System memory 1414 may be used for loading and storing data and/or instructions. The memory 1414 of the system 1400 may in some embodiments include any suitable volatile memory, such as suitable Dynamic Random Access Memory (DRAM).

NVM/memory 1415 may include one or more tangible, non-transitory computer-readable media for storing data and/or instructions. In some embodiments, the NVM/memory 1415 may include any suitable non-volatile memory such as flash memory and/or any suitable non-volatile storage device such as at least one of a HDD (Hard Disk Drive), CD (Compact Disc) Drive, DVD (Digital Versatile Disc) Drive.

The NVM/memory 1415 may comprise a portion of the storage resources on the device on which the system 1400 is installed, or it may be accessible by, but not necessarily a part of, the device. For example, the NVM/storage 1415 may be accessed over a network via the network interface 1420.

In particular, system memory 1414 and NVM/storage 1415 may each include: a temporary copy and a permanent copy of instructions 1424.

Network interface 1420 may include a transceiver to provide a radio interface for system 1400 to communicate with any other suitable device (e.g., front end module, antenna, etc.) over one or more networks. In some embodiments, network interface 1420 may be integrated with other components of system 1400. For example, network interface 1420 may be integrated with at least one of processor 1404, system memory 1414, NVM/storage 1415, and a firmware device (not shown) having instructions that, when executed by at least one of processors 1404, system 1400 implements the image processing methods described in the embodiments above in connection with fig. 9-10.

Network interface 1420 may further include any suitable hardware and/or firmware to provide a multiple-input multiple-output radio interface. For example, network interface 1420 may be a network adapter, a wireless network adapter, a telephone modem, and/or a wireless modem.

In one embodiment, at least one of processors 1404 may be packaged together with logic for one or more controllers of system control logic 1408 to form a System In Package (SiP). In one embodiment, at least one of processors 1404 may be integrated on the same die with logic for one or more controllers of system control logic 1408 to form a system on a chip (SoC).

The system 1400 may further include: input/output (I/O) devices 1432. The I/O device 1432 may include a user interface to enable a user to interact with the system 1400; the design of the peripheral component interface enables peripheral components to also interact with the system 1400. In some embodiments, the system 1400 further includes sensors for determining at least one of environmental conditions and location information associated with the system 1400.

In some embodiments, the user interface may include, but is not limited to, a display (e.g., a liquid crystal display, a touch screen display, etc.), a speaker, a microphone, one or more cameras (e.g., still image cameras and/or video cameras), a flashlight (e.g., a light emitting diode flash), and a keyboard.

In some embodiments, the peripheral component interfaces may include, but are not limited to, a non-volatile memory port, an audio jack, and a power interface.

In some embodiments, the sensors may include, but are not limited to, a gyroscope sensor, an accelerometer, a proximity sensor, an ambient light sensor, and a positioning unit. The positioning unit may also be part of the network interface 1420 or interact with the network interface 1420 to communicate with components of a positioning network, such as Global Positioning System (GPS) satellites.

Fig. 15 shows a block diagram of a SoC (System on Chip) 1500 according to an embodiment of the present application. In fig. 15, like parts have the same reference numerals. In addition, the dashed box is an optional feature of more advanced socs. In fig. 15, SoC 1500 includes: an interconnect unit 1550 coupled to the application processor 1515; a system agent unit 1570; a bus controller unit 1580; an integrated memory controller unit 1540; a set or one or more coprocessors 1520 which may include integrated graphics logic, an image processor, an audio processor, and a video processor; an Static Random Access Memory (SRAM) unit 1530; a Direct Memory Access (DMA) unit 1560. In one embodiment, the coprocessor 1520 comprises a special-purpose processor, such as, for example, a network or communication processor, compression engine, GPGPU, a high-throughput MIC processor, embedded processor, or the like.

Embodiments of the mechanisms disclosed herein may be implemented in hardware, software, firmware, or a combination of these implementations. Embodiments of the application may be implemented as computer programs or program code executing on programmable systems comprising at least one processor, a storage system (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device.

Program code may be applied to input instructions to perform the functions described herein and generate output information. The output information may be applied to one or more output devices in a known manner. For purposes of this application, a processing system includes any system having a processor such as, for example, a Digital Signal Processor (DSP), a microcontroller, an Application Specific Integrated Circuit (ASIC), or a microprocessor.

The program code may be implemented in a high level procedural or object oriented programming language to communicate with a processing system. The program code can also be implemented in assembly or machine language, if desired. Indeed, the mechanisms described in this application are not limited in scope to any particular programming language. In any case, the language may be a compiled or interpreted language.

In some cases, the disclosed embodiments may be implemented in hardware, firmware, software, or any combination thereof. The disclosed embodiments may also be implemented as instructions carried by or stored on one or more transitory or non-transitory machine-readable (e.g., computer-readable) storage media, which may be read and executed by one or more processors. For example, the instructions may be distributed via a network or via other computer readable media. Thus, a machine-readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer), including, but not limited to, floppy diskettes, optical disks, read-only memories (CD-ROMs), magneto-optical disks, read-only memories (ROMs), Random Access Memories (RAMs), erasable programmable read-only memories (EPROMs), electrically erasable programmable read-only memories (EEPROMs), magnetic or optical cards, flash memory, or a tangible machine-readable memory for transmitting information (e.g., carrier waves, infrared digital signals, etc.) using the internet in an electrical, optical, acoustical or other form of propagated signal. Thus, a machine-readable medium includes any type of machine-readable medium suitable for storing or transmitting electronic instructions or information in a form readable by a machine (e.g., a computer).

In the drawings, some features of the structures or methods may be shown in a particular arrangement and/or order. However, it is to be understood that such specific arrangement and/or ordering may not be required. Rather, in some embodiments, the features may be arranged in a manner and/or order different from that shown in the illustrative figures. In addition, the inclusion of a structural or methodical feature in a particular figure is not meant to imply that such feature is required in all embodiments, and in some embodiments, may not be included or may be combined with other features.

It should be noted that, in the embodiments of the apparatuses in the present application, each unit/module is a logical unit/module, and physically, one logical unit/module may be one physical unit/module, or may be a part of one physical unit/module, and may also be implemented by a combination of multiple physical units/modules, where the physical implementation manner of the logical unit/module itself is not the most important, and the combination of the functions implemented by the logical unit/module is the key to solve the technical problem provided by the present application. Furthermore, in order to highlight the innovative part of the present application, the above-mentioned device embodiments of the present application do not introduce units/modules which are not so closely related to solve the technical problems presented in the present application, which does not indicate that no other units/modules exist in the above-mentioned device embodiments.

It is noted that, in the examples and descriptions of this patent, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the use of the verb "comprise a" to define an element does not exclude the presence of another, same element in a process, method, article, or apparatus that comprises the element.

While the present application has been shown and described with reference to certain preferred embodiments thereof, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present application.

Claims

1. An image stabilization processing method is used for terminal shooting equipment, and is characterized by comprising the following steps:

acquiring motion information and an image to be processed of the terminal shooting equipment; the image to be processed comprises at least one target area containing a target; determining a background area outside the target area in the image to be processed according to the target area;

calculating a transformation matrix corresponding to each pixel in the image to be processed; the transformation matrix is determined based on the motion information and camera internal parameters; the transformation matrix establishes a mapping relation between the positions of pixels of the image to be processed and the image after image stabilization processing, and represents the rotation transformation and/or translation transformation of the pixels;

performing first processing on each pixel of the target area, and performing second processing on each pixel of the background area, wherein the first processing comprises weakening processing performed on rotation transformation and/or translation transformation of each pixel of the target area, and the second processing is different from the first processing;

and outputting the image after image stabilization processing based on the transformation matrix corresponding to each pixel of the image after the first processing and the second processing.

2. The method according to claim 1, characterized in that a motion vector corresponding to each pixel of the image is determined based on the motion information, and a transformation matrix corresponding to each pixel of the image is calculated from the motion vector, wherein the transformation matrix comprises a rotation matrix and/or a translation matrix.

3. The method of claim 2, comprising:

the first processing includes: and determining a rotation matrix and/or a translation matrix corresponding to each pixel of the target area based on the motion information, and performing degradation processing on the rotation matrix and/or the translation matrix corresponding to each pixel of the target area.

4. The method of claim 3, wherein the degradation process comprises: reducing the rotation vector corresponding to each pixel of the target area, and calculating a degraded rotation matrix, or performing interpolation calculation between the rotation matrix corresponding to each pixel of the target area and the unit matrix to obtain the degraded rotation matrix; and/or reducing translation vectors corresponding to the pixels in the target area, and calculating a degraded translation matrix, or performing interpolation calculation between the translation matrix corresponding to the pixels in the target area and a zero matrix to obtain the degraded translation matrix.

5. The method of claim 1, comprising:

the second processing is performed on each pixel of the background area, and includes: and performing image coordinate transformation and pixel value interpolation on each pixel in the background area to finish image stabilization and rolling shutter correction.

6. The method of claim 1, comprising:

and determining a rotation matrix and/or a translation matrix corresponding to each pixel of the target area based on the motion information, performing degradation processing on the rotation matrix and/or the translation matrix corresponding to each pixel of the target area, and performing image coordinate transformation and pixel value interpolation on each pixel of the image subjected to the degradation processing to complete image stabilization and rolling shutter correction.

7. The method of claim 1, comprising:

8. The method of claim 7, comprising:

and calculating a transformation matrix corresponding to the image pixel in the buffer area by combining the transformation matrix corresponding to each pixel of the target area and the transformation matrix corresponding to each pixel of the background area through interpolation, wherein the transformation matrix comprises a rotation matrix and/or a translation matrix.

9. The method according to claim 1, wherein the image to be processed is divided into N × M grids, a transformation matrix corresponding to the coordinates of the grid points is calculated, and for each pixel in a square grid region surrounded by four adjacent grid points, interpolation calculation is performed according to the coordinate position relationship between the pixel and the adjacent grid point and the transformation matrix corresponding to the adjacent grid point, so as to obtain the transformation matrix corresponding to each pixel.

10. The method of claim 9, comprising:

and when the proportion of the target area to the image to be processed is smaller than a preset threshold value, the proportion of the set buffer area to the image to be processed is smaller than a preset proportion threshold value.

11. The method of claim 1, comprising:

identifying a target of the image to be processed by a convolutional neural grid;

and determining the target area according to the target.

12. The method of claim 2, comprising:

the motion vector is subjected to smoothing processing;

the smoothing process comprises the steps of passing through a neural network, Gaussian filtering, an IIR filter, a PID regulator and L ₁ Optimization, L ₂ At least one method in the optimization performs smoothing.

13. A terminal shooting method is characterized by comprising the following steps:

the terminal collects an image to be processed;

processing the image to be processed by any one of the methods of claims 1-12 to obtain an image-stabilized image.

14. A machine-readable medium having stored thereon instructions which, when executed on a machine, cause the machine to perform the method of any one of claims 1 to 13.

15. A system, comprising:

a memory for storing instructions for execution by one or more processors of the system, an

A processor, being one of the processors of the system, for performing the method of any one of claims 1 to 13.