CN109285122B

CN109285122B - Method and equipment for processing image

Info

Publication number: CN109285122B
Application number: CN201710597224.9A
Authority: CN
Inventors: 蒋静远
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2017-07-20
Filing date: 2017-07-20
Publication date: 2022-09-27
Anticipated expiration: 2037-07-20
Also published as: CN109285122A

Abstract

The embodiment of the application relates to a method and equipment for processing an image, which are used for solving the problems that in the prior art, a shake eliminating mode can be interfered by objects with different depths of field in a video, and the shake eliminating effect is reduced. According to the depth of field difference values of at least one pair of adjacent image feature points, selecting a target feature point from the at least one pair of adjacent image feature points; determining the position deviation information of the current frame according to the position of the selected target feature point in the current frame and the position of the selected target feature point in a reference frame; and correcting the current frame according to the position deviation information of the current frame.

Description

Method and equipment for processing image

Technical Field

The present application relates to the field of video processing technologies, and in particular, to a method and an apparatus for image processing.

Background

With the popularization of handheld devices with cameras, more and more users shoot videos through the handheld devices. For example, in a trading scene, the Digital Interactive Visual Augmentation (DIVA) is a technology that sellers can shoot commodities through handheld devices and upload the commodities to the internet for buyers to watch.

At present, when a user shoots an image through a handheld device, since the handheld device used by the user is not a professional device, such as a mobile phone, a home camera, and the like, and the user is not a professional cameraman, some jittered pictures are often included in a video when shooting. The shaking causes the shot commodities to irregularly shake up and down in the display area during video playing, and greatly influences the video playing quality. Meanwhile, the original video is processed by a series of technical means such as frame extraction, coding and the like by being limited by the existing software and hardware technology, so that the performance of jitter at a playing end is further enhanced.

One conventional method for removing jitter is to estimate a motion trajectory of a lens with respect to a photographed object based on feature point extraction of an image, and remove jitter components in the trajectory, thereby achieving the purpose of stably outputting a video image.

However, in this way, the feature points are extracted in units of the whole video frame image, and since the extracted feature points are often distributed on a plurality of objects with different depths of field, and the imaging influence of the lens shake on the objects with different depths of field is different, the accuracy of the estimation of the lens motion trajectory is disturbed, thereby affecting the quality of the final video.

The existing image after the video jitter is eliminated can be corrected by translation, rotation and the like, so that an area which is not in the original video can appear in the video after the video jitter is eliminated, and the area is an invalid black screen area during displaying, thereby greatly influencing the displaying effect of the image after the video jitter is eliminated.

In summary, the current method for eliminating jitter is interfered by objects with different depths of field in the video, and the jitter elimination effect is reduced.

Disclosure of Invention

The application provides a method and equipment for processing an image, which are used for solving the problems that in the prior art, a shake eliminating mode can be interfered by objects with different depths of field in a video, and the shake eliminating effect is reduced.

The embodiment of the application provides a method for processing an image, which comprises the following steps:

determining at least one pair of adjacent image feature points of a current frame in a video and a depth difference value thereof;

selecting a target feature point from at least one pair of adjacent image feature points according to the depth difference value of the at least one pair of adjacent image feature points;

determining the position deviation information of the current frame according to the position of the selected target feature point in the current frame and the position of the selected target feature point in a reference frame;

and correcting the current frame according to the position deviation information of the current frame.

According to the depth difference value of at least one pair of adjacent image feature points, selecting a target feature point from the at least one pair of adjacent image feature points; determining the position deviation information of the current frame according to the position of the selected target feature point in the current frame and the position of the selected target feature point in a reference frame; and correcting the current frame according to the position deviation information of the current frame. According to the method and the device, the target image characteristic points can be selected according to the depth of field difference values of at least one pair of adjacent image characteristic points, the position deviation information of the current frame is determined according to the positions of the selected target image characteristic points in the current frame and the position of the reference frame, the current frame is corrected according to the position deviation information of the current frame, the influence of interference of objects with different depths of field in the video is reduced, and therefore the jitter elimination effect and the quality of the final video are improved.

An embodiment of the present application provides an apparatus for performing image processing, including:

the depth-of-field determining module is used for determining at least one pair of adjacent image feature points of the current frame in the video and the depth-of-field difference value of the image feature points;

the feature point selection module is used for selecting a target feature point from at least one pair of adjacent image feature points according to the depth difference value of the determined at least one pair of adjacent image feature points;

the information determining module is used for determining the position deviation information of the current frame according to the position of the selected target feature point in the current frame and the position of the selected target feature point in a reference frame;

and the correction module is used for correcting the current frame according to the first affine transformation matrix of the current frame and the stabilized video motion matrix.

The application also provides an image processing method, which is used for solving the problem that the display effect of the image with the jitter removed is greatly influenced due to the fact that the image with the jitter removed has an invalid black screen area in the prior art.

Determining an invalid content overlapping area according to the invalid content area in the current frame after correction processing;

determining a target sub-region from a region except the invalid overlapping region, wherein the target sub-region is a sub-region with the largest area in all sub-regions with the aspect ratio which is the same as that of the current frame in the region except the invalid overlapping region;

and cutting the current frame after the correction treatment according to the target subarea.

The present application also provides an apparatus for performing image processing, the apparatus comprising:

the overlapping area determining module is used for determining an invalid content overlapping area according to the invalid content area in the current frame after the correction processing;

a sub-region determining module, configured to determine a target sub-region from a region other than the invalid overlapping region, where the target sub-region is a sub-region with a largest area in all sub-regions with an aspect ratio that is the same as that of the current frame in the region other than the invalid overlapping region;

and the cutting module is used for cutting the current frame after the correction processing according to the target sub-region.

The method comprises the steps of determining a target sub-region from a region except an invalid overlapping region, wherein the target sub-region is the sub-region with the largest area in all sub-regions with the length-width ratios which are the same as the length-width ratio of the target frame in the region except the invalid overlapping region, and cutting the target frame after correction processing according to the target sub-region. The target sub-region determined by the embodiment of the application is the sub-region with the largest area in all the sub-regions with the length-width ratios which are the same as the length-width ratio of the target frame in the region except the invalid overlapping region, so that the invalid black screen region in the image with jitter elimination can be cut, and the influence of the invalid black screen region on the display effect of the image with jitter elimination is avoided.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings may be obtained according to these drawings without inventive labor.

FIG. 1 is a schematic flow chart illustrating a method for image processing according to an embodiment of the present disclosure;

fig. 2 is a schematic flowchart of a method for grouping target feature points according to an embodiment of the present application;

FIG. 3 is a flowchart illustrating a method for determining invalid overlap regions according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram illustrating the trimming performed according to the embodiment of the present application;

FIG. 5 is a flowchart illustrating a method for determining a target sub-region according to an embodiment of the present application;

FIG. 6 is a diagram illustrating a complete process of processing an image according to an embodiment of the present application;

FIG. 7 is a schematic structural diagram of an apparatus for processing an image according to an embodiment of the present disclosure;

FIG. 8 is a flowchart illustrating a method for cropping an image according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of an apparatus for cropping an image according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the accompanying drawings, and it should be understood that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.

As shown in fig. 1, a method for performing image processing according to an embodiment of the present application includes:

step 100, determining at least one pair of adjacent image feature points and depth difference values thereof of a current frame in a video;

101, selecting a target feature point from at least one pair of adjacent image feature points according to the depth difference value of the determined at least one pair of adjacent image feature points;

step 102, determining position deviation information of the current frame according to the position of the selected target feature point in the current frame and the position of the selected target feature point in a reference frame;

and 103, correcting the current frame according to the position deviation information of the current frame.

According to the depth of field difference values of at least one pair of adjacent image feature points, selecting a target feature point from the at least one pair of adjacent image feature points; determining the position deviation information of the current frame according to the position of the selected target feature point in the current frame and the position of the selected target feature point in a reference frame; and correcting the current frame according to the position deviation information of the current frame. According to the method and the device, the target image characteristic points can be selected according to the depth of field difference values of at least one pair of adjacent image characteristic points, the position deviation information of the current frame is determined according to the positions of the selected target image characteristic points in the current frame and the position of the reference frame, the current frame is corrected according to the position deviation information of the current frame, the influence of interference of objects with different depths of field in the video is reduced, and therefore the jitter elimination effect and the quality of the final video are improved.

The current frame in the embodiment of the present application is a video frame undergoing a correction process.

In implementation, all video frames in the video can be subjected to rectification processing; the first frame or the last frame can be removed, and other video frames can be corrected.

Optionally, when determining at least one pair of adjacent image feature points of a current frame in a video:

determining at least two image characteristic points of a current frame according to corner point information of the current frame in a video;

at least one pair of adjacent image feature points is determined from the at least two image feature points.

The image feature points are points which have clear characteristics in the image and can effectively reflect the essential features of the image and identify target objects in the image. There are many ways to determine image feature points in a video frame, for example, Harris corner information of the video frame may be extracted, and the image feature points may be determined according to the Harris corner information.

Optionally, in order to prevent the extracted corner information from being dense at a rich texture position, the minimum interval between corners may be set to 1/N of the image width, where N may be set as required, for example, 50.

After determining the image feature points of the current frame, the image feature points may be paired, that is, adjacent image feature points are used as a pair, and if one image feature point is adjacent to a plurality of image feature points, one image feature point may be paired with the plurality of image feature points.

For example, if the image feature point a is adjacent to the image feature point b, and the image feature point b is adjacent to the image feature point c, the image feature point b and the image feature point a form a pair, and the image feature point b and the image feature point c form a pair.

When determining which image feature points need to form a pair, triangulation (using a Delaunay triangulation algorithm, that is, two feature points connected by each edge after triangulation are both used as adjacent image feature points) can be performed on the image feature points of the current frame, and the adjacent relationship between the image feature points is established.

Optionally, when determining the depth-of-field difference value of at least one pair of adjacent image feature points of the current frame in the video, the depth-of-field difference value of the at least one pair of adjacent image feature points is determined according to the coordinates of the image feature points in the direction.

In practice, there are many ways to determine the depth information difference value of each pair of adjacent image feature points, and one of them is as follows:

there are two adjacent points p1, p2 on a frame, corresponding to two corresponding points p3, p4 in adjacent frames. And calculating whether the difference value between p1 and p3 is similar to the difference value between p2 and p4, such as whether the difference value is within a set range, and if the difference value is similar, considering that the two points p1 and p2 are on the same depth of field.

Calculating the difference value between the two points can be defined by equation 1:

equation 1: image feature point _c And its corresponding image feature point tracked in the next frame _c+1 Respectively calculating the absolute value of the coordinate difference of the two points in the x-axis direction and the y-axis direction to obtain point _c And point _c+1 The difference value of (a). a and b are the magnitude of the change in the two axes.

After the difference value of the adjacent points is obtained through the first formula, the depth of field difference value of the two adjacent feature points can be obtained through the second formula:

equation 2 difference value: with adjacent image feature points _c And point' _c And respectively calculating difference values between two frames of the two feature points by using a formula I, and then calculating the difference between the two difference values in a and b to be used as the depth difference values of the two feature points.

Optionally, when the target feature point is selected from the at least one pair of adjacent image feature points according to the determined depth difference value of the at least one pair of adjacent image feature points:

selecting at least one pair of adjacent target image feature points with matched depth of field difference values from the at least one pair of adjacent image feature points;

grouping the selected target image feature points according to the depth of field information of the target image feature points to obtain a feature point group;

and selecting a characteristic point group meeting the screening condition from the obtained characteristic point groups, and selecting the target image characteristic point from the selected characteristic point group.

When at least one pair of adjacent target image feature points with a depth difference value matching is selected from the at least one pair of adjacent image feature points, the setting may be performed as needed, for example, a pair of image feature points with a difference degree of two dimensions (i.e., an X axis and a Y axis) smaller than a set threshold may be set as a pair of matched image feature points (i.e., a pair of image feature points with a similar depth). The threshold value may be determined according to a specific application scenario, and may be set to 1, for example.

After at least one pair of matched target image feature points is obtained, the target feature points can be grouped according to the depth information of the target image feature points.

For example, the target feature points may be grouped by using a breadth first search algorithm. See fig. 2 in particular.

As shown in fig. 2, the method for grouping target feature points in the embodiment of the present application includes:

and 200, forming a target image feature point set by the matched at least one pair of target image feature points.

Step 201, selecting an ungrouped target image feature point from the target image feature point set.

And 202, forming a new characteristic point group by the selected target image characteristic points which are not grouped.

And 203, accessing the target image feature points which are adjacent to the selected target image feature points which are not grouped and have depth-of-field difference values smaller than the set value into a new feature point group.

Step 204, judging whether the target image feature point set has non-grouped target image feature points, if yes, returning to step 201; otherwise, the process is skipped.

After the grouping is completed, a feature point group meeting the screening condition is selected from all feature point groups.

Optionally, the screening conditions herein include, but are not limited to, some or all of the following conditions:

and (3) screening the condition 1, wherein the number of the characteristic points in the characteristic point group is greater than a preset characteristic point threshold value.

The feature point threshold value here may be set according to an application scenario, a requirement, and the like.

Here, if there are a plurality of feature point groups whose number is greater than the preset feature point threshold, one of the feature point groups may be randomly selected.

And 2, selecting a feature point group with the maximum number of feature points of the target image if the current frame is a set frame of the video.

Here, if there are a plurality of feature point groups with the largest number, one of the feature point groups can be randomly selected.

In implementation, the screening condition 2 can be used when the current frame is a setting frame of the video, so that a better effect is achieved.

The setting frame here may be the first frame or the last frame.

If the current frame is the first frame, the adjacent frame of the current frame is positioned before the current frame;

if the frame is the last frame, the adjacent frame of the current frame is positioned after the current frame.

And (3) selecting a feature point group with the largest number of specified target image feature points, wherein the specified target image feature points are positioned in the convex polygon with the smallest area in the convex polygons containing the image feature points of the adjacent frames.

The relative number here is the number of target image feature points within the convex polygon divided by the number of target image feature points in the set of feature points.

In implementation, the filtering condition 3 may be used when the current frame is not the setting frame of the video, so as to achieve a better effect.

The setting frame here may be the first frame or the last frame.

and if the frame is the last frame, the adjacent frame of the current frame is positioned behind the current frame.

For the filtering condition 3, it is necessary to determine the position of each target feature point of the current frame in the adjacent frame. The adjacent frame may be located before the current frame or located after the current frame.

If the adjacent frame is located in front of the current frame, the first frame may not be corrected;

if the adjacent frame is located behind the current frame, the last frame may not be corrected.

In determining the position of each target feature point of the current frame in the adjacent frame, the position of each target feature point of the current frame in the adjacent frame may be determined using the Lucas-Kanade sparse optical flow algorithm.

After the position of each target feature point of the current frame in the adjacent frame is determined, a convex polygon composed of image feature points of the adjacent frame can be determined. Here, each image feature point of the adjacent frame is a point at which the position of each feature point of the current frame in the adjacent frame is located.

After obtaining a plurality of convex polygons, a convex polygon with the smallest area can be selected. Then, the convex polygon is mapped to the corresponding position of the current frame, a plurality of image feature points in each feature point group are determined to be in the convex polygon, and the feature point group with the largest relative number (such as 3/10>25/100) is selected.

Optionally, when determining the position deviation information of the current frame according to the position of the selected target feature point in the current frame and the position of the selected target feature point in the reference frame:

determining relative position deviation information of the current frame according to the position of the selected target image feature point in the current frame and the position of the selected target image feature point in an adjacent frame of the current frame;

and determining the position deviation information of the current frame according to the relative position deviation information of the current frame and the relative position deviation information of at least one other frame between the current frame and the reference frame.

In implementation, here, the positional deviation information of the current frame may be represented by a first affine transformation matrix, and the relative positional deviation information of the current frame may be represented by a second affine transformation matrix.

Specifically, the specific method for determining the second affine transformation matrix of the current frame includes:

it is known that the affine transformation matrix M for the point (x, y) to the point (x ', y') is as follows

In this case, e ═ cos (a), b ═ sin (a), c ═ x, d ═ y. a is the rotation angle, and x and y are the translation amplitude of the x-axis and the y-axis.

If the point tracked by the feature point in the adjacent frame is (x ", y"), the variation error is defined as:

(|x”-x'|+|y”-y'|) ²

the second affine transformation matrix M is estimated such that the mean value of the transformation errors is minimized for all image feature points. The optimization problem can use a least square method, a random gradient descent method and other optimization methods.

Wherein, when determining the position deviation information of the current frame according to the relative position deviation information of the current frame and the relative position deviation information of at least one other frame between the current frame and the reference frame:

and determining a first affine transformation matrix of the current frame relative to a reference frame in the video according to the second affine transformation matrix of the current frame and second affine transformation matrices of other current frames between the current frame and the reference frame.

In implementation, the second affine transformation matrix of the current frame may be determined by using a minimum quadratic error optimization method:

wherein the content of the first and second substances,

it is the second affine transformation matrices dx and dy which are the translation amplitudes of the image on the horizontal and vertical axes and da which is the rotation angle of the image around its center.

After the second affine transformation matrix of the current frame is obtained, the second affine transformation matrices of the current frame and all frames between the current frame and the set frame are added to obtain a first affine transformation matrix of the current frame relative to a reference frame in the video.

The setting frame here may be the first frame or the last frame.

If the current frame is the first frame, the adjacent frame of the current frame is positioned in front of the current frame, namely the whole video is played in the forward sequence, and the jitter is eliminated. (ii) a

If the frame is the last frame, the adjacent frame of the current frame is positioned after the current frame. The method is equivalent to that the whole video is played backwards, and jitter elimination is carried out.

The first affine transformation matrix may specifically be referred to by the following formula:

where a is the rotation angle, x is the translation coordinate on the horizontal axis, and y is a function of the translation coordinate y on the vertical axis.

Optionally, when the current frame is corrected according to the position deviation information of the current frame:

determining the position of a video window corresponding to the current frame;

determining the position deviation information of the stabilized current frame according to the position deviation information of the current frame and the position deviation information of at least one other frame in the position of the video window;

and correcting the current frame according to the position deviation information after the current frame is determined to be stable.

The stabilized position deviation information can be represented by a stabilized video motion matrix.

Specifically, after the first affine transformation matrix of each current frame is obtained, low-pass filtering is performed on the first affine transformation matrix of the current frame, so as to obtain a video motion matrix after the current frame is stabilized.

Optionally, for any current frame, determining a position of a video window corresponding to the current frame;

and determining the stabilized video motion matrix of the current frame according to the first affine transformation matrix of the current frame and the first affine transformation matrices in other current frames in the video window position.

Optionally, the video window in the embodiment of the present application is a symmetric video window, that is, the current frame is located in the center of the corresponding video window.

Optionally, other current frames in the video window are located at two sides of the center and symmetrically distributed.

For example, the center of the video window position corresponding to the current frame a is the current frame a, and other current frames are symmetrically distributed on both sides.

In an implementation, the length of the video window corresponding to each current frame is related to the position of the current frame in the video.

In order to make the length of the video window as large as possible, if there are a plurality of the at least one other frame, the number of the other frames located at the center side is the minimum value between the number of the current frame from the first frame in the video and the number of the other frames from the last frame in the video.

For example, the current frame is located in the corresponding video window, and the number of other current frames on one side of the symmetric distribution is the minimum value of the number of frames of the current frame from the first frame and the number of frames from the last frame.

For example, the current video has 5 frames in total, the length of the video window corresponding to the 1 st frame is 1, and only the 1 st frame is included; the length of a video window corresponding to the 2 nd frame is 3, and the video window comprises the 1 st frame, the 2 nd frame and the 3 rd frame; the length of a video window corresponding to the 3 rd frame is 5, and the video window comprises the 1 st frame, the 2 nd frame, the 3 rd frame, the 4 th frame and the 5 th frame; the length of a video window corresponding to the 4 th frame is 3, and the video window comprises the 3 rd frame, the 4 th frame and the 5 th frame; the length of the video window corresponding to the 5 th frame is 1, and only the 5 th frame is included.

Therefore, the length of the video window corresponding to each current frame can be ensured to be maximum.

After the position of the video window is determined, averaging the first affine transformation matrix of the current frame and the first affine transformation matrices in other current frames in the position of the video window to obtain a video motion matrix after the current frame is stabilized.

For example, 5 frames are counted in the current video, the length of the video window corresponding to the 2 nd frame is 3, and the video window includes the 1 st frame, the 2 nd frame and the 3 rd frame, and then the video motion matrix after the 2 nd frame is stabilized is obtained by averaging the first affine transformation matrices of the 1 st frame, the 2 nd frame and the 3 rd frame.

See in particular the following formula:

wherein the content of the first and second substances,

representing the stabilized video motion matrix and window _ size representing the number of current frames in the video window.

In practice, to ensure the quality of the average value found, the first M frames at the beginning and the last M frames at the end of the video may be discarded. M may be set as desired, such as 5.

In implementation, a matrix difference between the first affine transformation matrix of the current frame and the stabilized video motion matrix may be determined, and the current frame may be subjected to a correction process according to the matrix difference.

See in particular the following formula:

wherein, (x, y, a) _ajdust The matrix difference of the first affine transformation matrix and the stabilized video motion matrix for the current frame.

In order to improve the quality of an output video, after the current frame is corrected, the embodiment of the application further provides a scheme for clipping the corrected current frame.

Specifically, determining an invalid content overlapping area according to an invalid content area in the current frame after correction processing;

determining a target subregion from a region except the invalid overlapping region, wherein the target subregion is a subregion with the largest area in all subregions with the same aspect ratio as the aspect ratio of the current frame in the region except the invalid overlapping region;

and cutting the current frame after the correction according to the target subarea.

The invalid content overlapping area is an overlapping area of the invalid content areas of all the current frames, that is, the invalid content overlapping area includes the invalid content areas of all the current frames, and is an intersection of the invalid content areas of all the current frames.

The invalid content overlap region may be represented by four matrices as follows:

left＝(y ₀ ,y ₁ ,y ₂ ,.....y _m-1 )

right＝(y ₀ ,y ₁ ,y ₂ ,.....y _m-1 )

top＝(x ₀ ,x ₁ ,x ₂ ,.....x _n-1 )

bottom＝(x ₀ ,x ₁ ,x ₂ ,.....x _n-1 )

setting the upper left corner of the current frame as the origin of a coordinate system, and in a coordinate plane formed by pixel points, expressing the coordinates of the left most right row where the invalid pixel points are located by a left matrix; the right matrix represents the leftmost column coordinate on the right side of the invalid pixel point. the top matrix represents the line coordinate of the invalid pixel point at the top and the lowest; bottom represents the bottom-most row coordinate of the invalid pixel point. The specific flow is shown in FIG. 3.

As shown in fig. 3, a method for determining an invalid overlap region according to an embodiment of the present application includes:

step 300, extracting a video frame from the video according to the sequence.

Traversing all rows in the current video frame, and executing the following steps:

step 301, judging whether the ith row left (i) column of the current frame is an invalid pixel, if so, executing step 302; otherwise step 303 is performed.

Where left (i) denotes the column number of the first active pixel to the left of the ith row.

Step 302, increase left (i), and execute step 303.

The step size may be increased one step at a time, for example, the step size may be set to 1.

Step 303, determining whether the ith right (i) column of the current frame is an invalid pixel, if yes, executing step 304; otherwise, step 305 is performed.

Wherein right (i) indicates the column number of the first effective pixel at the right of the ith row.

Step 304, reduce right (i), and execute step 305.

The step size can be reduced one step at a time, for example the step size can be set to 1.

Traversing all columns in the current video frame, and executing the following steps:

step 305, judging whether the jth row and bottom (j) column of the current frame is an invalid pixel, if so, executing step 306; otherwise, go to step 307.

Where bottom (i) indicates the row number of the lowermost effective pixel in the ith column.

Step 306, reduce bottom (j), and execute step 307.

Each time, the step size can be decreased by one step size, for example, the step size can be set to 1.

Step 307, judging whether the jth top (j) column of the jth row of the current frame is an invalid pixel, if so, executing step 308; otherwise, step 309 is performed.

Wherein top (i) represents the row number of the uppermost effective pixel of the ith column.

Step 308, increase top (j), and execute step 309.

Step 309, judging whether the current frame is the last frame, if yes, jumping out of the process, otherwise, returning to step 300.

The aspect ratio of the target sub-region determined by the embodiment of the present application is the same as the aspect ratio of the current frame, and is the largest sub-region in the region except the invalid overlapping region.

Taking fig. 4 as an example, the area a in fig. 4 is an invalid overlap area; the B region is a region other than the invalid overlap region, and may be referred to as an effective region; the C region is the target sub-region.

The specific process of determining the target sub-region can be seen in fig. 5.

As shown in fig. 5, a method for determining a target sub-region in an embodiment of the present application includes:

step 500, traversing each column in the current frame of the invalid overlapping area in sequence.

Each traversal performs steps 501-508. The steps 501 to 504 and the steps 505 to 508 have no necessary timing relationship, and may be executed simultaneously or separately.

Step 501, enumerating a vertical edge parallel to a column in the current frame, wherein the current column is a jth column, end point row numbers of the vertical edge are top [ j ] and i, and top [ j ] < i < ═ bottom [ j ].

Step 502, for any enumerated vertical edge in step 501, determining that the length of the vertical edge is l-top [ j ] +1, and the width of the sub-region is w-l k, where k is the aspect ratio of the current frame.

Step 503, assuming that the vertical edge in step 501 is the left edge, if j + w-1 is not greater than right [ top [ j ] ], and j + w-1 is not greater than right [ i ], then the sub-region is the candidate target sub-region, the coordinates of the upper left corner of the sub-region are (top [ j ], j), and the coordinates of the lower right corner of the sub-region are (i, j + w-1).

Step 504, assuming that the vertical side in step 501 is the right side, j + w-1 is not less than left [ top [ j ] ], and j + w-1 is not less than left [ i ], the sub-region is the candidate target sub-region, the coordinates of the upper left corner of the sub-region are (top [ j ], j + w-1), and the coordinates of the lower right corner are (i, j).

Step 505, enumerating a vertical edge parallel to the column in the current frame, where the current column is the jth column, the endpoint row numbers of the vertical edge are i and bottom [ j ], and top [ j ] <i < bottom [ j ].

Step 506, for any enumerated vertical edge in step 505, determining that the length of the vertical edge is l-bottom [ j ] -i +1, and the width of the sub-region is w-l k, where k is the aspect ratio of the current frame.

And 507, assuming that the vertical edge in the step 505 is a left side edge, if j + w-1 is not greater than right [ bottom [ j ] ], and j + w-1 is not greater than left [ i ], the sub-region is the candidate target sub-region, the coordinates of the upper left corner of the sub-region are (i, j), and the coordinates of the lower right corner of the sub-region are (bottom [ j ], j + w-1).

And step 508, assuming that the vertical side in the step 505 is the right side, if j-w +1 is not less than left [ bottom [ j ] ], and j-w +1 is not less than left [ i ], the sub-region is the candidate target sub-region, the upper left corner coordinate of the sub-region is (i, j-w +1), and the lower right corner coordinate is (bottom [ j ], j).

And 509, selecting the region with the largest area from all the candidate target sub-regions as a target sub-region.

Optionally, if there are a plurality of sub-regions with the largest area in all the sub-regions with the same aspect ratio as the current frame in the region except the invalid overlapping region, the sub-region closest to the center of the current frame is used as the target sub-region.

Here, the center of the sub-region may be determined first, and the sub-region with the shortest distance between the center of the sub-region and the center of the current frame may be used as the sub-region closest to the center of the current frame.

If there are a plurality of sub-regions having the shortest distance between the center of the sub-region and the center of the current frame, one of the sub-regions may be randomly selected.

Optionally, after performing clipping processing on the current frame after performing the correction processing according to the target sub-region, the method further includes:

and amplifying the current frame subjected to cutting according to the size of the current frame, and outputting the current frame.

Because the current frame is cut, the current frame after being cut can be amplified according to the original size of the current frame, and the size of the output current frame can be ensured to be unchanged.

From the above, it can be seen that: the whole process of processing the image in the embodiment of the application comprises two parts: 1. carrying out correction treatment; 2. and (6) cutting.

As shown in fig. 6, in the schematic diagram of the whole process of processing an image according to the embodiment of the present application, the rectification processing includes: determining characteristic points, determining adjacent characteristic points, grouping the characteristic points, selecting the characteristic points, determining a first affine transformation matrix, low-pass filtering and correcting a video;

the cutting treatment comprises the following steps: determining invalid content overlapping regions, determining target sub-regions, and performing cropping.

Based on the same inventive concept, the embodiment of the present application further provides an apparatus for processing an image, and as the principle of the apparatus for solving the problem is similar to the method for processing an image in the embodiment of the present application, the implementation of the apparatus may refer to the implementation of the method, and repeated details are not repeated.

As shown in fig. 7, an apparatus for processing an image according to an embodiment of the present application includes:

a depth-of-field determining module 700, configured to determine at least one pair of adjacent image feature points of a current frame in a video and a depth-of-field difference value thereof;

a feature point selection module 701, configured to select a target feature point from at least one pair of adjacent image feature points according to a depth difference value of the at least one pair of adjacent image feature points;

an information determining module 702, configured to determine, according to the position of the selected target feature point in the current frame and the position of the selected target feature point in a reference frame, position deviation information of the current frame;

and a correcting module 703, configured to perform correction processing on the current frame according to the first affine transformation matrix of the current frame and the stabilized video motion matrix.

Optionally, the depth of field determining module 700 is specifically configured to:

and determining the depth difference value of at least one pair of adjacent image characteristic points according to the coordinates of the image characteristic points in the direction.

Optionally, the feature point selecting module 701 is specifically configured to:

grouping the selected target image feature points according to the depth-of-field information of the target image feature points to obtain a feature point group;

Optionally, the screening conditions include some or all of the following conditions:

the number of target image feature points in the feature point group is greater than a preset feature point threshold;

selecting a characteristic point group with the maximum number of characteristic points of the target image;

and selecting a characteristic point group with the maximum number of the appointed target image characteristic points, wherein the appointed target image characteristic points are positioned in the convex polygon with the minimum area in the convex polygons containing the image characteristic points of the adjacent frames.

Optionally, the information determining module 702 is specifically configured to:

determining relative position deviation information of the current frame according to the position of the selected target image characteristic point in the current frame and the position of the selected target image characteristic point in an adjacent frame of the current frame;

Optionally, the correcting module 703 is specifically configured to:

determining the position of a video window corresponding to the current frame;

Optionally, the current frame is located in the center of the corresponding video window.

Optionally, if there are a plurality of the at least one other frame, the number of the other frames located at the center side is the minimum value between the number of the current frame from the first frame in the video and the number of the other frames from the last frame in the video.

Optionally, the apparatus further comprises:

an overlap region determining module 704, configured to determine an invalid content overlap region according to the invalid content region in the current frame after the correction processing;

a sub-region determining module 705, configured to determine a target sub-region from a region other than the invalid overlapping region, where the target sub-region is a sub-region with a largest area among all sub-regions with aspect ratios that are the same as the aspect ratio of the current frame in the region other than the invalid overlapping region;

and a clipping module 706, configured to perform clipping processing on the current frame after the correction processing according to the target sub-region.

Optionally, the sub-region determining module 705 is specifically configured to:

and if the area of the sub-area with the largest area in all the sub-areas with the same aspect ratio as the current frame in the area except the invalid overlapping area is multiple, taking the sub-area closest to the center of the current frame as a target sub-area.

Optionally, the clipping module 706 is further configured to:

As shown in fig. 8, a method for cropping an image according to an embodiment of the present application includes:

step 800, determining an invalid content overlapping area according to the invalid content area in the current frame after correction processing;

step 801, determining a target sub-region from a region except the invalid overlapping region, wherein the target sub-region is a sub-region with the largest area in all sub-regions with the aspect ratio same as that of the current frame in the region except the invalid overlapping region;

and step 802, cutting the current frame after the correction according to the target subarea.

The method comprises the steps of determining a target sub-region from a region except an invalid overlapping region, wherein the target sub-region is a sub-region with the largest area in all sub-regions with the same length-width ratio as the current frame length-width ratio in the region except the invalid overlapping region, and performing cutting processing on the current frame after correction processing according to the target sub-region. The target sub-region determined by the embodiment of the application is the sub-region with the largest area in all the sub-regions with the length-width ratios which are the same as the length-width ratio of the current frame in the regions except the invalid overlapping region, so that the invalid black screen region in the image with jitter elimination can be cut, and the influence of the invalid black screen region on the display effect of the image with jitter elimination can be avoided.

Optionally, the determining a target sub-region from a region outside the invalid overlap region includes:

The invalid content overlapping region is an overlapping region of the invalid content regions of all the current frames, that is, the invalid content overlapping region includes the invalid content regions of all the current frames, and is an intersection of the invalid content regions of all the current frames.

left＝(y ₀ ,y ₁ ,y ₂ ,.....y _m-1 )

right＝(y ₀ ,y ₁ ,y ₂ ,.....y _m-1 )

top＝(x ₀ ,x ₁ ,x ₂ ,.....x _n-1 )

bottom＝(x ₀ ,x ₁ ,x ₂ ,.....x _n-1 )

setting the upper left corner of the current frame as the origin of a coordinate system, and in a coordinate plane formed by pixel points, expressing the coordinates of the left most right column where the invalid pixel points are located by a left matrix; the right matrix represents the leftmost column coordinate on the right side of the invalid pixel point. the top matrix represents the line coordinate of the invalid pixel point at the top and the lowest; bottom represents the line coordinate of the invalid pixel point at the bottom and the top. The specific flow is shown in FIG. 3.

Based on the same inventive concept, the embodiment of the present application further provides an apparatus for cropping an image, and as the principle of solving the problem of the apparatus is similar to the method for cropping an image in the embodiment of the present application, the implementation of the apparatus may refer to the implementation of the method, and repeated parts are not described again.

As shown in fig. 9, an apparatus for cropping an image according to an embodiment of the present application includes:

an overlap region determining module 900, configured to determine an invalid content overlap region according to the invalid content region in the current frame after the correction processing;

a sub-region determining module 901, configured to determine a target sub-region from a region other than the invalid overlapping region, where the target sub-region is a sub-region with a largest area in all sub-regions with the same aspect ratio as the current frame in the region other than the invalid overlapping region;

and a clipping module 902, configured to perform clipping processing on the current frame after the correction processing according to the target sub-region.

Optionally, the sub-region determining module 901 is specifically configured to:

Optionally, the clipping module 902 is further configured to:

The present application is described above with reference to block diagrams and/or flowchart illustrations of methods, apparatus (systems) and/or computer program products according to embodiments of the application. It will be understood that one block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, and/or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer and/or other programmable data processing apparatus, create means for implementing the functions/acts specified in the block diagrams and/or flowchart block or blocks.

Accordingly, the subject application may also be embodied in hardware and/or in software (including firmware, resident software, micro-code, etc.). Furthermore, the application may take the form of a computer program product on a computer-usable or computer-readable storage medium having computer-usable or computer-readable program code embodied in the medium for use by or in connection with an instruction execution system. In the context of this application, a computer-usable or computer-readable medium may be any medium that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A method of image processing, the method comprising:

determining at least one pair of adjacent image feature points of a current frame in a video and a depth difference value of the image feature points;

selecting a characteristic point group which meets the screening condition from the obtained characteristic point group, and selecting a target image characteristic point from the selected characteristic point group;

2. The method of claim 1, wherein determining at least one pair of adjacent image feature points for a current frame in the video comprises:

3. The method of claim 1, wherein determining depth difference values for at least one pair of adjacent image feature points of a current frame in the video comprises:

and determining the depth difference value of at least one pair of adjacent image feature points according to the coordinates of the image feature points in the direction.

4. The method of claim 1, wherein the screening conditions comprise some or all of the following conditions:

5. The method of claim 1, wherein determining the position deviation information of the current frame according to the position of the selected target feature point at the current frame and the position at the reference frame comprises:

determining relative position deviation information of the current frame according to the position of the selected target feature point in the current frame and the position of the selected target feature point in an adjacent frame of the current frame;

6. The method according to any one of claims 1 to 5, wherein the performing the correction process on the current frame according to the position deviation information of the current frame includes:

determining the position of a video window corresponding to the current frame;

7. The method of claim 6, wherein the current frame is centered in the corresponding video window.

8. The method of claim 6, wherein if the at least one other frame is plural, the number of the other frames located at the center side is a minimum value of a number of frames of the current frame from a first frame in the video and a number of frames from a last frame in the video.

9. The method of claim 1, wherein after performing the correction process on the current frame according to the position deviation information of the current frame, the method further comprises:

determining a target subregion from the regions except the invalid overlapping region, wherein the target subregion is a subregion with the largest area in all subregions with the same aspect ratio as the current frame in the regions except the invalid overlapping region;

10. The method of claim 9, wherein said determining a target sub-region from a region outside of a null overlap region comprises:

11. The method according to claim 9 or 10, wherein after performing the clipping process on the current frame after the correction process according to the target sub-region, the method further comprises:

12. An apparatus for performing image processing, the apparatus comprising:

the feature point selection module is used for selecting at least one pair of adjacent target image feature points with matched depth of field difference values from the at least one pair of adjacent image feature points; grouping the selected target image feature points according to the depth of field information of the target image feature points to obtain a feature point group; selecting a characteristic point group which meets the screening condition from the obtained characteristic point group, and selecting a target image characteristic point from the selected characteristic point group;

13. The device of claim 12, wherein the depth of field determination module is specifically configured to:

14. The device of claim 12, wherein the depth of field determination module is specifically configured to:

15. The apparatus of claim 12, wherein the screening conditions include some or all of the following conditions:

the number of the target image feature points in the feature point group is larger than a preset feature point threshold;

16. The device of claim 12, wherein the information determination module is specifically configured to:

17. The apparatus of any one of claims 12 to 16, wherein the orthotic module is specifically configured to:

determining the position of a video window corresponding to the current frame;

18. The apparatus of claim 12, wherein the apparatus further comprises:

a sub-region determining module, configured to determine a target sub-region in a region other than the invalid overlapping region, where the target sub-region is a sub-region with a largest area in all sub-regions, of which aspect ratios are the same as those of the current frame, in the region other than the invalid overlapping region;