CN114638808A

CN114638808A - Multi-scene video jitter detection method based on video monitoring

Info

Publication number: CN114638808A
Application number: CN202210284410.8A
Authority: CN
Inventors: 马丕明; 刘学孔; 左修洋
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2022-03-22
Filing date: 2022-03-22
Publication date: 2022-06-17

Abstract

The invention relates to a multi-scene video jitter detection method based on video monitoring, which comprises the following steps: acquiring images of two adjacent frames of a conference monitoring picture; searching and matching feature points of the two acquired frames of images; if the number of the matched feature points is smaller than a set threshold value, judging that the meeting place is switched, and not carrying out picture jitter detection at the moment; if the number of the matched characteristic points is larger than a set threshold value, judging that the meeting place is not switched, respectively calculating the row projection and the column projection of the two frames of images, respectively performing correlation operation on the row projection and the column projection, taking an extreme value in a correlation value curve as displacement generated between the two frames, and taking a vector absolute value as the displacement; and when the displacement distance is larger than the set threshold value, judging that the video shakes. The invention can accurately and efficiently judge whether the video picture shakes or not aiming at the video conference. The method can adapt to multi-scene monitoring scenes, avoids the situation of jitter detection misjudgment caused by scene conversion, and has the advantages of small calculated amount, good detection effect and wider use scenes.

Description

Multi-scene video jitter detection method based on video monitoring

Technical Field

The invention relates to a multi-scene video jitter detection method based on video monitoring, and belongs to the technical field of computer vision.

Background

The monitoring system becomes an indispensable part in our daily life, and the importance of the monitoring system in the fields of banks, markets, companies, schools, residential areas, public transportation and the like is self-evident, and the monitoring system plays an increasingly important role along with the construction of cities and the development of companies. Especially, in recent years, the outbreak of new crown epidemic situation meets the epidemic prevention requirement, avoids the gathering of a large number of people in different areas, and the monitoring system of multiple meeting places is more and more widely applied. The multi-meeting-place monitoring system means that a plurality of sub-meeting-place monitoring pictures are sequentially transmitted into a main monitoring picture so as to meet the polling requirement of each meeting place. In order to guarantee the quality of the monitoring video, a series of problem detection needs to be carried out on the polling video, and video picture jitter detection is a very critical loop.

Chinese patent document CN106385580B discloses a video jitter detection method based on image gray distribution characteristics, in which a video frame jitter detection method is mentioned, and the method is: step 1: intercepting two adjacent frames of images in the video; and 2, step: converting the two intercepted frame images into a gray scale space; and step 3: counting the gray value of each row and the gray value of each column of the previous frame image, and calculating a row gray mean value, a row gray variance value, a column gray mean value and a column gray variance value; and 4, step 4: counting the gray value of each row and the gray value of each column of the next frame of image, and calculating a row gray mean value, a row gray variance value, a column gray mean value and a column gray variance value; and 5: performing a line hypothesis test on the line gray level mean value and the line gray level variance of the two frames of images obtained in the step 3 and the step 4 to obtain a line test factor; step 6: performing a column hypothesis test on the column gray mean value and the column gray variance of the front frame and the back frame obtained by calculation in the step 3 and the step 4 to obtain a column test factor; and 7: comparing the row check factor and the column check factor obtained by calculation in the step 5 and the step 6 with a given threshold respectively, and if one of the row check factor and the column check factor exceeds the threshold, judging that the previous frame of image is a jittered frame; and 8: and counting the proportion of the jitter frames in the whole video, and if the proportion exceeds a set jitter threshold, judging that the video jitters. The method mainly utilizes the gray feature of the image, judges whether the video picture shakes or not by judging the mean value and the variance of the gray of the row and the column of the two frames of images, and has the characteristics of small calculated amount and good real-time performance. However, the method has high limitation, is not suitable for the current multi-meeting-place monitoring scene, and the scene switching of the picture can cause the gray scale characteristics of the image to be changed violently, thereby directly causing the misjudgment phenomenon of the algorithm.

Disclosure of Invention

In order to overcome the defects in the prior art, the invention provides a multi-scene video jitter detection method based on video monitoring, so as to solve the problem that the jitter detection is mostly applied to a single scene at present.

The technical scheme of the invention is as follows:

a video jitter detection method for multiple meeting places based on video monitoring comprises the following steps:

step 1: detecting characteristic points of the two acquired frames of images by a Shi-Tomasi corner detection method;

step 2: judging whether meeting place switching occurs or not through corner matching; the method comprises the following steps: if the number of the matched angular points is less than a set threshold value, judging that the meeting place is switched, and not carrying out picture jitter detection at the moment; if the number of the matched angular points is larger than a set threshold value, judging that the meeting place is not switched, and entering the step 3;

and step 3: judging whether the picture shakes or not according to the gray characteristic of the image; the method comprises the following steps: respectively calculating the row projection and the column projection of two frames of images, respectively carrying out correlation operation on the row projection and the column projection, taking an extreme value in a correlation value curve as displacement generated between the two frames, and taking a vector absolute value as the magnitude of the displacement; and when the displacement distance is larger than the set threshold value, judging that the video shakes.

According to the invention, the step 1 is preferably realized by the following steps:

step 1.1: calculating the pixel value variation E (u, v) inside a window when the window moves towards x and y directions simultaneously in a Cartesian rectangular coordinate system with the upper left corner of the captured video frame image as an origin, the right side as an x axis and the downward side as a y axis;

step 1.2: for each window, respectively calculating a corresponding angular point response function R;

step 1.3, setting a threshold value threshold, carrying out threshold value judgment on the calculated angular point response function, if R is larger than the threshold value, indicating that the window corresponds to an angular point characteristic, and the pixel corresponds to an angular point, otherwise, neglecting the pixel point, and detecting the next pixel point.

Further preferably, the step 1.1 specifically comprises the following steps:

enabling the center of a window to be located at any position (x, y) of a gray level image of any frame image in a video acquired by video monitoring, wherein the gray level value of a pixel at the position is I (x, y), if the window moves towards the x direction and the y direction by small displacement u and v respectively, the window moves to a new position (x + u, y + v), the gray level value of the pixel at the position is I (x + u, y + v), and I (x + u, y + v) -I (x, y) refers to the change value of the gray level value caused by window movement;

defining ω (x, y) as a window function at position (x, y) to represent the weight of each pixel within the window, setting the weight of all pixels within the window to 1; or, ω (x, y) is set to a gaussian distribution with the center of the window as the origin; if the pixel at the center point of the window is the corner point, the weight coefficient of the center point of the window is set to be 1, which indicates that the contribution of the center point of the window to the gray scale change is large; the farther the points are from the center point of the window, namely the angular point, the smaller the gray scale change of the points, the smaller the weight coefficient is set to be 01, and the farther the weight coefficient is from the angular point, the closer the weight coefficient is to 0, the smaller the contribution of the points to the gray scale change is shown;

the formula for calculating the variation E (u, v) of the pixel value inside the window is shown in formula (I):

in the formula (I), I_xAnd I_yIs the partial differential of I, in the image is the gradient map in the x and y directions,

the matrix M is:

the right side of the arrow is a result of diagonalization processing of the real symmetric matrix, R is a response function of an angular point and does not influence the variation components of two orthogonal directions;

after diagonalization, the variation components in two orthogonal directions are extracted, namely lambda₁And λ₂，λ₁And λ₂Are two eigenvalues of the matrix M.

Further preferably, the step 1.2 specifically comprises the following steps:

directly using a smaller characteristic value as a corner response function R, as shown in formula (II):

R＝min(λ₁，λ₂) (II)

in the formula (II), λ₁And λ₂Is the eigenvalue of the matrix M.

Further preferably, in step 1.3, threshold is 15.

According to the invention, the step 2 is preferably realized by the following steps:

intercepting two adjacent frames of images of a video, wherein the former frame is defined as a reference frame, the latter frame is defined as a current frame, taking one corner point of the reference frame image, and finding out two corner points which are nearest to a characteristic point Euclidean distance of the reference frame in the current frame image, wherein in the two corner points, the ratio of the Euclidean distance of the corner point nearest to the reference frame corner point divided by the Euclidean distance of the next-nearest corner point is less than a threshold value T, then the two corner points are successfully matched, otherwise, the two corner points are not successfully matched;

repeating the operation until all the detected corner points in the step 1 are subjected to corner point matching, and taking T0.40.6;

if the matching degree of all the corner points of the two frames reaches more than 90%, the meeting place switching does not occur, otherwise, the meeting place switching is judged to occur.

Further preferably, the current frame corner is based on the nearest Euclidean distance rho of the reference frame corner₁Is obtained by the formulaFormula (III):

in the formula (III), x₁、y₁Is the line and column coordinates, x, of the corner point of the reference frame₂、y₂The coordinates of the current frame corresponding to the angular point are taken as the line coordinates and the column coordinates;

the Euclidean distance rho of the current frame corner next to the reference frame corner₂The formula (IV) is shown as the following formula:

in the formula (IV), x₃、y₃The coordinates of the row and the column of the angular point of the current frame, which is next to the angular point of the reference frame;

the formula for the ratio is shown in formula (V):

according to the invention, the step 3 is preferably realized by the following steps:

step 3.1: calculating the total pixel value Row of each line of the image_k(i) As shown in formula (VI):

in formula (VI), k represents the k-th frame image, i represents the i-th row, and m represents the number of columns of the image;

step 3.2: calculating the line average gray value Row of the whole image_kAs shown in formula (VII):

Row_k＝[∑Row_k(i))]/n (VII)

in formula (VII), n represents the total number of lines of the image;

step 3.3: using the total pixel value Row of each Row of the image_k(i) Subtract the line mean gray value Row of the image_kObtaining a line correction value Rowproject of a projection value of a k frame image_k(i) As shown in formula (VIII):

Rowproject_k(i)＝Row_k(i)-Row_k (VIII)

step 3.4: calculating the total pixel value Col of each column of the image_k(j) As shown in formula (IX):

in the formula (IX), k denotes a k frame image, j denotes a j line, and n denotes a total number of lines of the image;

step 3.5: calculating the column mean gray value Col of the whole image_kAs shown in formula (X):

Col_k＝[∑Col_k(j))]/m(X)

in formula (X), m represents the total number of columns of the image;

step 3.6: using the total pixel value Col of each column of the image_k(j) Subtract the column mean gray value Col of the image_kObtaining Colproject correction value of projection value of k frame image_k(j) As shown in formula (XI):

Colproject_k(j)＝Col_k(j)-Col_k (XI)

drawing a gray level projection curve of the row and the column of the image through the line and column projection value correction values calculated in the step 3.3 and the step 3.6;

step 3.7: after calculating the row and column gray projection curves of the current frame and the reference frame respectively, performing cross-correlation operation on the row and column gray projection curves of the current frame and the reference frame respectively, taking an extreme value in a correlation value curve as displacement of the image frame, and taking an absolute value of a vector thereof as the displacement, wherein the cross-correlation operation calculation formula is shown as a formula (XII) and a formula (XIII):

in the formulae (XII) and (XIII), R_x(w) and R_y(v) Respectively representing the calculation formulas, Col, for performing the correlation operation on the processed row and column projections_pre(j) Is the gray projection value of the j column of the current frame image, Col_ref(j) Taking the gray projection value of the jth column of the reference frame as p and q are the search length of the current frame relative to the reference frame on one side of the image;

w and v are in a range such that R_x(w) and R_y(v) Is defined as w_minAnd v_minThen, the displacement vectors of the current frame relative to the reference frame in the horizontal and vertical directions are respectively shown in formula (XIV) and formula (XV):

d_x＝m+1-w_min (XIV)

d_y＝n+1-v_min (XV)

in the formulae (XIV) and (XV), d_xRepresents the displacement of the current frame compared to the reference frame in the horizontal direction, where d_yRepresenting the displacement of the current frame in the vertical direction compared to the reference frame;

when d is_xOr d_yIs greater than a set threshold value T₁And judging that the video picture shakes, otherwise, judging that the video picture does not shake.

Further preferably, T is selected according to different applicable scenes₁And taking 5-30 parts.

The beneficial effects of the invention are as follows:

the invention provides a multi-scene video jitter detection method based on video monitoring, which can effectively identify whether a video picture is converted or not through corner matching and detection, thereby being suitable for multi-scene monitoring situations, avoiding the situation of jitter detection misjudgment caused by scene conversion, and judging whether the video picture is jittered or not through the gray level characteristics of images. The method improves the traditional single scene video jitter detection mode, so that the method has wider application range, stronger anti-interference capability and better detection effect.

Drawings

FIG. 1 is a schematic flow chart of a video jitter detection method for multiple meeting places based on video surveillance according to the present invention;

FIG. 2 is a schematic diagram of corner point matching when no video scene is switched;

FIG. 3 is a schematic diagram of corner point matching when a video scene is switched;

FIG. 4(a) is a diagram illustrating a video scene I;

FIG. 4(b) is a schematic column projection of the image of FIG. 4 (a);

FIG. 4(c) is a schematic line projection of the image of FIG. 4 (a);

FIG. 5(a) is a schematic view of a video scene II;

FIG. 5(b) is a schematic column projection of the image of FIG. 5 (a);

fig. 5(c) is a schematic line projection of the image of fig. 5 (a).

Detailed Description

The invention is further described below, but not limited thereto, with reference to the drawings and examples of the specification.

Example 1

A video monitoring-based multi-meeting-place video jitter detection method detects feature points through a Shi-Tomasi corner detection method, judges whether meeting-place switching occurs or not through feature point matching, and finally judges whether a picture jitters or not through the gray scale features of the image, as shown in figure 1, the method comprises the following steps:

step 1: detecting characteristic points (angular points) of the two acquired frames of images by a Shi-Tomasi angular point detection method;

and 3, step 3: judging whether the picture shakes or not according to the gray characteristic of the image; the method comprises the following steps: respectively calculating the row projection and the column projection of two frames of images, respectively carrying out correlation operation on the row projection and the column projection, taking an extreme value in a correlation value curve as displacement generated between the two frames, and taking a vector absolute value as the magnitude of the displacement; and when the displacement distance is greater than the set threshold value, judging that the video shakes.

Example 2

The method for detecting the video jitter of multiple meeting places based on video monitoring in the embodiment 1 is characterized in that:

when one window moves in the image smooth area, the image gray scale is not changed; the window moves in the direction of the edge, and the image gray scale is not changed; the window moves at the corner points, causing a significant change in the image grey scale. The Shi-Tomas corner detection utilizes the intuitive physical phenomenon, and judges whether the corner is a corner or not according to the change degree of the window in each direction. The concrete implementation steps of the step 1 comprise:

step 1.1: calculating the pixel value variation E (u, v) inside a window when the window moves towards x and y directions simultaneously in a Cartesian rectangular coordinate system with the upper left corner of the captured video frame image as an origin, the right side as an x axis and the downward side as a y axis; the method specifically comprises the following steps:

defining ω (x, y) as a window function at position (x, y) to represent the weight of each pixel within the window, setting the weight of all pixels within the window to 1; alternatively, ω (x, y) is set to a gaussian distribution (binary normal distribution) with the center of the window as the origin; if the pixel at the center point of the window is an angular point, the gray value change of the center point of the window is very strong before and after the window moves, so that the weight coefficient of the center point of the window is set to be 1, which indicates that the center point of the window has a large contribution to the gray value change; the farther the points are from the center point of the window, namely the angular point, the smaller the gray scale change of the points, the smaller the weight coefficient is set to be 01, and the farther the weight coefficient is from the angular point, the closer the weight coefficient is to 0, the smaller the contribution of the points to the gray scale change is shown;

the amount of change in the gray value of the pixel caused by the window moving in each direction (u, v) is as follows:

e (u, v) will be very large for one corner point. Thus, this function above can be maximized to obtain corner points in the image. Computing E (u, v) with the above function can be very slow. Therefore, the taylor expansion (only first order) can be used to get an approximate form of this formula.

The taylor expansion formula for two dimensions is:

T(x，y)≈f(u，v)+(x-u)f_x(u，v)+(y-v)f_y(u，v)+…

applying I (u + x, y + v) to the above formula, we can obtain:

I(x+u，y+v)≈I(x，y)+uI_x+vI_y

wherein I_xAnd I_yIs the partial differential of I, which in the image is the gradient map in the x and y directions.

The derivation continues as follows:

taking u and v out to obtain the final form: the formula for calculating the variation E (u, v) of the pixel values inside the window is shown in formula (I):

the matrix M is:

after diagonalization, the variation components in two orthogonal directions are extracted, namely lambda₁And λ₂，λ₁And λ₂Are two eigenvalues of the matrix M;

having obtained the final form of E (u, v) from the derivation of step 1.1, it is then necessary to use the eigenvalues to find those windows that will cause large changes in the grey value. The stability of the corner is related to the smaller eigenvalue of the matrix M, and then the smaller eigenvalue is directly used as the corner response function R, as shown in equation (II):

R＝min(λ₁，λ₂) (II)

in the formula (II), λ₁And λ₂Is the eigenvalue of the matrix M;

step 1.3: the result of detecting the Shi-Tomas corner is a gray image with a corner response function R, a threshold value is set (threshold is 15 in the method), the calculated corner response function is subjected to threshold value judgment, if R is greater than threshold, it is indicated that the window corresponds to a corner feature, the pixel corresponds to a corner, otherwise, the pixel is ignored, and the next pixel is detected.

The concrete implementation steps of the step 2 comprise:

The Euclidean distance rho between the corner of the current frame and the corner of the reference frame₁The formula (III) is shown as the formula:

in the formula (III), x₁、y₁Is the row and column coordinates, x, of the corner of the reference frame₂、y₂The coordinates of the current frame corresponding to the angular point are taken as the line coordinates and the column coordinates;

the formula for the ratio is shown in formula (V):

the most obvious characteristic when the video shakes is that the whole displacement can be generated between frames, and after the displacement is detected, whether the video shakes is judged through further logic, therefore, basically, the shaking of the video is carried out around how to detect the displacement, the invention calculates the row and column displacement quantity based on the gray characteristic of the image of the video picture, and further judges whether the video picture shakes, and the specific implementation step of the step 3 comprises the following steps:

Row_k＝[∑Row_k(i))]/n (VII)

in formula (VII), n represents the total number of lines of the image;

Rowproject_k(i)＝Row_k(i)-Row_k (VIII)

in the formula (IX), k denotes the k-th frame image, j denotes the j-th line, and n denotes the total number of lines of the image;

step 3.5: calculating the column average gray value Col of the whole image_kAs shown in formula (X):

Col_k＝[∑Col_k(j))]/m (X)

in the formula (X), m represents the total number of columns of the image;

step 3.6: using the total pixel value Col of each column of the image_k(j) Subtracting the column mean gray value Col of the image_kObtaining Colproject correction value of projection value of k frame image_k(j) As shown in formula (XI):

Colproject_k(j)＝Col_k(j)-Col_k (XI)

drawing a gray level projection curve of the row and the column of the image through the projection value correction values of the row and the column calculated in the step 3.3 and the step 3.6;

in the formulae (XII) and (XIII), R_x(w) and Ry (v) respectively represent calculation formulas for performing correlation operation on the processed row and column projections, Col_pre(j) Is the gray projection value of the j column of the current frame image, Col_ref(j) Taking the gray projection value of the jth column of the reference frame as p and q are the search length of the current frame relative to the reference frame on one side of the image;

w and v are in the range of values such that R_x(w) and R_y(v) The value of (b) is defined as w_minAnd v_minThen, the displacement vectors of the current frame relative to the reference frame in the horizontal and vertical directions are respectively shown in formula (XIV) and formula (XV):

d_x＝m+1-w_min(XIV)

d_y＝n+1-v_min(XV)

when d is_xOr d_yIs greater than a set threshold value T₁Judging that the video image shakes during the process, otherwise, judging that the video image does not shake, and according to different applicable scenes, T₁And taking 5-30 parts.

FIG. 2 is a schematic diagram of corner point matching when no video scene is switched; FIG. 3 is a schematic diagram of corner matching when a video scene is switched; as can be seen from fig. 2 and 3, when the monitored scene is not changed, the corners of the two frames of images captured from the video are substantially completely matched, and when the monitored scene is changed, the corners of the two frames of images captured from the video are only partially matched, and whether the video scene is switched or not is determined according to the matching degree of the corners of the two frames of images.

FIG. 4(a) is a diagram illustrating a video scene I; FIG. 4(b) is a schematic column projection of the image of FIG. 4 (a); FIG. 4(c) is a schematic line projection of the image of FIG. 4 (a); FIG. 5(a) is a schematic view of a video scene II; FIG. 5(b) is a schematic column projection of the image of FIG. 5 (a); fig. 5(c) is a schematic line projection of the image of fig. 5 (a). After the row-column projection image of the image is obtained, the formula in step 3 is used for carrying out correlation operation on the row-column projection respectively, the extreme value in the correlation value curve is taken as the displacement generated between two frames, and the vector absolute value is the displacement. And when the displacement distance is larger than a set threshold value, judging that the video shakes.

Claims

1. A video jitter detection method for multiple meeting places based on video monitoring is characterized by comprising the following steps:

and step 3: judging whether the picture shakes or not according to the gray characteristics of the image; the method comprises the following steps: respectively calculating the row projection and the column projection of two frames of images, respectively carrying out correlation operation on the row projection and the column projection, taking an extreme value in a correlation value curve as displacement generated between the two frames, and taking a vector absolute value as the magnitude of the displacement; and when the displacement distance is larger than the set threshold value, judging that the video shakes.

2. The method for detecting the video jitter of multiple meeting places based on video surveillance as claimed in claim 1, wherein the specific implementation steps of step 1 include:

3. The video shake detection method for multiple meeting places based on video surveillance as claimed in claim 2, wherein the step 1.1 includes the following steps;

the matrix M is:

4. The video shake detection method for multiple meeting places based on video monitoring as claimed in claim 2, wherein the step 1.2 comprises the following steps:

R＝min(λ₁，λ₂) (II)

in the formula (II), λ₁And λ₂Is the eigenvalue of the matrix M.

5. The method for detecting multi-meeting-place video jitter based on video surveillance as claimed in claim 2, wherein in step 1.3, threshold is 15.

6. The method for detecting the video jitter of multiple meeting places based on video surveillance as claimed in claim 1, wherein the step 2 is implemented by the following steps:

repeating the operation until all the detected corner points in the step 1 are subjected to corner point matching, and taking T as 0.4-0.6;

7. The method as claimed in claim 6, wherein the Euclidean distance p between the corner of the current frame and the corner of the reference frame is the nearest Euclidean distance₁The formula (III) is shown as the formula:

formula (III)In, x₁、y₁Is the row and column coordinates, x, of the corner of the reference frame₂、y₂The coordinates of the current frame corresponding to the angular point are taken as the line coordinates and the column coordinates;

the formula for the ratio is shown in formula (V):

8. the video shake detection method for multiple meeting places based on video surveillance as claimed in any one of claims 1-7, wherein the step 3 is implemented by the following steps:

in formula (VI), k represents the kth frame image, i represents the ith row, and m represents the number of columns of the image;

Row_k＝[∑Row_k(i))]/n (VII)

in formula (VII), n represents the total number of lines of the image;

step 3.3: using the total pixel value Row of each Row of the image_k(i) Subtract the line mean gray value Row of the image_kTo obtain a pairRowpproject line correction value of image projection value of k-th frame_k(i) As shown in formula (VIII):

Rowproject_k(i)＝Row_k(i)-Row_k (VIII)

Col_k＝[∑Col_k(j))]/m (X)

in the formula (X), m represents the total number of columns of the image;

Colproject_k(j)＝Col_k(j)-Col_k (XI)

in the formulae (XII) and (XIII), R_x(w) and R_y(v) Respectively representing the calculation formulas, Col, for performing the correlation operation on the processed row and column projections_pre(j) Is the gray projection value of the jth column of the current frame image, Col_ref(j) Taking the gray projection value of the jth column of the reference frame as p and q are the search length of the current frame relative to the reference frame on one side of the image;

w and v are in a range such that R_x(w) and R_y(v) The value of (b) is defined as w_minAnd v_minThen, the displacement vectors of the current frame relative to the reference frame in the horizontal and vertical directions are respectively shown in formula (XIV) and formula (XV):

d_x＝m+1-w_min (XIV)

d_y＝n+1-v_min (XV)

in the formulae (XIV) and (XV), d_xRepresents the displacement of the current frame compared with the reference frame in the horizontal direction, wherein dy represents the displacement of the current frame compared with the reference frame in the vertical direction;

when d is_xOr d_yIs greater than a set threshold value T₁And judging that the video picture shakes when the video picture is in the normal state, otherwise, judging that the video picture does not shake.

9. The video monitoring-based multi-meeting-place video shake detection method according to claim 8, wherein T is the number of applications according to different applicable scenes₁And taking 5-30 parts.