CN103679215A

CN103679215A - Video monitoring method based on group behavior analysis driven by big visual big data

Info

Publication number: CN103679215A
Application number: CN201310746795.6A
Authority: CN
Inventors: 黄凯奇; 康运锋; 曹黎俊; 张旭
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2013-12-30
Filing date: 2013-12-30
Publication date: 2014-03-26
Anticipated expiration: 2033-12-30
Also published as: CN103679215B

Abstract

A video monitoring method achieved by a computer comprises the steps of receiving video data captured by a vidicon; establishing a group behavior model according to the received video data; estimating parameters of the group behavior model to obtain multiple crowd behaviors existing in a scene; using the obtained group behavior model to obtain behavior feature sets of different crowds; converting the obtained behavior feature sets and using the converted behavior feature sets to obtain statistical people number values according to all of crowd behaviors. The vidicon has general applicability in angle. The video monitoring method can be used for people counting at open entrances and exits, is small in calculation quantity and can meet the real-time video processing requirement.

Description

Video monitoring method for group behavior analysis based on vision big data driving

Technical Field

The invention relates to a video monitoring method, in particular to a video monitoring method based on a visual big data driven group behavior analysis technology.

Background

Most conventional monitoring systems require a dedicated monitoring person to make a manual judgment on the monitored video. This requires a lot of manpower and the person is long time dedicated to a matter and may neglect some abnormalities, with negative consequences. The intelligent video monitoring system can identify different objects, and can send out alarm and provide useful information in a fastest and optimal mode when the abnormal condition in the monitoring picture is found, so that monitoring personnel can be effectively assisted to obtain accurate information and process emergencies, and the phenomena of misinformation and failure in reporting are reduced to the maximum extent.

In the related art, video monitoring methods can be divided into two categories according to different crowd behavior detection methods. One type of method is a multi-person behavior recognition method based on motion tracking, which is challenged by the number of people in the crowd. When the number of people is large, the shielding is serious, and single tracking cannot be performed, so that the method can only be applied to the condition that the scene is simple and the number of people is small. The second method is a crowd behavior identification method based on feature learning or behavior model construction, and is mainly used for detecting abnormal behaviors in crowds, such as crowd gathering, crowd scattering, crowd running, crowd blow and other abnormal behaviors. The method is more suitable for multiple human scenes, the model is established by extracting the characteristics, and the model parameters are obtained by using a machine learning method, so that the detection rate is improved. But one model cannot describe all behaviors and therefore a different model is required for a particular behavior. In addition, the lack of training samples still poses challenges to obtaining optimal model parameters.

Disclosure of Invention

The invention aims to provide a video monitoring method which can detect and identify the behaviors of people and count the number of people with different behaviors.

In order to achieve the above object, a video monitoring method may include the steps of:

1) receiving video data captured by a camera;

2) establishing a group behavior model according to the received video data;

3) estimating parameters of the group behavior model to obtain various group behaviors in a scene;

4) obtaining behavior feature sets of different crowds by using the obtained group behavior model;

5) the resulting behavioral feature sets are transformed and used per-day

The population behavior is counted.

According to the technical scheme of the invention, the method has the advantages that: 1) the mathematical model is simple, the parameters are few, and the training is convenient; 2) the method can be used for crowd crowding environment and calculating the cumulative amount of people at specific behaviors; 3) the camera angle setting has universal applicability and can be used for counting the number of people in the open entrance; 4) the calculation amount is small, and the requirement of real-time video processing can be met.

Drawings

FIG. 1 shows a flow diagram of a video surveillance method according to an embodiment of the invention;

FIG. 2 illustrates a word-document model structure according to an embodiment of the present invention;

FIG. 3 illustrates an example live scenario according to an embodiment of the present invention;

FIG. 4 illustrates a set of different population behavior features in a live scene in accordance with an embodiment of the invention;

FIG. 5 shows a schematic view of a geometric correction according to an embodiment of the invention;

figure 6 illustrates an example of the number of people on site parks according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to specific embodiments and the accompanying drawings.

It should be noted that in the drawings or description, the same drawing reference numerals are used for similar or identical parts. Implementations not depicted or described in the drawings are of a form known to those of ordinary skill in the art. Additionally, while exemplifications of parameters including particular values may be provided herein, it is to be understood that the parameters need not be exactly equal to the respective values, but may be approximated to the respective values within acceptable error margins or design constraints. In addition, directional terms such as "upper", "lower", "front", "rear", "left", "right", and the like, referred to in the following embodiments, are directions only referring to the drawings. Accordingly, the directional terminology is used for purposes of illustration and is in no way limiting.

According to the technical scheme, firstly, aiming at the complexity of scene population, a population behavior model is used for mining various behaviors in a scene; then, acquiring a behavior feature set for each type of crowd according to the detected K types of crowd behaviors; then, the obtained behavior feature set is converted into a 5-dimensional feature vector, for example, so as to reduce feature dimensions, and a 5 x G-dimensional feature vector is obtained by associating time parameters; and then, training an artificial neural network by using the obtained 5-G-dimensional feature vector, thereby counting the accumulated amount of the behavior people of each type of population. The whole technical scheme flow chart of the embodiment of the invention is shown in the attached figure 1. The following provides a detailed description of embodiments of the invention.

Step 1: video data acquired by the camera is received and may be processed, such as de-noised.

Step 2: a group behavior model is established based on the received video data.

Due to the complexity of crowd behavior, there are often different crowd behaviors in one scene, and it is difficult to describe all behaviors with a single model. Therefore, the feature set of each behavior can be obtained through a group behavior model, and the behavior feature set is used for performing human group analysis. The group behavior model may be a word-document model, namely: the bottom-level features are used as words, the video segments are used as documents, so that crowd behaviors in the video, namely hidden topics, are mined, and feature sets of all the crowd behaviors, namely the bottom-level feature sets, are obtained.

The model bottom layer characteristics adopted by the embodiment of the invention are local motion information. For example, the motion pixels may be obtained by a frame difference method, and then an optical flow method (Horn B K P, Schunck B g]Artificial intellgence, 1981, 17 (1): 185-203) to calculate the velocity vector of the motion pixel, and then obtain the characteristics of the motion pixel, i.e. the position and the motion velocity. Here, each moving pixel is taken as a word w_iA segment of video may comprise M frames of images, i.e. M documents, each of which may be represented by a set of words, i.e. documents W ═ { W ═ W_i，i＝1，.., N }, wherein w_i＝{x_i，y_i，u_i，v_iN is the number of pixels in the video frame, x represents the horizontal position of the pixel, y represents the vertical position of the pixel, u represents the velocity of the pixel in the horizontal direction, and v represents the velocity of the pixel in the vertical direction. Of course, other techniques known in the art of motion estimation may be employed by those skilled in the art to represent the document W.

FIG. 2 illustrates a word-document model structure used by embodiments of the present invention. Wherein, alpha represents the relative strength among the hidden topics in the document set, beta represents the probability distribution of all the hidden topics, and the random variable pi_jCharacterizing the document layer j, random variable π_jThe size of (d) represents the specific gravity of each implied topic in the target document. In the word layer, z_jiRepresenting the implicit topic quota, x, assigned to each word i by the target document j_jiIs a word vector representation of the target document. Assuming that there are K behavioral topics, each topic is a multinomial distribution of words, and α may be a Dirichlet distribution of the corpus. For each document j, Dirichlet distributes Dir (π)_j| α) is at π_jAre parameters. For each word i in document j, topic z_jiHas a probability distribution of pi_jkWord x_jiIs about a parameterA plurality of distributions of (a). Wherein pi_jAnd z_jiFor the dependent variables, α and β are parameters that need to be optimized. When given α and β, the random variable π_jSubject z_j＝{z_jiThe term x_j＝{x_jiThe joint probability distribution of is shown in equation (1):

therefore, the core problem of constructing the word-document model is the inference of the distribution of the hidden variables, namely, the acquisition of the constituent information (pi, z) of the hidden topic in the target document. However, due to the posterior distribution p (z)_j，π_jI α, β), the distribution can be approximated using the variation distribution of equation (2) as shown below:

wherein, γ_jFor Dirichlet distribution q (π)_j|γ_j) Parameter of { phi })_jiIs a polynomial distribution q (z)_j|φ_j) The parameter (c) of (c). (gamma. rays)_j，φ_j) Can be calculated by calculating logp (x)_j| α, β).

And step 3: and estimating parameters of the group behavior model to obtain various group behaviors in the scene.

The optimum parameters (. alpha.,. beta.) can be calculated by calculating logp (x)_j| α, β) is obtained as shown in equation (3).

Also due to p (x)_j| α, β) is not straightforward to compute, the parameters (α, β) can be estimated by a variational maximum likelihood estimation EM method: in E-step, for each document j, find the optimal variation parameterThe above equation (2) is approximated using the variation distribution of the optimum variation parameter obtained by E-step, and the optimum parameter (α) is obtained by two-step loop calculation^*，β^*)。

As an example, fig. 3 shows a certain frame of image of received video data, wherein mining to the scene using the group behavior model of the present invention includes, for example, four implicit topics (crowd behaviors), namely: up, down, left, right.

Step 4: and obtaining different population behavior feature sets by using the obtained population behavior model.

Each frame of image in the video contains different crowd behaviors, and the parameter of the group behavior model obtained in step 3 can be used to obtain the feature set of each crowd behavior through the word-document model, as shown in the following formula (4).

<math> <mrow> <mfenced open='{' close=''> <mtable> <mtr> <mtd> <msub> <mi>f</mi> <msup> <mi>k</mi> <mo>*</mo> </msup> </msub> <mo>=</mo> <mo>{</mo> <msub> <mi>x</mi> <mrow> <msup> <mi>k</mi> <mo>*</mo> </msup> <mi>i</mi> </mrow> </msub> <mo>|</mo> <mi>i</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>,</mo> <mi>F</mi> <mo>}</mo> </mtd> </mtr> <mtr> <mtd> <msup> <mi>k</mi> <mo>*</mo> </msup> <mo>=</mo> <munder> <mrow> <mi>arg</mi> <mi>max</mi> </mrow> <mrow> <mi>k</mi> <mo>&Element;</mo> <mo>{</mo> <mn>1</mn> <mo>,</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>,</mo> <mi>K</mi> <mo>}</mo> </mrow> </munder> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>z</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>i</mi> </mrow> </msub> <mo>|</mo> <mi>α</mi> <mo>,</mo> <mi>β</mi> <mo>)</mo> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mo>)</mo> </mrow> </mrow> </math>

Wherein,

is the feature set of the k-th behavior, F is the number of features in the feature set of the k-th behavior, x_kiIs the feature that the word is the ith pixel point of the kth behavior.

Fig. 4 shows the crowd behavior in a scene, where different behaviors are represented by optical flow feature points (only some of the feature points are shown in the image), and there are three kinds of crowd behaviors in the figure: the feature point in the rectangular area 1 indicates upward movement, the feature point in the rectangular area 2 indicates leftward movement, and the feature point in the rectangular area 3 indicates downward movement.

And 5: the obtained behavior feature set is converted, and the statistical population value is obtained for each behavior by using the converted behavior feature set.

Different crowd behaviors and feature sets of each behavior are obtained through a group behavior model. Although the behavior feature set can also describe the number of behavior people, the feature dimension is high, the parameter training time is long, and the accumulated number cannot be directly obtained. Therefore, according to the method of the present invention, the behavior feature set of each frame of image can be converted into a 5-dimensional feature vector, thereby reducing the feature dimension. Meanwhile, the time parameter may be added to the behavior feature set, and for each behavior feature set obtained by using the above equation (4), a feature vector NF { AS } with dimension of 5 × G may be obtained_G，SV_G，DV_G，DD_G，NP_GAnd G is a time parameter and represents G frames for counting the accumulated amount of people at a specific behavior. Specifically, the above-mentioned 5 × G dimensional feature vector may be obtained by using the following method:

(1) average velocity vector AS_G：

AS_G＝{AS_gG1.., G }, wherein AS_gAs the average speed of the g frame image, the AS can be obtained AS shown in the formula (5)_g。

Wherein u is_jiAnd v_giRespectively representing the x-direction and y-direction velocity components of the ith feature in the g-th frame image.

(2) Velocity variance vector AV_G：

SV_G＝{SV_gG1, G, wherein SV is_gFor the speed variance of the g-th frame image, which is used to measure the complexity of the light stream speed in each frame image, SV can be obtained as shown in formula (6)_g。

(3) Direction variance vector DV_G：

DV_G＝{DV_gG1.., G }, where DV_gIs the direction variance of the g frame image, and is used for measuring the complexity of the optical flow direction, and DV can be obtained as shown in formula (7)_g。

<math> <mrow> <msub> <mi>DV</mi> <mi>g</mi> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <mn>8</mn> </mfrac> <munderover> <mi>Σ</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mn>8</mn> </munderover> <msup> <mrow> <mo>(</mo> <msub> <mi>ND</mi> <mi>gi</mi> </msub> <mo>-</mo> <msub> <mover> <mi>ND</mi> <mo>&OverBar;</mo> </mover> <mi>g</mi> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>7</mn> <mo>)</mo> </mrow> </mrow> </math>

Dividing 0-360 degrees into 8 intervals, voting the directional features of the concentrated optical flow of each behavior feature according to the angle intervals, and obtaining a directional histogram of each behavior. ND_giIs the statistical value of the ith interval of the direction histogram,

is { ND_giI is the average of 1.., 8 }.

(4) Directional divergence vector DD_G：

DD_G＝{DD_gG1, G, where DD is present_gFor the directional divergence of the g frame image, DD can be obtained as shown in equation (5)_g。

<math> <mrow> <mfenced open='{' close=''> <mtable> <mtr> <mtd> <msub> <mi>DD</mi> <mi>g</mi> </msub> <mo>=</mo> <munderover> <mi>Σ</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mn>8</mn> </munderover> <msub> <mi>ND</mi> <mi>gi</mi> </msub> <mo>×</mo> <mo>|</mo> <msub> <mi>RD</mi> <mi>g</mi> </msub> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> <mo>|</mo> </mtd> </mtr> <mtr> <mtd> <msub> <mi>RD</mi> <mi>g</mi> </msub> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>mod</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>-</mo> <msub> <mi>MD</mi> <mi>g</mi> </msub> <mo>,</mo> <mn>8</mn> <mo>)</mo> </mrow> <mo>-</mo> <mn>8</mn> <mo>×</mo> <mrow> <mo>(</mo> <mi>mod</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>-</mo> <msub> <mi>MD</mi> <mi>g</mi> </msub> <mo>,</mo> <mn>8</mn> <mo>)</mo> </mrow> <mo>&GreaterEqual;</mo> <mn>4</mn> <mo>)</mo> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>8</mn> <mo>)</mo> </mrow> </mrow> </math>

Wherein MD_g＝max(ND_gi)，i＝1，...，8。

(5) Line pixel total number vector

Since the depth of field of a monitored scene is generally large, and the projection of the scene on an image plane has a relatively serious perspective phenomenon (the same object looks large close to a camera and small far from the camera), the contribution of different pixels on the image plane needs to be weighted. The ground is assumed to be planar and the person is perpendicular to the ground. As shown in FIG. 5, let the vanishing point P_vHas the coordinates of (x)_v，y_v) Reference line is y_rIf H/2, the contribution factor of any pixel I (x, y) on the image plane can be obtained as shown in equation (9).

S_{C} (x, y) = {(\frac{y_{r} - y_{v}}{y - y_{v}})}^{2} - - - (9)

The total number of pixels for that behavior is then:the total number of pixels vector for this behavior is NP_G＝{NP_g，g＝1，...，G}

After the 5G-dimensional feature vectors are obtained, the number of people who enter and exit two different behaviors is manually calibrated to be used for training an artificial neural network model, and the trained neural network model is used for counting the number of people who enter and exit. The demographics may be obtained using well-known neural network methods. The experiment obtains the total number of people entering the park by counting the difference between the number of people entering the gate and the number of people going out of the gate at the exit under the scene. Fig. 6(a) shows the number of people who have access to a behavior group in a certain frame of live image. The number of persons entering and exiting from the beginning of counting up to now is shown in red font in the upper right corner of the image: in (In): 157, Out (Out): 39, only partial optical flow feature points are displayed in the image, the feature points in the elliptical area 1 represent out, the feature points in the elliptical area 2 represent in, the arrows represent the motion direction of the feature points, and the black frame is the people counting area. Figure 6(b) shows the change in population on the campus (in units of every 2 minutes) with the average accuracy of the population statistics on the campus being 92.35%.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only examples of the present invention and are not intended to limit the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A computer-implemented video surveillance method, comprising the steps of:

a) receiving video data captured by a camera;

b) establishing a group behavior model according to the received video data;

c) estimating parameters of the group behavior model to obtain various group behaviors in a scene;

d) obtaining behavior feature sets of different crowds by using the obtained group behavior model;

e) and converting the obtained behavior feature set, and obtaining a statistical population value for each group behavior by using the converted behavior feature set.

2. The method of claim 1, wherein the step b) comprises: building a word-document model in which each moving pixel is treated as a word w_iThe M frames of images of a video correspond to M documents, with the word set W ═ W_iI 1.. N } represents a document, where w_i＝{x_i，y_i，u_i，v_iN is the number of pixels in the video frame, x represents the horizontal position of the pixel, y represents the vertical position of the pixel, u represents the velocity of the pixel in the horizontal direction, and v represents the velocity of the pixel in the vertical direction.

3. The method of claim 1, wherein the step c) comprises: estimating parameters of the population behavior model using a maximum likelihood Estimation (EM) method.

4. The method of claim 2, wherein the step c) comprises: detecting behaviors present in the scene using a population behavior model and obtaining a feature set for each behavior according to the following formula:

<math> <mfenced open='{' close=''> <mtable> <mtr> <mtd> <msub> <mi>f</mi> <msup> <mi>k</mi> <mo>*</mo> </msup> </msub> <mo>=</mo> <mo>{</mo> <msub> <mi>x</mi> <mrow> <msup> <mi>k</mi> <mo>*</mo> </msup> <mi>i</mi> </mrow> </msub> <mo>|</mo> <mi>i</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>,</mo> <mi>F</mi> <mo>}</mo> </mtd> </mtr> <mtr> <mtd> <msup> <mi>k</mi> <mo>*</mo> </msup> <mo>=</mo> <munder> <mrow> <mi>arg</mi> <mi>max</mi> </mrow> <mrow> <mi>k</mi> <mo>&Element;</mo> <mo>{</mo> <mn>1</mn> <mo>,</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>,</mo> <mi>K</mi> <mo>}</mo> </mrow> </munder> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>z</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>i</mi> </mrow> </msub> <mo>|</mo> <mi>α</mi> <mo>,</mo> <mi>β</mi> <mo>)</mo> </mrow> </mtd> </mtr> </mtable> </mfenced> </math>

wherein alpha represents the relative strength among the hidden topics in the document set, beta represents the probability distribution of all the hidden topics, and the total K behaviors are assumed,

f is the number of features in the feature set of the kth behavior.

5. The method of claim 4, wherein the step d) comprises: converting the obtained feature set of the behavior into a feature vector NF ═ AS of 5 x G dimension_G，SV_G，DV_G，DD_G，NP_GAnd training an artificial neural network to count people, wherein, AS_GRepresenting the mean velocity vector, SV_GRepresenting the velocity variance vector, DV_GRepresenting a directional variance vector, DD_GRepresenting directional divergence vectors, and NP_GRepresenting a row pixel total vector.

6. Method according to claim 5, wherein the average velocity vector AS_GIs calculated as:

AS_G＝{AS_g，g＝1，...，G}

wherein AS_gIs the average speed of the g-th frame image,u_giand v_giRespectively representing the x-direction and y-direction velocity components of the ith feature in the g-th frame image.

7. The method of claim 5, wherein velocity variance vector SV_GIs calculated as:

SV_G＝{SV_g，g＝1，...，G}

wherein SV_gIs the velocity variance of the g-th frame image,

u_giand v_giRespectively representing the x-direction and y-direction velocity components of the ith feature in the g-th frame image.

8. The method of claim 5, wherein the directional variance vector DV_GIs calculated as:

DV_G＝{DV_g，g＝1，...，G}

wherein DV_gIs the direction variance of the g-th frame image,

ND_giis the statistical value of the ith interval of the direction histogram,

average value of (a), u_giAnd v_giRespectively representing the x-direction and y-direction velocity components of the ith feature in the g-th frame image.

9. The method of claim 5, wherein the directional divergence vector DD_GIs calculated as:

DD_G＝{DD_g，g＝1，...，G}

wherein DD_gIs the directional divergence of the g frame image.

Wherein

<math> <mfenced open='{' close=''> <mtable> <mtr> <mtd> <msub> <mi>DD</mi> <mi>g</mi> </msub> <mo>=</mo> <munderover> <mi>Σ</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mn>8</mn> </munderover> <msub> <mi>ND</mi> <mi>gi</mi> </msub> <mo>×</mo> <mo>|</mo> <msub> <mi>RD</mi> <mi>g</mi> </msub> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> <mo>|</mo> </mtd> </mtr> <mtr> <mtd> <msub> <mi>RD</mi> <mi>g</mi> </msub> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>mod</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>-</mo> <msub> <mi>MD</mi> <mi>g</mi> </msub> <mo>,</mo> <mn>8</mn> <mo>)</mo> </mrow> <mo>-</mo> <mn>8</mn> <mo>×</mo> <mrow> <mo>(</mo> <mi>mod</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>-</mo> <msub> <mi>MD</mi> <mi>g</mi> </msub> <mo>,</mo> <mn>8</mn> <mo>)</mo> </mrow> <mo>&GreaterEqual;</mo> <mn>4</mn> <mo>)</mo> </mrow> </mtd> </mtr> </mtable> </mfenced> </math>

Wherein MD_g＝max(ND_gi)，i＝1，...，8，ND_giIs the statistical value of the ith interval of the direction histogram.

10. The method of claim 5, wherein the row total number of pixels vector NP_GIs calculated as:

NP_G＝{NP_g，g＝1，...，G}

wherein NP is_gThe total number of line pixels of the g-th frame image,