CN112347879A

CN112347879A - Theme mining and behavior analysis method for video moving target

Info

Publication number: CN112347879A
Application number: CN202011165718.8A
Authority: CN
Inventors: 滕辉; 龙飞
Original assignee: Chinaso Information Technology Co ltd
Current assignee: Chinaso Information Technology Co ltd
Priority date: 2020-10-27
Filing date: 2020-10-27
Publication date: 2021-02-09
Anticipated expiration: 2040-10-27
Also published as: CN112347879B

Abstract

The invention relates to the field of image processing, and discloses a theme mining and behavior analysis method for a video moving target, which comprises the following steps: s1) obtaining a video frame sequence, and extracting a characteristic matrix of the video frame sequence; s2) performing theme mining by using the feature matrix to obtain a theme matrix; s3) performing behavior analysis on the video frame sequence by using the theme matrix to obtain the behavior category of the video moving object. The method extracts the image areas with obvious changes in the video frame, further constructs the video expression, and can accurately capture the motion attribute of the target in the video. In addition, the invention adopts a non-negative matrix decomposition algorithm based on stream, optimizes a solving function by using a weight matrix and a constraint condition, excavates a subject summary and more accurately expresses time correlation information among video frames. The invention also provides a behavior multi-classification model based on the double-current convolutional network, so that behavior labels aiming at the mined subjects are obtained, and the classification accuracy is improved.

Description

Theme mining and behavior analysis method for video moving target

Technical Field

The invention relates to the field of image processing, in particular to a theme mining and behavior analysis method for a video moving target.

Background

In recent years, with the rapid development of the internet, a large amount of video information gradually becomes an important medium for people to perceive external things, such as a live platform, a monitoring stream, and the like. Due to the fact that scenes and content targets in the video are complex and complicated and the video duration is long, the video cannot provide overview like texts, the video also contains a large amount of redundant blank information, people cannot analyze one video source efficiently in a short time, and a large amount of manpower and time cost are consumed. Generally, a main object in a video can reflect information of the video itself when moving, and therefore how to accurately mine the object and the motion attribute in the video and analyze the behavior thereof is a problem that needs to be solved urgently. In the prior art, methods such as key frames are mainly adopted for video abstraction, and color gamut space information in an image is captured through a classification algorithm, but the algorithm is only used for obtaining a single image and cannot obtain a moving object and corresponding behavior attributes. In addition, although the video clip can be obtained by performing window sliding on the basis of the key frame, it is difficult to ensure whether the clip contains accurate information.

For example, the national patent publication CN108848422A discloses a video abstract generation method based on target detection, which obtains and labels a picture set including more than 2 target objects in a training stage, establishes a deep learning network, and trains the network by using the training data set to obtain a trained deep learning network. In the using stage, a section of video is obtained, the video is divided into frames, the video frames are input into a trained network, and the network outputs the feature vector of the target object contained in each video frame, the position vector corresponding to the target object and the original image of the video frame containing the target object. And finally, clustering all the feature vectors to obtain a video abstract result. Although the invention discloses a video abstract generation method based on target detection, the method only identifies targets in the whole video and then clusters the targets to obtain an abstract, and key abstract information such as target motion behaviors and the like in one video cannot be accurately described.

Disclosure of Invention

The invention provides a theme mining and behavior analysis method aiming at a video moving target, so that the problems in the prior art are solved.

A topic mining and behavior analysis method for a video moving target comprises the following steps:

s1) obtaining a video frame sequence, and extracting a characteristic matrix Y of the video frame sequence;

s2) performing theme mining by using the feature matrix Y to obtain a theme matrix w;

s3) performing behavior analysis on the video frame sequence by using the theme matrix w to obtain the behavior category of the video moving object.

Further, in step S1), acquiring a video frame sequence, and extracting a feature matrix Y of the video frame sequence, includes the following steps:

s11) obtaining a video frame sequence I (x, y, t) containing the motion of a video moving object, segmenting the video frame sequence I (x, y, t) into N video frame segments, wherein x and y respectively represent an x coordinate and a y coordinate in a space dimension, and t represents time;

s12) performing Gaussian convolution on the video frame sequence I (x, y, t) to obtain a Gaussian convolution result of the video frame sequence I (x, y, t)

Wherein sigma²,τ²The variances of the spatial dimension and the temporal dimension of the sequence of video frames I (x, y, t), respectively, f (-) is a mapping function that maps the sequence of video frames I (x, y, t) to corresponding pixels in the sequence of images;

s13) calculating a three-dimensional space-time secondary moment matrix according to the Gaussian convolution result of the video frame sequence I (x, y, t);

s14) obtaining the eigenvalue of the three-dimensional space-time secondary moment matrix, constructing a discriminant function related to the eigenvalue, obtaining all positive large-value points of the discriminant function in time and space, taking the positive large-value points as the detected interest points, obtaining all interest points of the video frame sequence, and taking the positions in the video frame sequence corresponding to the positive large-value points as the positions of the detected interest points;

s15) extracting feature joint descriptors from all interest points of the video frame sequence respectively to obtain video framesJoint description set of characteristics of sequence of frequency frames { z } - { z ═ z₁,z₂,…z_v,…,z_N},z_vA feature association descriptor set representing the v-th video frame segment,

a feature joint descriptor representing the ith interest point of the vth video frame segment, wherein M is the total number of interest points of the vth video frame segment, i belongs to {1,2, …, M }, and v belongs to {1,2, …, N };

s16) clustering the feature joint description set { z } of the video frame sequence by using a K-means method to obtain a clustering result B ═ B of K clustering centers₁、b₂、…、b_K]，b_kThe characteristic vector representing the Kth clustering center, K belongs to R⁺；

S17) calculating the coding vector of the v-th video frame segment according to the clustering result B of the K clustering centers

c_iFor intermediate coded vectors, intermediate coded vectors c_iSatisfy the requirement of

S18) encoding vector C for the v-th video frame segment_vNormalization is carried out to obtain a normalized coding vector

S19) repeating the steps S17) to S18) in turn to obtain the normalized coding vectors of all the video frame segments, and constructing a feature matrix Y and a feature matrix by using the normalized coding vectors of all the video frame segments

Y∈R^K×N。

Further, in step S13), calculating a three-dimensional space-time second moment matrix according to the result of the gaussian convolution of the sequence of video frames I (x, y, t), including the following steps:

s131) a result L (x, y, t; sigma²,τ²) Calculating partial derivatives to obtain a Gaussian convolution result L (x, y, t; sigma²,τ²) Partial derivatives L in relation to the x-coordinate of the spatial dimension_XGaussian convolution result L (x, y, t; sigma)²,τ²) Partial derivatives L with respect to the y-coordinate of the spatial dimension_yAnd the Gaussian convolution result L (x, y, t; sigma)²,τ²) Partial derivatives L with respect to the time dimension t_t；

S132) based on the gaussian convolution result L (x, y, t; sigma²,τ²) Partial derivatives L in relation to the x-coordinate of the spatial dimension_XThe Gaussian convolution result L (x, y, t; sigma)²,τ²) Partial derivatives L with respect to the y-coordinate of the spatial dimension_yAnd the Gaussian convolution result L (x, y, t; sigma)²,τ²) Partial derivatives L with respect to the time dimension t_tCalculating a three-dimensional space-time second moment matrix mu, a three-dimensional space-time second moment matrix

Further, in step S14), obtaining eigenvalues of the three-dimensional space-time secondary moment matrix, constructing a discriminant function related to the eigenvalues, and obtaining all positive large-value points of the discriminant function in time and space, including the following steps:

s141) obtaining three eigenvalues lambda of three-dimensional space-time second moment matrix mu₁、λ₂And λ₃；

S142) constructing three eigenvalues lambda of three-dimensional space-time second moment matrix mu₁、λ₂And λ₃The discrimination function R of the correlation is equal to λ₁λ₂λ₃-k(λ₁+λ₂+λ₃)³K represents an empirical coefficient;

s143) obtaining all positive large-value points of the discriminant function R in time and space.

In step S142), k represents an empirical coefficient, and k is greater than or equal to 0.01 and less than or equal to 0.07.

Further, in step S15), the method for extracting feature joint descriptors for all the points of interest of the video frame sequence includes the following steps:

s151) acquiring a rectangular area near the jth interest point of the video frame sequence, and recording the rectangular area near the jth interest point as (delta)_x,△_y,△_t)^jFor a rectangular region (Delta) near the jth interest point_x,△_y,△_t)^jComputing normalized histogram of oriented gradients descriptors

And optical flow histogram descriptor

j is equal to {1,2, …, d }, and d is the total number of interest points of the video frame sequence;

s152) describing the directional gradient histogram

And optical flow histogram descriptor

Splicing to obtain HOG/HOF joint descriptor of j interest point

Taking the HOG/HOF joint descriptor of the jth interest point as a feature joint descriptor of the jth interest point of the video frame sequence;

s153) repeating steps S151) to S152) in turn, obtaining feature joint descriptors of all interest points of the video frame sequence.

The method extracts the image area (namely the cuboid area near the interest point) with obvious change in the video frame, and utilizes the direction gradient histogram descriptor and the optical flow histogram descriptor to further construct the video feature expression, so that the motion attribute of the target in the video can be accurately captured.

Further, in step S2), performing topic mining by using the feature matrix Y to obtain a topic matrix w, including the following steps:

s21) establishing an N-dimensional edge weight matrix

p is a positive integer and p<N, m and N are weight matrixes P respectively_WThe row and column index of (c), m ∈ {1,2, …, N }, N ∈ {1,2, …, N };

s22) according to the N-dimensional edge weight matrix P_WConstructing an N-dimensional diagonal matrix P_DDiagonal matrix P_DThe element value of the mth row on the main diagonal is an N-dimensional edge weight matrix P_WThe sum of all element values on line m;

s23) decomposing the coding matrix Y into a first non-negative matrix W and a second non-negative matrix H by using a non-negative matrix decomposition method, wherein Y is approximately equal to WH, updating the first non-negative matrix W and the second non-negative matrix H by using an iteration rule to obtain a first non-negative matrix W after updating is finished, and taking the first non-negative matrix W after updating as a subject matrix W.

Further, in step S23), decomposing the encoding matrix Y into a first non-negative matrix W and a second non-negative matrix H by using a non-negative matrix decomposition method, where Y ≈ WH, updating the first non-negative matrix W and the second non-negative matrix H by using an iteration rule to obtain a first non-negative matrix W after the updating is completed, and taking the first non-negative matrix W after the updating as a subject matrix W, including the following steps:

s231) randomly initializing a Kxr random matrix, taking the Kxr random matrix as a first non-negative matrix W, randomly initializing an r × N random matrix, taking the r × N random matrix as a second non-negative matrix H, wherein each element value in the Kxr random matrix and the r × N random matrix is a random number between 0 and 1, and r is a preset subject number;

s232) respectively updating the first non-negative matrix W and the second non-negative matrix H by utilizing an iteration rule to obtain an updated first non-negative matrix W and an updated second non-negative matrix H, wherein the iteration rule is

Wherein, beta is a constraint coefficient, and beta belongs to [0,1 ]]Subscripts e and q respectively represent matrix row serial numbers and matrix column serial numbers;

s233) calculating an optimization function by using the updated first non-negative matrix W and the updated second non-negative matrix H

H_e+1E +1 th column vector, H, representing the updated second non-negative matrix H_eAn e-th column vector representing the updated second non-negative matrix H;

s234) repeating the steps S232) to S233) in sequence until the optimization function converges to a minimum value, ending the iteration, obtaining the updated first non-negative matrix W, and taking the updated first non-negative matrix W as the theme matrix W.

The topic mining algorithm adopts a non-negative matrix decomposition algorithm based on stream, optimizes a solving function by using a weight matrix and a constraint condition, mines the topic abstract and can more accurately express the time correlation information between video frames.

Further, in step S3), performing behavior analysis on the video frame sequence by using the topic matrix w to obtain a behavior category of the video moving object, including the following steps:

s31) obtaining the corresponding video frame segment index e in the video frame sequence I according to the theme matrix w^*Introducing an index e^*The corresponding video frame segment is denoted as I (e)^*)，

y_qFor the qth column vector in the feature matrix Y, q ∈ [1, N]，w_eAn e-th column vector of the theme matrix w;

s32) recording the number of the moving target types as T, and acquiring a trained target recognition network model M₁And a trained scene recognition network model M₂；

S33) setting the behavior type of each moving target as M, acquiring T trained multi-classification deep learning classification network models, and recording the T trained multi-classification deep learning classification network models as a network model set { L };

s34) using the object recognition network model M₁And scene recognition network model M₂For video frame segment I (e)^*) Identifying to respectively obtain target identification result vectors

And scene recognition result vector

S35) obtaining the video frame fragment I (e) from the network model set { L }^*) Corresponding multi-classification deep learning classification network model

S36) utilizing the video frame segment I (e)^*) Corresponding multi-classification deep learning classification network model L_indexFor video frame segment I (e)^*) And performing behavior identification to obtain the behavior category of the video moving target.

Step S31), so that

The maximum e value is the topic matrix w to obtain the corresponding video frame segment index e in the video frame sequence I^*Thereby obtaining the index e of video frame segment^*Corresponding to e^*A video frame segment I (e)^*). In step S35), acquisition

The maximum element value in the vector, the position index of the maximum element value in the vector corresponds to the index multiple deep learning classification network models in the T trained multiple deep learning classification network models, and the index multiple deep learning classification network models are used as the video frame sliceSegment I (e)^*) Corresponding multi-classification deep learning classification network model L_index。

Further, in step S33), the multi-classification deep learning classification network includes five convolutional layers and three pooling layers.

Further, the trained target recognition network model M₁And a trained scene recognition network model M₂The ResNet50 network model was used separately.

The invention has the beneficial effects that: the invention designs a theme mining and behavior analysis method aiming at a video moving target, which comprises the steps of firstly, equally dividing a video frame sequence into a plurality of video frame segments, extracting space-time interest points in the video frame sequence, accurately capturing moving target frames contained in a video, and constructing feature expression. In addition, the invention obtains the theme matrix of the video by using the streaming non-negative matrix decomposition algorithm, and the theme mining result is more accurate by increasing the weight matrix and the constraint coefficient. Finally, the double-current convolution neural network is adopted, the target identification is carried out on the subject frame, then the corresponding behavior classification network is further selected to obtain the behavior label of the target, the motion attribute of the target in the video can be accurately captured, the time correlation information among the video frames can be more accurately expressed, and the classification accuracy rate is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the embodiments are briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

Fig. 1 is a schematic flowchart of a topic mining and behavior analysis method for a video moving object according to this embodiment.

Fig. 2 is a schematic flow chart of obtaining the theme matrix w according to the first embodiment.

Fig. 3 is a schematic flow chart of obtaining a behavior category of a video moving object by using a topic matrix w according to the first embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.

In a first embodiment, a topic mining and behavior analysis method for a video moving object, as shown in fig. 1, includes the following steps:

s1) obtaining the video frame sequence, and extracting the characteristic matrix Y of the video frame sequence, comprising the following steps:

s13) calculating a three-dimensional spatio-temporal secondary moment matrix from the result of gaussian convolution of the sequence of video frames I (x, y, t), comprising the steps of:

S14) obtaining the eigenvalue of the three-dimensional space-time secondary moment matrix, constructing a discriminant function related to the eigenvalue, obtaining all positive large-value points of the discriminant function in time and space, taking the positive large-value points as the detected interest points, obtaining all interest points of the video frame sequence, and taking the positions in the video frame sequence corresponding to the positive large-value points as the positions of the detected interest points.

Step S14), obtaining the eigenvalue of the three-dimensional space-time second moment matrix, constructing a discrimination function related to the eigenvalue, and obtaining all positive large value points of the discrimination function in time and space, comprising the following steps:

S142) constructing three eigenvalues lambda of three-dimensional space-time second moment matrix mu₁、λ₂And λ₃The discrimination function R of the correlation is equal to λ₁λ₂λ₃-k(λ₁+λ₂+λ₃)³K represents an empirical coefficient, k is more than or equal to 0.01 and less than or equal to 0.07;

S15) extracting feature joint descriptors from all interest points of the video frame sequence respectively to obtain a feature joint description set { z } ═ z of the video frame sequence₁,z₂,…z_v,…,z_N},z_vFeatures representing the v-th video frame segmentA subset of the union descriptors is combined with the description,

and the feature joint descriptor represents the ith interest point of the vth video frame segment, wherein M is the total number of interest points of the vth video frame segment, i belongs to {1,2, …, M }, and v belongs to {1,2, …, N }.

Step S15), respectively extracting feature joint descriptors for all interest points of the video frame sequence, including the following steps:

And optical flow histogram descriptor

s152) describing the directional gradient histogram

And optical flow histogram descriptor

Splicing to obtain HOG/HOF joint descriptor of j interest point

Y∈R^K×N。

S2) performing topic mining using the feature matrix Y to obtain a topic matrix w, as shown in fig. 2, including the following steps:

s21) establishing an N-dimensional edge weight matrix

s23) decomposing the coding matrix Y into a first non-negative matrix W and a second non-negative matrix H by using a non-negative matrix decomposition method, wherein Y is approximately equal to WH, updating the first non-negative matrix W and the second non-negative matrix H by using an iteration rule to obtain a first non-negative matrix W after the updating is finished, and taking the first non-negative matrix W after the updating is finished as a subject matrix W, and the method comprises the following steps:

S3) performing behavior analysis on the video frame sequence by using the topic matrix w to obtain a behavior category of the video moving object, as shown in fig. 3, including the following steps:

y_qFor the qth column vector in the feature matrix Y, q ∈ [1, N]，w_eIs the e-th column vector of the topic matrix w.

Step S31), so that

The maximum element value in the vector, the position index of the maximum element value in the vector corresponds to the index-th multi-class deep learning classification network models in the T trained multi-class deep learning classification network models, and the index-th multi-class deep learning classification network models are used as the video frame segment I (e)^*) Corresponding multi-classification deep learning classification network model L_index。

S32) recording the number of the moving target types as T, and acquiring a trained target recognition network model M₁And a trained scene recognition network model M₂(ii) a Trained target recognition network model M₁And a trained scene recognition network model M₂The ResNet50 network model was used separately.

S33), setting the behavior types of each moving object as M types, obtaining T trained multi-classification deep learning classification network models, recording the T trained multi-classification deep learning classification network models as a network model set { L }, wherein the multi-classification deep learning classification network comprises five convolutional layers and three pooling layers.

And scene recognition result vector

S35) from the network model set

In the acquisition and video frame fragment I (e)^*) Corresponding multi-classification deep learning classification network model

The invention designs a theme mining and behavior analysis method aiming at a video moving target, which comprises the steps of firstly, equally dividing a video frame sequence into a plurality of video frame segments, extracting space-time interest points in the video frame sequence, accurately capturing moving target frames contained in a video, and constructing feature expression. In addition, the invention obtains the theme matrix of the video by using the streaming non-negative matrix decomposition algorithm, and the theme mining result is more accurate by increasing the weight matrix and the constraint coefficient. Finally, the invention adopts a double-current convolutional neural network to identify the target of the subject frame and then further selects the corresponding behavior classification network to obtain the behavior label of the target.

By adopting the technical scheme disclosed by the invention, the following beneficial effects are obtained:

the method extracts the image areas with obvious changes in the video frame, further constructs the video expression, and can accurately capture the motion attribute of the target in the video.

The topic mining algorithm adopts a non-negative matrix decomposition algorithm based on stream, optimizes a solving function by using a weight matrix and a constraint condition, mines the topic abstract and more accurately expresses the time correlation information between video frames.

The invention provides a behavior multi-classification model based on a double-current convolutional network, which is used for obtaining behavior labels aiming at mined subjects and improving the classification accuracy.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that it will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the principle of the present invention, and such modifications and improvements should also be considered within the scope of the present invention.

Claims

1. A topic mining and behavior analysis method for a video moving target is characterized by comprising the following steps:

2. The topic mining and behavior analysis method for video moving objects according to claim 1, wherein the step S1) of obtaining a video frame sequence and extracting a feature matrix Y of the video frame sequence comprises the following steps:

Wherein sigma²,τ²The variances of the spatial dimension and the temporal dimension of the video frame sequence I (x, y, t), respectively, and f (-) is a mapping function for mapping the video frame sequence I (x, y, t) to corresponding pixels in the image sequence;

s14) obtaining the eigenvalue of the three-dimensional space-time secondary moment matrix, constructing a discriminant function related to the eigenvalue, obtaining all positive large-value points of the discriminant function in time and space, taking the positive large-value points as the detected interest points, and obtaining all interest points of the video frame sequence, wherein the positions in the video frame sequence corresponding to the positive large-value points are the positions of the detected interest points;

s15) extracting feature joint descriptors for all interest points of the video frame sequence, respectively, to obtain a feature joint description set { z } ═ z } of the video frame sequence₁,z₂,…z_v,…,z_N},z_vJoint description subset of features representing the v-th video frame segmentIn the synthesis process, the raw materials are mixed,

s16) clustering the feature joint description set { z } of the video frame sequence by using a K-means method to obtain a clustering result B ═ B of K clustering centers₁、b₂、…、b_K]，b_kFeature vector representing the Kth clustering center, K ∈ R⁺；

S19) repeating the steps S17) to S18) in turn to obtain the normalized coding vectors of all the video frame segments, and constructing a feature matrix Y by using the normalized coding vectors of all the video frame segments, wherein the feature matrix Y is

Y∈R^K×N。

3. The topic mining and behavior analysis method for video moving objects according to claim 2, wherein in step S13), the method for calculating the three-dimensional spatio-temporal secondary moment matrix according to the gaussian convolution result of the video frame sequence I (x, y, t) comprises the following steps:

s131) the result of the gaussian convolution L (x, y, t; sigma²,τ²) Calculating partial derivatives to obtain a Gaussian convolution result L (x, y, t; sigma²,τ²) Partial derivatives L in relation to the x-coordinate of the spatial dimension_XGaussian convolution result L (x, y, t; sigma)²,τ²) Partial derivatives L with respect to the y-coordinate of the spatial dimension_yAnd the Gaussian convolution result L (x, y, t; sigma)²,τ²) Partial derivatives L with respect to the time dimension t_t；

S132) calculating a gaussian convolution result L (x, y, t; sigma²,τ²) Partial derivatives L in relation to the x-coordinate of the spatial dimension_XThe Gaussian convolution result L (x, y, t; sigma)²,τ²) Partial derivatives L with respect to the y-coordinate of the spatial dimension_yAnd the result L (x, y, t; sigma) of the Gaussian convolution²,τ²) Partial derivatives L with respect to the time dimension t_tCalculating a three-dimensional space-time secondary moment matrix mu, said three-dimensional space-time secondary moment matrix

4. The method for topic mining and behavior analysis of video moving objects according to claim 3, wherein in step S14), the eigenvalues of the three-dimensional space-time secondary moment matrix are obtained, the discriminant function related to the eigenvalues is constructed, and all positive large-value points of the discriminant function in time and space are obtained, comprising the following steps:

S142) constructing three eigenvalues lambda of three-dimensional space-time second moment matrix mu₁、λ₂And λ₃A related discriminant function R, said discriminant function R being λ₁λ₂λ₃-k(λ₁+λ₂+λ₃)³K represents an empirical coefficient;

5. The topic mining and behavior analysis method for video motion objects according to claim 2 or 4, wherein in step S15), the method comprises the following steps of respectively extracting feature joint descriptors for all interest points of a video frame sequence:

s151) acquiring a cuboid region near the jth interest point of the video frame sequence, and recording the cuboid region near the jth interest point as (delta)_x,△_y,△_t)^jFor the rectangular area (Delta) near the j-th interest point_x,△_y,△_t)^jComputing normalized histogram of oriented gradients descriptors

And optical flow histogram descriptor

s152) describing the histogram of directional gradients

And the optical flow histogram descriptor

Splicing to obtain HOG/HOF joint descriptor of j interest point

6. The method for topic mining and behavior analysis of a video moving object according to claim 1, wherein in step S2), topic mining is performed by using the feature matrix Y to obtain a topic matrix w, comprising the following steps:

s21) establishing an N-dimensional edge weight matrix

s22) according to the N-dimensional edge weight matrix P_WConstructing an N-dimensional diagonal matrix P_DSaid diagonal matrix P_DThe element value of the mth row on the main diagonal is the N-dimensional edge weight matrix P_WThe sum of all element values on line m;

s23) decomposing the coding matrix Y into a first non-negative matrix W and a second non-negative matrix H by using a non-negative matrix decomposition method, wherein Y is approximately equal to WH, updating the first non-negative matrix W and the second non-negative matrix H by using an iteration rule to obtain a first non-negative matrix W after the updating is finished, and taking the first non-negative matrix W after the updating is finished as a subject matrix W.

7. The method for topic mining and behavior analysis of video motion objects according to claim 6, wherein in step S23), the method decomposes the encoding matrix Y into a first non-negative matrix W and a second non-negative matrix H by non-negative matrix decomposition, Y ≈ WH, updates the first non-negative matrix W and the second non-negative matrix H by using an iteration rule to obtain a first non-negative matrix W after completing the update, and uses the first non-negative matrix W after completing the update as the topic matrix W, comprising the following steps:

s234) repeating the steps S232) to S233) in sequence until the optimization function converges to a minimum value, ending the iteration, obtaining a first non-negative matrix W after the updating is finished, and taking the first non-negative matrix W after the updating is finished as a subject matrix W.

8. The topic mining and behavior analysis method for the video moving object according to claim 1 or 7, wherein in step S3), the behavior analysis is performed on the video frame sequence by using the topic matrix w to obtain the behavior category of the video moving object, comprising the following steps:

s31) obtaining the corresponding video frame segment index e in the video frame sequence I according to the theme matrix w^*Will index e^*The corresponding video frame segment is denoted as I (e)^*)，

y_qFor the q-th column vector in the feature matrix Y，q∈[1,N]，w_eAn e-th column vector of the theme matrix w;

s32) recording the number of the moving target types as T, and acquiring a trained target recognition network model

And a trained scene recognition network model

S33) setting the behavior type of each moving target as M, obtaining T trained multi-classification deep learning classification network models, and recording the T trained multi-classification deep learning classification network models as a network model set

S34) using the target recognition network model

And said scene recognition network model

For video frame segment I (e)^*) Identifying to respectively obtain target identification result vectors

And scene recognition result vector

S35) from the set of network models

To the video frame fragment I (e)^*) Corresponding multi-classification deep learning classification network model

S36) utilizing the video frame segment I (e)^*) Corresponding multi-classification deep learning classification network model

For video frame segment I (e)^*) And performing behavior identification to obtain the behavior category of the video moving target.

9. The topic mining and behavior analysis method for video motion targets of claim 8, wherein in step S33), the multi-classification deep learning classification network comprises five convolutional layers and three pooling layers.

10. The method of claim 8, wherein the trained object recognition network model M is a model of a video motion object₁And the trained scene recognition network model M₂The ResNet50 network model was used separately.