CN112347879B - Theme mining and behavior analysis method for video moving target - Google Patents
Theme mining and behavior analysis method for video moving target Download PDFInfo
- Publication number
- CN112347879B CN112347879B CN202011165718.8A CN202011165718A CN112347879B CN 112347879 B CN112347879 B CN 112347879B CN 202011165718 A CN202011165718 A CN 202011165718A CN 112347879 B CN112347879 B CN 112347879B
- Authority
- CN
- China
- Prior art keywords
- matrix
- video frame
- video
- frame sequence
- negative
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
- G06V20/42—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
- G06V10/464—Salient features, e.g. scale invariant feature transforms [SIFT] using a plurality of salient features, e.g. bag-of-words [BoW] representations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/50—Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biomedical Technology (AREA)
- Evolutionary Biology (AREA)
- Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to the field of image processing, and discloses a theme mining and behavior analysis method for a video moving target, which comprises the following steps: s1) obtaining a video frame sequence, and extracting a characteristic matrix of the video frame sequence; s2) performing theme mining by using the feature matrix to obtain a theme matrix; s3) performing behavior analysis on the video frame sequence by using the theme matrix to obtain the behavior category of the video moving object. The method extracts the image areas with obvious changes in the video frame, further constructs the video expression, and can accurately capture the motion attribute of the target in the video. In addition, the invention adopts a non-negative matrix decomposition algorithm based on stream, optimizes a solving function by using a weight matrix and a constraint condition, excavates a subject summary and more accurately expresses time correlation information among video frames. The invention also provides a behavior multi-classification model based on the double-current convolutional network, so that behavior labels aiming at the mined subjects are obtained, and the classification accuracy is improved.
Description
Technical Field
The invention relates to the field of image processing, in particular to a theme mining and behavior analysis method for a video moving target.
Background
In recent years, with the rapid development of the internet, a large amount of video information gradually becomes an important medium for people to perceive external things, such as a live platform, a monitoring stream, and the like. Due to the fact that scenes and content targets in the video are complex and complicated and the video duration is long, the video cannot provide overview like texts, the video also contains a large amount of redundant blank information, people cannot analyze one video source efficiently in a short time, and a large amount of manpower and time cost are consumed. Generally, a main object in a video can reflect information of the video itself when moving, and therefore how to accurately mine the object and the motion attribute in the video and analyze the behavior thereof is a problem that needs to be solved urgently. In the prior art, methods such as key frames are mainly adopted for video abstraction, and color gamut space information in an image is captured through a classification algorithm, but the algorithm is only used for obtaining a single image and cannot obtain a moving object and corresponding behavior attributes. In addition, although the video clip can be obtained by performing window sliding on the basis of the key frame, it is difficult to ensure whether the clip contains accurate information.
For example, the national patent publication CN108848422A discloses a video abstract generation method based on target detection, which obtains and labels a picture set including more than 2 target objects in a training stage, establishes a deep learning network, and trains the network by using the training data set to obtain a trained deep learning network. In the using stage, a section of video is obtained, the video is divided into frames, the video frames are input into a trained network, and the network outputs the feature vector of the target object contained in each video frame, the position vector corresponding to the target object and the original image of the video frame containing the target object. And finally, clustering all the feature vectors to obtain a video abstract result. Although the invention discloses a video abstract generation method based on target detection, the method only identifies targets in the whole video and then clusters the targets to obtain an abstract, and key abstract information such as target motion behaviors and the like in one video cannot be accurately described.
Disclosure of Invention
The invention provides a theme mining and behavior analysis method aiming at a video moving target, so that the problems in the prior art are solved.
A topic mining and behavior analysis method for a video moving target comprises the following steps:
s1) obtaining a video frame sequence, and extracting a characteristic matrix Y of the video frame sequence;
s2) performing theme mining by using the feature matrix Y to obtain a theme matrix w;
s3) performing behavior analysis on the video frame sequence by using the theme matrix w to obtain the behavior category of the video moving object.
Further, in step S1), acquiring a video frame sequence, and extracting a feature matrix Y of the video frame sequence, includes the following steps:
s11) obtaining a video frame sequence I (x, y, t) containing the motion of a video moving object, segmenting the video frame sequence I (x, y, t) into N video frame segments, wherein x and y respectively represent an x coordinate and a y coordinate in a space dimension, and t represents time;
s12) performing Gaussian convolution on the video frame sequence I (x, y, t) to obtain a Gaussian convolution result of the video frame sequence I (x, y, t)Wherein sigma2,τ2The variances of the spatial dimension and the temporal dimension of the sequence of video frames I (x, y, t), respectively, f (-) is a mapping function that maps the sequence of video frames I (x, y, t) to corresponding pixels in the sequence of images;
s13) calculating a three-dimensional space-time secondary moment matrix according to the Gaussian convolution result of the video frame sequence I (x, y, t);
s14) obtaining the eigenvalue of the three-dimensional space-time secondary moment matrix, constructing a discriminant function related to the eigenvalue, obtaining all positive large-value points of the discriminant function in time and space, taking the positive large-value points as the detected interest points, obtaining all interest points of the video frame sequence, and taking the positions in the video frame sequence corresponding to the positive large-value points as the positions of the detected interest points;
s15) extracting feature joint descriptors from all interest points of the video frame sequence respectively to obtain a feature joint description set { z } ═ z of the video frame sequence1,z2,…zv,…,zN},zvFeature association descriptor representing a vth video frame segmentIn the collection of the images, the image data is collected, a feature joint descriptor representing the ith interest point of the vth video frame segment, wherein M is the total number of interest points of the vth video frame segment, i belongs to {1,2, …, M }, and v belongs to {1,2, …, N };
s16) clustering the feature joint description set { z } of the video frame sequence by using a K-means method to obtain a clustering result B ═ B of K clustering centers1、b2、…、bK],bkThe characteristic vector representing the Kth clustering center, K belongs to R+;
S17) calculating the coding vector of the v-th video frame segment according to the clustering result B of the K clustering centersciFor intermediate coded vectors, intermediate coded vectors ciSatisfy the requirement of
S18) encoding vector C for the v-th video frame segmentvNormalization is carried out to obtain a normalized coding vector
S19) repeating the steps S17) to S18) in turn to obtain the normalized coding vectors of all the video frame segments, and constructing a feature matrix Y and a feature matrix by using the normalized coding vectors of all the video frame segmentsY∈RK×N。
Further, in step S13), calculating a three-dimensional space-time second moment matrix according to the result of the gaussian convolution of the sequence of video frames I (x, y, t), including the following steps:
s131) a result L (x, y, t; sigma2,τ2) Calculating partial derivatives to obtain a Gaussian convolution result L (x, y, t; sigma2,τ2) Partial derivatives L in relation to the x-coordinate of the spatial dimensionXGaussian convolution result L (x, y, t; sigma)2,τ2) Partial derivatives L with respect to the y-coordinate of the spatial dimensionyAnd the Gaussian convolution result L (x, y, t; sigma)2,τ2) Partial derivatives L with respect to the time dimension tt;
S132) based on the gaussian convolution result L (x, y, t; sigma2,τ2) Partial derivatives L in relation to the x-coordinate of the spatial dimensionXThe Gaussian convolution result L (x, y, t; sigma)2,τ2) Partial derivatives L with respect to the y-coordinate of the spatial dimensionyAnd the Gaussian convolution result L (x, y, t; sigma)2,τ2) Partial derivatives L with respect to the time dimension ttCalculating a three-dimensional space-time second moment matrix mu, a three-dimensional space-time second moment matrix
Further, in step S14), obtaining eigenvalues of the three-dimensional space-time secondary moment matrix, constructing a discriminant function related to the eigenvalues, and obtaining all positive large-value points of the discriminant function in time and space, including the following steps:
s141) obtaining three eigenvalues lambda of three-dimensional space-time second moment matrix mu1、λ2And λ3;
S142) constructing three eigenvalues lambda of three-dimensional space-time second moment matrix mu1、λ2And λ3The discrimination function R of the correlation is equal to λ1λ2λ3-k(λ1+λ2+λ3)3K represents an empirical coefficient;
s143) obtaining all positive large-value points of the discriminant function R in time and space.
In step S142), k represents an empirical coefficient, and k is greater than or equal to 0.01 and less than or equal to 0.07.
Further, in step S15), the method for extracting feature joint descriptors for all the points of interest of the video frame sequence includes the following steps:
s151) acquiring a rectangular area near the jth interest point of the video frame sequence, and recording the rectangular area near the jth interest point as (delta)x,△y,△t)jFor a rectangular region (Delta) near the jth interest pointx,△y,△t)jComputing normalized histogram of oriented gradients descriptorsAnd optical flow histogram descriptorj is equal to {1,2, …, d }, and d is the total number of interest points of the video frame sequence;
s152) describing the directional gradient histogramAnd optical flow histogram descriptorSplicing to obtain HOG/HOF joint descriptor of j interest pointTaking the HOG/HOF joint descriptor of the jth interest point as a feature joint descriptor of the jth interest point of the video frame sequence;
s153) repeating steps S151) to S152) in turn, obtaining feature joint descriptors of all interest points of the video frame sequence.
The method extracts the image area (namely the cuboid area near the interest point) with obvious change in the video frame, and utilizes the direction gradient histogram descriptor and the optical flow histogram descriptor to further construct the video feature expression, so that the motion attribute of the target in the video can be accurately captured.
Further, in step S2), performing topic mining by using the feature matrix Y to obtain a topic matrix w, including the following steps:
s21) establishing an N-dimensional edge weight matrixp is a positive integer and p<N, m and N are weight matrixes P respectivelyWThe row and column index of (c), m ∈ {1,2, …, N }, N ∈ {1,2, …, N };
s22) according to the N-dimensional edge weight matrix PWConstructing an N-dimensional diagonal matrix PDDiagonal matrix PDThe element value of the mth row on the main diagonal is an N-dimensional edge weight matrix PWThe sum of all element values on line m;
s23) decomposing the coding matrix Y into a first non-negative matrix W and a second non-negative matrix H by using a non-negative matrix decomposition method, wherein Y is approximately equal to WH, updating the first non-negative matrix W and the second non-negative matrix H by using an iteration rule to obtain a first non-negative matrix W after updating is finished, and taking the first non-negative matrix W after updating as a subject matrix W.
Further, in step S23), decomposing the encoding matrix Y into a first non-negative matrix W and a second non-negative matrix H by using a non-negative matrix decomposition method, where Y ≈ WH, updating the first non-negative matrix W and the second non-negative matrix H by using an iteration rule to obtain a first non-negative matrix W after the updating is completed, and taking the first non-negative matrix W after the updating as a subject matrix W, including the following steps:
s231) randomly initializing a Kxr random matrix, taking the Kxr random matrix as a first non-negative matrix W, randomly initializing an r × N random matrix, taking the r × N random matrix as a second non-negative matrix H, wherein each element value in the Kxr random matrix and the r × N random matrix is a random number between 0 and 1, and r is a preset subject number;
s232) respectively updating the first non-negative matrix W and the second non-negative matrix H by utilizing an iteration rule to obtain an updated first non-negative matrix W and an updated second non-negative matrix H, wherein the iteration rule isWherein, beta is a constraint coefficient, and beta belongs to [0,1 ]]Subscripts e and q are respectively shown inShowing the serial number of the matrix row and the serial number of the matrix column;
s233) calculating an optimization function by using the updated first non-negative matrix W and the updated second non-negative matrix HHe+1E +1 th column vector, H, representing the updated second non-negative matrix HeAn e-th column vector representing the updated second non-negative matrix H;
s234) repeating the steps S232) to S233) in sequence until the optimization function converges to a minimum value, ending the iteration, obtaining the updated first non-negative matrix W, and taking the updated first non-negative matrix W as the theme matrix W.
The topic mining algorithm adopts a non-negative matrix decomposition algorithm based on stream, optimizes a solving function by using a weight matrix and a constraint condition, mines the topic abstract and can more accurately express the time correlation information between video frames.
Further, in step S3), performing behavior analysis on the video frame sequence by using the topic matrix w to obtain a behavior category of the video moving object, including the following steps:
s31) obtaining the corresponding video frame segment index e in the video frame sequence I according to the theme matrix w*Introducing an index e*The corresponding video frame segment is denoted as I (e)*),yqFor the qth column vector in the feature matrix Y, q ∈ [1, N],weAn e-th column vector of the theme matrix w;
s32) recording the number of the moving target types as T, and acquiring a trained target recognition network model M1And a trained scene recognition network model M2;
S33) setting the behavior type of each moving target as M, acquiring T trained multi-classification deep learning classification network models, and recording the T trained multi-classification deep learning classification network models as a network model set { L };
s34) using the object recognition network model M1And scene recognition network model M2For video frame segment I (e)*) Identifying to respectively obtain target identification result vectorsAnd scene recognition result vector
S35) obtaining the video frame fragment I (e) from the network model set { L }*) Corresponding multi-classification deep learning classification network model
S36) utilizing the video frame segment I (e)*) Corresponding multi-classification deep learning classification network model LindexFor video frame segment I (e)*) And performing behavior identification to obtain the behavior category of the video moving target.
Step S31), so thatThe maximum e value is the topic matrix w to obtain the corresponding video frame segment index e in the video frame sequence I*Thereby obtaining the index e of video frame segment*Corresponding to e*A video frame segment I (e)*). In step S35), acquisitionThe maximum element value in the vector, the position index of the maximum element value in the vector corresponds to the index-th multi-class deep learning classification network models in the T trained multi-class deep learning classification network models, and the index-th multi-class deep learning classification network models are used as the video frame segment I (e)*) Corresponding multi-classification deep learning classification network model Lindex。
Further, in step S33), the multi-classification deep learning classification network includes five convolutional layers and three pooling layers.
Further, the trained target recognition network model M1And a trained scene recognition network model M2The ResNet50 network model was used separately.
The invention has the beneficial effects that: the invention designs a theme mining and behavior analysis method aiming at a video moving target, which comprises the steps of firstly, equally dividing a video frame sequence into a plurality of video frame segments, extracting space-time interest points in the video frame sequence, accurately capturing moving target frames contained in a video, and constructing feature expression. In addition, the invention obtains the theme matrix of the video by using the streaming non-negative matrix decomposition algorithm, and the theme mining result is more accurate by increasing the weight matrix and the constraint coefficient. Finally, the double-current convolution neural network is adopted, the target identification is carried out on the subject frame, then the corresponding behavior classification network is further selected to obtain the behavior label of the target, the motion attribute of the target in the video can be accurately captured, the time correlation information among the video frames can be more accurately expressed, and the classification accuracy rate is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the embodiments are briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
Fig. 1 is a schematic flowchart of a topic mining and behavior analysis method for a video moving object according to this embodiment.
Fig. 2 is a schematic flow chart of obtaining the theme matrix w according to the first embodiment.
Fig. 3 is a schematic flow chart of obtaining a behavior category of a video moving object by using a topic matrix w according to the first embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.
In a first embodiment, a topic mining and behavior analysis method for a video moving object, as shown in fig. 1, includes the following steps:
s1) obtaining the video frame sequence, and extracting the characteristic matrix Y of the video frame sequence, comprising the following steps:
s11) obtaining a video frame sequence I (x, y, t) containing the motion of a video moving object, segmenting the video frame sequence I (x, y, t) into N video frame segments, wherein x and y respectively represent an x coordinate and a y coordinate in a space dimension, and t represents time;
s12) performing Gaussian convolution on the video frame sequence I (x, y, t) to obtain a Gaussian convolution result of the video frame sequence I (x, y, t)Wherein sigma2,τ2The variances of the spatial dimension and the temporal dimension of the sequence of video frames I (x, y, t), respectively, f (-) is a mapping function that maps the sequence of video frames I (x, y, t) to corresponding pixels in the sequence of images;
s13) calculating a three-dimensional spatio-temporal secondary moment matrix from the result of gaussian convolution of the sequence of video frames I (x, y, t), comprising the steps of:
s131) a result L (x, y, t; sigma2,τ2) Calculating partial derivatives to obtain a Gaussian convolution result L (x, y, t; sigma2,τ2) Partial derivatives L in relation to the x-coordinate of the spatial dimensionXGaussian convolution result L (x, y, t; sigma)2,τ2) Partial derivatives L with respect to the y-coordinate of the spatial dimensionyAnd the Gaussian convolution result L (x, y, t; sigma)2,τ2) Partial derivatives L with respect to the time dimension tt;
S132) based on the gaussian convolution result L (x, y, t; sigma2,τ2) Partial derivatives L in relation to the x-coordinate of the spatial dimensionXThe Gaussian convolution result L (x, y, t; sigma)2,τ2) Partial derivatives L with respect to the y-coordinate of the spatial dimensionyAnd the Gaussian convolution result L (x, y, t; sigma)2,τ2) Partial derivatives L with respect to the time dimension ttCalculating a three-dimensional space-time second moment matrix mu, a three-dimensional space-time second moment matrix
S14) obtaining the eigenvalue of the three-dimensional space-time secondary moment matrix, constructing a discriminant function related to the eigenvalue, obtaining all positive large-value points of the discriminant function in time and space, taking the positive large-value points as the detected interest points, obtaining all interest points of the video frame sequence, and taking the positions in the video frame sequence corresponding to the positive large-value points as the positions of the detected interest points.
Step S14), obtaining the eigenvalue of the three-dimensional space-time second moment matrix, constructing a discrimination function related to the eigenvalue, and obtaining all positive large value points of the discrimination function in time and space, comprising the following steps:
s141) obtaining three eigenvalues lambda of three-dimensional space-time second moment matrix mu1、λ2And λ3;
S142) constructing three eigenvalues lambda of three-dimensional space-time second moment matrix mu1、λ2And λ3The discrimination function R of the correlation is equal to λ1λ2λ3-k(λ1+λ2+λ3)3K represents an empirical coefficient, k is more than or equal to 0.01 and less than or equal to 0.07;
s143) obtaining all positive large-value points of the discriminant function R in time and space.
S15) extracting feature joint descriptors from all interest points of the video frame sequence respectively to obtain a feature joint description set { z } ═ z of the video frame sequence1,z2,…zv,…,zN},zvA feature association descriptor set representing the v-th video frame segment, and the feature joint descriptor represents the ith interest point of the vth video frame segment, wherein M is the total number of interest points of the vth video frame segment, i belongs to {1,2, …, M }, and v belongs to {1,2, …, N }.
Step S15), respectively extracting feature joint descriptors for all interest points of the video frame sequence, including the following steps:
s151) acquiring a rectangular area near the jth interest point of the video frame sequence, and recording the rectangular area near the jth interest point as (delta)x,△y,△t)jFor a rectangular region (Delta) near the jth interest pointx,△y,△t)jComputing normalized histogram of oriented gradients descriptorsAnd optical flow histogram descriptorj is equal to {1,2, …, d }, and d is the total number of interest points of the video frame sequence;
s152) describing the directional gradient histogramAnd optical flow histogram descriptorSplicing to obtain HOG/HOF joint descriptor of j interest pointTaking the HOG/HOF joint descriptor of the jth interest point as a feature joint descriptor of the jth interest point of the video frame sequence;
s153) repeating steps S151) to S152) in turn, obtaining feature joint descriptors of all interest points of the video frame sequence.
The method extracts the image area (namely the cuboid area near the interest point) with obvious change in the video frame, and utilizes the direction gradient histogram descriptor and the optical flow histogram descriptor to further construct the video feature expression, so that the motion attribute of the target in the video can be accurately captured.
S16) clustering the feature joint description set { z } of the video frame sequence by using a K-means method to obtain a clustering result B ═ B of K clustering centers1、b2、…、bK],bkThe characteristic vector representing the Kth clustering center, K belongs to R+;
S17) calculating the coding vector of the v-th video frame segment according to the clustering result B of the K clustering centersciFor intermediate coded vectors, intermediate coded vectors ciSatisfy the requirement of
S18) encoding vector C for the v-th video frame segmentvNormalization is carried out to obtain a normalized coding vector
S19) repeating the steps S17) to S18) in turn to obtain the normalized coding vectors of all the video frame segments, and constructing a feature matrix Y and a feature matrix by using the normalized coding vectors of all the video frame segmentsY∈RK×N。
S2) performing topic mining using the feature matrix Y to obtain a topic matrix w, as shown in fig. 2, including the following steps:
s21) establishing an N-dimensional edge weight matrixp is a positive integer and p<N, m and N are weight matrixes P respectivelyWOf rows and columnsSequence number, m belongs to {1,2, …, N }, N belongs to {1,2, …, N };
s22) according to the N-dimensional edge weight matrix PWConstructing an N-dimensional diagonal matrix PDDiagonal matrix PDThe element value of the mth row on the main diagonal is an N-dimensional edge weight matrix PWThe sum of all element values on line m;
s23) decomposing the coding matrix Y into a first non-negative matrix W and a second non-negative matrix H by using a non-negative matrix decomposition method, wherein Y is approximately equal to WH, updating the first non-negative matrix W and the second non-negative matrix H by using an iteration rule to obtain a first non-negative matrix W after the updating is finished, and taking the first non-negative matrix W after the updating is finished as a subject matrix W, and the method comprises the following steps:
s231) randomly initializing a Kxr random matrix, taking the Kxr random matrix as a first non-negative matrix W, randomly initializing an r × N random matrix, taking the r × N random matrix as a second non-negative matrix H, wherein each element value in the Kxr random matrix and the r × N random matrix is a random number between 0 and 1, and r is a preset subject number;
s232) respectively updating the first non-negative matrix W and the second non-negative matrix H by utilizing an iteration rule to obtain an updated first non-negative matrix W and an updated second non-negative matrix H, wherein the iteration rule isWherein, beta is a constraint coefficient, and beta belongs to [0,1 ]]Subscripts e and q respectively represent matrix row serial numbers and matrix column serial numbers;
s233) calculating an optimization function by using the updated first non-negative matrix W and the updated second non-negative matrix HHe+1E +1 th column vector, H, representing the updated second non-negative matrix HeAn e-th column vector representing the updated second non-negative matrix H;
s234) repeating the steps S232) to S233) in sequence until the optimization function converges to a minimum value, ending the iteration, obtaining the updated first non-negative matrix W, and taking the updated first non-negative matrix W as the theme matrix W.
The topic mining algorithm adopts a non-negative matrix decomposition algorithm based on stream, optimizes a solving function by using a weight matrix and a constraint condition, mines the topic abstract and can more accurately express the time correlation information between video frames.
S3) performing behavior analysis on the video frame sequence by using the topic matrix w to obtain a behavior category of the video moving object, as shown in fig. 3, including the following steps:
s31) obtaining the corresponding video frame segment index e in the video frame sequence I according to the theme matrix w*Introducing an index e*The corresponding video frame segment is denoted as I (e)*),yqFor the qth column vector in the feature matrix Y, q ∈ [1, N],weIs the e-th column vector of the topic matrix w.
Step S31), so thatThe maximum e value is the topic matrix w to obtain the corresponding video frame segment index e in the video frame sequence I*Thereby obtaining the index e of video frame segment*Corresponding to e*A video frame segment I (e)*). In step S35), acquisitionThe maximum element value in the vector, the position index of the maximum element value in the vector corresponds to the index-th multi-class deep learning classification network models in the T trained multi-class deep learning classification network models, and the index-th multi-class deep learning classification network models are used as the video frame segment I (e)*) Corresponding multi-classification deep learning classification network model Lindex。
S32) recording the number of the moving target types as T, and acquiring a trained target recognition network model M1And one has been trainedFormed scene recognition network model M2(ii) a Trained target recognition network model M1And a trained scene recognition network model M2The ResNet50 network model was used separately.
S33), setting the behavior types of each moving object as M types, obtaining T trained multi-classification deep learning classification network models, recording the T trained multi-classification deep learning classification network models as a network model set { L }, wherein the multi-classification deep learning classification network comprises five convolutional layers and three pooling layers.
S34) using the object recognition network model M1And scene recognition network model M2For video frame segment I (e)*) Identifying to respectively obtain target identification result vectorsAnd scene recognition result vector
S35) from the network model setIn the acquisition and video frame fragment I (e)*) Corresponding multi-classification deep learning classification network model
S36) utilizing the video frame segment I (e)*) Corresponding multi-classification deep learning classification network model LindexFor video frame segment I (e)*) And performing behavior identification to obtain the behavior category of the video moving target.
The invention designs a theme mining and behavior analysis method aiming at a video moving target, which comprises the steps of firstly, equally dividing a video frame sequence into a plurality of video frame segments, extracting space-time interest points in the video frame sequence, accurately capturing moving target frames contained in a video, and constructing feature expression. In addition, the invention obtains the theme matrix of the video by using the streaming non-negative matrix decomposition algorithm, and the theme mining result is more accurate by increasing the weight matrix and the constraint coefficient. Finally, the invention adopts a double-current convolutional neural network to identify the target of the subject frame and then further selects the corresponding behavior classification network to obtain the behavior label of the target.
By adopting the technical scheme disclosed by the invention, the following beneficial effects are obtained:
the method extracts the image areas with obvious changes in the video frame, further constructs the video expression, and can accurately capture the motion attribute of the target in the video.
The topic mining algorithm adopts a non-negative matrix decomposition algorithm based on stream, optimizes a solving function by using a weight matrix and a constraint condition, mines the topic abstract and more accurately expresses the time correlation information between video frames.
The invention provides a behavior multi-classification model based on a double-current convolutional network, which is used for obtaining behavior labels aiming at mined subjects and improving the classification accuracy.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that it will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the principle of the present invention, and such modifications and improvements should also be considered within the scope of the present invention.
Claims (9)
1. A topic mining and behavior analysis method for a video moving target is characterized by comprising the following steps:
s1) obtaining a video frame sequence, and extracting a characteristic matrix Y of the video frame sequence;
s2) performing theme mining by using the feature matrix Y to obtain a theme matrix w;
s3) performing behavior analysis on the video frame sequence by using the theme matrix w to obtain the behavior category of the video moving target;
in step S3), performing behavior analysis on the video frame sequence by using the topic matrix w to obtain a behavior category of a video moving object, including the following steps:
s31) obtaining the corresponding video frame segment index e in the video frame sequence I according to the theme matrix w*Will index e*The corresponding video frame segment is denoted as I (e)*),yqFor the qth column vector in the feature matrix Y, q ∈ [1, N],weAn e-th column vector of the theme matrix w;
s32) recording the number of the moving target types as T, and acquiring a trained target recognition network modelAnd a trained scene recognition network model
S33) setting the behavior type of each moving target as M, obtaining T trained multi-classification deep learning classification network models, and recording the T trained multi-classification deep learning classification network models as a network model set
S34) using the target recognition network modelAnd said scene recognition network modelFor video frame segment I (e)*) Identifying to respectively obtain target identification result vectorsAnd scene recognition result vector
S35) from the set of network modelsTo the video frame fragment I (e)*) Corresponding multi-classification deep learning classification network model
2. The topic mining and behavior analysis method for video moving objects according to claim 1, wherein the step S1) of obtaining a video frame sequence and extracting a feature matrix Y of the video frame sequence comprises the following steps:
s11) obtaining a video frame sequence I (x, y, t) containing the motion of a video moving object, segmenting the video frame sequence I (x, y, t) into N video frame segments, wherein x and y respectively represent an x coordinate and a y coordinate in a space dimension, and t represents time;
s12) performing Gaussian convolution on the video frame sequence I (x, y, t) to obtain a Gaussian convolution result of the video frame sequence I (x, y, t)Wherein sigma2,τ2Spatial dimension and time, respectively, of a sequence of video frames I (x, y, t)The variance of the dimension, f (-) is a mapping function that maps the sequence of video frames I (x, y, t) to the corresponding pixel points in the image sequence;
s13) calculating a three-dimensional space-time secondary moment matrix according to the Gaussian convolution result of the video frame sequence I (x, y, t);
s14) obtaining the eigenvalue of the three-dimensional space-time secondary moment matrix, constructing a discriminant function related to the eigenvalue, obtaining all positive large-value points of the discriminant function in time and space, taking the positive large-value points as the detected interest points, and obtaining all interest points of the video frame sequence, wherein the positions in the video frame sequence corresponding to the positive large-value points are the positions of the detected interest points;
s15) extracting feature joint descriptors for all interest points of the video frame sequence, respectively, to obtain a feature joint description set { z } ═ z } of the video frame sequence1,z2,…zv,…,zN},zvA joint descriptor set of features representing the v-th video frame segment, a feature joint descriptor representing the ith interest point of the vth video frame segment, wherein M is the total number of interest points of the vth video frame segment, i belongs to {1,2, …, M }, and v belongs to {1,2, …, N };
s16) clustering the feature joint description set { z } of the video frame sequence by using a K-means method to obtain a clustering result B ═ B of K clustering centers1、b2、…、bK],bkFeature vector representing the Kth clustering center, K ∈ R+;
S17) calculating the coding vector of the v-th video frame segment according to the clustering result B of the K clustering centersciFor intermediate coded vectors, intermediate coded vectors ciSatisfy the requirement of
S18) encoding vector C for the v-th video frame segmentvNormalization is carried out to obtain a normalized coding vector
3. The topic mining and behavior analysis method for video moving objects according to claim 2, wherein in step S13), the method for calculating the three-dimensional spatio-temporal secondary moment matrix according to the gaussian convolution result of the video frame sequence I (x, y, t) comprises the following steps:
s131) the result of the gaussian convolution L (x, y, t; sigma2,τ2) Calculating partial derivatives to obtain a Gaussian convolution result L (x, y, t; sigma2,τ2) Partial derivatives L in relation to the x-coordinate of the spatial dimensionXGaussian convolution result L (x, y, t; sigma)2,τ2) Partial derivatives L with respect to the y-coordinate of the spatial dimensionyAnd the Gaussian convolution result L (x, y, t; sigma)2,τ2) Partial derivatives L with respect to the time dimension tt;
S132) calculating a gaussian convolution result L (x, y, t; sigma2,τ2) Partial derivatives L in relation to the x-coordinate of the spatial dimensionXThe Gaussian convolution result L (x, y, t; sigma)2,τ2) Partial derivatives L with respect to the y-coordinate of the spatial dimensionyAnd the result L (x, y, t; sigma) of the Gaussian convolution2,τ2) Partial derivatives L with respect to the time dimension ttComputing a three-dimensional space-time second moment matrix mu, said three-dimensional space-timeSecond order moment matrix
4. The method for topic mining and behavior analysis of video moving objects according to claim 3, wherein in step S14), the eigenvalues of the three-dimensional space-time secondary moment matrix are obtained, the discriminant function related to the eigenvalues is constructed, and all positive large-value points of the discriminant function in time and space are obtained, comprising the following steps:
s141) obtaining three eigenvalues lambda of three-dimensional space-time second moment matrix mu1、λ2And λ3;
S142) constructing three eigenvalues lambda of three-dimensional space-time second moment matrix mu1、λ2And λ3A related discriminant function R, said discriminant function R being λ1λ2λ3-k(λ1+λ2+λ3)3K represents an empirical coefficient;
s143) obtaining all positive large-value points of the discriminant function R in time and space.
5. The topic mining and behavior analysis method for video motion objects according to claim 2 or 4, wherein in step S15), the method comprises the following steps of respectively extracting feature joint descriptors for all interest points of a video frame sequence:
s151) acquiring a cuboid region near the jth interest point of the video frame sequence, and recording the cuboid region near the jth interest point as (delta)x,△y,△t)jFor the rectangular area (Delta) near the j-th interest pointx,△y,△t)jComputing normalized histogram of oriented gradients descriptorsAnd optical flow histogram descriptord is the total number of points of interest of the video frame sequence;
s152) describing the histogram of directional gradientsAnd the optical flow histogram descriptorSplicing to obtain HOG/HOF joint descriptor of j interest pointTaking the HOG/HOF joint descriptor of the jth interest point as a feature joint descriptor of the jth interest point of the video frame sequence;
s153) repeating steps S151) to S152) in turn, obtaining feature joint descriptors of all interest points of the video frame sequence.
6. The method for topic mining and behavior analysis of a video moving object according to claim 1, wherein in step S2), topic mining is performed by using the feature matrix Y to obtain a topic matrix w, comprising the following steps:
s21) establishing an N-dimensional edge weight matrixp is a positive integer and p<N, m and N are weight matrixes P respectivelyWThe row and column index of (c), m ∈ {1,2, …, N }, N ∈ {1,2, …, N };
s22) according to the N-dimensional edge weight matrix PWConstructing an N-dimensional diagonal matrix PDSaid diagonal matrix PDThe element value of the mth row on the main diagonal is the N-dimensional edge weight matrix PWThe sum of all element values on line m;
s23) decomposing the coding matrix Y into a first non-negative matrix W and a second non-negative matrix H by using a non-negative matrix decomposition method, wherein Y is approximately equal to WH, updating the first non-negative matrix W and the second non-negative matrix H by using an iteration rule to obtain a first non-negative matrix W after the updating is finished, and taking the first non-negative matrix W after the updating is finished as a subject matrix W.
7. The method for topic mining and behavior analysis of video motion objects according to claim 6, wherein in step S23), the method decomposes the encoding matrix Y into a first non-negative matrix W and a second non-negative matrix H by non-negative matrix decomposition, Y ≈ WH, updates the first non-negative matrix W and the second non-negative matrix H by using an iteration rule to obtain a first non-negative matrix W after completing the update, and uses the first non-negative matrix W after completing the update as the topic matrix W, comprising the following steps:
s231) randomly initializing a Kxr random matrix, taking the Kxr random matrix as a first non-negative matrix W, randomly initializing an r × N random matrix, taking the r × N random matrix as a second non-negative matrix H, wherein each element value in the Kxr random matrix and the r × N random matrix is a random number between 0 and 1, and r is a preset subject number;
s232) respectively updating the first non-negative matrix W and the second non-negative matrix H by utilizing an iteration rule to obtain an updated first non-negative matrix W and an updated second non-negative matrix H, wherein the iteration rule isWherein, beta is a constraint coefficient, and beta belongs to [0,1 ]]Subscripts e and q respectively represent matrix row serial numbers and matrix column serial numbers;
s233) calculating an optimization function by using the updated first non-negative matrix W and the updated second non-negative matrix HHe+1E +1 th column vector, H, representing the updated second non-negative matrix HeAn e-th column vector representing the updated second non-negative matrix H;
s234) repeating the steps S232) to S233) in sequence until the optimization function converges to a minimum value, ending the iteration, obtaining a first non-negative matrix W after the updating is finished, and taking the first non-negative matrix W after the updating is finished as a subject matrix W.
8. The topic mining and behavior analysis method for video motion targets of claim 1, wherein in step S33), the multi-classification deep learning classification network comprises five convolutional layers and three pooling layers.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011165718.8A CN112347879B (en) | 2020-10-27 | 2020-10-27 | Theme mining and behavior analysis method for video moving target |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011165718.8A CN112347879B (en) | 2020-10-27 | 2020-10-27 | Theme mining and behavior analysis method for video moving target |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112347879A CN112347879A (en) | 2021-02-09 |
CN112347879B true CN112347879B (en) | 2021-06-29 |
Family
ID=74360291
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011165718.8A Active CN112347879B (en) | 2020-10-27 | 2020-10-27 | Theme mining and behavior analysis method for video moving target |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112347879B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113688871B (en) * | 2021-07-26 | 2022-07-01 | 南京信息工程大学 | Transformer-based video multi-label action identification method |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2008073366A2 (en) * | 2006-12-08 | 2008-06-19 | Sobayli, Llc | Target object recognition in images and video |
CN102254328A (en) * | 2011-05-17 | 2011-11-23 | 西安电子科技大学 | Video motion characteristic extracting method based on local sparse constraint non-negative matrix factorization |
CN103824062A (en) * | 2014-03-06 | 2014-05-28 | 西安电子科技大学 | Motion identification method for human body by parts based on non-negative matrix factorization |
CN104063721A (en) * | 2014-07-04 | 2014-09-24 | 中国科学院自动化研究所 | Human behavior recognition method based on automatic semantic feature study and screening |
CN104700086A (en) * | 2015-03-20 | 2015-06-10 | 清华大学 | Excavating method of topic actions of man-machine interaction for video analysis |
CN107301382A (en) * | 2017-06-06 | 2017-10-27 | 西安电子科技大学 | The Activity recognition method of lower depth Non-negative Matrix Factorization is constrained based on Time Dependent |
CN111612054A (en) * | 2020-05-14 | 2020-09-01 | 国网河北省电力有限公司电力科学研究院 | User electricity stealing behavior identification method based on non-negative matrix factorization and density clustering |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070230774A1 (en) * | 2006-03-31 | 2007-10-04 | Sony Corporation | Identifying optimal colors for calibration and color filter array design |
US10242266B2 (en) * | 2016-03-02 | 2019-03-26 | Mitsubishi Electric Research Laboratories, Inc. | Method and system for detecting actions in videos |
US10181082B2 (en) * | 2017-02-06 | 2019-01-15 | Brown University | Method and system for automated behavior classification of test subjects |
CN108038420B (en) * | 2017-11-21 | 2020-10-30 | 华中科技大学 | Human behavior recognition method based on depth video |
CN111310605B (en) * | 2020-01-21 | 2023-09-01 | 北京迈格威科技有限公司 | Image processing method and device, electronic equipment and storage medium |
CN111274995B (en) * | 2020-02-13 | 2023-07-14 | 腾讯科技(深圳)有限公司 | Video classification method, apparatus, device and computer readable storage medium |
-
2020
- 2020-10-27 CN CN202011165718.8A patent/CN112347879B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2008073366A2 (en) * | 2006-12-08 | 2008-06-19 | Sobayli, Llc | Target object recognition in images and video |
CN102254328A (en) * | 2011-05-17 | 2011-11-23 | 西安电子科技大学 | Video motion characteristic extracting method based on local sparse constraint non-negative matrix factorization |
CN103824062A (en) * | 2014-03-06 | 2014-05-28 | 西安电子科技大学 | Motion identification method for human body by parts based on non-negative matrix factorization |
CN104063721A (en) * | 2014-07-04 | 2014-09-24 | 中国科学院自动化研究所 | Human behavior recognition method based on automatic semantic feature study and screening |
CN104700086A (en) * | 2015-03-20 | 2015-06-10 | 清华大学 | Excavating method of topic actions of man-machine interaction for video analysis |
CN107301382A (en) * | 2017-06-06 | 2017-10-27 | 西安电子科技大学 | The Activity recognition method of lower depth Non-negative Matrix Factorization is constrained based on Time Dependent |
CN111612054A (en) * | 2020-05-14 | 2020-09-01 | 国网河北省电力有限公司电力科学研究院 | User electricity stealing behavior identification method based on non-negative matrix factorization and density clustering |
Non-Patent Citations (5)
Title |
---|
On Space-Time Interest Points;Ivan Laptev;《International Journal of Computer Vision volume》;20050930;第2.2节 * |
Representative Video Action Discovery Using Interactive Non-negative Matrix Factorization;Hui Teng等;《Springer》;20151231;第3节 * |
Research on motion recognition algorithm based on bag-of-words model;Ting Huang等;《Microsystem Technologies》;20190525;第1-8页 * |
基于新特性非负矩阵分解方法的行为识别研究;卜海丽;《中国优秀硕士学位论文全文数据库基础科学辑》;20190215(第2期);A002-15 * |
基于时空兴趣点的人体行为识别研究;林博;《中国博士学位论文全文数据库信息科技辑》;20190415(第4期);I138-20 * |
Also Published As
Publication number | Publication date |
---|---|
CN112347879A (en) | 2021-02-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111709409B (en) | Face living body detection method, device, equipment and medium | |
CN111754596B (en) | Editing model generation method, device, equipment and medium for editing face image | |
CN107133569B (en) | Monitoring video multi-granularity labeling method based on generalized multi-label learning | |
CN111310731A (en) | Video recommendation method, device and equipment based on artificial intelligence and storage medium | |
CN108288051B (en) | Pedestrian re-recognition model training method and device, electronic equipment and storage medium | |
WO2022062419A1 (en) | Target re-identification method and system based on non-supervised pyramid similarity learning | |
CN108491766B (en) | End-to-end crowd counting method based on depth decision forest | |
WO2022218396A1 (en) | Image processing method and apparatus, and computer readable storage medium | |
CN105740915B (en) | A kind of collaboration dividing method merging perception information | |
CN103988232A (en) | IMAGE MATCHING by USING MOTION MANIFOLDS | |
CN113610144A (en) | Vehicle classification method based on multi-branch local attention network | |
CN109635647B (en) | Multi-picture multi-face clustering method based on constraint condition | |
CN113688890A (en) | Abnormality detection method, abnormality detection device, electronic apparatus, and computer-readable storage medium | |
CN111582410B (en) | Image recognition model training method, device, computer equipment and storage medium | |
CN113065409A (en) | Unsupervised pedestrian re-identification method based on camera distribution difference alignment constraint | |
CN116935447A (en) | Self-adaptive teacher-student structure-based unsupervised domain pedestrian re-recognition method and system | |
CN113515669A (en) | Data processing method based on artificial intelligence and related equipment | |
Naseer et al. | Pixels to precision: features fusion and random forests over labelled-based segmentation | |
CN112528058A (en) | Fine-grained image classification method based on image attribute active learning | |
CN109492610A (en) | A kind of pedestrian recognition methods, device and readable storage medium storing program for executing again | |
CN110222772B (en) | Medical image annotation recommendation method based on block-level active learning | |
CN114170558B (en) | Method, system, apparatus, medium, and article for video processing | |
CN112347879B (en) | Theme mining and behavior analysis method for video moving target | |
JP2019023801A (en) | Image recognition device, image recognition method and image recognition program | |
CN110826432B (en) | Power transmission line identification method based on aviation picture |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |