CN112329656A - Feature extraction method for human action key frame in video stream - Google Patents

Feature extraction method for human action key frame in video stream Download PDF

Info

Publication number
CN112329656A
CN112329656A CN202011246020.9A CN202011246020A CN112329656A CN 112329656 A CN112329656 A CN 112329656A CN 202011246020 A CN202011246020 A CN 202011246020A CN 112329656 A CN112329656 A CN 112329656A
Authority
CN
China
Prior art keywords
frame
motion
calculation
video stream
mhi
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011246020.9A
Other languages
Chinese (zh)
Other versions
CN112329656B (en
Inventor
宋玲
夏智敏
陈燕
叶进
石森煌
王立颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangxi University
Original Assignee
Guangxi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangxi University filed Critical Guangxi University
Priority to CN202011246020.9A priority Critical patent/CN112329656B/en
Publication of CN112329656A publication Critical patent/CN112329656A/en
Application granted granted Critical
Publication of CN112329656B publication Critical patent/CN112329656B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/28Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis

Abstract

The invention discloses a method for extracting characteristics of human action key frames in video stream, which uses a method for improving MHI (media-height interface) based on combination of Gaussian kernel function and equal-interval frame distance sampling to process video stream data without analyzing images of each frame, can effectively smooth gray value changes in a motion history image MHI so that the motion history image MHI has stronger robustness, extracts image characteristics through HOG (hyper text object graph), and then uses an NN (neural network) classifier to detect whether an action state label is changed or not and extracts the action key frames according to the change. The invention can smoothly extract the video stream of the action key frame on the premise of meeting the action classification precision.

Description

Feature extraction method for human action key frame in video stream
Technical Field
The invention relates to the technical field of target identification, in particular to a method for extracting characteristics of human action key frames in video streams.
Background
The technology of extracting key frames in video streams for changes in human body motion has been the focus of research in recent years. The video stream data has the characteristics of large information amount, complex data structure, strict time sequence characteristics and the like, and how to analyze the change of the human action state from the real-time video data of a video file or a camera and obtain corresponding action key frame data is the most critical problem in the human key point detection problem.
Disclosure of Invention
The invention provides a feature extraction method of human action key frames in video streams, which can quickly and accurately extract the action key frames in the video streams.
In order to solve the problems, the invention is realized by the following technical scheme:
a method for extracting features of human action key frames in video stream includes the following steps:
step 1: acquiring a calculation frame from video stream data by using an equal interval sampling method;
step 2: generating a historical motion map and performing motion segmentation on the calculation frame by using an improved motion historical map algorithm based on a Gaussian kernel function, and separating a human motion foreground from a background to obtain the historical motion map;
on the basis of the traditional motion history map algorithm, the improved motion history map algorithm based on the Gaussian kernel function performs incremental increase or decremental on the gray value of the calculation frame at the current moment by comparing the gray values of the corresponding pixel points of the calculation frame at the current moment and the comparison frame in the time sequence, namely:
if the difference value of the gray value of the corresponding pixel point of the calculation frame at the current moment and the comparison frame in the time sequence is larger than or equal to the set gray threshold, increasing the gray value of the pixel point of the calculation frame at the current moment
Figure BDA0002770052090000011
Wherein the omega tableDisplaying the set frame influence factor, wherein t represents the current time, and delta t represents the time difference between the calculation frame of the current time and the comparison frame in the time sequence;
if the difference value of the gray value of the corresponding pixel point of the calculation frame at the current moment and the comparison frame in the time sequence is smaller than a set gray threshold, reducing the rated attenuation coefficient sigma of the gray value of the pixel point of the calculation frame at the current moment;
and step 3: describing the motion information of the contour edge of the historical motion image by utilizing the directional gradient histogram feature, and extracting and calculating the image feature in the frame;
and 4, step 4: and (3) carrying out motion recognition on the image features by using the NN classifier, and outputting the calculation frame at the current moment as an action key frame when the motion state labels of the calculation frame at the current moment and the comparison frame in the time sequence are changed.
The method further comprises the following steps before the step 1: when the video stream data is collected, the median filter is used for carrying out preprocessing for eliminating noise on the video stream data.
The contrast frames in the above time sequence are obtained from the video stream data using an equal-interval sampling method.
The sampling interval of the comparison frame in the above timing is equal to or larger than the sampling interval of the calculation frame.
In step2, the set grayscale threshold is 127.
In step2, the attenuation coefficient σ is 30.
Compared with the prior art, the invention provides an algorithm (GMHKE) for extracting the video stream key frame based on the improved MHI and HOG characteristics of the Gaussian kernel function. The algorithm uses a method for improving MHI based on Gaussian kernel function and equal-interval frame distance sampling to process video stream data, each frame of image is not needed to be analyzed, the change of gray values in the MHI of a motion history picture can be effectively smoothed, the MHI has strong robustness, image features are extracted through HOG, and an NN classifier is used for detecting whether an action state label is changed or not and extracting action key frames according to the change of the action state label. The invention can smoothly extract the video stream of the action key frame on the premise of meeting the action classification precision.
Drawings
Fig. 1 is a flowchart of a method for extracting features of a human motion key frame in a video stream.
Fig. 2 shows the selection of attenuation coefficients in GMHI.
FIG. 3 is a diagram of action samples, (a) Walk behavior, (b) Run behavior, and (c) Collapse behavior.
Fig. 4 shows the experimental results of test set 1, (a) original video, (b) MHI, and (c) HOG features.
Fig. 5 shows the results of the test set 2 experiment, (a) raw video, (b) MHI, and (c) HOG features.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to specific examples.
Referring to fig. 1, a method for extracting features of human motion key frames (GMHKE) in a video stream includes the following steps:
step1, preprocessing the video stream data to eliminate noise by using a median filter, and then acquiring a calculation frame from the video stream data by using an equal-interval sampling method.
And 2, generating a historical motion map and performing motion segmentation on the calculation frame by using an improved motion historical map algorithm based on a Gaussian kernel function, and separating the human motion foreground from the background to obtain the historical motion map.
On the basis of the traditional motion history map algorithm, the improved motion history map algorithm based on the Gaussian kernel function performs incremental increase or decremental on the gray value of the calculation frame at the current moment by comparing the gray values of the corresponding pixel points of the calculation frame at the current moment and the comparison frame in the time sequence, namely:
when the difference value of the gray value of the corresponding pixel point of the calculation frame at the current moment and the comparison frame in the time sequence is greater than or equal to the set gray threshold value, the gray value of the pixel point of the calculation frame at the current moment is increased
Figure BDA0002770052090000031
Where ω represents the set frame influence factorAnd sub, t represents the current time, and delta t represents the time difference between the calculation frame and the comparison frame in the time sequence.
And when the difference value of the gray values of the corresponding pixel points of the calculation frame at the current moment and the comparison frame in the time sequence is smaller than a set gray threshold, reducing the rated attenuation coefficient sigma of the gray value of the pixel point of the calculation frame at the current moment.
The comparison frames in the time sequence are obtained from the video stream data by using an equal interval sampling method, namely the comparison frames in the time sequence change along with time, and the time intervals of the comparison frames in every 2 adjacent time sequences are the same. The sampling interval of the comparison frames in the time sequence is equal to or greater than the sampling interval of the calculation frames. When the sampling interval of the comparison frame is equal to that of the calculation frame, the comparison frame of the calculation frame at the current moment in the time sequence is the calculation frame at the previous moment. When the sampling interval of the comparison frame is larger than that of the calculation frame, the comparison frame of the calculation frame at the current moment in the time sequence is the calculation frame at the previous moments.
And 3, describing the motion information of the contour edge of the historical motion picture by using HOG (histogram of oriented gradients) characteristics, and extracting the image characteristics in the calculation frame.
And 4, performing motion recognition on the image features by using the NN classifier, and outputting the calculation frame at the current moment as an action key frame when the motion state labels of the calculation frame at the current moment and the comparison frame in the time sequence are changed.
The following is a description of the related art to which the present invention relates:
(1) data pre-processing
In the process of acquiring video data, due to the change of illumination intensity and angle, the existence of environmental noise can cause certain interference factors to the processing and identification of images, so that the images need to be preprocessed for eliminating noise and improving image quality. Good pre-processing results will improve the accuracy and rate of subsequent operations.
The median filtering uses a nonlinear method, filters the image by using a sliding window with the size of odd number of sampling frames, such as (2n +1) × (2n +1), arranges the gray values of the pixels in the sampling frames in sequence, and uses the median to replace the gray value in the center of a function frame, so that the method can smooth impulse noise and protect the integrity of the image edge, and has better performance on repairing salt and pepper noise.
The median filter of the filter with the size of 15 × 15 is used to remove salt and pepper noise, and the coordinates of the pixel point of the original image are (x, y), the gray value is f (x, y), and the gray value after median filtering is g (x, y), then the operational relationship is shown in formula (1):
g(x,y)=median(f(x-k,y-j),(l,j)∈w) (1)
(2) extracting and calculating frame by equal interval frame distance method
Due to the fact that the pixel overlapping rate of two adjacent frames in video stream data is high, if all frames before a certain moment are considered to be included in MHI calculation, pixel point changes are compact, the calculation amount of pixel gray values is large, and the next feature extraction is affected. Because the MHI describes the relative intensity change of the pixel points in the data stream, the time of the relative intensity change needs to be fixed, the extraction of the video frames by using methods such as clustering and the like is usually self-adaptive extraction, the sampling time interval of the extraction method generally changes along with the operation result, and therefore, the calculation frames are extracted by using a method with equal frame spacing, namely, the extraction of the calculation frames is performed by using an interval sampling rate of 1/5 frame spacing within a range of a rated frame number.
The MHI sampled at equal interval frame distance contains clearer motion information, because the MHI obtained by the method avoids extraction of redundant data, reduces the influence of pixel overlapping of adjacent frames, and sparsizes human body action, so that motion characteristics are more obvious.
(3) Improved motion history map (GMHI) based on Gaussian kernel function
The change of the human motion state can be judged only by data accumulation of a certain amount of time series data frames. Human actions form a space-time shape in a space-time volume of an action video, a Motion sequence of a human can be described by a single MHI Image, and a Motion History Map (MHI) is that time sequence data of the video is observed, and the Motion condition of each object in the video stream is represented in a brightness attenuation mode to represent the time before and after the Motion occurs, so that the MHI is used for judging the change of the Motion state. The value of a certain pixel in an original MHI algorithm is directly attenuated from a full brightness value once changed, the method is sensitive to the interference of environmental noise and external factors, and once the conditions of flicker, winged insects and the like appear in camera lens or video stream data, the correct formation of the MHI is greatly influenced.
The Gaussian function (Gaussian function) replaces the pixel value of the point by using the weighted mean of the pixel neighborhood, and the weight of each neighborhood pixel point is monotonically increased or decreased with the distance between the point and the central point, so that the value of the pixel point can be effectively and smoothly calculated, as shown in formula (2):
Figure BDA0002770052090000041
wherein the content of the first and second substances,
Figure BDA0002770052090000042
representing the height of the curve in the gaussian function; 2 omega2Representing the coordinate range of the gaussian function.
The traditional gaussian function is selected from the spatial domain of pixels, and the invention adopts the time-sequence domain of pixels, namely in a time-sequence video stream data, the influence weight of the pixel value closer to the current frame on the frame is higher, and vice versa. The GMHI algorithm is obtained by improving the update function of the MHI by using the Gaussian function as the kernel function coefficient, and only the gradually-increased operation is given to the pixel points with high pixel value change frequency, so that the influence of small-range mutation and flicker on the MHI formation of the pixel points is avoided, and the method has higher robustness and accuracy.
GMHI provides motion foreground information by recording changes at pixel points and encoding, and the update formula of the gray value H (x, y, t) of the pixel (x, y) of the t-th frame calculation frame is as follows:
Figure BDA0002770052090000043
wherein the content of the first and second substances,
Figure BDA0002770052090000044
represents the update function of the pixel (x, y) of the t-th frame calculation frame, and σ represents a given attenuation coefficient.
The motion history map can influence the expression effect because of different values of the parameters, the closer to the current moment, the brighter the pixel brightness change is, the feature extraction is convenient, the brightness of the pixel value is increased by adopting an accumulation mode, and the noise elimination capability is good. Frame t calculates an update function for a pixel (x, y) of a frame
Figure BDA0002770052090000045
Comprises the following steps:
Figure BDA0002770052090000051
where D (x, y, t) is the difference between the gray-level values of the corresponding pixels (x, y) of the tth frame calculation frame and the comparison frame in time series (i.e., the tth- Δ t frame calculation frame):
D(x,y,t)=|H(x,y,t)-H(x,y,t-Δt)| (5)
wherein, H (x, y, t) represents the gray scale value of the pixel (x, y) of the t-th frame calculation frame, and H (x, y, t- Δ t) represents the gray scale value of the pixel (x, y) of the comparison frame in time sequence of the t-th frame calculation frame.
And adding a Gaussian kernel function for smooth calculation, when the change amplitude D (x, y, t) of the gray value of the pixel point exceeds a gray threshold 127, improving the Gaussian kernel function by using the weighted value of the neighborhood of the pixel time sequence to calculate so as to gradually increase the gray value, determining the influence weight of the gray value of the pixel according to the distance delta t of the time sequence, wherein the influence weight is more obvious than the linear change in the original MHI function, and the influence of noise and use scene interference factors can be reduced by better retaining the contour of the human motion track.
After the parameters of the gaussian kernel function are determined through experiments, the calculation of the whole MHI only needs to select an attenuation coefficient sigma, and the selection of the attenuation coefficient influences the attenuation speed of the gray value of the pixel. As shown in fig. 2, when the attenuation coefficient is too large, the MHI image can record only a motion with a large motion amplitude, and when the attenuation coefficient is too small, the MHI image may be difficult to judge the direction of the motion. When the attenuation coefficients are 10, 20, 30, 40, and 50, respectively, the motion recording frame becomes less noticeable as the motion recording frame becomes longer as the coefficient increases, and the motion recording frame becomes shorter as the coefficient decreases. In experiments it can be seen that the MHI image reaches a relatively optimal state when the attenuation factor is chosen to be 30.
The basic steps of the GMHI algorithm are described as follows:
inputting: video stream data
And (3) outputting: exercise history map
step1, obtaining a calculation frame from video stream data by using an equal-interval sampling method;
step2. update the calculation frame using equation (3): when the change frequency of the pixel points in the calculation frame and the contrast frame in the time sequence is larger, accumulating the gray value of the pixel points; when the change frequency of the pixel points in the calculation frame and the contrast frame in the time sequence is less than the gray threshold, the gray value is decreased by a rated attenuation coefficient; and accordingly obtains the MHI picture.
step3. repeat step1 to step2 until the video stream data stops being input;
step4. the algorithm ends.
The foreground sequence can be represented in a compact manner by the calculated MHI. The sequence of contours belonging to an action is compressed into a gray-scale image, where the most recent motion is represented by the lighter gray-scale value pixels as shown, preserving the main motion information. The MHI is centered with respect to the centroid of the detected foreground and scaled to some fixed scale size so as to have a scaled and location invariant representation. The illumination and contrast invariance representation may be represented by dividing each pixel in the MHI by the MHIτTo obtain a unit sum, as shown in equation (6):
Figure BDA0002770052090000061
where M is the total number of rows of pixels in the MHI image and N is the total number of columns of pixels.
(4) Description of HOG features
The HOG feature descriptor is widely used for object detection and human action recognition in computer vision, can well describe motion information of contour edges in MHIs, and the part adopts a common one-dimensional operator [ -1, 0, 1] and a transpose matrix thereof to carry out gradient calculation of the MHIs, and for each MHI, the MHI is zoomed into a gray image of 48x104 pixels according to a motion foreground obtained by segmentation, so that the cell size of a pixel unit is selected to be 4x4 and the pixel block size is 8x8 on parameter selection of HOG feature extraction, wherein the gradient of the cell is divided into 8 direction blocks by 360 degrees, each bin size is 45 degrees, and L2 canonical normalization is carried out according to gradient information and directions. Each block contains 4 pixel units and 8 directional blocks, so each block contains 32 dimensional feature vectors, while each image has 8 blocks in the horizontal direction and 6 blocks in the vertical direction, and with block as scanning step size, each image has 1536 dimensional feature vectors.
(5) Change of motion state
A simple and efficient NN classifier approach is used to obtain the closest classification labels, which due to its simplicity can be applied to a large number of classification problems, effectively classifying large action classes. In the process of training and testing the data set, according to the HOG characteristic value obtained by the MHI diagram, calculating the Euclidean distance of the characteristic value, as shown in formula (7), allocating the action label with the value closest to the sample to the test sample, and finally calculating the matching success rate of the action label and the sample. When the action label is changed, the current frame is saved as a picture file and is output to the target folder.
Figure BDA0002770052090000062
Where p and q are HOG vectors for test and training samples and n is the length of the vector.
(6) MHI and HOG feature extraction action key frame based on Gaussian kernel function improvement
The steps of the algorithm for GMHKE are described as follows:
inputting: video stream data
And (3) outputting: action key frame in picture form
step1, obtaining a calculation frame from video stream data by using an equal-interval sampling method;
step2, generating a historical motion map and performing motion segmentation on the calculation frame by using GMHI of a Gaussian kernel function, and separating a human motion foreground from a background;
step3, using HOG feature extraction to obtain image features in the calculation frame;
step4, using an NN classifier to perform motion recognition on the features, and if the motion state label is changed, exporting the current frame as a picture to be output as a motion key frame;
step5. the above steps 1 to 4 until no new video stream data is input.
step6. the algorithm ends.
Suppose that a motion history map is calculated by taking M frames of video streams as an interval, N is the sampling frequency of Gaussian kernel function combined with equal interval frame distance sampling, C is the feature vector dimension extracted by HOG features, and N is the total number of actions in a data set. The complexity of the GMHKE algorithm proposed in this chapter is O (2 MNC)2/n), and the algorithm complexity for action key frame extraction using the original MHI is O (2 MNC)2) Through complexity analysis, the GMHKE algorithm is found to be n times lower in complexity than the key frame extraction using the original MHI action key frame.
2. Experiment and analysis of results
(1) Human motion recognition database
The experiment used the MuHAVi database, which is a multi-view dataset. The database is widely applied to the field of human motion recognition. The MuHAVi database uses 8 cameras to record 14 human motion behaviors obtained by recording the motions of seven testers under different scenes from angles of 0 DEG and 45 DEG, and captures 136 video files in total, wherein the video files comprise Collapse left, Collapse right, Guard to kit, Guard to Punch, Kick right, Punch right, Run left to right, Run right to left, Stand up right, Turn back left, Turn back right, Walk left to right and Walk right to left. These motion behaviors can be further grouped into eight classes, Collapse, Guard, Kick, Punch, Run, Stand up, Turn Back, Walk, such as Run left to right and Run right to left, which can be grouped into the Run action class. Samples of athletic performance in the video database are shown in fig. 3.
The experiment is based on a MuHAVi video data set, and the problem that whether the GMHKE algorithm can effectively extract the human action key frame is verified. In order to display the performance of the algorithm, a cross-validation method is adopted, namely, the video data of one tester is selected as a validation set, and the video data of the remaining 6 actors is used as a training set to test the classifier.
(2) Experimental platform configuration
The server configuration used in this experiment was as follows:
hardware platform: CPU, Inter (R) Core (TM) i5-8250U CPU @1.6GHz 1.80 GHz; GPU, NVIDA GeForce GTX 1050with Max-Q Design 4 g; solid state disk, 256G; a cloud platform, Titan XP E5-16208 core 32G 2TB hard disk.
A software platform: system environment, windows 10 family chinese version; CUDA 9.0; python 3.5; tensorflow 1.12.0; keras 2.0.8; jupyter notewood;
data set: a MuHAVi video data set; youku network video data set.
(3) Simulation experiment results and analysis
1) Classification accuracy and efficiency of GMHKE algorithm
The method of cross validation is adopted to train and validate the GMHKE algorithm and 136 video sequences in the MuHAVi video data set by using the original MHI method. The accuracy and time-consuming experimental results of GMHKE and original MHI are shown in Table 1, the GMHKE algorithm obtains 93.1% accuracy, the accuracy of the original MHI is 91.9%, the whole rate is improved by 2%, in terms of time, the GMHKE algorithm trains and tests eight types of actions of 136 video sequences, 80873 frames are used in total, 2696 MHI images are obtained, 3197s is consumed, the processing time of a single MHI image is 1.18s, each frame of the original MHI image needs to be calculated, 15175s is consumed, and most of the obtained MHI images are redundant data.
TABLE 1 accuracy and time consumption of GMHKE and original MHI
Figure BDA0002770052090000081
(2) Recognition rate of GMHKE algorithm for different behaviors
Table 2 shows the recognition rates of 8 types of behaviors in 136 video sequences of the MuHAVi video data set obtained by the GMHKE algorithm and the original MHI through the same HOG feature extraction, and the functional relation between the recognition rate P (N) and the misrecognition rate P (N | M) of a certain type of behavior is obtained as shown in equations (8) and (9):
Figure BDA0002770052090000082
Figure BDA0002770052090000083
wherein, R (N) represents the number of N types of behaviors correctly identified by the GMHKE algorithm, Q (N) is the total number of N types of behaviors labeled in the verification set, and R (N | M) is the number of M types of behaviors incorrectly identified by the Nth type of behaviors.
As can be seen from the data in Table 2, the GMHKE algorithm has the most obvious improvement on the recognition rate of Collapse, Walk, Run and Kick, and has less improvement on Stand up, Guard, Punch and Turn Back. The reason for this is that equal interval frame distance is adopted to extract the calculation frame, so that the motion characteristics are more obvious, while Stand up and Guard are in a low-motion partial static state because the human body is in a similar motion state, and the two motion states are easy to be confused, so that the lifting efficiency is lower, the Punch self-recognition rate is better, the lifting space is not large, the Turn Back action is more biased to small-scale self-motion, and the method adopting equal interval frame distance may lose part of motion information, so that the recognition rate is slightly lower than the original HMI.
TABLE 2 GMHKE Algorithm and behavior recognition rate of original MHI
Figure BDA0002770052090000084
Tables 3 and 4 are behavior recognition confusion matrices of GMHKE and original MHI, the diagonal line of the matrix is the probability that the mth behavior is correctly recognized, the nth row in the mth column represents the probability that the mth behavior is recognized as the nth behavior, and when m is unequal to n, the probability is false recognition, 0.922 in the first row in the first column in the table is that the accuracy that the Collapse behavior is correctly recognized as Collapse is 92.2%, and 0.070 in the third row in the first column represents that the Collapse behavior is incorrectly recognized as kirck is 7%.
Table 3 confusion matrix of original MHI algorithm
Figure DEST_PATH_IMAGE001
Figure DEST_PATH_IMAGE002
TABLE 4 confusion matrix for GMHKE algorithm
Figure DEST_PATH_IMAGE003
(3) Accuracy of GMHKE algorithm for extracting action key frame
Five video segments are randomly extracted in the MuHAVi video data verification set each time to carry out the verification of the accuracy of the extraction of the action key frame which is repeatedly averaged three times, and five groups of experiments are completed in total, wherein the table 5 shows the extraction rate obtained by comparing the number of the action key frames extracted by using the GMHKE algorithm with the number of action change frames labeled by the data set. The extraction rule of the action key frame is that if the motion classification label of the current frame is monitored to be changed compared with the last action key frame, the current frame is considered as the latest action key frame and is output in a picture mode, and if the current frame is not changed, the next round of detection is continued.
TABLE 5 GMHKE Algorithm extraction Key frame accuracy
Figure BDA0002770052090000093
(4) Performance of GMHKE algorithm in Internet test set
As the training set and the verification set belong to the same database, in order to avoid the occurrence of the over-fitting result, a data set from the Internet (Youku) is selected for testing, the test set 1 is derived from a demonstration video of fitness movement, wherein the demonstration video comprises 12 key movements, and the test set 2 is derived from a demonstration video of Keep fitness software, wherein the demonstration video comprises 20 key movements. Tables 6-7 and FIGS. 4-5 show the results of the test set. By using the method, the key frames with changed motion states can be extracted with higher accuracy for the input video sequence of the test set.
TABLE 6 accuracy of GMHKE algorithm in test set 1
Figure BDA0002770052090000101
TABLE 7 accuracy of GMHKE algorithm in test set 2
Figure BDA0002770052090000102
The invention provides a GMHKE algorithm which uses a method of improving GMHI based on Gaussian kernel function combined with equal-interval frame distance sampling to process video stream data, extracts image features through HOG, detects whether an action state label is changed or not through an NN classifier, extracts a motion key frame according to the change, verifies the practicability of the GMHKE algorithm through experimental simulation, and obtains a relatively ideal effect in action classification. The Gaussian kernel function is combined with the equal interval frame distance extraction calculation frame to calculate, the gray value intensity of the motion history image can be changed more stably, the robustness is better, the change of the human motion state can be described clearly by HOG (high order generalized) extracting the characteristics of different gradient information of MHI (high definition information), and the NN classifier of rapid motion classification is combined to prove that the method of the chapter can smoothly extract the video stream of the motion key frame on the premise of meeting the motion classification precision.
It should be noted that, although the above-mentioned embodiments of the present invention are illustrative, the present invention is not limited thereto, and thus the present invention is not limited to the above-mentioned embodiments. Other embodiments, which can be made by those skilled in the art in light of the teachings of the present invention, are considered to be within the scope of the present invention without departing from its principles.

Claims (6)

1. A method for extracting features of human action key frames in video stream is characterized by comprising the following steps:
step 1: acquiring a calculation frame from video stream data by using an equal interval sampling method;
step 2: generating a historical motion map and performing motion segmentation on the calculation frame by using an improved motion historical map algorithm based on a Gaussian kernel function, and separating a human motion foreground from a background to obtain the historical motion map;
on the basis of the traditional motion history map algorithm, the improved motion history map algorithm based on the Gaussian kernel function performs incremental increase or decremental on the gray value of the calculation frame at the current moment by comparing the gray values of the corresponding pixel points of the calculation frame at the current moment and the comparison frame in the time sequence, namely:
if the difference value of the gray value of the corresponding pixel point of the calculation frame at the current moment and the comparison frame in the time sequence is larger than or equal to the set gray threshold, increasing the gray value of the pixel point of the calculation frame at the current moment
Figure FDA0002770052080000011
Wherein ω represents a set frame impact factor, t represents a current time, and Δ t represents a time difference between a calculation frame at the current time and a comparison frame in a time sequence;
if the difference value of the gray value of the corresponding pixel point of the calculation frame at the current moment and the comparison frame in the time sequence is smaller than a set gray threshold, reducing the rated attenuation coefficient sigma of the gray value of the pixel point of the calculation frame at the current moment;
and step 3: describing the motion information of the contour edge of the historical motion image by utilizing the directional gradient histogram feature, and extracting and calculating the image feature in the frame;
and 4, step 4: and (3) carrying out motion recognition on the image features by using the NN classifier, and outputting the calculation frame at the current moment as an action key frame when the motion state labels of the calculation frame at the current moment and the comparison frame in the time sequence are changed.
2. The method as claimed in claim 1, wherein the method further comprises, before step 1: when the video stream data is collected, the median filter is used for carrying out preprocessing for eliminating noise on the video stream data.
3. The method as claimed in claim 1, wherein the comparison frames in time series are obtained from the video stream data by using an equal-interval sampling method.
4. The method as claimed in claim 3, wherein the sampling interval of the comparison frames in the time sequence is equal to or greater than the sampling interval of the calculation frames.
5. The method according to claim 1, wherein the threshold value of the gray scale level set in step2 is 127.
6. The method according to claim 1, wherein in step2, the attenuation coefficient σ is 30.
CN202011246020.9A 2020-11-10 2020-11-10 Feature extraction method for human action key frame in video stream Active CN112329656B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011246020.9A CN112329656B (en) 2020-11-10 2020-11-10 Feature extraction method for human action key frame in video stream

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011246020.9A CN112329656B (en) 2020-11-10 2020-11-10 Feature extraction method for human action key frame in video stream

Publications (2)

Publication Number Publication Date
CN112329656A true CN112329656A (en) 2021-02-05
CN112329656B CN112329656B (en) 2022-05-10

Family

ID=74317337

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011246020.9A Active CN112329656B (en) 2020-11-10 2020-11-10 Feature extraction method for human action key frame in video stream

Country Status (1)

Country Link
CN (1) CN112329656B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113362324A (en) * 2021-07-21 2021-09-07 上海脊合医疗科技有限公司 Bone health detection method and system based on video image
CN113762114A (en) * 2021-08-27 2021-12-07 四川智胜慧旅科技有限公司 Personnel searching method and system based on outdoor video identification
CN113918769A (en) * 2021-10-11 2022-01-11 平安国际智慧城市科技股份有限公司 Method, device and equipment for marking key actions in video and storage medium
CN116805433A (en) * 2023-06-27 2023-09-26 北京奥康达体育科技有限公司 Human motion trail data analysis system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120314064A1 (en) * 2011-06-13 2012-12-13 Sony Corporation Abnormal behavior detecting apparatus and method thereof, and video monitoring system
CN106485245A (en) * 2015-08-24 2017-03-08 南京理工大学 A kind of round-the-clock object real-time tracking method based on visible ray and infrared image
CN110516609A (en) * 2019-08-28 2019-11-29 南京邮电大学 A kind of fire video detection and method for early warning based on image multiple features fusion
CN110555368A (en) * 2019-06-28 2019-12-10 西安理工大学 Fall-down behavior identification method based on three-dimensional convolutional neural network
CN110705412A (en) * 2019-09-24 2020-01-17 北京工商大学 Video target detection method based on motion history image
CN110781723A (en) * 2019-09-05 2020-02-11 杭州视鑫科技有限公司 Group abnormal behavior identification method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120314064A1 (en) * 2011-06-13 2012-12-13 Sony Corporation Abnormal behavior detecting apparatus and method thereof, and video monitoring system
CN106485245A (en) * 2015-08-24 2017-03-08 南京理工大学 A kind of round-the-clock object real-time tracking method based on visible ray and infrared image
CN110555368A (en) * 2019-06-28 2019-12-10 西安理工大学 Fall-down behavior identification method based on three-dimensional convolutional neural network
CN110516609A (en) * 2019-08-28 2019-11-29 南京邮电大学 A kind of fire video detection and method for early warning based on image multiple features fusion
CN110781723A (en) * 2019-09-05 2020-02-11 杭州视鑫科技有限公司 Group abnormal behavior identification method
CN110705412A (en) * 2019-09-24 2020-01-17 北京工商大学 Video target detection method based on motion history image

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
FIZA MURTAZA等: "Multi-view Human Action Recognition using 2D Motion Templates based on MHIs and their HOG Description", 《IET COMPUTER VISION》 *
JUNFENG SUN等: "Human Actions Recognition Using Improved MHI and 2-D Gabor Filter Based on Energy Blocks", 《PROCEEDINGS OF THE 2018 2ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE: TECHNOLOGIES AND APPLICATIONS (ICAITA 2018)》 *
于乐: "基于运动历史图和支持向量机的手势识别", 《电子技术与软件工程》 *
冀翀晓: "基于数字图像处理的课堂行为识别方法的研究", 《中国优秀博硕士学位论文全文数据库(硕士)社会科学Ⅱ辑》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113362324A (en) * 2021-07-21 2021-09-07 上海脊合医疗科技有限公司 Bone health detection method and system based on video image
CN113362324B (en) * 2021-07-21 2023-02-24 上海脊合医疗科技有限公司 Bone health detection method and system based on video image
CN113762114A (en) * 2021-08-27 2021-12-07 四川智胜慧旅科技有限公司 Personnel searching method and system based on outdoor video identification
CN113918769A (en) * 2021-10-11 2022-01-11 平安国际智慧城市科技股份有限公司 Method, device and equipment for marking key actions in video and storage medium
CN116805433A (en) * 2023-06-27 2023-09-26 北京奥康达体育科技有限公司 Human motion trail data analysis system
CN116805433B (en) * 2023-06-27 2024-02-13 北京奥康达体育科技有限公司 Human motion trail data analysis system

Also Published As

Publication number Publication date
CN112329656B (en) 2022-05-10

Similar Documents

Publication Publication Date Title
CN112329656B (en) Feature extraction method for human action key frame in video stream
Goldman et al. Precise detection in densely packed scenes
CN110929593B (en) Real-time significance pedestrian detection method based on detail discrimination
CN106778712B (en) Multi-target detection and tracking method
JP5604256B2 (en) Human motion detection device and program thereof
WO2009109127A1 (en) Real-time body segmentation system
CN111260738A (en) Multi-scale target tracking method based on relevant filtering and self-adaptive feature fusion
CN110298297A (en) Flame identification method and device
CN106295532B (en) A kind of human motion recognition method in video image
Zhang et al. License plate localization in unconstrained scenes using a two-stage CNN-RNN
Ahmed et al. Human detection using HOG-SVM, mixture of Gaussian and background contours subtraction
Wang et al. Fully convolutional network based skeletonization for handwritten chinese characters
CN110827265A (en) Image anomaly detection method based on deep learning
Rong et al. Scene text recognition in multiple frames based on text tracking
CN108961262B (en) Bar code positioning method in complex scene
CN111401308A (en) Fish behavior video identification method based on optical flow effect
Perreault et al. Centerpoly: Real-time instance segmentation using bounding polygons
CN106446832B (en) Video-based pedestrian real-time detection method
Gui et al. A fast caption detection method for low quality video images
Piérard et al. A probabilistic pixel-based approach to detect humans in video streams
CN108573217B (en) Compression tracking method combined with local structured information
Mizher et al. Action key frames extraction using l1-norm and accumulative optical flow for compact video shot summarisation
Li et al. Research on hybrid information recognition algorithm and quality of golf swing
Jaiswal et al. Survey paper on various techniques of recognition and tracking
CN113470073A (en) Animal center tracking method based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant