CN113095295B - Fall detection method based on improved key frame extraction - Google Patents

Fall detection method based on improved key frame extraction Download PDF

Info

Publication number
CN113095295B
CN113095295B CN202110502441.1A CN202110502441A CN113095295B CN 113095295 B CN113095295 B CN 113095295B CN 202110502441 A CN202110502441 A CN 202110502441A CN 113095295 B CN113095295 B CN 113095295B
Authority
CN
China
Prior art keywords
key frame
falling
frame
key
behaviors
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110502441.1A
Other languages
Chinese (zh)
Other versions
CN113095295A (en
Inventor
胡佳佳
李伟彤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN202110502441.1A priority Critical patent/CN113095295B/en
Publication of CN113095295A publication Critical patent/CN113095295A/en
Application granted granted Critical
Publication of CN113095295B publication Critical patent/CN113095295B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Human Computer Interaction (AREA)
  • Social Psychology (AREA)
  • Psychiatry (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a fall detection method based on improved key frame extraction, which comprises the following steps of S1: acquiring an unprocessed original video stream; s2: performing key frame extraction on the original video stream preliminarily by using an inter-frame difference method; s3: performing secondary optimization on the key frames generated in the step S2 by using a clustering algorithm to obtain optimal key frames; s4: extracting features from the optimal key frames, and constructing feature vectors; s5: the extracted feature vector is used as the input of a Support Vector Machine (SVM) for initial judgment, and the support vector machine is used for distinguishing non-falling behaviors, falling behaviors and falling-like behaviors; s6: and performing secondary classification on the feature vector with the distinguishing result being the fall-like behavior by using a convolutional neural network, outputting a detection result, and finishing final detection of the fall-like behavior. Compared with the traditional clustering method, the algorithm provided by the invention has lower redundancy, higher recall ratio and higher accuracy, and can save a lot of time and improve the accuracy for subsequent falling detection.

Description

Fall detection method based on improved key frame extraction
Technical Field
The invention relates to the technical field of video monitoring, in particular to a fall detection method based on improved key frame extraction.
Background
With the growth of the age, the body functions of the old gradually decline, the falling seriously threatens the life safety of the old, and the falling is counted to become the cause of accidental death and injury of the old, so that the death risk of the old can be effectively reduced by 80% if the old can be timely rescued after falling. The current method for detecting the falling based on the video is the main stream method for detecting the falling, so that in order to judge the state of the old more quickly, useless frames in the video sequence can be removed, and only key frames which can reflect the video content and do not lose the motion sequence are checked.
How to retrieve valid, critical information from a large volume of video data for application within a prescribed time is currently a critical issue to be resolved urgently. The key frame is one or a plurality of frames of images reflecting the main content of the shot, so that the main visual content of the video can be simply and generally described, and compared with the number of image frames contained in the original video, the use of the key frame can greatly reduce the data volume of the video index, thereby providing a good data preprocessing effect for later application.
The most dominant methods for extracting key frames at present are of 4 types: (1) based on a shot boundary method: this method typically extracts frames at fixed positions of the shot as key frames, which has the disadvantage of not fully reflecting the video content. (2) Based on visual content analysis. The method takes the change degree of the video content as a standard for selecting the key frames, takes the frames with severe changes as the key frames, and can generate a large number of redundant video frames and express incomplete video content. (3) Based on motion analysis. According to the method, the motion quantity in the lens is calculated, and the key frame is selected at the position where the motion quantity reaches the local minimum value. (4) The clustering-based method has the advantages that the image data is high-dimensional, the calculated amount is large, the calculation process is complex, the situation of memory overflow can occur, a large amount of redundancy can be generated, and the efficiency is low.
The Chinese patent with publication number CN107220604A discloses a video-based fall detection method, which comprises the following steps: s1, processing a video image, and identifying and positioning a human body area in the image; s2, extracting joint points based on a cascade regression network aiming at the human body area to obtain a group of human body joint points, wherein a plurality of regression networks with the same structure are cascaded behind a first-stage network to finely adjust the coordinate positions of the human body joint points; s3, taking the motion vector of each human body joint point as the characteristic of human body motion, and dynamically analyzing whether the human body falls or not by analyzing the change of the joint point. The patent does not extract key frames in the video, resulting in an insufficient overall decision process.
Disclosure of Invention
The invention provides a fall detection method based on improved key frame extraction, which uses an improved key frame extraction technology to extract a small number of key frames, and utilizes the small number of key frames to completely reflect video content, so that fall behaviors can be discovered more quickly, and old people falling can be cured in time.
In order to solve the technical problems, the technical scheme of the invention is as follows:
a fall detection method based on improved key frame extraction comprises the following steps:
s1: acquiring an unprocessed original video stream;
s2: performing key frame extraction on the original video stream preliminarily by using an inter-frame difference method;
s3: performing secondary optimization on the key frames generated in the step S2 by using a clustering algorithm to obtain optimal key frames;
s4: extracting features from the optimal key frames, and constructing feature vectors;
s5: the extracted feature vector is used as the input of a Support Vector Machine (SVM) for initial judgment, and the support vector machine is used for distinguishing non-falling behaviors, falling behaviors and falling-like behaviors;
s6: and performing secondary classification on the feature vector with the distinguishing result being the fall-like behavior by using a convolutional neural network, outputting a detection result, and finishing final detection of the fall-like behavior.
Preferably, in step S2, the key frame extraction is performed on the original video stream preliminarily by using an inter-frame difference method, which specifically includes the following steps:
s2.1: reading an original video stream, and calculating the frame difference between a current frame and a previous frame;
s2.2: obtaining average interframe difference according to the result of the step S2.1; the average inter-frame difference is specifically obtained by making a difference between two frames of corresponding pixel points, and then summing the difference and dividing the sum by the total number of pixels to obtain an average value of pixel variation;
s2.3: all frames of the original video stream are ordered according to the value of the average inter-frame difference, and the first n frames are selected as key frames.
Preferably, in step S3, the key frames generated in step S2 are secondarily optimized using a K-means clustering algorithm.
Preferably, the obtaining the optimal key frame in step S3 specifically includes the following steps:
s3.1: calculating a color feature vector of each key frame;
s3.2: taking a first frame image in an image data set formed by all key frames as an initial cluster center;
s3.3: respectively carrying out similarity measurement on the rest key frames and the centers of all current clusters, and if the similarity is smaller than a threshold value, creating a class for the rest key frames; if the similarity is greater than the threshold, adding the cluster into the previous cluster;
s3.4: repeating the step S3.3 until all the key frames are taken out;
s3.5: and after the clustering is completed, selecting the key frame nearest to the cluster center as the optimal key frame of the cluster video frame.
Preferably, in step S3.1, a color feature vector of each key frame is calculated, specifically:
s3.1.1: the key frame image color space is converted from RGB space to HSV space, specifically:
s3.1.2: h, S, V is non-uniformly quantized according to the ratio of 8:3:3 to form a 72-dimensional color feature vector, wherein H epsilon [0,360], S epsilon [0,1], V epsilon [0,1]:
HSV color feature vector F for each frame i The expression is as follows:
F i =9H+3S+V,i=1,2...n。
preferably, in step S3.3, when the similarity is greater than the threshold, the cluster center is recalculated by requiring an average value once when the cluster is added to the previous cluster.
Preferably, in step S3.3, similarity measurement is performed on the remaining key frames and all cluster centers at present, where the similarity measurement specifically is:
wherein d (F) i ,F j ) F is the inter-frame distance between the ith frame and the jth frame i (k)、F j (k) Taking the inter-frame distance d (F) for the elements of the color feature vectors of the ith and jth frames i ,F j )>m+2σ 2 As the number K of key frames to be extracted, where m, σ 2 The mean and variance of the feature vectors for all n frames, respectively.
Preferably, in step S4, features are extracted from the optimal key frame, and feature vectors are constructed, which specifically includes the following steps:
s4.1: extracting an aspect ratio Fr, a centroid Fcen, a width change rate Fcha and a longest intercept angle Fa of a human body outline from the optimal key frame, wherein the aspect ratio Fr is the aspect ratio of an circumscribed rectangle of a human body in the key frame, the centroid Fcen is the central position of the human body in the key frame, the width change rate Fcha is the width change condition of a target person in the key frame, and the longest intercept angle Fa of the human body outline is the longest intercept angle characteristic in the key frame;
s4.2: in combination with the above features, a feature vector f= [ Fr, fcen, fcha, fa ] is constructed.
Preferably, the step S5 specifically includes the following steps:
s5.1: dividing a data set collected in advance into a training set and a testing set according to the ratio of 6:4, and training and testing the SVM, wherein the data set comprises characteristic vectors and label information of the current state;
s5.2: SVM model training part: using an SVM_train module in a Libsvm library to enable an SVM type to be C_SVC, enabling a kernel function type to be RBF, wherein gamma parameters in the RBF kernel function are set to be 2, loss function parameters of the C_SVC are set to be 1, and a parameter prob is a training sample set and comprises information for storing the total number of training samples, sample falling labels and all feature vectors with training;
s5.3: the SVM prediction part utilizes an SVM_prediction module in the Libsvm library, the module comprises a model and x parameters, wherein the model is path information of a training model file, x is a sample to be detected, and the prediction part sequentially acquires samples from a test sample based on training information in the model file to judge and returns a classification result of each sample;
s5.4: the SVM carries out primary classification on the data set, and the data set is divided into three results of non-falling behaviors, falling-like behaviors and falling-like behaviors, wherein the falling-like behaviors comprise: lying down, sitting down, quick rising up.
Preferably, the step S6 specifically includes the following steps:
s6.1: judging the SVM in the step S5 as falling-like behavior, and according to the following 6:4, dividing the training set and the testing set;
s6.2: taking the training sample set in the step S6.1 as input of a convolutional neural network, and performing deep learning to obtain deep features capable of distinguishing normal behaviors and falling behaviors;
s6.3: the model classifier is selected as a three-dimensional convolutional neural network, and the network comprises 16 layers, 13 convolutional layers, 3 full-connection layers, 5 pooling layers and a softmax classifying layer, wherein a ReLU is connected behind each convolutional layer and each full-connection layer; the resolution of the picture is 224 x 224, the initial learning rate of the model is set to be 0.005, the attenuation rate of the learning rate is 0.8, the decay rate of the weight is 0.0006, and the maximum iteration number is 20K; all convolution layers adopt 3D convolution kernels, the size is 3 x 3, the step length is 1 x 1, the number of convolution kernels is 64, 128, 256 respectively 256, 512 512, 512; the pooling layer adopts 3D maximum pooling, and the pooling layers adopt pooling cores with the size of 2 x 2 and step sizes;
s6.4: iterative training is continuously carried out to obtain a CNN model, a test set sample is input into the trained CNN model, classification is carried out by using softmax, a classification result is output, and final falling detection is completed.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that:
after the video data are collected, firstly, carrying out preliminary key frame extraction on the original video by an inter-frame difference method with simple calculation steps, and deleting a large number of similar key frames; and secondly, the clustering algorithm is used for secondary optimization, so that the defects that the traditional method is large in calculated amount, complex in calculation process, large in redundancy and the like can be overcome. Compared with the traditional clustering method, the algorithm provided by the invention has lower redundancy, higher recall ratio and higher accuracy, and can save a lot of time and improve the accuracy for subsequent falling detection.
Drawings
FIG. 1 is a schematic flow chart of the method of the present invention.
Fig. 2 is a schematic diagram of an optimal key frame result in an embodiment.
Fig. 3 is a schematic diagram of human body characteristics in an embodiment.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the present patent;
for the purpose of better illustrating the embodiments, certain elements of the drawings may be omitted, enlarged or reduced and do not represent the actual product dimensions;
it will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The technical scheme of the invention is further described below with reference to the accompanying drawings and examples.
Example 1
The embodiment provides a fall detection method based on improved key frame extraction, as shown in fig. 1, comprising the following steps:
s1: acquiring an unprocessed original video stream;
s2: performing key frame extraction on the original video stream preliminarily by using an inter-frame difference method;
s3: performing secondary optimization on the key frames generated in the step S2 by using a clustering algorithm to obtain optimal key frames;
s4: extracting features from the optimal key frames, and constructing feature vectors;
s5: the extracted feature vector is used as the input of a Support Vector Machine (SVM) for initial judgment, and the support vector machine is used for distinguishing non-falling behaviors, falling behaviors and falling-like behaviors;
s6: and performing secondary classification on the feature vector with the distinguishing result being the fall-like behavior by using a convolutional neural network, outputting a detection result, and finishing final detection of the fall-like behavior.
In step S2, the key frame extraction is carried out on the original video stream preliminarily by utilizing an inter-frame difference method, and the method specifically comprises the following steps:
s2.1: reading an original video stream, and calculating the frame difference between a current frame and a previous frame;
s2.2: obtaining average interframe difference according to the result of the step S2.1;
s2.3: all frames of the original video stream are ordered according to the value of the average inter-frame difference, and the first n frames are selected as key frames.
And in the step S3, performing secondary optimization on the key frames generated in the step S2 by using a K-means clustering algorithm.
The obtaining of the optimal key frame in the step S3 specifically includes the following steps:
s3.1: calculating a color feature vector of each key frame;
s3.2: taking a first frame image in an image data set formed by all key frames as an initial cluster center;
s3.3: respectively carrying out similarity measurement on the rest key frames and the centers of all current clusters, and if the similarity is smaller than a threshold value, creating a class for the rest key frames; if the similarity is greater than the threshold, adding the cluster into the previous cluster; in this embodiment, the threshold value is taken to be 0.7.
S3.4: repeating the step S3.3 until all the key frames are taken out;
s3.5: after the clustering is completed, the key frame nearest to the cluster center is selected as the optimal key frame of the cluster video frame, as shown in fig. 2.
In step S3.1, a color feature vector of each key frame is calculated, specifically:
s3.1.1: the key frame image color space is converted from RGB space to HSV space, specifically:
s3.1.2: h, S, V is non-uniformly quantized according to the ratio of 8:3:3 to form a 72-dimensional color feature vector, wherein H epsilon [0,360], S epsilon [0,1], V epsilon [0,1]:
HSV color feature vector F for each frame i The expression is as follows:
F i =9H+3S+V,i=1,2...n。
in step S3.3, when the similarity is greater than the threshold, the cluster center is recalculated by requiring an average value once when the cluster is added to the previous cluster.
In step S3.3, performing similarity measurement on the remaining key frames and all cluster centers at present, where the similarity measurement specifically includes:
wherein d (F) i ,F j ) F is the inter-frame distance between the ith frame and the jth frame i (k)、F j (k) Taking the inter-frame distance d (F) for the elements of the color feature vectors of the ith and jth frames i ,F j )>m+2σ 2 As the number K of key frames to be extracted, where m, σ 2 The mean and variance of the feature vectors for all n frames, respectively.
In step S4, features are extracted from the optimal key frame, and feature vectors are constructed, which specifically includes the following steps:
s4.1: as shown in fig. 3, the aspect ratio Fr, the centroid Fcen, the width change rate Fcha, and the longest intercept angle Fa of the human body contour are extracted from the optimal key frame, wherein:
the aspect ratio Fr is the aspect ratio of the circumscribed rectangle of the human body in the key frame, and is smaller than 1 in the human body upright posture;
the centroid Fcen is the central position of a human body in a key frame, and the movement of the human body inevitably causes the centroid to generate displacement change;
the width change rate Fcha is the width change condition of the target person in the key frame, when the person moves normally, the width change rate value is not more than 1, and when the person falls off abnormally, the width of the target person changes instantly, so that the width change rate of the person is more than 1, and the width change rate Fcha can be used as the basis for distinguishing falling and other normal human body behavior movements;
the longest sectional line angle Fa of the human body outline is the longest sectional line angle characteristic in the key frame, and the longest sectional line angle is smaller than 90 degrees under the standing posture of the human body, so that the longest sectional line angle Fa can be used as a basis for distinguishing falling behaviors from normal behaviors;
s4.2: in combination with the above features, a feature vector f= [ Fr, fcen, fcha, fa ] is constructed.
The step S5 specifically includes the following steps:
s5.1: dividing a data set collected in advance into a training set and a testing set according to the ratio of 6:4, and training and testing the SVM, wherein the data set comprises characteristic vectors and label information of the current state;
s5.2: SVM model training part: using an SVM_train module in a Libsvm library to enable an SVM type to be C_SVC, enabling a kernel function type to be RBF, wherein gamma parameters in the RBF kernel function are set to be 2, loss function parameters of the C_SVC are set to be 1, and a parameter prob is a training sample set and comprises information for storing the total number of training samples, sample falling labels and all feature vectors with training;
s5.3: the SVM prediction part utilizes an SVM_prediction module in the Libsvm library, the module comprises a model and x parameters, wherein the model is path information of a training model file, x is a sample to be detected, and the prediction part sequentially acquires samples from a test sample based on training information in the model file to judge and returns a classification result of each sample;
s5.4: the SVM carries out primary classification on the data set, and the data set is divided into three results of non-falling behaviors, falling-like behaviors and falling-like behaviors, wherein the falling-like behaviors comprise: lying down, sitting down, quick rising up.
The step S6 specifically includes the following steps:
s6.1: judging the SVM in the step S5 as falling-like behavior, and according to the following 6:4, dividing the training set and the testing set;
s6.2: taking the training sample set in the step S6.1 as input of a convolutional neural network, and performing deep learning to obtain deep features capable of distinguishing normal behaviors and falling behaviors;
s6.3: the model classifier is selected as a three-dimensional convolutional neural network, and the network comprises 16 layers, 13 convolutional layers, 3 full-connection layers, 5 pooling layers and a softmax classifying layer, wherein a ReLU is connected behind each convolutional layer and each full-connection layer; the resolution of the picture is 224 x 224, the initial learning rate of the model is set to be 0.005, the attenuation rate of the learning rate is 0.8, the decay rate of the weight is 0.0006, and the maximum iteration number is 20K; all convolution layers adopt 3D convolution kernels, the size is 3 x 3, the step length is 1 x 1, the number of convolution kernels is 64, 128, 256 respectively 256, 512 512, 512; the pooling layer adopts 3D maximum pooling, and the pooling layers adopt pooling cores with the size of 2 x 2 and step sizes;
s6.4: iterative training is continuously carried out to obtain a CNN model, a test set sample is input into the trained CNN model, classification is carried out by using softmax, a classification result is output, and final falling detection is completed.
The same or similar reference numerals correspond to the same or similar components;
the terms describing the positional relationship in the drawings are merely illustrative, and are not to be construed as limiting the present patent;
it is to be understood that the above examples of the present invention are provided by way of illustration only and not by way of limitation of the embodiments of the present invention. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are desired to be protected by the following claims.

Claims (8)

1. The fall detection method based on the improved key frame extraction is characterized by comprising the following steps of:
s1: acquiring an unprocessed original video stream;
s2: performing key frame extraction on the original video stream preliminarily by using an inter-frame difference method;
s3: performing secondary optimization on the key frames generated in the step S2 by using a clustering algorithm to obtain optimal key frames;
s4: extracting features from the optimal key frames, and constructing feature vectors;
s5: the extracted feature vector is used as the input of a Support Vector Machine (SVM) for initial judgment, and the support vector machine is used for distinguishing non-falling behaviors, falling behaviors and falling-like behaviors;
s6: performing secondary classification on the feature vector of which the distinguishing result is the fall-like behavior by using a convolutional neural network, outputting a detection result, and finishing final detection of the fall-like behavior;
the obtaining of the optimal key frame in the step S3 specifically includes the following steps:
s3.1: calculating a color feature vector of each key frame;
s3.2: taking a first frame image in an image data set formed by all key frames as an initial cluster center;
s3.3: respectively carrying out similarity measurement on the rest key frames and the centers of all current clusters, and if the similarity is smaller than a threshold value, creating a class for the rest key frames; if the similarity is greater than the threshold, adding the cluster into the previous cluster;
s3.4: repeating the step S3.3 until all the key frames are taken out;
s3.5: after the clustering is completed, selecting a key frame nearest to the cluster center as an optimal key frame of the cluster video frame;
in step S3.1, a color feature vector of each key frame is calculated, specifically:
s3.1.1: the key frame image color space is converted from RGB space to HSV space, specifically:
s3.1.2: h, S, V is non-uniformly quantized according to the ratio of 8:3:3 to form a 72-dimensional color feature vector, wherein H epsilon [0,360, S epsilon [0,1, V epsilon [0, 1):
HSV color feature vector F for each frame i The expression is as follows:
F i =9H+3S+V,i=1,2...n。
2. the fall detection method based on improved key frame extraction as claimed in claim 1, wherein in step S2, the key frame extraction is performed on the original video stream preliminarily by using an inter-frame difference method, specifically comprising the steps of:
s2.1: reading an original video stream, and calculating the frame difference between a current frame and a previous frame;
s2.2: obtaining average interframe difference according to the result of the step S2.1;
s2.3: all frames of the original video stream are ordered according to the value of the average inter-frame difference, and the first n frames are selected as key frames.
3. A fall detection method based on improved key frame extraction as claimed in claim 2, wherein in step S3 the key frames generated in step S2 are secondarily optimised using a K-means clustering algorithm.
4. A fall detection method based on improved key frame extraction as claimed in claim 3, wherein in step S3.3, when the similarity is greater than a threshold, then a one-time average is required to recalculate the cluster class centre when joining a previous cluster.
5. The fall detection method based on improved key frame extraction as claimed in claim 4, wherein in step S3.3, the remaining key frames are respectively subjected to similarity measures with all cluster centers at present, wherein the similarity measures are specifically as follows:
wherein d (F) i ,F j ) F is the inter-frame distance between the ith frame and the jth frame i (k)、F j (k) Taking the inter-frame distance d (F) for the elements of the color feature vectors of the ith and jth frames i ,F j )>m+2σ 2 As the number K of key frames to be extracted, where m, σ 2 The mean and variance of the feature vectors for all n frames, respectively.
6. The fall detection method based on improved key frame extraction as claimed in claim 5, wherein in step S4, features are extracted from the optimal key frame, and feature vectors are constructed, specifically comprising the steps of:
s4.1: extracting an aspect ratio Fr, a centroid Fcen, a width change rate Fcha and a longest intercept angle Fa of a human body outline from the optimal key frame, wherein the aspect ratio Fr is the aspect ratio of an circumscribed rectangle of a human body in the key frame, the centroid Fcen is the central position of the human body in the key frame, the width change rate Fcha is the width change condition of a target person in the key frame, and the longest intercept angle Fa of the human body outline is the longest intercept angle characteristic in the key frame;
s4.2: in combination with the above features, a feature vector f= [ Fr, fcen, fcha, fa ] is constructed.
7. The fall detection method based on improved key frame extraction as claimed in claim 6, wherein said step S5 specifically comprises the steps of:
s5.1: dividing a data set collected in advance into a training set and a testing set according to the ratio of 6:4, and training and testing the SVM, wherein the data set comprises characteristic vectors and label information of the current state;
s5.2: SVM model training part: using an SVM_train module in a Libsvm library to enable an SVM type to be C_SVC, enabling a kernel function type to be RBF, wherein gamma parameters in the RBF kernel function are set to be 2, loss function parameters of the C_SVC are set to be 1, and a parameter prob is a training sample set and comprises information for storing the total number of training samples, sample falling labels and all feature vectors with training;
s5.3: the SVM prediction part utilizes an SVM_prediction module in the Libsvm library, the module comprises a model and x parameters, wherein the model is path information of a training model file, x is a sample to be detected, and the prediction part sequentially acquires samples from a test sample based on training information in the model file to judge and returns a classification result of each sample;
s5.4: the SVM carries out primary classification on the data set, and the data set is divided into three results of non-falling behaviors, falling-like behaviors and falling-like behaviors, wherein the falling-like behaviors comprise: lying down, sitting down, quick rising up.
8. The fall detection method based on improved key frame extraction as claimed in claim 7, wherein said step S6 specifically comprises the steps of:
s6.1: judging the SVM in the step S5 as falling-like behavior, and according to the following 6:4, dividing the training set and the testing set;
s6.2: taking the training sample set in the step S6.1 as input of a convolutional neural network, and performing deep learning to obtain deep features for distinguishing normal behaviors and falling behaviors;
s6.3: the model classifier is selected as a three-dimensional convolutional neural network, and the network comprises 16 layers, 13 convolutional layers, 3 full-connection layers, 5 pooling layers and a softmax classifying layer, wherein a ReLU is connected behind each convolutional layer and each full-connection layer; the resolution of the picture is 224 x 224, the initial learning rate of the model is set to be 0.005, the attenuation rate of the learning rate is 0.8, the decay rate of the weight is 0.0006, and the maximum iteration number is 20K; all convolution layers adopt 3D convolution kernels, the size is 3 x 3, the step length is 1 x 1, the number of convolution kernels is 64, 128, 256 respectively 256, 512 512, 512; the pooling layer adopts 3D maximum pooling, and the pooling layers adopt pooling cores with the size of 2 x 2 and step sizes;
s6.4: iterative training is continuously carried out to obtain a CNN model, a test set sample is input into the trained CNN model, classification is carried out by using softmax, a classification result is output, and final falling detection is completed.
CN202110502441.1A 2021-05-08 2021-05-08 Fall detection method based on improved key frame extraction Active CN113095295B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110502441.1A CN113095295B (en) 2021-05-08 2021-05-08 Fall detection method based on improved key frame extraction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110502441.1A CN113095295B (en) 2021-05-08 2021-05-08 Fall detection method based on improved key frame extraction

Publications (2)

Publication Number Publication Date
CN113095295A CN113095295A (en) 2021-07-09
CN113095295B true CN113095295B (en) 2023-08-18

Family

ID=76664785

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110502441.1A Active CN113095295B (en) 2021-05-08 2021-05-08 Fall detection method based on improved key frame extraction

Country Status (1)

Country Link
CN (1) CN113095295B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113627342B (en) * 2021-08-11 2024-04-12 人民中科(济南)智能技术有限公司 Method, system, equipment and storage medium for video depth feature extraction optimization

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017000465A1 (en) * 2015-07-01 2017-01-05 中国矿业大学 Method for real-time selection of key frames when mining wireless distributed video coding
CN110427825A (en) * 2019-07-01 2019-11-08 上海宝钢工业技术服务有限公司 The video flame recognition methods merged based on key frame with quick support vector machines
CN110532850A (en) * 2019-07-02 2019-12-03 杭州电子科技大学 A kind of fall detection method based on video artis and hybrid classifer
CN110555368A (en) * 2019-06-28 2019-12-10 西安理工大学 Fall-down behavior identification method based on three-dimensional convolutional neural network
CN110826491A (en) * 2019-11-07 2020-02-21 北京工业大学 Video key frame detection method based on cascading manual features and depth features

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017000465A1 (en) * 2015-07-01 2017-01-05 中国矿业大学 Method for real-time selection of key frames when mining wireless distributed video coding
CN110555368A (en) * 2019-06-28 2019-12-10 西安理工大学 Fall-down behavior identification method based on three-dimensional convolutional neural network
CN110427825A (en) * 2019-07-01 2019-11-08 上海宝钢工业技术服务有限公司 The video flame recognition methods merged based on key frame with quick support vector machines
CN110532850A (en) * 2019-07-02 2019-12-03 杭州电子科技大学 A kind of fall detection method based on video artis and hybrid classifer
CN110826491A (en) * 2019-11-07 2020-02-21 北京工业大学 Video key frame detection method based on cascading manual features and depth features

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Vision-Based Fall Event Detection in Complex Background Using Attention Guided Bi-Directional LSTM;YONG CHEN 等;《IEEEAccess》;第161337-161348页 *

Also Published As

Publication number Publication date
CN113095295A (en) 2021-07-09

Similar Documents

Publication Publication Date Title
US11809485B2 (en) Method for retrieving footprint images
CN110619369B (en) Fine-grained image classification method based on feature pyramid and global average pooling
CN108898137B (en) Natural image character recognition method and system based on deep neural network
CN110334765B (en) Remote sensing image classification method based on attention mechanism multi-scale deep learning
CN111680614B (en) Abnormal behavior detection method based on video monitoring
CN111178208A (en) Pedestrian detection method, device and medium based on deep learning
CN115661943B (en) Fall detection method based on lightweight attitude assessment network
CN104504362A (en) Face detection method based on convolutional neural network
CN112070044B (en) Video object classification method and device
CN105184298A (en) Image classification method through fast and locality-constrained low-rank coding process
CN109903339B (en) Video group figure positioning detection method based on multi-dimensional fusion features
CN110321805B (en) Dynamic expression recognition method based on time sequence relation reasoning
CN112861917B (en) Weak supervision target detection method based on image attribute learning
CN113920400A (en) Metal surface defect detection method based on improved YOLOv3
Islam et al. InceptB: a CNN based classification approach for recognizing traditional bengali games
CN111126240A (en) Three-channel feature fusion face recognition method
CN110008899B (en) Method for extracting and classifying candidate targets of visible light remote sensing image
CN110381392A (en) A kind of video abstraction extraction method and its system, device, storage medium
CN113095295B (en) Fall detection method based on improved key frame extraction
CN111310787B (en) Brain function network multi-core fuzzy clustering method based on stacked encoder
CN114220143A (en) Face recognition method for wearing mask
CN115049952A (en) Juvenile fish limb identification method based on multi-scale cascade perception deep learning network
CN116385430A (en) Machine vision flaw detection method, device, medium and equipment
CN111340213A (en) Neural network training method, electronic device, and storage medium
CN113850182A (en) Action identification method based on DAMR-3 DNet

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant