CN115713806A - Falling behavior identification method based on video classification and electronic equipment - Google Patents

Falling behavior identification method based on video classification and electronic equipment Download PDF

Info

Publication number
CN115713806A
CN115713806A CN202211421984.1A CN202211421984A CN115713806A CN 115713806 A CN115713806 A CN 115713806A CN 202211421984 A CN202211421984 A CN 202211421984A CN 115713806 A CN115713806 A CN 115713806A
Authority
CN
China
Prior art keywords
human body
key points
skeleton
model
human
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211421984.1A
Other languages
Chinese (zh)
Inventor
戴林
王汝杰
陈东亮
瞿关明
孙娜娇
陈磊
王思俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tiandy Technologies Co Ltd
Original Assignee
Tiandy Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tiandy Technologies Co Ltd filed Critical Tiandy Technologies Co Ltd
Priority to CN202211421984.1A priority Critical patent/CN115713806A/en
Publication of CN115713806A publication Critical patent/CN115713806A/en
Priority to PCT/CN2023/096822 priority patent/WO2024103682A1/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention provides a falling behavior identification method based on video classification and electronic equipment, belongs to the technical field of video monitoring, and solves the problem of low identification accuracy in a falling detection technology. The method comprises the following steps: detecting human skeleton key points in a video frame image to be identified; tracking a human body target based on the human body skeleton key points, and acquiring a motion track of the human body target and a motion change process of the human body skeleton key points; recognizing a falling behavior in a human skeleton timing diagram by using an ST-GCN model to obtain a first recognition result; recognizing a falling behavior in the video sequence by using an S3DFAST double-flow model to obtain a second recognition result; and comprehensively judging the first recognition result and the second recognition result to obtain a final fall recognition result.

Description

Falling behavior identification method based on video classification and electronic equipment
Technical Field
The invention relates to the technical field of video monitoring, in particular to a falling behavior identification method based on video classification and electronic equipment.
Background
With the development and progress of artificial intelligence technology, object recognition has been applied in many fields. The object recognition technology can be used for recognizing the behavior of the object besides the category of the object, wherein fall detection is an important behavior recognition technology.
Image-based fall detection techniques have found widespread use in many different areas, such as: the fields of life, traffic, security and the like. The current fall detection technology based on images only uses the space dimension information of the images, does not use the time dimension information, has single characteristic dimension and incomplete information, and therefore has the problem of low identification accuracy.
Disclosure of Invention
The invention aims to provide a falling behavior identification method based on video classification and electronic equipment, and solves the problem that the falling detection technology is low in identification accuracy.
In a first aspect, the invention provides a fall behavior identification method based on video classification, which includes:
detecting human skeleton key points in a video frame image to be identified;
tracking a human body target based on the human body skeleton key points, and acquiring a motion track of the human body target and a motion change process of the human body skeleton key points;
the method comprises the steps that a falling behavior in a human skeleton timing diagram is identified by using an ST-GCN model, and a first identification result is obtained;
identifying a falling behavior in the video sequence by using an S3DFAST double-flow model to obtain a second identification result;
and comprehensively judging the first identification result and the second identification result to obtain a final falling identification result.
Further, the step of detecting the key points of the human skeleton in the video frame image to be identified comprises the following steps:
and detecting the video frame image to be recognized by utilizing an OpenPose model to obtain the key points and positions of the human skeleton.
Further, the step of tracking the human body target based on the human body skeleton key points to obtain the motion trail of the human body target and the motion change process of the human body skeleton key points comprises the following steps:
and tracking the human body target by using Deepsort based on the key points of the human body skeleton, and acquiring the motion trail of the human body target and the motion change process of the key points of the human body skeleton.
Further, the training process of the ST-GCN model comprises the following steps:
constructing a deep learning training set and a testing set from a fallen human skeleton sample and a normal activity human skeleton sample;
and training the ResNet-50-based network model by using an ST-GCN framework based on the training set and the test set to obtain the ST-GCN model.
Furthermore, the loss function of the network model is a standard cross entropy loss function, and the parameter learning uses a standard random gradient descent algorithm.
Further, the training process of the S3DFAST dual-stream model includes:
constructing a deep learning training set and a testing set from the falling short video and the normal activity short video;
and training a double-flow-based network model by using an S3DFAST framework based on the training set and the test set to obtain an S3DFAST double-flow model.
Further, the loss function of the network model is a cross entropy loss function, and the parameter learning uses an adaptive learning rate gradient descent algorithm.
In a second aspect, the present invention further provides an electronic device, which includes a memory and a processor, where the memory stores a computer program operable on the processor, and the processor implements the steps of the above method when executing the computer program.
In a third aspect, the invention also provides a computer readable storage medium having stored thereon computer executable instructions which, when invoked and executed by a processor, cause the processor to execute the method.
The method for recognizing the falling behavior based on the video classification comprises the steps of firstly detecting key points of a human skeleton, then tracking a human target, and obtaining a motion track of the human target and a motion change process of the key points of the human skeleton. Then, a first identification result with space dimension information is obtained by utilizing an ST-GCN model to identify the falling behavior in a human skeleton timing diagram, a second identification result with time dimension information is obtained by utilizing an S3DFAST double-flow model to identify the falling behavior in a video sequence, and finally comprehensive judgment is carried out to obtain a final falling identification result.
Accordingly, the electronic device and the computer-readable storage medium provided by the embodiment of the invention also have the technical effects.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flowchart of a fall behavior recognition method according to an embodiment of the present invention;
FIG. 2 is a flow chart of a training process of an ST-GCN model according to an embodiment of the present invention;
fig. 3 is a flowchart of a training process of the S3DFAST dual-stream model according to an embodiment of the present invention.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "comprising" and "having," and any variations thereof, as referred to in embodiments of the present invention, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements but may alternatively include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The current fall detection technology based on images has the following defects:
(1) Only the space dimension information of the image is used, the time dimension information is not used, the characteristic dimension is single, and the information is incomplete;
(2) The characteristics learned by combining the spatial dimension information are not comprehensive, and key characteristics capable of representing the behavior need to be manually searched, for example, the falling behavior needs to be identified by combining the characteristics of the height, the speed and the like of a human body, so that the manual consumption for manually selecting the characteristics is increased;
(3) The incompleteness of the information causes the error rate of the behavior recognition algorithm to be high and the recognition rate to be low.
Based on this, an embodiment of the present invention provides a fall behavior identification method based on video classification, as shown in fig. 1, the method includes the following steps:
s1: and detecting the key points of the human skeleton in the video frame image to be identified.
S2: tracking a human body target based on the human body skeleton key points, and acquiring a motion track of the human body target and a motion change process of the human body skeleton key points;
s3: recognizing a falling behavior in a human skeleton timing diagram by using an ST-GCN model to obtain a first recognition result;
s4: recognizing a falling behavior in the video sequence by using an S3DFAST double-flow model to obtain a second recognition result;
s5: and comprehensively judging the first recognition result and the second recognition result to obtain a final fall recognition result.
According to the embodiment of the invention, comprehensive judgment is carried out according to the first identification result with the space dimension information and the second identification result with the time dimension information to obtain the final fall identification result, and the space dimension information and the time dimension information are integrated, so that the information is more comprehensive, and the problem of lower identification accuracy in the prior art is solved. In addition, key features which can represent the behaviors do not need to be found manually, and manual consumption of manually selecting the features is saved, so that the embodiment of the invention has the advantages of high identification precision, good real-time performance and multi-scene practicability.
In a possible embodiment, the step S1 includes:
for a section of monitoring image video, detecting a video frame image to be recognized by utilizing an OpenPose model to obtain a human skeleton key point and a position of the human skeleton key point.
The OpenPose human posture recognition model is an open source library developed based on a convolutional neural network and supervised learning and taking caffe as a framework. The gesture estimation of human body action, facial expression, finger motion and the like can be realized. The method is suitable for single person and multiple persons, has extremely good robustness, and is the first real-time multi-person two-dimensional attitude estimation application based on deep learning in the world.
In a possible implementation, the step S2 includes:
and tracking the human body target by using Deepsort based on the key points of the human body skeleton, and acquiring the motion trail of the human body target and the motion change process of the key points of the human body skeleton. The Deepsort algorithm is an improved algorithm on the basis of the sort algorithm, cascade Matching (Matching Cascade) and confirmation (confirmed) of a new track are added, and tracking of a human body target is more accurate.
In one possible embodiment, as shown in fig. 2, the training process of the ST-GCN model includes:
and constructing a deep learning training set and a testing set and a human skeleton space-time diagram by using the fallen human skeleton sample and the normal activity human skeleton sample.
The ResNet-50 based network model is trained using the ST-GCN (space-time graph convolutional neural network) framework based on a training set and a test set.
Configuring a deep learning parameter training model: the size of the model of the pyrrch file is about 0.5M, ST-GCN is the combination of TCN (Temporal Convolutional Network) and GCN (Graph Convolutional Network); the data dimension of the model input is (N, C, T, V, M), such as (256,3, 32, 18,2).
Wherein: n represents the number of videos of one batch (batch size = 256); c represents a joint feature (3); t represents the number of key frames (32); v represents the number of joints (18 joint points); m represents the number of people in a frame (2).
The loss function uses a standard cross entropy loss function (CrossEntropyLoss), and the parameter learning uses a standard random Gradient Descent algorithm (SGD).
And finally obtaining an ST-GCN human skeleton falling behavior identification model.
As shown in fig. 3, in one possible implementation, the training process of the S3DFAST dual-stream model includes:
and constructing a deep learning training set and a testing set by the falling short video and the normal activity short video, and constructing a human skeleton space-time diagram.
Based on the training set and the test set, a dual stream based network model is trained using the S3DFAST framework.
Configuring a deep learning parameter training model: the size of a model of a pytore file of the model is about 2.9M, the model structure has two channels for stream transmission, wherein one channel is a Fast channel and mainly used for capturing motion information, and the other channel is a Slow channel and mainly used for capturing detail information; inputting a single sample with the size of 3 × 112 by the model, and inputting a video segment frame number Fast channel with 8 frames; the Slow channel is 16 frames. The loss function uses a cross entropy loss function, and the parameter learning uses an adaptive learning rate gradient descent algorithm (ADAM).
And finally obtaining an S3DFAST falling behavior identification model.
In a possible implementation, the step S5 includes:
and comprehensively judging the first identification result with the space dimension information and the second identification result with the time dimension information, for example, taking the intersection of the two identification results to obtain a final fall identification result.
An embodiment of the present invention further provides an electronic device, which includes a memory and a processor, where the memory stores a computer program that can run on the processor, and the processor implements the steps of the method when executing the computer program.
In accordance with the above method, embodiments of the present invention also provide a computer readable storage medium storing machine executable instructions, which when invoked and executed by a processor, cause the processor to perform the steps of the above method.
The apparatus provided by the embodiment of the present invention may be specific hardware on the device, or software or firmware installed on the device, etc. The device provided by the embodiment of the present invention has the same implementation principle and technical effect as the method embodiments, and for the sake of brief description, reference may be made to the corresponding contents in the method embodiments without reference to the device embodiments. It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the foregoing systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The functions may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present invention or a part thereof which substantially contributes to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; and the modifications, changes or substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention. Are intended to be covered by the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (9)

1. A fall behavior identification method based on video classification is characterized by comprising the following steps:
detecting human skeleton key points in a video frame image to be identified;
tracking a human body target based on the human body skeleton key points, and acquiring a motion track of the human body target and a motion change process of the human body skeleton key points;
recognizing a falling behavior in a human skeleton timing diagram by using an ST-GCN model to obtain a first recognition result;
recognizing a falling behavior in the video sequence by using an S3DFAST double-flow model to obtain a second recognition result;
and comprehensively judging the first recognition result and the second recognition result to obtain a final fall recognition result.
2. The method according to claim 1, wherein the step of detecting the key points of the human skeleton in the video frame image to be identified comprises:
and detecting the video frame image to be recognized by utilizing an OpenPose model to obtain the key points and positions of the human skeleton.
3. The method according to claim 1, wherein the step of tracking the human body target based on the human body skeleton key points to obtain the motion trail of the human body target and the motion change process of the human body skeleton key points comprises:
and tracking the human body target by using Deepsort based on the key points of the human body skeleton, and acquiring the motion trail of the human body target and the motion change process of the key points of the human body skeleton.
4. The method according to claim 1, wherein the training process of the ST-GCN model comprises:
constructing a deep learning training set and a testing set from a fallen human skeleton sample and a normal activity human skeleton sample;
and training the ResNet-50-based network model by using an ST-GCN framework based on the training set and the test set to obtain the ST-GCN model.
5. The method of claim 4, wherein the loss function of the network model is a standard cross entropy loss function, and wherein the parameter learning uses a standard stochastic gradient descent algorithm.
6. The method of claim 1, wherein the training process of the S3DFAST dual-stream model comprises:
constructing a deep learning training set and a testing set from the falling short video and the normal activity short video;
and training a double-flow-based network model by using an S3DFAST framework based on the training set and the test set to obtain an S3DFAST double-flow model.
7. The method of claim 6, wherein the loss function of the network model is a cross-entropy loss function, and wherein the parameter learning uses an adaptive learning rate gradient descent algorithm.
8. An electronic device comprising a memory and a processor, wherein the memory stores a computer program operable on the processor, and wherein the processor implements the steps of the method of any of claims 1 to 7 when executing the computer program.
9. A computer readable storage medium, having stored thereon computer executable instructions, which when invoked and executed by a processor, cause the processor to execute the method of any of claims 1 to 7.
CN202211421984.1A 2022-11-14 2022-11-14 Falling behavior identification method based on video classification and electronic equipment Pending CN115713806A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202211421984.1A CN115713806A (en) 2022-11-14 2022-11-14 Falling behavior identification method based on video classification and electronic equipment
PCT/CN2023/096822 WO2024103682A1 (en) 2022-11-14 2023-05-29 Fall behavior identification method based on video classification and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211421984.1A CN115713806A (en) 2022-11-14 2022-11-14 Falling behavior identification method based on video classification and electronic equipment

Publications (1)

Publication Number Publication Date
CN115713806A true CN115713806A (en) 2023-02-24

Family

ID=85233085

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211421984.1A Pending CN115713806A (en) 2022-11-14 2022-11-14 Falling behavior identification method based on video classification and electronic equipment

Country Status (2)

Country Link
CN (1) CN115713806A (en)
WO (1) WO2024103682A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024103682A1 (en) * 2022-11-14 2024-05-23 天地伟业技术有限公司 Fall behavior identification method based on video classification and electronic device

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113642361B (en) * 2020-05-11 2024-01-23 杭州萤石软件有限公司 Fall behavior detection method and equipment
CN113743293B (en) * 2021-09-02 2023-11-24 泰康保险集团股份有限公司 Fall behavior detection method and device, electronic equipment and storage medium
CN114463844A (en) * 2022-01-12 2022-05-10 三峡大学 Fall detection method based on self-attention double-flow network
CN114511931A (en) * 2022-02-22 2022-05-17 平安科技(深圳)有限公司 Action recognition method, device and equipment based on video image and storage medium
CN115713806A (en) * 2022-11-14 2023-02-24 天地伟业技术有限公司 Falling behavior identification method based on video classification and electronic equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024103682A1 (en) * 2022-11-14 2024-05-23 天地伟业技术有限公司 Fall behavior identification method based on video classification and electronic device

Also Published As

Publication number Publication date
WO2024103682A1 (en) 2024-05-23

Similar Documents

Publication Publication Date Title
CN109146921B (en) Pedestrian target tracking method based on deep learning
Jiang et al. Action unit detection using sparse appearance descriptors in space-time video volumes
CN109145766B (en) Model training method and device, recognition method, electronic device and storage medium
CN110516556A (en) Multi-target tracking detection method, device and storage medium based on Darkflow-DeepSort
Minhas et al. Incremental learning in human action recognition based on snippets
CN112163469B (en) Smoking behavior recognition method, system, equipment and readable storage medium
CN107818307B (en) Multi-label video event detection method based on LSTM network
Ren et al. Learning with weak supervision from physics and data-driven constraints
Simon et al. Visual event recognition using decision trees
CN111985333B (en) Behavior detection method based on graph structure information interaction enhancement and electronic device
CN115713715A (en) Human behavior recognition method and system based on deep learning
Ali et al. Deep Learning Algorithms for Human Fighting Action Recognition.
CN115713806A (en) Falling behavior identification method based on video classification and electronic equipment
CN111291695A (en) Personnel violation behavior recognition model training method, recognition method and computer equipment
CN112115996B (en) Image data processing method, device, equipment and storage medium
JP7285973B2 (en) Quantized Transition Change Detection for Activity Recognition
Garzón et al. A fast action recognition strategy based on motion trajectory occurrences
Nikpour et al. Deep reinforcement learning in human activity recognition: A survey
CN115798055B (en) Violent behavior detection method based on cornersort tracking algorithm
Del Rose et al. Survey on classifying human actions through visual sensors
CN115645929A (en) Method and device for detecting plug-in behavior of game and electronic equipment
Martínez et al. Spatio‐temporal multi‐scale motion descriptor from a spatially‐constrained decomposition for online action recognition
Huang et al. A hierarchical temporal memory based hand posture recognition method
CN112434629A (en) Online time sequence action detection method and equipment
Perochon et al. Unsupervised Action Segmentation of Untrimmed Egocentric Videos

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination