CN115713806A

CN115713806A - Falling behavior identification method based on video classification and electronic equipment

Info

Publication number: CN115713806A
Application number: CN202211421984.1A
Authority: CN
Inventors: 戴林; 王汝杰; 陈东亮; 瞿关明; 孙娜娇; 陈磊; 王思俊
Original assignee: Tiandy Technologies Co Ltd
Current assignee: Tiandy Technologies Co Ltd
Priority date: 2022-11-14
Filing date: 2022-11-14
Publication date: 2023-02-24
Also published as: WO2024103682A1

Abstract

The invention provides a falling behavior identification method based on video classification and electronic equipment, belongs to the technical field of video monitoring, and solves the problem of low identification accuracy in a falling detection technology. The method comprises the following steps: detecting human skeleton key points in a video frame image to be identified; tracking a human body target based on the human body skeleton key points, and acquiring a motion track of the human body target and a motion change process of the human body skeleton key points; recognizing a falling behavior in a human skeleton timing diagram by using an ST-GCN model to obtain a first recognition result; recognizing a falling behavior in the video sequence by using an S3DFAST double-flow model to obtain a second recognition result; and comprehensively judging the first recognition result and the second recognition result to obtain a final fall recognition result.

Description

Falling behavior identification method based on video classification and electronic equipment

Technical Field

The invention relates to the technical field of video monitoring, in particular to a falling behavior identification method based on video classification and electronic equipment.

Background

With the development and progress of artificial intelligence technology, object recognition has been applied in many fields. The object recognition technology can be used for recognizing the behavior of the object besides the category of the object, wherein fall detection is an important behavior recognition technology.

Image-based fall detection techniques have found widespread use in many different areas, such as: the fields of life, traffic, security and the like. The current fall detection technology based on images only uses the space dimension information of the images, does not use the time dimension information, has single characteristic dimension and incomplete information, and therefore has the problem of low identification accuracy.

Disclosure of Invention

The invention aims to provide a falling behavior identification method based on video classification and electronic equipment, and solves the problem that the falling detection technology is low in identification accuracy.

In a first aspect, the invention provides a fall behavior identification method based on video classification, which includes:

detecting human skeleton key points in a video frame image to be identified;

tracking a human body target based on the human body skeleton key points, and acquiring a motion track of the human body target and a motion change process of the human body skeleton key points;

the method comprises the steps that a falling behavior in a human skeleton timing diagram is identified by using an ST-GCN model, and a first identification result is obtained;

identifying a falling behavior in the video sequence by using an S3DFAST double-flow model to obtain a second identification result;

and comprehensively judging the first identification result and the second identification result to obtain a final falling identification result.

Further, the step of detecting the key points of the human skeleton in the video frame image to be identified comprises the following steps:

and detecting the video frame image to be recognized by utilizing an OpenPose model to obtain the key points and positions of the human skeleton.

Further, the step of tracking the human body target based on the human body skeleton key points to obtain the motion trail of the human body target and the motion change process of the human body skeleton key points comprises the following steps:

and tracking the human body target by using Deepsort based on the key points of the human body skeleton, and acquiring the motion trail of the human body target and the motion change process of the key points of the human body skeleton.

Further, the training process of the ST-GCN model comprises the following steps:

constructing a deep learning training set and a testing set from a fallen human skeleton sample and a normal activity human skeleton sample;

and training the ResNet-50-based network model by using an ST-GCN framework based on the training set and the test set to obtain the ST-GCN model.

Furthermore, the loss function of the network model is a standard cross entropy loss function, and the parameter learning uses a standard random gradient descent algorithm.

Further, the training process of the S3DFAST dual-stream model includes:

constructing a deep learning training set and a testing set from the falling short video and the normal activity short video;

and training a double-flow-based network model by using an S3DFAST framework based on the training set and the test set to obtain an S3DFAST double-flow model.

Further, the loss function of the network model is a cross entropy loss function, and the parameter learning uses an adaptive learning rate gradient descent algorithm.

In a second aspect, the present invention further provides an electronic device, which includes a memory and a processor, where the memory stores a computer program operable on the processor, and the processor implements the steps of the above method when executing the computer program.

In a third aspect, the invention also provides a computer readable storage medium having stored thereon computer executable instructions which, when invoked and executed by a processor, cause the processor to execute the method.

The method for recognizing the falling behavior based on the video classification comprises the steps of firstly detecting key points of a human skeleton, then tracking a human target, and obtaining a motion track of the human target and a motion change process of the key points of the human skeleton. Then, a first identification result with space dimension information is obtained by utilizing an ST-GCN model to identify the falling behavior in a human skeleton timing diagram, a second identification result with time dimension information is obtained by utilizing an S3DFAST double-flow model to identify the falling behavior in a video sequence, and finally comprehensive judgment is carried out to obtain a final falling identification result.

Accordingly, the electronic device and the computer-readable storage medium provided by the embodiment of the invention also have the technical effects.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a flowchart of a fall behavior recognition method according to an embodiment of the present invention;

FIG. 2 is a flow chart of a training process of an ST-GCN model according to an embodiment of the present invention;

fig. 3 is a flowchart of a training process of the S3DFAST dual-stream model according to an embodiment of the present invention.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "comprising" and "having," and any variations thereof, as referred to in embodiments of the present invention, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements but may alternatively include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The current fall detection technology based on images has the following defects:

(1) Only the space dimension information of the image is used, the time dimension information is not used, the characteristic dimension is single, and the information is incomplete;

(2) The characteristics learned by combining the spatial dimension information are not comprehensive, and key characteristics capable of representing the behavior need to be manually searched, for example, the falling behavior needs to be identified by combining the characteristics of the height, the speed and the like of a human body, so that the manual consumption for manually selecting the characteristics is increased;

(3) The incompleteness of the information causes the error rate of the behavior recognition algorithm to be high and the recognition rate to be low.

Based on this, an embodiment of the present invention provides a fall behavior identification method based on video classification, as shown in fig. 1, the method includes the following steps:

s1: and detecting the key points of the human skeleton in the video frame image to be identified.

S2: tracking a human body target based on the human body skeleton key points, and acquiring a motion track of the human body target and a motion change process of the human body skeleton key points;

s3: recognizing a falling behavior in a human skeleton timing diagram by using an ST-GCN model to obtain a first recognition result;

s4: recognizing a falling behavior in the video sequence by using an S3DFAST double-flow model to obtain a second recognition result;

s5: and comprehensively judging the first recognition result and the second recognition result to obtain a final fall recognition result.

According to the embodiment of the invention, comprehensive judgment is carried out according to the first identification result with the space dimension information and the second identification result with the time dimension information to obtain the final fall identification result, and the space dimension information and the time dimension information are integrated, so that the information is more comprehensive, and the problem of lower identification accuracy in the prior art is solved. In addition, key features which can represent the behaviors do not need to be found manually, and manual consumption of manually selecting the features is saved, so that the embodiment of the invention has the advantages of high identification precision, good real-time performance and multi-scene practicability.

In a possible embodiment, the step S1 includes:

for a section of monitoring image video, detecting a video frame image to be recognized by utilizing an OpenPose model to obtain a human skeleton key point and a position of the human skeleton key point.

The OpenPose human posture recognition model is an open source library developed based on a convolutional neural network and supervised learning and taking caffe as a framework. The gesture estimation of human body action, facial expression, finger motion and the like can be realized. The method is suitable for single person and multiple persons, has extremely good robustness, and is the first real-time multi-person two-dimensional attitude estimation application based on deep learning in the world.

In a possible implementation, the step S2 includes:

and tracking the human body target by using Deepsort based on the key points of the human body skeleton, and acquiring the motion trail of the human body target and the motion change process of the key points of the human body skeleton. The Deepsort algorithm is an improved algorithm on the basis of the sort algorithm, cascade Matching (Matching Cascade) and confirmation (confirmed) of a new track are added, and tracking of a human body target is more accurate.

In one possible embodiment, as shown in fig. 2, the training process of the ST-GCN model includes:

and constructing a deep learning training set and a testing set and a human skeleton space-time diagram by using the fallen human skeleton sample and the normal activity human skeleton sample.

The ResNet-50 based network model is trained using the ST-GCN (space-time graph convolutional neural network) framework based on a training set and a test set.

Configuring a deep learning parameter training model: the size of the model of the pyrrch file is about 0.5M, ST-GCN is the combination of TCN (Temporal Convolutional Network) and GCN (Graph Convolutional Network); the data dimension of the model input is (N, C, T, V, M), such as (256,3, 32, 18,2).

Wherein: n represents the number of videos of one batch (batch size = 256); c represents a joint feature (3); t represents the number of key frames (32); v represents the number of joints (18 joint points); m represents the number of people in a frame (2).

The loss function uses a standard cross entropy loss function (CrossEntropyLoss), and the parameter learning uses a standard random Gradient Descent algorithm (SGD).

And finally obtaining an ST-GCN human skeleton falling behavior identification model.

As shown in fig. 3, in one possible implementation, the training process of the S3DFAST dual-stream model includes:

and constructing a deep learning training set and a testing set by the falling short video and the normal activity short video, and constructing a human skeleton space-time diagram.

Based on the training set and the test set, a dual stream based network model is trained using the S3DFAST framework.

Configuring a deep learning parameter training model: the size of a model of a pytore file of the model is about 2.9M, the model structure has two channels for stream transmission, wherein one channel is a Fast channel and mainly used for capturing motion information, and the other channel is a Slow channel and mainly used for capturing detail information; inputting a single sample with the size of 3 × 112 by the model, and inputting a video segment frame number Fast channel with 8 frames; the Slow channel is 16 frames. The loss function uses a cross entropy loss function, and the parameter learning uses an adaptive learning rate gradient descent algorithm (ADAM).

And finally obtaining an S3DFAST falling behavior identification model.

In a possible implementation, the step S5 includes:

and comprehensively judging the first identification result with the space dimension information and the second identification result with the time dimension information, for example, taking the intersection of the two identification results to obtain a final fall identification result.

An embodiment of the present invention further provides an electronic device, which includes a memory and a processor, where the memory stores a computer program that can run on the processor, and the processor implements the steps of the method when executing the computer program.

In accordance with the above method, embodiments of the present invention also provide a computer readable storage medium storing machine executable instructions, which when invoked and executed by a processor, cause the processor to perform the steps of the above method.

The apparatus provided by the embodiment of the present invention may be specific hardware on the device, or software or firmware installed on the device, etc. The device provided by the embodiment of the present invention has the same implementation principle and technical effect as the method embodiments, and for the sake of brief description, reference may be made to the corresponding contents in the method embodiments without reference to the device embodiments. It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the foregoing systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The functions may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present invention or a part thereof which substantially contributes to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; and the modifications, changes or substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention. Are intended to be covered by the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A fall behavior identification method based on video classification is characterized by comprising the following steps:

detecting human skeleton key points in a video frame image to be identified;

recognizing a falling behavior in a human skeleton timing diagram by using an ST-GCN model to obtain a first recognition result;

recognizing a falling behavior in the video sequence by using an S3DFAST double-flow model to obtain a second recognition result;

and comprehensively judging the first recognition result and the second recognition result to obtain a final fall recognition result.

2. The method according to claim 1, wherein the step of detecting the key points of the human skeleton in the video frame image to be identified comprises:

3. The method according to claim 1, wherein the step of tracking the human body target based on the human body skeleton key points to obtain the motion trail of the human body target and the motion change process of the human body skeleton key points comprises:

4. The method according to claim 1, wherein the training process of the ST-GCN model comprises:

5. The method of claim 4, wherein the loss function of the network model is a standard cross entropy loss function, and wherein the parameter learning uses a standard stochastic gradient descent algorithm.

6. The method of claim 1, wherein the training process of the S3DFAST dual-stream model comprises:

7. The method of claim 6, wherein the loss function of the network model is a cross-entropy loss function, and wherein the parameter learning uses an adaptive learning rate gradient descent algorithm.

8. An electronic device comprising a memory and a processor, wherein the memory stores a computer program operable on the processor, and wherein the processor implements the steps of the method of any of claims 1 to 7 when executing the computer program.

9. A computer readable storage medium, having stored thereon computer executable instructions, which when invoked and executed by a processor, cause the processor to execute the method of any of claims 1 to 7.