CN110705390A

CN110705390A - Body posture recognition method and device based on LSTM and storage medium

Info

Publication number: CN110705390A
Application number: CN201910875154.8A
Authority: CN
Inventors: 董洪涛
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-09-17
Filing date: 2019-09-17
Publication date: 2020-01-17
Also published as: WO2021051579A1

Abstract

The invention relates to the technical field of biological recognition, and provides a body posture recognition method based on LSTM, which comprises the following steps: acquiring a motion video of a subject to be identified; extracting motion characteristic information in the obtained motion video of the main body to be identified through OpenPose; the action characteristic information at least comprises skeleton key point information; identifying the action standardization degree corresponding to the action characteristic information according to the action characteristic information and a body posture identification model generated by pre-training; the body posture recognition model is a target neural network model generated according to a preset standard action; and the target neural network model is generated by training according to the standard action characteristic information arranged according to the time sequence. According to the method, the video motion is not required to be cut into isolated features for recognition, learning recognition is carried out by matching with a neural network model, the body posture recognition process is fast and accurate, and the user experience is improved.

Description

Body posture recognition method and device based on LSTM and storage medium

Technical Field

The invention relates to the technical field of biological recognition, in particular to a body posture recognition method and device based on LSTM and a computer readable storage medium.

Background

Many activities, particularly athletic sports such as swimming, table tennis, diving, gymnastics, etc., require specific requirements for each minute action during training and competition, in order to achieve better training or competition results. In the process of physical training, most of the adjustment and correction of a plurality of actions are finished by a special coach in the process of professional instruction, and sportsmen can hardly find the action errors of the sportsmen in the process of sports. In the existing body-building games, such as aerobics games, broadcast gymnastics games, dance games and the like, the judgment is used for scoring the individual performance, and although the judgment is a professional, the objectivity of the performance is influenced by misjudgment, missing judgment, subjective bias and the like.

Therefore, it is desirable to have a computer capable of analyzing and judging the movement and body posture of the exerciser to some extent instead of the coach and the referee.

Most of the existing gesture scoring systems adopt a sensor and APP mode, a sensor is used for capturing a motion mode of a human body, and the APP is used for presenting relevant data. Cannot be used for precise action determination.

Motion recognition techniques involve computer vision, pattern recognition, and the like. The purpose of human body action recognition is to accurately recognize and classify the collected human body action characteristics and actions in a complete human body action posture library in time, match out and output an action posture with the highest similarity.

The current motion recognition technology is mainly divided into two types, one is based on an RGB image, and the other is based on a depth image. However, both methods have their own disadvantages, and the RGB images contain too much information, which is not conducive to the extraction of the motion posture features. In the depth image, the phenomenon that limbs are mutually shielded easily occurs, and the identification accuracy is influenced.

Therefore, there is a need for an action gesture recognition method that can increase the detection speed without losing the detection accuracy.

Disclosure of Invention

The invention provides a body posture recognition method based on an LSTM, an electronic device and a computer readable storage medium, and the method is mainly used for obtaining a trained body posture recognition model by taking a video set of standard actions arranged according to a time sequence as a training dictionary, taking a neural network LSTM comprising a Masking layer and a Softmax layer as a model, taking human skeleton characteristic points in the video of the standard actions as targets and taking a connection-meaning time classifier (CTC) as a training criterion to train the model.

In order to achieve the above object, the present invention provides a body posture recognition method based on LSTM, which comprises:

acquiring a motion video of a subject to be identified;

extracting motion characteristic information in the obtained motion video of the main body to be identified through OpenPose; the action characteristic information at least comprises skeleton key point information;

identifying the action standardization degree corresponding to the action characteristic information according to the action characteristic information and a body posture identification model generated by pre-training; the body posture recognition model is a target neural network model generated according to a preset standard action; and the target neural network model is generated by training according to the standard action characteristic information arranged according to the time sequence.

In one embodiment, the neural network of the body posture recognition model comprises a Masking layer, a Softmax layer, and an LSTM layer between the Masking layer and the Softmax layer.

In one embodiment, the objective function of the body posture recognition model is CTC; the body gesture recognition model outputs a Loss value through the CTC, and the smaller the output Loss value is, the higher the motion normalization of the body to be recognized corresponding to the Loss value is.

In one embodiment, the step of extracting, by openpos, the motion feature information in the motion video of the acquired subject to be recognized includes:

acquiring the motion video of the main body to be recognized by taking a preset time period or a beat as a time unit, and extracting image frames from the motion video of the main body to be recognized by utilizing OpenPose;

determining a plurality of skeleton key feature points of the main body to be identified according to the extracted image frame;

and integrating the plurality of skeleton key feature points into skeleton key point information of the main body to be identified.

In one embodiment, the skeleton key point information is (x, y, v), where x and y are abscissa information and v is state information of the skeleton key point; the state of the keypoints includes visible, invisible, and not within the graph.

In one embodiment, after the extracting, by openpos, motion feature information in the motion video of the acquired subject to be recognized, the method further includes: and storing the skeleton key point information in a JSON format.

In one embodiment, before the identifying the motion normalization corresponding to the motion feature information according to the motion feature information and the pre-trained body posture recognition model, the method further includes: and (3) performing iterative training on the body posture recognition model by using the CTC in a Softmax layer until the Loss value output by the CTC is greater than or equal to a set threshold value.

In addition, to achieve the above object, the present invention also provides an electronic device including: the body posture identifying program based on the LSTM is arranged in the memory, and when the body posture identifying program based on the LSTM is executed by the processor, the following steps are realized: s110, acquiring a motion video of a subject to be identified; s120, extracting the motion characteristic information in the motion video of the obtained subject to be identified through OpenPose; the action characteristic information at least comprises skeleton key point information; s130, identifying the action standardization degree corresponding to the action characteristic information according to the action characteristic information and a body posture identification model generated by pre-training; the body posture recognition model is a target neural network model generated according to a preset standard action; and the target neural network model is generated by training according to the standard action characteristic information arranged according to the time sequence.

acquiring the motion video of the main body to be recognized by taking a preset time period or a beat as a time unit, and extracting image frames from the motion video of the main body to be recognized by utilizing OpenPose; s320, determining a plurality of skeleton key feature points of the main body to be identified according to the extracted image frame; s330, integrating the plurality of skeleton key feature points into skeleton key point information of the main body to be recognized.

Further, to achieve the above object, the present invention also provides a computer readable storage medium, in which an LSTM based body posture identifying program is stored, and when the LSTM based body posture identifying program is executed by a processor, the steps of the LSTM based body posture identifying method as described above are implemented.

The method takes a video set of standard actions as a training dictionary, takes a neural network LSTM comprising a Masking layer and a Softmax layer as a model, takes human skeleton characteristic points in the video of the standard actions as targets and takes a connecting meaning time classifier CTC as a training criterion to train the model, thereby obtaining a trained body posture recognition model. The body posture recognition method based on the LSTM can realize the effects of shortening the model training period and improving the model training precision, does not need to cut video actions into isolated features for recognition, and is matched with a neural network model for learning and recognition; the method is quick and accurate, and the user experience is improved.

Drawings

FIG. 1 is a flow chart of a preferred embodiment of the LSTM-based body gesture recognition method of the present invention;

FIG. 2 is a schematic diagram of key feature points of the skeleton of the present invention;

FIG. 3 is a flowchart illustrating a method for obtaining skeleton key point information according to a preferred embodiment of the present invention;

FIG. 4 is a schematic diagram of an application environment of the LSTM-based body gesture recognition method according to an embodiment of the present invention;

the implementation, functional features and advantages of the objects of the present invention will be further described with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The invention provides a body posture recognition method based on LSTM. Referring to fig. 1, a flow chart of a preferred embodiment of the LSTM-based body gesture recognition method of the present invention is shown. The method may be performed by an apparatus, which may be implemented by software and/or hardware.

It should be noted that we use a Recurrent Neural Network (RNN) based on LSTM (Long-Short Term Memory) to build a basic framework for learning effective features and modeling a dynamic process of a time domain, thereby implementing end-to-end behavior identification and detection. The long-term and short-term memory network is a time cycle neural network and is suitable for processing and predicting important events with relatively long intervals and delays in a time sequence.

Specifically, the method is based on detection of human skeleton feature points of a dynamic video, an open source algorithm OpenPose algorithm proposed by the university of Kingjiron in a card is used for extracting human posture feature points in a game video, the extracted human posture feature points are compared with a model trained in a neural network, a difference range is obtained, the score with the minimum difference is the highest, and otherwise, the score is low, so that accurate recognition of the body posture is achieved.

In this embodiment, the body posture recognition method based on LSTM includes: step S110-step S130.

And S110, acquiring the motion video of the subject to be identified.

In a specific embodiment, the video of the action of the subject to be scored is the video of the action of a contestant in a gymnastic game. For the obtained motion video, preprocessing or denoising is performed firstly.

S120, extracting the motion characteristic information in the motion video of the obtained subject to be identified through OpenPose; the action characteristic information at least comprises skeleton key point information.

The movement of human body can be described by the movement of key nodes of some main skeletons (skeleton key points for short), therefore, as long as the combination and tracking of a plurality of key nodes of human body skeletons can form the portrayal of various behaviors such as dancing, walking, running and the like, and the behaviors are identified by the movement of the key nodes of the human body skeletons.

Of course, joint displacement information of the human body, body surface key point displacement information, and the like may be used as the motion characteristic information of the subject to be identified.

Specifically, the invention optimizes recognition performance in one embodiment by introducing common features of skeleton key points in actions into the LSTM network as constraints of network parameter learning. A certain behavioral action of a person is often closely related to a set of some feature key points of the skeleton, and the interaction of the nodes in this set.

In a specific embodiment, key points of the eighth set of broadcast gymnastics, such as the "nose", "knee", "ankle" and "hand", constitute a set of nodes with discriminative power.

FIG. 2 is a schematic diagram of key feature points of the framework of the present invention. Referring to fig. 2, the number of the skeleton key point feature points is multiple, specifically 18 as shown.

OpenPose is a real-time multi-person key point detection library and is written based on OpenCV and Caffe; OpenPose has very strong performance, and 18 skeleton key feature points of the nose, the neck, the right shoulder, the right elbow, the right wrist, the left shoulder, the left elbow, the left wrist, the right hip, the right knee, the right ankle, the left hip, the left knee, the left ankle, the right eye, the left eye, the right ear and the left ear are selected for detection according to the skeleton characteristics and dance motion characteristics of a human body.

The subject to be identified is the broadcast gymnastics performer, and the action video of the broadcast gymnastics performer takes eight beats of 1, 2, 3, 4, 5, 6, 7 and 8 as a time unit. The number of frames of the video is determined according to the actual size of the video, and in the case of a broadcast gymnastics, the one eight beat is about 8 seconds, and the one beat is about 200 frames in terms of the frame rate 25. If the length of the video is 8 seconds, the frame number is 200, and each frame has 18 × 2 values, 200 × 36 values are input, and if the number of values in the video is not enough, zero is used for padding. That is, first, the sequence is converted into a fixed-length sequence, the sequence in the present application includes 200 × 36 values, and if the selected video is less than the sequence of the length, 0 is complemented; the image frames are arranged in time sequence by taking eight beats as a fixed-length sequence and taking eight beats as a unit.

An exemplary description is as follows: the broadcast gymnastics (also known as "popular broadcast gymnastics") consists of 8 sections of free-hand gymnastics, totaling 4 minutes and 45 seconds. Each section of motion is divided into four eight beats; take lower limb movement as an example: the first eight-claps, 1, the left foot moves forward one step, the right foot moves backward to cushion the sole, the two arms bend the elbows at the same time, the chest is crossed, the fist is held by the hand, and the fist heart moves inward; the 18 skeletal key feature points in this one eight beat were captured.

Fig. 3 is a schematic diagram of a method for acquiring skeleton key point information according to the present invention. Referring to fig. 3, the method for obtaining the skeleton key point information includes: s310, acquiring a motion video of the main body to be recognized by taking a preset time period or a beat as a time unit, and extracting an image frame from the motion video of the main body to be recognized by utilizing OpenPose; s320, determining a plurality of skeleton key feature points of the main body to be identified according to the extracted image frame; s330, integrating the plurality of skeleton key feature points into skeleton key point information of the main body to be recognized.

The preset time interval or the beat is a time definition sequence unit, and a fixed time interval can be used as a unit, and a beat can also be selected as a unit.

It should be noted that, the skeleton key point information (x, y, v) includes three pieces of information, x and y are horizontal and vertical coordinate information in the image, and v represents state information of the skeleton key point, i.e., visible, invisible, and not in the image (or cannot be inferred). Wherein, not in the figure means that the skeleton key point is not located in the figure shown in the image frame.

And the information of the skeleton key points is stored in a JSON format. Each JSON file corresponds to a data set respectively, and each item in one JSON file stores the position of a human body frame and the position of a human body skeleton key point of one picture in the data set.

It should be further noted that the information stored in the data set includes the file name of the stored image, the position of the stored human body frame, and the position of the human body skeleton key point; the position of the human body frame comprises four parameters, the first two parameters are coordinate values of the upper left angular point of the human body frame, and the second two parameters are coordinate values of the lower right angular point of the human body frame; and in the key point position of the human skeleton, V represents the state of the key point, vi ═ 1 is visible, vi ═ 2 is invisible, and vi ═ 3 is not in the graph or can not be speculated. Wherein, only the information of skeleton key points in the vi ═ 1 visible state needs to be returned, and the key points in other states are replaced by (0, 0, 0).

S130, identifying the action standardization degree corresponding to the action characteristic information according to the action characteristic information and a body posture identification model generated by pre-training; the body posture recognition model is a target neural network model generated according to a preset standard action; and the target neural network model is generated by training according to the standard action characteristic information arranged according to the time sequence.

In a specific embodiment, the training sample data set is composed of a video set of normative movements, wherein, still taking broadcast gymnastics as an example, the training sample data set is composed of 8-section free-hand gymnastics of stretching movement, chest expanding movement, kicking movement, side movement, body turning movement, whole body movement, jumping movement and finishing movement; the inside involves various movements such as flexion, extension, rotation, balance, jumping, etc.; each section of free-hand operation can be decomposed into 4-5 actions; the training sample data set comprises 2640 videos consisting of 40 actions; each video consists of 2-10 actions. The start time and the end time of the motion are determined, and the characteristic information of the motion is arranged according to the time sequence in the time period.

Specifically, the form classifier can recognize each frame of motion through the motion video, and the motion category is limited to the motion category contained in the training data; still taking the example of a broadcast gymnastics, the action classes are limited to 40 action classes contained in the training data.

An exemplary description is as follows: a video with the beat of 10 seconds and the frame rate of 25 is 250 frames in total, each frame is detected by the skeleton key point of an openposition frame, and 18 values are output; concatenating 18 values, i.e. 250 x 18 values as input, into the network; and obtaining a classification model, and obtaining the classification of how many beats exist. The inputs to this model are a number of 18 values, and the outputs are the labels "X motion Y beat" and Loss value. The labels are illustrated below: "head movement beat 1".

In a specific embodiment, the objective function of the body posture recognition model is CTC; and the body posture recognition model outputs a Loss value through the CTC, and the smaller the Loss value is, the higher the action specification degree of the body to be recognized is.

The invention utilizes the LSTM neural network structure to combine with CTC to carry out the model training of the attitude classifier, and finally obtains the LSTM-CTC attitude classification model. Taking a video set of standard actions as a training dictionary, taking a layer of convolutional neural network CNN plus a five-layer time recursive neural network LSTM as a model, taking human skeleton feature points in the standard action video as targets, and taking a connection meaning time classifier CTC as a training criterion to train the model to obtain a trained CTC posture model; and in the network training process, a neural network model and a loss network function model are trained through a stochastic gradient descent algorithm. The neural network includes: LSTM model + CTC model; wherein: the LSTM + CTC outputs the corresponding class label and the Loss value. The image is firstly read through the RNN, then the image is converted into a matrix, then the number of rows and columns of the matrix is taken, and finally the shape is changed by using a reshape function. The transformed features were fed to the LSTM.

The specific LSTM algorithm is an algorithm proposed due to the deficiencies of gradient disappearance and gradient burst of RNN. And has short-term memory ability. The update of the gradient is generally performed using bptt (back Propagation Through time). In the LSTM network, the neurons in the general RNN network are replaced with blocks.

The network structure of the neural network includes: the front of the LSTM layer is a Masking layer, and the rear of the LSTM layer is a Softmax layer.

Wherein, the Masking layer has the function of filtering 0; the filter character is then specified in mask _ value in the Masking layer. As described above, the complementary 0 s in the sequence are all filtered out; in addition, the embedding layer also has a filtering function, but unlike the masking layer, it can only filter 0, cannot specify other characters, and because of the embedding layer, it maps sequences into a space of a fixed dimension. Therefore, using a Masking layer may be more suitable than using an Embedding layer. The Softmax layer is used for classification.

In a specific embodiment, the LSTM layer concatenates each frame of data according to a step size of 200;

the computation process of CTC Loss, i.e., the Loss of one forward propagation. Is the negative logarithm of the current sequence tag. Obtained using the following formula:

L(S)＝-ln∏_(x,z)∈Sp(z|x)＝-∑_(x,z)∈Slnp(z|x)

where p (z | x) represents the probability of outputting a sequence z given an input x, and S is the training set. That is, the product of the probabilities of outputting the correct label after a sample is given, and then taking the negative logarithm is the loss function, and after taking the negative sign, the probability of outputting the correct label is maximized by minimizing the loss function.

That is, before the body posture identification model generated according to the motion characteristic information and the pre-training identifies the motion normalization corresponding to the motion characteristic information, the body posture identification model is subjected to iterative training by using a CTC in a Softmax layer until the Loss value output by the CTC is greater than or equal to a set threshold value. In the training process of the model, the smaller the Loss value, the higher the accuracy of the model, and the general accuracy requirement is 98%.

According to the body posture identification method based on the LSTM, the differences between the competition actions, speed, direction, angles and the like of the competitors and the standard actions can be analyzed without wearing a sensor on the trainers, and the precision of competition scoring is improved.

The invention provides a body posture recognition device based on LSTM, which comprises a skeleton key point information acquisition unit, a body posture recognition model training unit and a body posture recognition model detection unit; the framework key point information acquisition unit is used for acquiring action videos of the participants by using the camera device; skeleton key point detection is carried out on the action video of the subject to be identified through OpenPose to obtain skeleton key point information of the participants; the body posture recognition model training unit is used for acquiring the body posture recognition model through a training step; the body posture recognition model detection unit is used for carrying out classification recognition on the action skeleton key point information by using a body posture recognition model to obtain a classification label and calculate a Loss function Loss; and judging the similarity between the main body action to be identified and the standard action according to the Loss function Loss.

The invention provides a body posture recognition method based on LSTM, which is applied to an electronic device 4. Fig. 4 is a schematic diagram of an application environment of the LSTM-based body posture recognition method according to a preferred embodiment of the present invention.

In the present embodiment, the electronic device 4 may be a terminal device having an arithmetic function, such as a server, a smart phone, a tablet computer, a portable computer, or a desktop computer.

The electronic device 4 includes: a processor 42, a memory 41, an imaging device 43, a network interface 44, and a communication bus 45.

The memory 41 includes at least one type of readable storage medium. The at least one type of readable storage medium may be a non-volatile storage medium such as a flash memory, a hard disk, a multimedia card, a card-type memory 41, and the like. In some embodiments, the readable storage medium may be an internal storage unit of the electronic device 4, such as a hard disk of the electronic device 4. In other embodiments, the readable storage medium may also be an external memory 41 of the electronic device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the electronic device 4.

In the present embodiment, the readable storage medium of the memory 41 is generally used for storing the LSTM-based body posture recognition program 40 and the like installed in the electronic apparatus 4. The memory 41 may also be used to temporarily store data that has been output or is to be output.

Processor 42, which in some embodiments may be a Central Processing Unit (CPU), microprocessor or other data Processing chip, executes program code or processes data stored in memory 41, such as executing LSTM-based body gesture recognition program 40.

The imaging device 43 may be a part of the electronic device 4 or may be independent of the electronic device 4. In some embodiments, the electronic device 4 is a terminal device having a camera, such as a smart phone, a tablet computer, a portable computer, etc., and the camera 43 is the camera of the electronic device 4. In other embodiments, the electronic device 4 may be a server, and the camera 43 is independent from the electronic device 4 and connected to the electronic device 4 through a network, for example, the camera 43 is installed in a specific location, such as an office or a monitoring area, captures a real-time image of an object entering the specific location in real time, and transmits the captured real-time image to the processor 42 through the network.

The network interface 44 may optionally include a standard wired interface, a wireless interface (e.g., a WI-FI interface), and is typically used to establish a communication link between the electronic apparatus 4 and other electronic devices.

A communication bus 45 is used to enable connection communication between these components.

Fig. 4 only shows the electronic device 4 with components 41-45, but it is to be understood that not all of the shown components are required to be implemented, and that more or fewer components may alternatively be implemented.

In one embodiment of the present invention, the electronic device 4 may further include a user interface, the user interface may include an input unit such as a Keyboard (Keyboard), a voice input device such as a microphone (microphone) or other devices with voice recognition function, a voice output device such as a sound box, an earphone, or other devices, and optionally, the user interface may further include a standard wired interface or a wireless interface.

Furthermore, the electronic device 4 may further comprise a display, which may also be referred to as a display screen or a display unit. In some embodiments, the display device may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an Organic Light-Emitting Diode (OLED) touch device, or the like. The display is used for displaying information processed in the electronic apparatus 4 and for displaying a visualized user interface.

In addition, the electronic device 4 further includes a touch sensor. The area provided by the touch sensor for the user to perform touch operation is called a touch area. Further, the touch sensor described herein may be a resistive touch sensor, a capacitive touch sensor, or the like. The touch sensor may include not only a contact type touch sensor but also a proximity type touch sensor. Further, the touch sensor may be a single sensor, or may be a plurality of sensors arranged in an array, for example.

The area of the display of the electronic device 4 may be the same as or different from the area of the touch sensor. Optionally, a display is layered with the touch sensor to form a touch display screen. The device detects touch operation triggered by a user based on the touch display screen.

Optionally, the electronic device 4 may further include a Radio Frequency (RF) circuit, a sensor, an audio circuit, and the like, which are not described in detail herein.

In the apparatus embodiment shown in fig. 4, a memory 41, which is a type of computer storage medium, may include therein an operating system, and an LSTM-based body gesture recognition program 40; the processor 42, when executing the LSTM based body pose recognition program 40 stored in the memory 41, implements the following steps:

s110, acquiring a motion video of a subject to be identified; s120, extracting the motion characteristic information in the motion video of the obtained subject to be identified through OpenPose; the action characteristic information at least comprises skeleton key point information; s130, identifying the action standardization degree corresponding to the action characteristic information according to the action characteristic information and a body posture identification model generated by pre-training; the body posture recognition model is a target neural network model generated according to a preset standard action; and the target neural network model is generated by training according to the standard action characteristic information arranged according to the time sequence. .

In the electronic device provided in the above embodiment, the video set of the standard motion is used as a training dictionary, the neural network LSTM including a Masking layer and a Softmax layer is used as a model, the human skeleton feature points in the video of the standard motion are used as targets, and the connection-meaning time classifier CTC is used as a training criterion to train the model, so as to obtain a trained body posture recognition model, thereby performing correct posture recognition.

In other embodiments, the LSTM-based body gesture recognition program 40 may also be divided into one or more modules, which are stored in the memory 41 and executed by the processor 42 to accomplish the present invention. A module as referred to herein is a series of computer program instruction segments capable of performing a specified function.

The LSTM-based body pose recognition program 40 may be segmented into: the method comprises a framework key point information acquisition subprogram, a body posture recognition model training subprogram and a body posture recognition model detection subprogram; the framework key point information acquisition subprogram is used for acquiring the action video of the main body to be identified by utilizing the camera device; skeleton key point detection is carried out on the action video of the main body to be recognized through OpenPose, and skeleton key point information of the main body to be recognized is obtained; a body posture recognition model training subprogram for obtaining the body posture recognition model through the training step; a body posture recognition model detection subprogram for classifying and recognizing the action skeleton key point information by using a body posture recognition model to obtain a classification label and calculate a Loss function Loss; and judging the similarity between the main body action to be identified and the standard action according to the Loss function Loss.

Furthermore, an embodiment of the present invention further provides a computer-readable storage medium, in which an LSTM-based body posture identifying program is stored, and when executed by a processor, the LSTM-based body posture identifying program implements the following operations:

The specific implementation of the computer-readable storage medium of the present invention is substantially the same as the specific implementation of the LSTM-based body gesture recognition method and the electronic device, and will not be described herein again.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments. Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solution of the present invention essentially or contributing to the prior art can be embodied in the form of a software product, which is stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g. a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the present specification and drawings, or used directly or indirectly in other related fields, are included in the scope of the present invention.

Claims

1. An LSTM-based body posture recognition method applied to an electronic device is characterized by comprising the following steps:

acquiring a motion video of a subject to be identified;

2. The LSTM-based body gesture recognition method of claim 1, wherein the neural network of the body gesture recognition model comprises a Masking layer, a Softmax layer and an LSTM layer between the Masking layer and the Softmax layer.

3. The LSTM-based body gesture recognition method of claim 2, wherein the objective function of the body gesture recognition model is CTC; the body gesture recognition model outputs a Loss value through the CTC, and the smaller the output Loss value is, the higher the motion normalization of the body to be recognized corresponding to the Loss value is.

4. The LSTM-based body posture identifying method according to claim 1, wherein the step of extracting the motion feature information in the motion video of the obtained subject to be identified by openpos comprises:

5. The LSTM-based body pose recognition method of claim 4, wherein the skeleton key point information is (x, y, v), where x and y are the abscissa and ordinate information of the skeleton key point, and v is the state information of the skeleton key point;

the state of the keypoints includes visible, invisible, and not within the graph.

6. The LSTM-based body posture recognition method according to claim 5, after extracting the motion feature information in the motion video of the obtained subject to be recognized through openpos, further comprising: and storing the skeleton key point information in a JSON format.

7. The LSTM-based body posture identifying method of claim 3, further comprising, before identifying the motion normalization corresponding to the motion feature information according to the motion feature information and a pre-trained body posture identifying model:

and (3) performing iterative training on the body posture recognition model by using the CTC in a Softmax layer until the Loss value output by the CTC is greater than or equal to a set threshold value.

8. An electronic device comprising a memory, a processor, and an imaging device, wherein an LSTM-based body pose recognition program is stored in the memory, and wherein the LSTM-based body pose recognition program, when executed by the processor, implements the steps of:

acquiring a motion video of a subject to be identified;

9. The electronic device of claim 8,

the step of extracting the motion characteristic information in the motion video of the obtained subject to be identified through openpos comprises the following steps:

10. A computer readable storage medium, in which an LSTM based body pose recognition program is stored, which when executed by a processor, performs the steps of the LSTM based body pose recognition method according to any one of claims 1 to 7.