CN114092854A - Intelligent rehabilitation auxiliary training system for spinal degenerative disease based on deep learning - Google Patents
Intelligent rehabilitation auxiliary training system for spinal degenerative disease based on deep learning Download PDFInfo
- Publication number
- CN114092854A CN114092854A CN202111295019.XA CN202111295019A CN114092854A CN 114092854 A CN114092854 A CN 114092854A CN 202111295019 A CN202111295019 A CN 202111295019A CN 114092854 A CN114092854 A CN 114092854A
- Authority
- CN
- China
- Prior art keywords
- sequence
- skeleton
- frame
- classification
- limb
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012549 training Methods 0.000 title claims abstract description 69
- 238000013135 deep learning Methods 0.000 title claims abstract description 29
- 208000015122 neurodegenerative disease Diseases 0.000 title claims abstract description 16
- 239000003814 drug Substances 0.000 claims abstract description 19
- 238000012937 correction Methods 0.000 claims abstract description 15
- 238000013136 deep learning model Methods 0.000 claims abstract description 14
- 230000011218 segmentation Effects 0.000 claims abstract description 13
- 210000003414 extremity Anatomy 0.000 claims description 65
- 238000000034 method Methods 0.000 claims description 60
- 230000008569 process Effects 0.000 claims description 43
- 230000009471 action Effects 0.000 claims description 29
- 238000013519 translation Methods 0.000 claims description 17
- 239000013598 vector Substances 0.000 claims description 16
- 238000007781 pre-processing Methods 0.000 claims description 15
- 238000006243 chemical reaction Methods 0.000 claims description 12
- 230000003287 optical effect Effects 0.000 claims description 10
- 238000010606 normalization Methods 0.000 claims description 9
- 238000001514 detection method Methods 0.000 claims description 8
- 230000000007 visual effect Effects 0.000 claims description 8
- 238000013145 classification model Methods 0.000 claims description 7
- 230000033001 locomotion Effects 0.000 claims description 7
- 238000000926 separation method Methods 0.000 claims description 7
- 238000011156 evaluation Methods 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 6
- 238000012795 verification Methods 0.000 claims description 6
- 230000006399 behavior Effects 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 4
- 239000011159 matrix material Substances 0.000 claims description 4
- 230000009466 transformation Effects 0.000 claims description 4
- 230000000052 comparative effect Effects 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims description 3
- 238000000638 solvent extraction Methods 0.000 claims description 3
- 230000004913 activation Effects 0.000 claims description 2
- 230000008859 change Effects 0.000 claims description 2
- 230000001186 cumulative effect Effects 0.000 claims description 2
- 238000013461 design Methods 0.000 claims description 2
- 230000007306 turnover Effects 0.000 claims description 2
- 230000003412 degenerative effect Effects 0.000 claims 1
- 201000010099 disease Diseases 0.000 claims 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims 1
- 230000036541 health Effects 0.000 abstract description 2
- 238000013473 artificial intelligence Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 238000002474 experimental method Methods 0.000 description 5
- 238000012360 testing method Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 210000001217 buttock Anatomy 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 210000002414 leg Anatomy 0.000 description 2
- 238000011084 recovery Methods 0.000 description 2
- 101100037313 Drosophila melanogaster Rlip gene Proteins 0.000 description 1
- 101100140140 Xenopus laevis ralbp1-a gene Proteins 0.000 description 1
- 238000002679 ablation Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000005786 degenerative changes Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000003340 mental effect Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 230000000284 resting effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a spine degenerative disease intelligent rehabilitation auxiliary training system based on deep learning. The system comprises a traditional Chinese medicine guidance video real-time classification module based on deep learning and a video sequence dividing and evaluating module based on human body skeleton representation; the former obtains two-dimensional human skeleton data as training data of a learning model to carry out deep learning training to obtain a generalized deep learning model; finally, obtaining a real-time frame classification result; and the frame classification result is used for carrying out segmentation and real-time error correction on the framework sequences of the same category, and carrying out sequence comparison and scoring on the segmented sequence segments and the corresponding category expert group video framework sequence segments. The system of the invention can lead the patient to carry out the traditional Chinese medicine guidance operation training at any time without the guidance and the intervention of medical care personnel, is suitable for families and basic medical health institutions, can relieve the pressure of the medical care personnel and improve the flexibility and the accuracy of the rehabilitation training of the patient.
Description
Technical Field
The invention belongs to the technical field of computer vision video understanding, and particularly relates to a video sequence identification and evaluation system based on deep learning.
Background
The traditional Chinese medicine guidance is a health-care and treatment method with quite Chinese characteristics. It promotes recovery of limb movement by guiding the patient to perform shape-adjusting based on mental and resting, i.e. the main movement of the limb. With the continuous development of computer hardware and artificial intelligence technology, the research and development of intelligent auxiliary rehabilitation training systems have become a hot spot in related fields at home and abroad. In a computer vision task, video behavior recognition is a very challenging field, performance requirements of real-time processing cannot be met often based on an RGB frame input method, a method based on a skeleton sequence has lower time complexity, and extraction of skeleton information is relied on in an inference process. The invention adopts a recurrent neural network based on skeleton sequence input as a basic network framework for video behavior identification and adopts OpenPose[1]And (3) acquiring a skeleton sequence in real time by combining two-dimensional attitude estimation with sparse optical flow tracking, and finally performing skeleton sequence segmentation, patient power method scoring and action correction reminding according to a classification result. The invention combines the field of computer vision and the rehabilitation training of the traditional Chinese medicine guidance operation, develops an automatic evaluation and auxiliary training system for the rehabilitation training action aiming at the degenerative change of the spine, and realizes the automatic, accurate and intelligent rehabilitation training.
Disclosure of Invention
In order to improve the burden of traditional Chinese medicine rehabilitation medical personnel and improve the exercise effect of a patient with the spinal degenerative disease when the patient is subjected to rehabilitation training through the traditional Chinese medicine guidance, the invention provides an intelligent rehabilitation auxiliary training system aiming at the spinal degenerative disease based on deep learning, and the traditional Chinese medicine guidance exercise of the patient is automatically evaluated and corrected by combining an artificial intelligence technology.
The invention provides an intelligent rehabilitation auxiliary training system for spinal degenerative disease based on deep learning, which consists of a traditional Chinese medicine guidance video real-time classification module based on deep learning and a video sequence division and evaluation module based on human body skeleton representation.
The traditional Chinese medicine guidance video real-time classification module based on deep learning is realized through Openpos[1]Acquiring two-dimensional human skeleton data as training data of a deep learning model, and performing supervised deep learning training to obtain a generalized deep learning model; and then, performing real-time preprocessing on the input video by combining OpenPose frame separation detection with sparse optical flow tracking, and taking the result as an input feed-in pre-trained classification model to obtain a real-time frame classification result.
And the human body skeleton representation and video sequence division and evaluation module is used for carrying out segmentation and real-time error correction on skeleton sequences of the same category according to the frame classification result of the real-time classification module, and carrying out sequence comparison grading on the segmented sequence segments and the video skeleton sequence segments of the corresponding category expert group.
The intelligent rehabilitation auxiliary training system for the spinal degenerative disease based on deep learning provided by the invention comprises the following components:
the working content of the real-time classification module of the traditional Chinese medicine guidance video based on deep learning is as follows;
(1) acquiring training data of deep learning;
(2) training a deep learning model;
(3) the method comprises the steps of preprocessing videos in real time by combining OpenPose frame separation detection with sparse optical flow tracking and classifying the videos;
corresponding to the human body skeleton representation-based and video sequence partitioning and evaluating module, the working contents are as follows:
(4) carrying out segmentation and real-time error correction on the skeleton sequence based on the classification result;
(5) carrying out comparison scoring on the sequence segments after the segmentation is finished;
in the content (1), the specific operation flow of obtaining deep learning training data is as follows:
(11) processing video data, and clipping all the video data according to the design of the rehabilitation training skill and action to obtain short video data with the length not more than 1000 frames;
(12) performing left and right mirror image turnover on all short video data to serve as data enhancement;
(13) performing two-dimensional posture extraction on a video data sample processed in the process (11) by using an Openpos model pre-trained on a BODY _25 dataset to obtain a skeleton data sequence represented by two-dimensional space coordinates of 25 key points, wherein the 25 key points are respectively Nose, Neck, RShoulder, Relbow, RChrist, LShoulder, LElbow, LWrist, MidHip, Rip, RKne, Rankle, LHip, Lknee, Lankle, REye, LEye, reader, LEAr, LBigToe, LSmalToe, LHeel, RBigToe, RSmalToe, Rheel, Back;
(14) performing data preprocessing on all the framework data sequences, firstly performing translation operation, and subtracting the hip middle coordinate of the first effective frame from all the frame coordinates of the framework sequences; then, normalization operation is carried out, and the average distance d between shoulders of the skeleton sequence is calculatedsThen, all frame coordinates of the skeleton sequence are scaled by a scaling factor ofAnd finally, performing zero filling operation, filling all zeros behind the skeleton sequence, and fixing the length of the skeleton sequence to 1000 frames.
The specific process of training the deep learning model in the content (2) is as follows:
(21) the deep learning model adopts a classification network model, particularly adopts a hierarchical limb attention LSTM network, and the network comprises a visual angle conversion module, a limb attention module, a limb level classification module and a body level classification module; the hierarchical limb attention LSTM takes skeleton data as input, firstly, two-dimensional translation transformation is carried out on input sequence coordinates through a visual angle conversion module, then, each frame coordinate of a skeleton sequence is divided into 8 limb parts, and the limb parts pass through an LSTM layer and a Dropout layer of a limb level classification module respectively; then, the feature vector magnitude of 8 limb parts is connected and the space-time attention weight of each limb is calculated through a limb attention module; and finally, according to the space-time attention weight, corresponding to the feature vector magnitude of 8 limbs and weighting, and finally obtaining the classification score of the skeleton sequence on each frame through an LSTM layer, a Dropout layer and a softmax layer of a body level classification module.
(22) Setting a model hyper-parameter;
the main hyper-parameters in the model are: training conditions, batch size, learning rate, discarding rate, LSTM orthogonal initialization multiplication factor and maximum iteration number;
(23) starting training, namely taking the verification loss value of the model during training as a reference, and when the verification loss value of the model does not decrease and continues for 30 iterations, indicating that the network is already stored, and ending the training;
(24) adjusting the hyper-parameters for multiple times to obtain a model with the best generalization performance;
wherein, the operation flow of the view angle conversion module in the layer limb attention LSTM network in the step (21) is as follows:
in order to reduce the influence of shooting angle change on the classification performance of the model, the visual angle conversion module is used for carrying out self-adaptive two-dimensional translation and rotation operation on the input skeleton sequence, and coordinates of each frame of the input skeleton sequence are adjusted, wherein the specific calculation process is defined as:
S′t,j=[x′t,j,y′t,j]′=Rt(St,j-dt), (1)
wherein S ist,j=[xt,j,yt,j]' two-dimensional coordinates representing the jth key point of the tth frame of the input skeleton sequence;representing the translation vector corresponding to the t-th frame, RtThe two-dimensional rotation matrix corresponding to the t-th frame is represented as:
wherein,respectively represents the translation quantity alpha of all coordinates of the t-th frame of the skeleton sequence along the horizontal axis and the vertical axistRepresenting the arc of counterclockwise rotation of all coordinates for the t-th frame of the skeleton sequence.
In the process (21), the skeleton information passing through the view angle conversion module is divided into 8 limbs with certain overlap according to human body distribution, wherein the limbs are respectively a head, a left arm, a right arm, a body, a left leg, a right leg, a left foot and a right foot, and then the information of different limbs is calculated by an independent LSTM layer and a Dropout layer and then is cascaded into the integral skeleton information again.
In the process (21), the operation process of the limb attention module is as follows:
Ht=LSTM(concat(Ht,1,...,Ht,L)), (3)
at=W1tanh(W2Ht+b2)+b1
wherein Ht,iI-th limb information indicating the t-th frame skeleton sequence, i ≦ L (L ≦ 8), HtRepresenting the characteristic information obtained after 8 limb characteristic information passes through an LSTM layer and a Dropout layer after being cascaded, W1,W2Is a learnable parameter matrix, b1,b2Is a deviation vector, and then a weight vector alpha of each limb is calculated through soffmax activationt,lFinally, obtaining the weighted feature vector of each limb, and cascading all weighted limb feature vectors as the input of a subsequent module:
H′t,l=αt,l·Ht,l, (5)
H′t=concat(H′t,1,...,H′t,L), (6)
after the cascaded weighted limb feature vectors pass through two layers of LSTM and Dropout and a full connection layer, classification scores are obtained through softmax activation.
And (3) carrying out behavior classification on the classified videos by using the trained model, wherein the specific operation flow is as follows:
(31) obtaining and processing a skeleton sequence and classification from an input video signal source, obtaining skeleton sequence information in real time by adopting an OpenPose frame separation detection and sparse optical flow tracking mode, carrying out OpenPose posture estimation once every 5 frames to obtain 25 human body key point coordinates, wherein the 25 key points are respectively Nose, New, RShoulder, Relbow, RRISt, LShoulder, LElbow, LWrist, MidHip, Rlip, RKne, Rankle, LHip, Lknee, Lankle, REye, LEye, reader, LEAr, LBigToe, LSmalTolle, LHl, RBigToe, RSmalToe, Rheeel and Background according to the index sequence of 0-24. Tracking two-dimensional coordinate information of key points of 25 human bodies of each frame in subsequent 4 frames by adopting a Lucas-Kanade method, wherein the specific process is as follows:
St+1=calOpticalFlowPyrLK(It,It+1,St), (7)
wherein S ist,ItAnd respectively representing the gray image and the skeleton information of the t-th frame, wherein the calOpticalFlowPyrLK is one implementation of the OpenCV open source library to the Lucas-Kanade method.
(32) The classification of the skeleton sequence frame is slightly different from the classification in the reasoning process and the training process, in the reasoning process, skeleton information preprocessing and classification are performed once after 10 frames of skeleton information are obtained, the preprocessing process comprises translation, normalization and zero filling, wherein the translation process is not performed according to the coordinate of the hip midpoint (MidHip) of the first frame any more, but one hip midpoint coordinate is maintained as an origin in the system operation process, and the coordinate is updated once every 10 seconds; the normalization and zero-padding process is the same as step (14).
(33) In the inference process, since classification is performed every 10 frames instead of once for the whole video, in the process of system operation, all LSTM in the model need to keep their own parameter information after each classification, i.e. an LSTM steady full mode is started, and in this mode, the hierarchical limb attention LSTM network will keep the states of all LSTM layers after each inference and serve as an initial state at the next inference.
In the content (4), the skeleton sequence is segmented and corrected in real time based on the classification result, and the process comprises the following steps:
(41) and (3) skeleton sequence segmentation based on classification results:
in the system operation process, the work and practice performed by the patient generally includes a plurality of different actions, and in order to enable the subsequent scoring process to accurately compare the expert group skeleton sequence information of corresponding categories, real-time sequence segmentation needs to be performed after sequence frame classification, specifically: when the first frame of the sequence or the last frame of the last sequence section is determined, the classification of the current sequence section is judged according to the classification result of the current frame, and the basis for determining the end of the sequence section is that the frame classification and the current sequence section classification are continuously inconsistent and reach the maximum error tolerance length (100 frames) or the end of the input sequence.
(42) The real-time error correction based on the classification result comprises the following processes:
in the system operation process, different parameters are calculated according to the skeleton information and the classification result of each frame, and error correction text information is generated, wherein the relationship between the action type and the calculation parameters is shown in table 1.
TABLE 1 reference parameter names for error correction text generation for different action categories
Action categories | Parameter name |
0 | The relative lifting range of the hands,Angle between upper arm and lower arm |
1 | Relative range of left-right unfolding of both hands, and left-right rotation range of head |
2 | Relative longitudinal height of double elbows and relative transverse width of |
3 | Relative longitudinal height of double elbows and relative transverse width of double axes |
4 | Relative downward extension of both |
5 | Ratio of relative downward extension amplitude of hands to average length of shank |
6 | Longitudinal distance between buttocks and |
7 | Ratio of difference in longitudinal distance between feet and longitudinal distance between buttocks and neck |
Comparative scoring of segmented sequence segments in content (5)
And in the running process of the system, after a new sequence segment is determined, carrying out sequence comparison scoring according to the category of the sequence segment. First, the system maintains an expert skeleton sequence information Qn for each action analogy, for a sequence segment CmCalculating the similarity of the sequences at two ends by a dynamic time rule (DTW) algorithm, finding the optimal alignment path of the two sequences by the DTW algorithm through the idea of dynamic programming, and calculatingSum of euclidean distances of sequence frames on the path:
Cost(i,j)=D(i,j)+min[Cost(i-1,j),Cost(i,j-1),Cost(i-1,j-1)], (9)
wherein x, y represent two-dimensional coordinates in the sequence segment, and D (i, j) represents the sequence QnFrame i and sequence CmThe cumulative euclidean distance sum of Cost (i, j) at the (i, j) position of the best alignment path of the two sequences, and the similarity of the two sequences of Cost (n, m). According to the similarity of the sequences, calculating the alignment scores (percent system) of the current sequence segment and the expert group skeleton sequences, in order to make the score distribution more uniform, dividing the scores into 4 grades according to the similarity of the sequences, wherein the grades are respectively 90-100 (corresponding to the similarity 0-2), 75-90 (corresponding to the similarity 2-4), 60-75 (corresponding to the similarity 4-6) and 40-60 (corresponding to the similarity 6- ∞), and for the similarity of each grade, the score calculating process is as follows:
score=low+(len-10*2cost-highCost), (10)
wherein low and highpost represent the lowest score and highest similarity of the current gear, len represents the score range length of the current gear, and cost represents the similarity of the sequence.
In conclusion, the invention realizes the intelligent rehabilitation auxiliary training system for the patients with the spinal degenerative disease by innovatively combining the artificial intelligence technology with the traditional Chinese medicine rehabilitation medicine. The method comprises the following steps: (1) the Chinese medicine guidance video real-time classification model based on deep learning comprises the following steps: two-dimensional human skeleton data acquired by Openpos is used as training data of a deep learning model, supervised deep learning training is carried out to obtain a generalized deep learning model, then real-time preprocessing is carried out on a video by combining Openpos frame separation detection and sparse optical flow tracking, and a result is fed into the pre-trained deep learning model as input to obtain a real-time classification result; (2) Video sequence partitioning and evaluation based on human skeleton representation: and based on the real-time classification result, the framework sequences of the same category are segmented and corrected, and the segmented subsequence is scored. In the scoring process, the subsequence is compared with the expert group video sequence through a dynamic time warping algorithm, the video similarity is calculated, and the video similarity is converted into a percentile scoring; and in the correction process, reminding information is predefined according to the characteristics of the two-dimensional skeleton data in each action through the two-dimensional skeleton data and the classification result, and the patient is reminded in a voice broadcast mode when the predefined characteristic conditions are met. Experiments show that the real-time classification model provided by the invention realizes good balance on accuracy and speed performance, and the intelligent rehabilitation auxiliary training system for the spinal degenerative disease based on the classification model has high application value.
The invention applies the artificial intelligence technology to the field of traditional Chinese medicine guidance rehabilitation training, and promotes the recovery of limb movement function by automatically guiding the limb movement of a patient on the basis of mind and rest. According to the invention, through the technologies of computer vision, deep learning and the like, the accuracy of the action posture of the patient can be recognized and evaluated in real time in the traditional Chinese medicine guidance training process of the patient, correction is timely given in a voice reminding mode, and the training of the patient is scored according to the comparison result of the training process and the action of an expert group after the training of the patient is finished. The intelligent rehabilitation auxiliary training system designed for the patient in the remission stage of the spinal degenerative disease can lead the patient to carry out the traditional Chinese medicine guidance operation training at any time without the guidance and intervention of medical personnel, is suitable for families and basic medical health institutions, can greatly relieve the pressure of the medical personnel, and improves the flexibility and the accuracy of the rehabilitation training of the patient.
Drawings
Fig. 1 is a general flow diagram of the present invention.
Fig. 2 is a schematic diagram of joint points of the BODY _25 data set in openpos on the left, and a schematic diagram of dividing 25 key points into 8 parts of limbs in the classification method of the present invention on the right.
FIG. 3 is an architecture diagram of the LSTM network for the level of body attention according to the present invention, which is divided into four modules from left to right: the system comprises a visual angle conversion module, a body level module, a limb attention module and a limb level module.
Detailed Description
In the present invention, the structure of the TCM guide operation real-time classification model (hierarchical limb attention LSTM network) is shown in FIG. 3, wherein the input skeleton is preprocessed and only contains two-dimensional coordinate information of 25 key points, as shown in FIG. 2, wherein each skeleton sequence can be expressed asWherein T is 1000 and J is 25, which respectively represent the time length of the skeleton sequence and the key point data of the skeleton.
After the preprocessing process including translation, normalization and zero padding, the original skeleton sequence data is input into the hierarchical limb attention LSTM for classification, and finally the classification score of each frame of the skeleton sequence is output.
The implementation of the invention comprises two parts: the implementation of the training of a classification model (hierarchical limb attention LSTM network) and an intelligent rehabilitation auxiliary training system is as follows:
(1) preparation of data sets
Due to the unique applicability of the invention, the experiment is carried out by using the collected spinal rehabilitation data set (RDSD: rehabilitation action aiming at spinal degenerative disease). RDSD consists of 1012 video segments of 9 motion classes (8 power and stance motions), where the average length of each video segment is 18.4 seconds (30 frames per second). All videos are subjected to left and right mirror image turning to serve as data enhancement, and finally the video enhancement is carried out according to the following steps that: the scale of 25 is divided into a training set and a test set.
(2) Data pre-processing
For the RDSD data set provided by the invention, the OpenPose is adopted through an open-source multi-person posture framework code tool library[1]And performing two-dimensional attitude estimation to extract a skeleton sequence, wherein each frame of the skeleton sequence consists of two-dimensional space coordinates of 25 key points.
The invention takes each section of skeleton sequence as a benchmark to preprocess a data set, wherein the preprocessing comprises three steps of translation, normalization and zero filling: firstly, a translation operation is carried out, and all frames of the skeleton sequence are processedSubtracting the hip center coordinate of the first valid frame from all coordinates of (1); then, normalization operation is carried out, and the average double-shoulder distance d of all frames of the skeleton sequence is calculatedsThen, the coordinates of all frames of the skeleton sequence are scaled by the scaling factor ofAnd finally, performing zero filling operation, filling all zeros after the skeleton sequences with less than 1000 frames, and intercepting the skeleton sequences with more than 1000 frames to ensure that the time length of all the skeleton sequences is 1000 frames.
(3) Model training
The main hyper-parameters in the model are: training conditions, batch size, learning rate, discarding rate, LSTM orthogonal initialization multiplication factor and maximum iteration number;
in the invention, the hyper-parameters of the model are set as follows: training conditions are as follows: a single block GTX1070 GPU; batch size: set to 64; learning rate: setting the initial learning rate to be 0.005, and using the ReduceLROnPateau as a learning rate adjustment strategy, namely reducing the learning rate by 10 times if the verification loss value of the model is not reduced every 10 times; the discarding rate is as follows: set to 0.1; LSTM quadrature initialized multiplication factor: set to 0.001; maximum number of iterations: 300, stopping training in advance after the verification loss value of the model is not reduced and continues for 30 iterations, wherein the total training times are generally more than 100;
(4) results of the experiment
To study the effectiveness of the modules in the classification network (hierarchical limb attention LSTM network), the present invention contrasts the comparative experiments between the addition/non-addition of each model and the baseline network. Wherein the base line network is RNNs and consists of three layers of LSTM + Dropout; HRNNs means that the network consists of one layer of Limb level (Limb-level) LSTM + Dropout and two layers of Body level (Body-level) LSTM + Dropout; VT-RNNs vs RNNs are added with a View angle transformation Module (View-transformation Module) as shown in FIG. 3; VT-HRNNs in contrast to HRNNs, a View-transformation Module (View-transformation Module) as shown in FIG. 3 is added; VT-HRNNs-ATT compared with VT-HRNNs, a Limb Attention module (Limb-Attention) as shown in FIG. 3 is added, which is the LSTM network of the present invention. As can be seen from table 2, the test accuracy is improved by 1.22% by adopting the hierarchical structure (adding Limb-level) compared with the full Body-level (Body-level) LSTM, the test accuracy is improved by 1.04% by adding the view conversion module, and the test accuracy is improved by 2.41% by adding the Body attention module, which shows the effectiveness of each module in the hierarchical Body attention LSTM network of the present invention.
Table 2, the inventive hierarchical Limb attention LSTM network model ablation experiments on RDSD data set
Network architecture | Training accuracy | Accuracy of test |
RNNs(Baseline) | 95.30% | 92.01% |
HRNNs | 96.30% | 93.23% |
VT-RNNs | 95.43% | 93.05% |
VT-HRNNs | 96.06% | 92.93% |
VT-HRNNs-ATT | 96.64% | 95.34% |
(5) Realization of intelligent rehabilitation auxiliary training system
The intelligent rehabilitation auxiliary training system comprises the following four functions: the method comprises the steps of skeleton sequence real-time acquisition and classification based on 0penPose [1] frame-spaced detection combined with sparse optical flow tracking, skeleton sequence segmentation and real-time error correction, and skeleton sequence segment scoring, as shown in FIG. 1.
The system receives an image frame signal source from a common two-dimensional RGB camera and acquires a skeleton sequence in real time. Specifically, by running OpenPose once every 5 frames[1]And estimating a two-dimensional posture, and obtaining the two-dimensional coordinates of the key points of the human body of the next 4 frames by a Lucas-Kanade sparse optical flow tracking method according to the obtained two-dimensional coordinates of the key points of the human body of the 25 frames.
And (3) running the classification network (hierarchical limb attention LSTM network) in the invention once every time the latest 10 frames of human body skeleton sequences are obtained, and obtaining the action class labels of the 10 frames of sequences. Specifically, 10 frames of human skeleton sequences are fed into a pre-trained hierarchical limb attention LSTM network after being subjected to preprocessing including translation, normalization and zero padding, so that a real-time classification result is obtained. Note that the hierarchical limb attention LSTM network maintains the memory state of all LSTM units from the beginning of system operation and clears the memory state every 30 s.
And generating responsive text error correction information for the skeleton sequence of each frame according to the two-dimensional coordinate information of the key points of the skeleton sequence and the action category of the frame, and prompting the patient to perform the action key points of the work and practice currently. Specifically, different reference parameters are defined for 8 power-method actions respectively, and the method is used for dynamically calculating and generating text information and reminding a user in a voice broadcast mode as shown in a table 1.
The system dynamically manages all classified frames into which the action segment is not divided, namely, the action segment is divided. Specifically, the category of the action segment is determined according to the action category of the frame which is not classified by the current latest several frames, and the action category inconsistency of the longest 100 frames is tolerated.
And for the well-segmented action segments, the system selects corresponding expert group video skeleton sequence information loaded in advance according to the action categories, calculates the similarity distance between the current action segment and the expert group action segment through a dynamic time warping algorithm, and then calculates the percentile score of the current action segment according to a predefined distance-score conversion rule.
The system is realized based on Python3 language, wherein a Tensorflow, OpenCV, multi-process processing and other public code bases are mainly used, and the system runs on a portable min-PC in real time in a BS mode, and has high intelligent auxiliary application value for rehabilitation training of patients with spinal degenerative disease.
Reference to the literature
[1]Cao Z,Hidalgo G,Simon T,et al.OpenPose:Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2018.
[2]Donald J Berndt and James Clifford.1994.Using Dynamic Time Wraping to Find Patterns in Time Series.Proceedings of the AAAI Conference on Artificial Intelligence(AAAI).359- 370pages。
Claims (5)
1. A spine degenerative disease intelligent rehabilitation auxiliary training system based on deep learning is characterized by comprising a traditional Chinese medicine guidance video real-time classification module based on deep learning and a video sequence dividing and evaluating module based on human body skeleton representation;
the traditional Chinese medicine guidance video real-time classification module based on deep learning obtains two-dimensional human skeleton data through Openpos, the two-dimensional human skeleton data serve as training data of a deep learning model, supervised deep learning training is carried out, and a generalized deep learning model is obtained; then, performing real-time preprocessing on the input video by combining OpenPose frame separation detection with sparse optical flow tracking, and taking the result as an input feed-in pre-trained classification model to obtain a real-time frame classification result;
and the human body skeleton representation and video sequence division and evaluation module is used for carrying out segmentation and real-time error correction on skeleton sequences of the same category according to the frame classification result of the real-time classification module, and carrying out sequence comparison grading on the segmented sequence segments and the video skeleton sequence segments of the corresponding category expert group.
2. The intelligent rehabilitation auxiliary training system for the deep learning-based degenerative spine disease as claimed in claim 1, wherein:
the working content of the real-time classification module of the traditional Chinese medicine guidance video based on deep learning is as follows;
(1) acquiring training data of deep learning;
(2) training a deep learning model;
(3) the method comprises the steps of preprocessing videos in real time by combining OpenPose frame separation detection with sparse optical flow tracking and classifying the videos;
corresponding to the human body skeleton representation-based and video sequence partitioning and evaluating module, the working contents are as follows:
(4) carrying out segmentation and real-time error correction on the skeleton sequence based on the classification result;
(5) carrying out comparison scoring on the sequence segments after the segmentation is finished;
in the content (1), the specific operation flow of obtaining deep learning training data is as follows:
(11) processing video data, and clipping all the video data according to the design of the rehabilitation training skill and action to obtain short video data with the length not more than 1000 frames;
(12) performing left and right mirror image turnover on all short video data to serve as data enhancement;
(13) performing two-dimensional posture extraction on a video data sample processed in the process (11) by using an Openpos model pre-trained on a BODY _25 dataset to obtain a skeleton data sequence represented by two-dimensional space coordinates of 25 key points, wherein the 25 key points are respectively Nose, Neck, RShoulder, Relbow, RChrist, LShoulder, LElbow, LWrist, MidHip, Rip, RKne, Rankle, LHip, Lknee, Lankle, REye, LEye, reader, LEAr, LBigToe, LSmalToe, LHeel, RBigToe, RSmalToe, Rheel, Back;
(14) performing data preprocessing on all the framework data sequences, firstly performing translation operation, and subtracting the hip middle coordinate of the first effective frame from all the frame coordinates of the framework sequences; then, normalization operation is carried out, and the average distance d between shoulders of the skeleton sequence is calculatedsThen, all frame coordinates of the skeleton sequence are scaled by a scaling factor ofFinally, zero filling operation is carried out, all zeros are filled behind the framework sequence, and the length of the framework sequence is fixed to 1000 frames;
in the content (2), the training of the deep learning model specifically includes:
(21) the deep learning model adopts a classification network model, particularly adopts a hierarchical limb attention LSTM network, and the network comprises a visual angle conversion module, a limb attention module, a limb level classification module and a body level classification module; the hierarchical limb attention LSTM takes skeleton data as input, firstly, two-dimensional translation transformation is carried out on input sequence coordinates through a visual angle conversion module, then, each frame coordinate of a skeleton sequence is divided into 8 limb parts, and the limb parts pass through an LSTM layer and a Dropout layer of a limb level classification module respectively; then, the feature vector magnitude of 8 limb parts is connected and the space-time attention weight of each limb is calculated through a limb attention module; finally, according to the space-time attention weight, corresponding to the feature vector magnitude of 8 limbs and weighting, and finally obtaining the classification score of the skeleton sequence on each frame through an LSTM layer, a Dropout layer and a softmax layer of a body level classification module;
(22) setting a model hyper-parameter;
the main hyper-parameters in the model are: training conditions, batch size, learning rate, discarding rate, LSTM orthogonal initialization multiplication factor and maximum iteration number;
(23) starting training, namely taking the verification loss value of the model during training as a reference, and when the verification loss value of the model does not decrease and continues for 30 iterations, indicating that the network is already stored, and ending the training;
(24) adjusting the hyper-parameters for multiple times to obtain a model with the best generalization performance;
in the content (3), the behavior classification is carried out on the video with classification by using the trained model, and the specific operation flow is as follows:
(31) acquiring and processing a skeleton sequence and classification from an input video signal source, acquiring skeleton sequence information in real time by adopting an OpenPose frame separation detection and sparse optical flow tracking mode, and performing OpenPose attitude estimation once every 5 frames to obtain the coordinates of the key points of the 25 human bodies; tracking two-dimensional coordinate information of key points of 25 human bodies of each frame in subsequent 4 frames by adopting a Lucas-Kanade method, wherein the specific process comprises the following steps:
St+1=calOpticalFlowPyrLK(It,It+1,St),
wherein S ist,ItRespectively representing a gray image and skeleton information of a t-th frame, wherein the calOpticalFlowPyrLK is one implementation of an OpenCV open source library to a Lucas-Kanade method;
(32) the classification of the skeleton sequence frame is slightly different from the classification in the reasoning process and the training process, in the reasoning process, skeleton information preprocessing and classification are carried out once after 10 frames of skeleton information are obtained, and the preprocessing process comprises translation, normalization and zero filling;
(33) in the reasoning process, all LSTMs in the model keep the parameter information of the LSTM after each classification, namely, an LSTM stateful mode is started, and in the mode, the LSTM network with hierarchical limb attention keeps the states of all LSTM layers after reasoning each time and is used as the initial state in the next reasoning;
in the content (4), the skeleton sequence is segmented and corrected in real time based on the classification result, and the process comprises the following steps:
(41) and (3) skeleton sequence segmentation based on classification results:
after the sequence frames are classified, real-time sequence segmentation is performed, specifically: when the first frame of the sequence or the last frame of the last sequence section is determined, judging the type of the current sequence section according to the classification result of the current frame, and determining the basis for ending the sequence section, namely whether the frame type is continuously inconsistent with the type of the current sequence section and the maximum error tolerance length is reached or the input sequence is ended;
(42) the real-time error correction based on the classification result comprises the following processes:
calculating different parameters according to the skeleton information and the classification result of each frame, and generating error correction text information;
the comparative scoring of the segmentation-ended sequence segments in the content (5) comprises:
after a new sequence segment is determined, carrying out sequence comparison scoring according to the category of the sequence segment; first, the system maintains an expert group skeleton sequence information Q for each action analogynFor a sequence segment CmCalculating the similarity of the sequences at the two ends by a dynamic time rule (DTW) algorithm, finding the optimal alignment path of the two sequences by the DTW algorithm through the idea of dynamic programming, and calculating the Euclidean distance sum of sequence frames on the path:
Cost(i,j)=D(i,j)+min[Cost(i-1,j),Cost(i,j-1),Cost(i-1,j-1)],
wherein x, y represent two-dimensional coordinates in the sequence segment, and D (i, j) represents the sequence QnFrame i and sequence CmThe cumulative euclidean distance sum of Cost (i, j) at the (i, j) position of the best alignment path of the two sequences, and the similarity of the two sequences of Cost (n, m); according to the similarity of the sequences, calculating the comparison score of the current sequence segment and the expert group framework sequence, and adopting a percent system; in order to make the score distribution more uniform, the scores are divided into 4 grades according to the similarity of the sequences: respectively 90-100, corresponding to similarity 0-2; 75-90, corresponding to similarity 2-4; 60-75 parts of; the corresponding similarity is 4-6; 40-60, corresponding to the similarity of 6-infinity, and for the similarity of each grade, the score calculation process is as follows:
score=low+(len-10*2cost-highCost),
wherein low and highpost represent the lowest score and highest similarity of the current gear, len represents the length of the score range of the current gear, and cost represents the similarity of the sequence.
3. The intelligent rehabilitation training system for spinal degenerative disease based on deep learning of claim 2, wherein the operation process of the view transformation module in the LSTM network for level limb attention in the step (21) is as follows:
in order to reduce the influence of shooting angle change on the classification performance of the model, the visual angle conversion module is used for carrying out self-adaptive two-dimensional translation and rotation operation on the input skeleton sequence, and coordinates of each frame of the input skeleton sequence are adjusted, wherein the specific calculation process is defined as:
S′t,j=[x′t,j,y′t,j]′=Rt(St,j-dt),
wherein S ist,j=[xt,j,yt,j]' two-dimensional coordinates representing the jth key point of the tth frame of the input skeleton sequence;representing a translation vector corresponding to the t-th frame; rtThe two-dimensional rotation matrix corresponding to the t-th frame is represented as:
wherein,respectively represents the translation quantity alpha of all coordinates of the t-th frame of the skeleton sequence along the horizontal axis and the vertical axistRepresenting the radian of anticlockwise rotation of all coordinates of the t frame of the skeleton sequence;
the skeleton information passing through the visual angle conversion module is divided into 8 limbs with certain overlap according to human body distribution, namely a head, a left arm, a right arm, a trunk, a left leg, a right leg, a left foot and a right foot, and then the information of different limbs is calculated by an independent LSTM layer and a Dropout layer and then is cascaded into the integral skeleton information again.
4. The intelligent rehabilitation training system for spinal degenerative disease based on deep learning of claim 3, wherein the operation process of the limb attention module in the hierarchical limb attention LSTM network in the step (21) is as follows:
wherein Ht,iI-th limb information indicating the t-th frame skeleton sequence, i ≦ L (L ≦ 8), HtRepresenting the characteristic information obtained after 8 limb characteristic information passes through an LSTM layer and a Dropout layer after being cascaded, W1,W2Is a learnable parameter matrix, b1,b2Is a deviation vector, then a weight vector a of each limb is calculated by softmax activationt,lFinally, obtaining the weighted feature vector of each limb, and cascading all weighted limb feature vectors as the input of a subsequent module:
H′t,l=at,l·Ht,l,
H′t=concat(H′t,1,...,H′t,L),
after the cascaded weighted limb feature vectors pass through two layers of LSTM and Dropout and a full connection layer, classification scores are obtained through softmax activation.
5. The intelligent rehabilitation training system for spinal degenerative disease based on deep learning as claimed in claim 3, wherein the process (42) calculates different parameters according to the skeleton information and classification result of each frame, and the relationship between the motion category and the calculated parameters is as follows:
the action categories are: 0, 1, 2, 3, 4, 5, 6, 7; the corresponding parameter names are as follows in sequence: the two hands form an included angle with the lifting amplitude and the upper arm and the lower arm; the relative amplitude of left-right unfolding of the hands and the left-right rotation amplitude of the head; the relative longitudinal height of the double elbows and the relative transverse width of the double axes; the relative longitudinal height of the double elbows and the relative transverse width of the double axes; the relative downward extension range of the two hands; the ratio of the relative downward extension amplitude of the hands to the average length of the legs; longitudinal distance between the hip and the neck; the ratio of the difference in longitudinal distance between the feet to the longitudinal distance between the hip and the neck.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111295019.XA CN114092854A (en) | 2021-11-03 | 2021-11-03 | Intelligent rehabilitation auxiliary training system for spinal degenerative disease based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111295019.XA CN114092854A (en) | 2021-11-03 | 2021-11-03 | Intelligent rehabilitation auxiliary training system for spinal degenerative disease based on deep learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114092854A true CN114092854A (en) | 2022-02-25 |
Family
ID=80298777
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111295019.XA Pending CN114092854A (en) | 2021-11-03 | 2021-11-03 | Intelligent rehabilitation auxiliary training system for spinal degenerative disease based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114092854A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117958812A (en) * | 2024-03-28 | 2024-05-03 | 广州舒瑞医疗科技有限公司 | Human body posture feedback evaluation method for dynamic vestibular rehabilitation training |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102903122A (en) * | 2012-09-13 | 2013-01-30 | 西北工业大学 | Video object tracking method based on feature optical flow and online ensemble learning |
CN112560618A (en) * | 2020-12-06 | 2021-03-26 | 复旦大学 | Behavior classification method based on skeleton and video feature fusion |
-
2021
- 2021-11-03 CN CN202111295019.XA patent/CN114092854A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102903122A (en) * | 2012-09-13 | 2013-01-30 | 西北工业大学 | Video object tracking method based on feature optical flow and online ensemble learning |
CN112560618A (en) * | 2020-12-06 | 2021-03-26 | 复旦大学 | Behavior classification method based on skeleton and video feature fusion |
Non-Patent Citations (1)
Title |
---|
孔德壮;朱梦宇;于江坤;: "人脸表情识别在辅助医疗中的应用及方法研究", 生命科学仪器, no. 02, 25 April 2019 (2019-04-25) * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117958812A (en) * | 2024-03-28 | 2024-05-03 | 广州舒瑞医疗科技有限公司 | Human body posture feedback evaluation method for dynamic vestibular rehabilitation training |
CN117958812B (en) * | 2024-03-28 | 2024-06-14 | 广州舒瑞医疗科技有限公司 | Human body posture feedback evaluation method for dynamic vestibular rehabilitation training |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104463100B (en) | Intelligent wheel chair man-machine interactive system and method based on human facial expression recognition pattern | |
CN101777116B (en) | Method for analyzing facial expressions on basis of motion tracking | |
Wei et al. | Towards on-demand virtual physical therapist: Machine learning-based patient action understanding, assessment and task recommendation | |
CN112200074A (en) | Attitude comparison method and terminal | |
CN110490109A (en) | A kind of online human body recovery action identification method based on monocular vision | |
CN108875586A (en) | A kind of functional limb rehabilitation training detection method based on depth image Yu skeleton data multiple features fusion | |
Tsai et al. | Enhancing accuracy of human action Recognition System using Skeleton Point correction method | |
Li et al. | Generative adversarial networks for generation and classification of physical rehabilitation movement episodes | |
CN112990089A (en) | Method for judging human motion posture | |
CN110956141A (en) | Human body continuous action rapid analysis method based on local recognition | |
Zhang | Analyzing body changes of high-level dance movements through biological image visualization technology by convolutional neural network | |
Cai et al. | Gaze estimation with an ensemble of four architectures | |
CN114092854A (en) | Intelligent rehabilitation auxiliary training system for spinal degenerative disease based on deep learning | |
Yan et al. | A review of basketball shooting analysis based on artificial intelligence | |
Dey et al. | Workoutnet: A deep learning model for the recognition of workout actions from still images | |
Wang et al. | Human posture recognition method based on skeleton vector with depth sensor | |
Zhigang et al. | Human behavior recognition method based on double-branch deep convolution neural network | |
Chen et al. | An Approach to Real-Time Fall Detection based on OpenPose and LSTM | |
CN116012942A (en) | Sign language teaching method, device, equipment and storage medium | |
Wang et al. | Emotion Recognition From Full-Body Motion Using Multiscale Spatio-Temporal Network | |
Biró et al. | AI-controlled training method for performance hardening or injury recovery in sports | |
Gamra et al. | Yopose: Yoga posture recognition using deep pose estimation | |
Fu | Research on intelligent recognition technology of gymnastics posture based on KNN fusion DTW algorithm based on sensor technology | |
Gedaragoda et al. | “Hand Model”–A Static Sinhala Sign Language Translation Using Media-Pipe and SVM Compared with Hybrid Model of KNN, SVM and Random Forest Algorithms | |
Qiu et al. | Machine Learning based Movement Analysis and Correction for Table Tennis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |