CN114092854A

CN114092854A - Intelligent rehabilitation auxiliary training system for spinal degenerative disease based on deep learning

Info

Publication number: CN114092854A
Application number: CN202111295019.XA
Authority: CN
Inventors: 路红; 沈梦琦; 杨博弘; 任浩然; 张文强; 李伟
Original assignee: Fudan University
Current assignee: Fudan University
Priority date: 2021-11-03
Filing date: 2021-11-03
Publication date: 2022-02-25

Abstract

The invention discloses a deep learning-based intelligent rehabilitation auxiliary training system for spinal degeneration. The system of the present invention includes a real-time classification module of traditional Chinese medicine guidance video based on deep learning and a video sequence division and evaluation module based on human skeleton representation; the former performs deep learning training by acquiring two-dimensional human skeleton data as the training data of the learning model, The generalized deep learning model is obtained; finally, the real-time frame classification result is obtained; the latter performs segmentation and real-time error correction on the skeleton sequence of the same category according to the frame classification result, and divides the segmented sequence segment with the corresponding category expert group video skeleton sequence segment Perform sequence alignment scoring. The system of the present invention does not require the guidance and intervention of medical staff, and can enable patients to carry out TCM guidance training at any time by themselves, is suitable for families and primary medical and health institutions, can reduce the pressure of medical staff, and improve the flexibility and accuracy of patient rehabilitation training.

Description

Intelligent rehabilitation auxiliary training system for spinal degenerative diseases based on deep learning

技术领域technical field

本发明属于计算机视觉视频理解技术领域，具体为基于深度学习的视频序列识别与评估系统。The invention belongs to the technical field of computer vision video understanding, in particular to a video sequence recognition and evaluation system based on deep learning.

背景技术Background technique

中医导引术是一种颇具中国特色的养身与治疗方法。它通过引导患者进行在调神和调息基础上的调形，即肢体的主要活动，促进肢体运动恢复。随着计算机硬件和人工智能技术的不断发展，研发智能辅助康复训练系统已成为国内外相关领域的热点。在计算机视觉任务中，视频行为识别是一个具有很大挑战性的领域，基于RGB帧输入方法往往无法达到实时处理的性能要求，基于骨架序列的方法具有更低的时间复杂度，但在推理过程会依赖骨架信息的提取。本发明采用基于骨架序列输入的循环神经网络作为视频行为识别的基本网络架构，并通过OpenPose^[1]二维姿态估计结合稀疏光流追踪实时获取骨架序列，最后根据分类结果进行骨架序列分割、患者功法评分以及动作纠正提醒。本发明将计算机视觉领域和中医导引术康复训练相结合，研发了一种针对脊柱退行性病变的康复训练动作自动评估与辅助训练系统，实现自动化、精确化、智能化的康复训练。TCM Daoyin is a method of self-cultivation and treatment with Chinese characteristics. It promotes the recovery of limb movement by guiding the patient to adjust the shape based on regulating the mind and breath, that is, the main activities of the limbs. With the continuous development of computer hardware and artificial intelligence technology, the research and development of intelligent assisted rehabilitation training system has become a hot spot in related fields at home and abroad. In computer vision tasks, video action recognition is a very challenging field. Methods based on RGB frame input often cannot meet the performance requirements of real-time processing. Methods based on skeleton sequences have lower time complexity, but in the inference process It will rely on the extraction of skeleton information. The invention adopts the cyclic neural network based on skeleton sequence input as the basic network architecture of video behavior recognition, and obtains the skeleton sequence in real time through OpenPose ^[1] two-dimensional pose estimation combined with sparse optical flow tracking, and finally performs skeleton sequence segmentation according to the classification results, patient Exercise score and action correction reminder. The invention combines the field of computer vision and the rehabilitation training of traditional Chinese medicine Daoyin to develop an automatic evaluation and auxiliary training system for rehabilitation training actions for spinal degenerative diseases, so as to realize automatic, precise and intelligent rehabilitation training.

发明内容SUMMARY OF THE INVENTION

为了提高脊柱退行性病变患者在通过中医导引术进行康复训练时中医康复医护人员的负担并提高患者练习效果，本发明提供了一种基于深度学习的针对脊柱退行性病变的智能康复辅助训练系统，结合人工智能技术，自动对患者的中医导引术练习进行评估和纠正。In order to increase the burden of TCM rehabilitation medical staff when patients with spinal degenerative diseases undergo rehabilitation training through TCM Daoyin surgery and improve the patient's practice effect, the present invention provides an intelligent rehabilitation auxiliary training system for spinal degenerative diseases based on deep learning , combined with artificial intelligence technology, to automatically evaluate and correct the patient's TCM Daoyin practice.

本发明提供的基于深度学习的针对脊柱退行性病变的智能康复辅助训练系统，由基于深度学习的中医导引术视频实时分类模块和基于人体骨架表示的视频序列划分与评估模块组成。The intelligent rehabilitation auxiliary training system for spinal degenerative diseases based on deep learning provided by the present invention is composed of a deep learning-based TCM guidance video real-time classification module and a human skeleton representation-based video sequence division and evaluation module.

所述基于深度学习的中医导引术视频实时分类模块，通过Openpose^[1]获取二维人体骨架数据，作为深度学习模型的训练数据，进行有监督深度学习训练，得到泛化的深度学习模型；然后通过OpenPose隔帧检测结合稀疏光流追踪，对输入视频进行实时预处理，结果作为输入馈入预训练的分类模型，得到实时帧分类结果。The deep learning-based TCM guidance video real-time classification module obtains two-dimensional human skeleton data through Openpose ^[1] , which is used as the training data of the deep learning model to perform supervised deep learning training to obtain a generalized deep learning model; Then, through OpenPose frame detection combined with sparse optical flow tracking, real-time preprocessing is performed on the input video, and the result is fed into the pre-trained classification model as input to obtain real-time frame classification results.

所述基于人体骨架表示和视频序列划分与评估模块，根据所述实时分类模块的帧分类结果，对相同类别的骨架序列进行分割与实时纠错，并将分割好的序列段与相应类别专家组视频骨架序列段进行序列对比评分。The segmentation and evaluation module based on human skeleton representation and video sequence, according to the frame classification result of the real-time classification module, performs segmentation and real-time error correction on the skeleton sequence of the same category, and separates the segmented sequence segment with the corresponding category expert group. Video skeleton sequence segments for sequence alignment scoring.

本发明提出的基于深度学习的针对脊柱退行性病变的智能康复辅助训练系统中：In the intelligent rehabilitation auxiliary training system for spinal degenerative diseases based on deep learning proposed by the present invention:

对应于所述基于深度学习的中医导引术视频实时分类模块，其工作内容如下；Corresponding to the described deep learning-based TCM guidance video real-time classification module, its work content is as follows;

(1)获取深度学习的训练数据；(1) Obtain training data for deep learning;

(2)训练深度学习模型；(2) Training a deep learning model;

(3)OpenPose隔帧检测结合稀疏光流追踪实时预处理视频并进行分类；(3) OpenPose frame detection combined with sparse optical flow tracking to preprocess and classify video in real time;

对应于所述基于人体骨架表示和视频序列划分与评估模块，其工作内容如下：Corresponding to the described segmentation and evaluation module based on human skeleton representation and video sequence, its work content is as follows:

(4)基于分类结果对骨架序列进行分割与实时纠错；(4) Segmentation and real-time error correction of the skeleton sequence based on the classification result;

(5)对分割结束的序列段进行对比评分；(5) compare and score the segmented sequence segments;

内容(1)中所述获取深度学习训练数据，具体操作流程为：To obtain deep learning training data as described in content (1), the specific operation process is as follows:

(11)处理视频数据，根据康复训练功法动作的设计，将所有视频数据进行剪辑，得到长度不超过1000帧的短视频数据；(11) Process video data, edit all video data according to the design of rehabilitation training exercises, and obtain short video data with a length of no more than 1000 frames;

(12)将所有短视频数据进行左右镜像翻转，作为数据增强；(12) all short video data are mirrored and flipped as data enhancement;

(13)以在BODY_25数据集上预训练的Openpose模型对流程(11)中处理好的视频数据样本进行二维姿态提取，获取由25个关键点的二维空间坐标表示的骨架数据序列，其中25个关键点按照0-24索引顺序分别为Nose，Neck，RShoulder，RElbow，RWrist， LShoulder，LElbow，LWrist，MidHip，RHip，RKnee，Rankle，LHip，LKnee，LAnkle，REye， LEye，Rear，LEar，LBigToe，LSmallToe，LHeel，RBigToe，RSmallToe，Rheel，Background ；(13) Using the Openpose model pre-trained on the BODY_25 data set, extract the two-dimensional pose of the video data samples processed in the process (11), and obtain the skeleton data sequence represented by the two-dimensional spatial coordinates of 25 key points, wherein The 25 key points are Nose, Neck, RShoulder, RElbow, RWrist, LShoulder, LElbow, LWrist, MidHip, RHip, RKnee, Rankle, LHip, LKnee, LAnkle, REye, LEye, Rear, LEar, LBigToe, LSmallToe, LHeel, RBigToe, RSmallToe, Rheel, Background;

(14)对所有骨架数据序列进行数据预处理，首先进行平移操作，将骨架序列的所有帧坐标减去第一个有效帧的臀中坐标；然后进行归一化操作，计算骨架序列的平均双肩距离d_s，然后将骨架序列所有帧坐标进行缩放，缩放系数为

最后进行零填充操作，在骨架序列后方填充全零，将骨架序列长度固定为1000帧。(14) Perform data preprocessing on all skeleton data sequences, first perform a translation operation, subtract the mid-hip coordinates of the first valid frame from all frame coordinates of the skeleton sequence; then perform a normalization operation to calculate the average shoulder of the skeleton sequence distance d _s , and then scale all frame coordinates of the skeleton sequence with a scaling factor of

Finally, a zero-padding operation is performed, and all zeros are filled at the back of the skeleton sequence, and the length of the skeleton sequence is fixed to 1000 frames.

内容(2)中所述训练深度学习模型，具体流程为：The specific process of training the deep learning model in content (2) is as follows:

(21)所述深度学习模型采用分类网络模型，具体采用层次肢体注意力LSTM网络，该网络包括视角转换模块、肢体注意力模块、肢体级别分类模块和身体级别分类模块；层次肢体注意力LSTM以骨架数据为输入，首先经过视角转换模块对输入序列坐标进行二维平移变换，然后骨架序列的每一帧坐标被划分为8个肢体部分，并分别经过肢体级别分类模块的LSTM层和Dropout层；然后将8个肢体部分的特征向量级联并通过肢体注意力模块，计算每一个肢体的时空注意力权重；最后按照时空注意力权重对应8个肢体的特征向量级联并加权，经过身体级别分类模块的LSTM层、Dropout层和softmax层，最终得到骨架序列在每一帧上的分类分数。(21) The deep learning model adopts a classification network model, and specifically adopts a hierarchical limb attention LSTM network, which includes a perspective conversion module, a limb attention module, a limb level classification module and a body level classification module; The skeleton data is the input, first, the input sequence coordinates are subjected to two-dimensional translation transformation through the perspective conversion module, and then the coordinates of each frame of the skeleton sequence are divided into 8 limb parts, which are respectively passed through the LSTM layer and the Dropout layer of the limb level classification module; Then, the eigenvectors of the 8 limbs are cascaded and passed through the limb attention module to calculate the spatiotemporal attention weight of each limb; finally, the eigenvectors corresponding to the 8 limbs are cascaded and weighted according to the spatiotemporal attention weight, and are classified by the body level. The LSTM layer, Dropout layer and softmax layer of the module finally get the classification score of the skeleton sequence on each frame.

(22)设置模型超参数；(22) Setting model hyperparameters;

模型中主要超参数有：训练条件、批次大小、学习率、丢弃率、LSTM正交初始化乘法因子、最高迭代次数；The main hyperparameters in the model are: training conditions, batch size, learning rate, dropout rate, LSTM orthogonal initialization multiplication factor, and the maximum number of iterations;

(23)开始训练，以训练时模型的验证损失值为基准，当模型的验证损失值不再下降并持续30次迭代后，说明网络已经收纳，结束训练；(23) Start training, take the validation loss value of the model during training as the benchmark, when the validation loss value of the model no longer decreases and continues for 30 iterations, it means that the network has been accepted, and the training ends;

(24)多次调整超参数，得到泛化性能最好的模型；(24) Adjust the hyperparameters multiple times to obtain the model with the best generalization performance;

其中，步骤(21)中层次肢体注意力LSTM网络中的视角转换模块的操作流程为：Among them, the operation process of the perspective conversion module in the hierarchical limb attention LSTM network in step (21) is:

为了减少由于拍摄角度变化对模型分类性能的影响，通过视角转换模块对输入骨架序列进行自适应的二维平移和旋转操作，将输入骨架序列每一帧的坐标进行调整，其中，具体的计算过程定义为：In order to reduce the impact of the change of shooting angle on the model classification performance, the input skeleton sequence is subjected to adaptive two-dimensional translation and rotation operations through the perspective conversion module, and the coordinates of each frame of the input skeleton sequence are adjusted. Among them, the specific calculation process defined as:

S′_t，j＝[x′_t，j，y′_t，j]′＝R_t(S_t，j-d_t)， (1)S' _{t, j} = [x' _{t, j} , y' _{t, j} ]' = R _t (S _{t, j -} d _t ), (1)

其中，S_t，j＝[x_t，j，y_t，j]′代表输入骨架序列第t帧第j个关键点的二维坐标；

代表第t帧对应的平移向量，R_t代表第t帧对应的二维旋转矩阵，具体表示为：Wherein, S _{t, j} =[x _{t, j} , y _{t, j} ]′ represents the two-dimensional coordinates of the j-th key point in the t-th frame of the input skeleton sequence;

Represents the translation vector corresponding to the t-th frame, and R _t represents the two-dimensional rotation matrix corresponding to the t-th frame, specifically expressed as:

其中，

分别表示对骨架序列第t帧所有坐标沿着横轴和纵轴的平移量，α_t表示对骨架序列第t帧所有坐标逆时针旋转的弧度。in,

Represents the amount of translation along the horizontal axis and the vertical axis for all coordinates of the t-th frame of the skeleton sequence, respectively, and α _t represents the radian of counterclockwise rotation of all the coordinates of the t-th frame of the skeleton sequence.

流程(21)中，经过视角转换模块的骨架信息，被按照人体分布被划分为8个具有一定重叠的肢体，分别是头部、左臂、右臂、躯干、左腿、右腿、左脚、右脚，随后不同肢体的信息经过单独的LSTM层和Dropout层计算后再次级联为整体的骨架信息。In the process (21), the skeleton information of the perspective conversion module is divided into 8 limbs with a certain overlap according to the distribution of the human body, namely the head, the left arm, the right arm, the torso, the left leg, the right leg, and the left foot. , right foot, and then the information of different limbs is calculated by separate LSTM layer and Dropout layer and then concatenated into the overall skeleton information.

流程(21)中，肢体注意力模块的操作流程为：In the process (21), the operation process of the limb attention module is:

H_t＝LSTM(concat(H_t，1，...，H_t，L))， (3) _Ht =LSTM(concat(Ht _,1 ,..., _Ht,L )), (3)

a_t＝W₁tanh(W₂H_t+b₂)+b₁ a _t =W ₁ tanh(W ₂ H _t +b ₂ )+b ₁

其中，H_t，i表示第t帧骨架序列的第i个肢体信息，1≤i≤L，(L＝8)，H_t表示8个肢体特征信息经过级联后经过LSTM层和Dropout层后得到的特征信息，W₁，W₂是可学习的参数矩阵，b₁，b₂是偏差向量，然后通过soffmax激活计算每一个肢体的权重向量α_t，l，最终得到加权的每一个肢体的特征向量，并将所有加权的肢体特征向量级联作为后续模块的输入：Among them, H _{t, i} represents the i-th limb information of the t-th frame skeleton sequence, 1≤i≤L, (L=8), H _t represents the 8 limb feature information after being cascaded and passed through the LSTM layer and the Dropout layer. The obtained feature information, W ₁ , W ₂ are learnable parameter matrices, b ₁ , b ₂ are bias vectors, and then calculate the weight vector α _{t, l} of each limb through soffmax activation, and finally get the weighted weight of each limb. feature vectors, and concatenate all weighted limb feature vectors as input to subsequent modules:

H′_t，l＝α_t，l·H_t，l， (5)H' _{t, l} = α _{t, l} ·H _{t, l} , (5)

H′_t＝concat(H′_t，1，...，H′_t，L)， (6)H' _t = concat(H' _{t, 1} , ..., H' _{t, L} ), (6)

级联的加权肢体特征向量经过两层LSTM和Dropout和一层全连接层后，通过softmax激活得到分类分数。The cascaded weighted limb feature vectors are passed through two layers of LSTM and Dropout and one fully connected layer, and then the classification scores are obtained through softmax activation.

内容(3)用训练好的模型对带分类视频进行行为分类，具体操作流程为：Content (3) Use the trained model to classify the video with classification. The specific operation process is as follows:

(31)从输入视频信号源中获取并处理骨架序列、分类，采用Openpose隔帧检测结合稀疏光流追踪的方式实时获取骨架序列信息，每5帧进行一次OpenPose姿态估计，得到25个人体关键点坐标，其中25个关键点按照0-24索引顺序分别为Nose，Neck，RShoulder，RElbow，RWrist，LShoulder，LElbow，LWrist，MidHip，RHip，RKnee，Rankle，LHip，LKnee，LAnkle，REye，LEye，Rear，LEar，LBigToe，LSmallToe，LHeel，RBigToe，RSmallToe，Rheel，Background。采用Lucas-Kanade方法追踪后续4帧中每帧的25个人体关键点二维坐标信息，其中具体过程为：(31) Obtain and process the skeleton sequence and classification from the input video signal source, use Openpose frame detection combined with sparse optical flow tracking to obtain skeleton sequence information in real time, perform OpenPose pose estimation every 5 frames, and obtain 25 human body key points Coordinates, where the 25 key points are Nose, Neck, RShoulder, RElbow, RWrist, LShoulder, LElbow, LWrist, MidHip, RHip, RKnee, Rankle, LHip, LKnee, LAnkle, REye, LEye, Rear in the index order of 0-24 , LEar, LBigToe, LSmallToe, LHeel, RBigToe, RSmallToe, Rheel, Background. The Lucas-Kanade method is used to track the two-dimensional coordinate information of 25 human body key points in each of the following 4 frames. The specific process is as follows:

S_t+1＝calOpticalFlowPyrLK(I_t，I_t+1，S_t)， (7)S _t+1 =calOpticalFlowPyrLK(I _t , It ₊₁ , S _t ), (7)

其中，S_t，I_t分别表示第t帧的灰度图像和骨架信息，calOpticalFlowPyrLK为OpenCV开源库对Lucas-Kanade方法的一种实现。Among them, S _t and It respectively represent the grayscale image and skeleton information of the _t -th frame, and calOpticalFlowPyrLK is an implementation of the Lucas-Kanade method by the OpenCV open source library.

(32)骨架序列帧的分类在推理过程中和训练过程中略有不同，在推理过程中，每获取到10帧骨架信息后进行一次骨架信息预处理和分类，预处理的过程包括平移、归一化、零填充，其中平移过程不再依据首帧的臀中点(MidHip)的坐标，而是在系统运行过程中维护一个臀中点坐标作为原点，该坐标每10秒进行一次更新；归一化和零填充过程与步骤(14)相同。(32) The classification of skeleton sequence frames is slightly different in the inference process and the training process. In the inference process, skeleton information preprocessing and classification are performed after every 10 frames of skeleton information are obtained. The preprocessing process includes translation, normalization The translation process is no longer based on the coordinates of the mid-hip point (MidHip) of the first frame, but maintains a mid-hip coordinate as the origin during system operation, and the coordinates are updated every 10 seconds; normalization The process of ization and zero padding is the same as step (14).

(33)在推理过程中，由于每10帧进行一次分类而不是整段视频进行一次分类，所以在系统运行的过程中，需要使模型中所有LSTM在每次分类之后保持自己的参数信息，即开启LSTM stateful模式，在该模式下，层次肢体注意力LSTM网络会在每次进行推理后保留所有LSTM层的状态，并作为下一次推理时的初始状态。(33) In the inference process, since the classification is performed every 10 frames instead of the entire video, it is necessary to make all LSTMs in the model maintain their own parameter information after each classification during the operation of the system, i.e. Turn on the LSTM stateful mode, in which the hierarchical limb attention LSTM network retains the state of all LSTM layers after each inference and serves as the initial state for the next inference.

内容(4)中基于分类结果对骨架序列进行分割与实时纠错，其流程为：In content (4), the skeleton sequence is segmented and real-time error correction is performed based on the classification result. The process is as follows:

(41)基于分类结果的骨架序列分割：(41) Skeleton sequence segmentation based on classification results:

系统运行过程中，患者进行的功法一般包含多个不同的动作，为了使后续的评分过程能够准确地比对相应类别的专家组骨架序列信息，需要在序列帧分类后进行实时序列分割，具体为：在序列的第一帧或者确定了上一个序列段的末尾帧时，根据当前帧分类结果判断当前序列段的类别，而确定一个序列段结束的依据要么是帧类别和当前序列段类别连续不一致且达到最大错误容忍长度(100帧)或输入序列结束。During the operation of the system, the exercises performed by patients generally include multiple different actions. In order to enable the subsequent scoring process to accurately compare the skeleton sequence information of the corresponding category of expert groups, it is necessary to perform real-time sequence segmentation after the sequence frame classification. It is: when the first frame of the sequence or the end frame of the previous sequence segment is determined, the category of the current sequence segment is determined according to the classification result of the current frame, and the basis for determining the end of a sequence segment is either the frame category and the current sequence segment category are continuous Inconsistent and either the maximum error tolerance length (100 frames) is reached or the input sequence ends.

(42)基于分类结果的实时纠错，其流程为：(42) Real-time error correction based on classification results, the process is as follows:

系统运行过程中，根据每一个帧的骨架信息和分类结果，计算不同的参数，并生成纠错文本信息，其中动作类别与计算参数的关系如表1所述。During the operation of the system, different parameters are calculated according to the skeleton information and classification results of each frame, and error correction text information is generated. The relationship between action categories and calculation parameters is described in Table 1.

表格1，不同动作类别生成纠错文本的参考参数名称Table 1. Reference parameter names for generating error correction texts for different action categories

动作类别Action category 参数名称parameter name 00 双手相对上举幅度、上臂与下臂夹角The relative lift of the hands, the angle between the upper arm and the lower arm 11 双手左右展开的相对幅度、头部左右旋转幅度The relative extent of the left and right spread of the hands and the left and right rotation of the head 22 双肘相对纵向高度、双轴相对横向宽度The relative longitudinal height of the elbows, the relative lateral width of the two axes 33 双肘相对纵向高度、双轴相对横向宽度The relative longitudinal height of the elbows, the relative lateral width of the two axes 44 双手相对下伸幅度Relative reach of hands 55 双手相对下伸幅度与小腿平均长度之比The ratio of the relative extension of the hands to the average length of the calf 66 臀部与脖颈纵向距离Longitudinal distance between hip and neck 77 双脚纵向距离差与臀部到脖颈纵向距离之比 The ratio of the difference in the longitudinal distance of the feet to the longitudinal distance from the hip to the neck

内容(5)中对分割结束的序列段进行对比评分In content (5), compare and score the sequence segments at the end of the segmentation

系统运行过程中，每当确定一段新序列段后，根据序列段的类别进行序列比对评分。首先，系统为每个动作类比维护一段专家组骨架序列信息Qn，对于一个序列段C_m，通过动态时间规则(DTW)算法计算两端序列的相似度，DTW算法通过动态规划的思想，找到两段序列的最佳对齐路径，并计算路径上的序列帧欧氏距离总和：During the operation of the system, whenever a new sequence segment is determined, the sequence alignment score is performed according to the category of the sequence segment. First, the system maintains a segment of expert group skeleton sequence information Qn for each action analogy. For a sequence segment C _m , the similarity between the sequences at both ends is calculated by the dynamic time rule (DTW) algorithm. The DTW algorithm uses the idea of dynamic programming to find two The optimally aligned path for a sequence of segments, and compute the sum of Euclidean distances for the sequence frames along the path:

Cost(i，j)＝D(i，j)+min[Cost(i-1，j)，Cost(i，j-1)，Cost(i-1，j-1)]， (9)Cost(i,j)=D(i,j)+min[Cost(i-1,j),Cost(i,j-1),Cost(i-1,j-1)], (9)

其中，x，y表示序列段中的二维坐标，D(i，j)表示序列Q_n的第i帧和序列C_m的第j帧的欧式距离，Cost(i，j)在两个序列的最佳对齐路径的(i，j)位置的累计欧式距离和，Cost(n，m)两段序列的相似度。根据序列的相似度，计算当前序列段和专家组骨架序列的比对分数( 百分制)，为了使得分数分布更加均匀，按照序列的相似度大小将分数划分为4档，分别是90-100(对应相似度0-2)、75-90(对应相似度2-4)、60-75(对应相似度4-6 )、40-60(对应相似度6-∞)，对于每档的相似度，分数计算过程为：Among them, x, y represent the two-dimensional coordinates in the sequence segment, D(i, j) represents the Euclidean distance between the i-th frame of the sequence Q _n and the j-th frame of the sequence C _m , Cost(i, j) in the two sequences The cumulative Euclidean distance sum of the (i, j) positions of the best alignment path, Cost (n, m) the similarity of the two sequences. According to the similarity of the sequence, calculate the alignment score of the current sequence segment and the expert group skeleton sequence (percentage system). Similarity 0-2), 75-90 (corresponding to similarity 2-4), 60-75 (corresponding to similarity 4-6), 40-60 (corresponding to similarity 6-∞), for the similarity of each file, The score calculation process is:

score＝low+(len-10*2^{cost-highCost})， (10)score=low+(len-10*2 ^{cost-highCost} ), (10)

其中，low和highCost表示当前档的最低分数和最高相似度，len表示当前档的分数范围长度，cost表示序列的相似度。Among them, low and highCost represent the lowest score and highest similarity of the current file, len represents the length of the score range of the current file, and cost represents the similarity of the sequence.

综上，本发明通过创新性地将人工智能技术与中医康复医学相结合，实现针对脊柱退行性病变患者的智能康复辅助训练系统。包括：(1)基于深度学习的中医导引术视频实时分类模型：通过Openpose获取的二维人体骨架数据作为深度学习模型的训练数据，进行有监督深度学习训练，得到泛化的深度学习模型，然后通过Openpose隔帧检测结合稀疏光流追踪对视频进行实时预处理，结果作为输入馈入预训练的深度学习模型中，得到实时分类结果；( 2)基于人体骨架表示的视频序列划分与评估：基于上述的实时分类结果，对相同类别的骨架序列进行分割与纠错，并对分割好的子序列进行评分。其中，评分过程通过动态时间规整算法，将子序列与专家组视频序列进行比对，计算视频相似度并转换为百分制评分；纠正过程通过二维骨架数据和分类结果，在每一个动作根据二维骨架数据的特征预先定义提醒信息，并在符合预定义特征条件时通过语音播报的形式对患者进行提醒。实验表明，本发明中提出的实时分类模型在准确率和速度性能上实现了很好的平衡，并且基于此分类模型的针对脊柱退行性病变的智能康复辅助训练系统具有很高的应用价值。To sum up, the present invention realizes an intelligent rehabilitation auxiliary training system for patients with spinal degenerative diseases by innovatively combining artificial intelligence technology with traditional Chinese medicine rehabilitation medicine. Including: (1) Deep learning-based real-time classification model of traditional Chinese medicine Daoyin surgery video: The two-dimensional human skeleton data obtained through Openpose is used as the training data of the deep learning model, and supervised deep learning training is carried out to obtain a generalized deep learning model. Then, the video is preprocessed in real time through Openpose frame detection combined with sparse optical flow tracking, and the result is fed into the pre-trained deep learning model as input to obtain real-time classification results; (2) Video sequence division and evaluation based on human skeleton representation: Based on the above real-time classification results, the skeleton sequences of the same category are segmented and error-corrected, and the segmented subsequences are scored. Among them, the scoring process uses the dynamic time warping algorithm to compare the sub-sequences with the expert group video sequences, calculates the video similarity and converts it into a percentile score; The characteristics of the skeleton data pre-define reminder information, and when the pre-defined characteristic conditions are met, the patient is reminded in the form of voice broadcast. Experiments show that the real-time classification model proposed in the present invention achieves a good balance in accuracy and speed performance, and the intelligent rehabilitation auxiliary training system for spinal degenerative diseases based on this classification model has high application value.

本发明将人工智能技术应用到中医导引术康复训练领域，通过自动地引导患者在调神和调息基础上的肢体活动，促进肢体运动功能恢复。本发明通过计算机视觉、深度学习等技术，能够在患者进行中医导引术训练的过程中，实时地识别、评估患者动作姿势的准确性，通过语音提醒的形式及时给予纠正，并且在患者训练结束后根据训练过程和专家组动作对比的结果对患者的训练进行评分。本发明针对脊柱退行性疾病缓解期的患者设计的智能康复辅助训练系统，无需医护人员的指导和干预，能够使患者随时自行进行中医导引术训练，适用于家庭及基层医疗卫生机构，能够大大减轻医护人员的压力，提高患者康复训练的灵活性和准确性。The invention applies artificial intelligence technology to the field of rehabilitation training of traditional Chinese medicine daoyin, and promotes the recovery of limb motor function by automatically guiding patients' limb movements on the basis of regulating mind and breath. The present invention, through technologies such as computer vision and deep learning, can identify and evaluate the accuracy of the patient's action posture in real time during the patient's training of traditional Chinese medicine guidance, and timely correct it in the form of a voice reminder, and at the end of the patient's training Afterwards, the patient's training was scored according to the training process and the results of the comparison of the movements of the expert group. The intelligent rehabilitation auxiliary training system designed by the present invention for patients in the remission period of spinal degenerative diseases does not require the guidance and intervention of medical staff, and enables patients to carry out the training of traditional Chinese medicine guidance at any time by themselves. It is suitable for families and primary medical and health institutions, and can greatly Reduce the pressure of medical staff and improve the flexibility and accuracy of patient rehabilitation training.

附图说明Description of drawings

图1为本发明的总流程框图。FIG. 1 is a general flow block diagram of the present invention.

图2左为OpenPose中BODY_25数据集的关节点示意图，右为本发明中分类方法中将25 个关键点划分为8部分肢体的示意图。The left side of FIG. 2 is a schematic diagram of the joint points of the BODY_25 dataset in OpenPose, and the right side is a schematic diagram of dividing 25 key points into 8 limbs in the classification method of the present invention.

图3为本发明中层次肢体注意力LSTM网络的架构图，从左到右分为四个模块：视角转换模块、身体级别模块、肢体注意力模块、肢体级别模块。3 is an architecture diagram of a hierarchical limb attention LSTM network in the present invention, which is divided into four modules from left to right: a perspective conversion module, a body level module, a limb attention module, and a limb level module.

具体实施方式Detailed ways

本发明中，中医导引术动作实时分类模型结构(层次肢体注意力LSTM网络)如图3所示，其中输入骨架经过预处理后，仅包含25个关键点的二维坐标信息，如图2所示，其中每段骨架序列可以表示为

其中T＝1000，J＝25，分别代表骨架序列的时间长度和骨架的关键点数据。In the present invention, the structure of the real-time classification model (hierarchical limb attention LSTM network) of TCM daoyin movements is shown in Figure 3, wherein the input skeleton only contains the two-dimensional coordinate information of 25 key points after preprocessing, as shown in Figure 2 shown, where each backbone sequence can be expressed as

where T=1000 and J=25, representing the time length of the skeleton sequence and the keypoint data of the skeleton, respectively.

原始骨架序列数据经过包括平移、归一化和零填充的预处理过程后，被输入到层次肢体注意力LSTM中进行分类，最终输出骨架序列每一帧的分类分数。The raw skeleton sequence data is input into the hierarchical limb attention LSTM for classification after preprocessing including translation, normalization and zero padding, and finally outputs the classification score of each frame of the skeleton sequence.

本发明的实施包括两个部分：分类模型(层次肢体注意力LSTM网络)的训练和智能康复辅助训练系统的实现，其中具体实施流程如下：The implementation of the present invention includes two parts: the training of the classification model (hierarchical limb attention LSTM network) and the realization of the intelligent rehabilitation auxiliary training system, wherein the specific implementation process is as follows:

(1)数据集的准备(1) Preparation of the dataset

由于本发明独特的应用性，我们使用自己收集的脊柱康复功法数据集(RDSD：针对脊柱退行性病变的康复动作)进行实验。RDSD由9个动作类别(8个功法动作和站立动作)共1012段视频组成，其中每段视频的平均长度为18.4秒(每秒30帧)。所有视频经过左右镜像翻转作为数据增强，最终按照75：25的比例被划分为训练集和测试集。Due to the unique applicability of the present invention, we use our own collection of spinal rehabilitation exercise datasets (RDSD: rehabilitation movements for spinal degenerative diseases) to conduct experiments. RDSD consists of 1012 videos in 9 action categories (8 kungfu actions and standing actions), in which the average length of each video is 18.4 seconds (30 frames per second). All videos are mirror-flipped as data enhancement, and finally divided into training set and test set according to the ratio of 75:25.

(2)数据预处理(2) Data preprocessing

对于本发明提出的RDSD数据集，通过开源多人姿态骨架代码工具库OpenPose^[1]进行二维姿态估计提取出骨架序列，其中骨架序列的每一帧由25个关键点的二维空间坐标组成。For the RDSD data set proposed by the present invention, the skeleton sequence is extracted by two-dimensional pose estimation through the open source multi-person pose skeleton code tool library OpenPose ^[1] , wherein each frame of the skeleton sequence consists of 25 key points of two-dimensional space coordinates. .

本发明以每一段骨架序列为基准，对数据集进行预处理，其中包括平移、归一化和零填充三步：首先进行平移操作，将骨架序列的所有帧的所有坐标减去第一有效帧的臀中坐标；然后进行归一化操作，计算骨架序列所有帧的平均双肩距离d_s，然后对骨架序列所有帧的坐标进行缩放，缩放系数为

最后进行零填充操作，在不足1000帧的骨架序列后填充全零，对超过1000帧的骨架序列进行截取，保证所有骨架序列的时间长度为1000 帧。Based on each skeleton sequence, the present invention preprocesses the data set, which includes three steps of translation, normalization and zero padding: first, a translation operation is performed, and the first valid frame is subtracted from all coordinates of all frames of the skeleton sequence. The mid- _hip coordinates of

Finally, the zero-padding operation is performed, and all zeros are filled after the skeleton sequence of less than 1000 frames, and the skeleton sequence of more than 1000 frames is intercepted to ensure that the time length of all skeleton sequences is 1000 frames.

(3)模型训练(3) Model training

模型中的主要超参数有：训练条件、批次大小、学习率、丢弃率、LSTM正交初始化乘法因子、最高迭代次数；The main hyperparameters in the model are: training conditions, batch size, learning rate, dropout rate, LSTM orthogonal initialization multiplication factor, maximum number of iterations;

本发明中，模型的超参数的设置如下：训练条件：单块GTX1070 GPU；批次大小：设置为64；学习率：初始学习率设置为0.005，使用ReduceLROnPlateau作为学习率调整策略，即若每10次迭代模型的验证损失值不下降，将学习率降低10倍；丢弃率：设置为0.1；LSTM正交初始化乘法因子：设置为0.001；最高迭代次数：300，当模型的验证损失值不再下降并持续30次迭代后，提前终止训练，训练总次数一般在100次以上；In the present invention, the settings of the hyperparameters of the model are as follows: training conditions: a single GTX1070 GPU; batch size: set to 64; learning rate: the initial learning rate is set to 0.005, and ReduceLROnPlateau is used as the learning rate adjustment strategy, that is, if every 10 The validation loss value of the next iteration model does not decrease, and the learning rate is reduced by 10 times; the dropout rate: set to 0.1; the LSTM orthogonal initialization multiplication factor: set to 0.001; the maximum number of iterations: 300, when the model's validation loss value no longer decreases And after 30 iterations, the training is terminated early, and the total number of training is generally more than 100 times;

(4)实验结果(4) Experimental results

为了研究分类网络(层次肢体注意力LSTM网络)中各模块的有效性，本发明对比添加/未添加各个模型与基线网络之间的对比实验。其中基线网络为RNNs，由三层 LSTM+Dropout组成；HRNNs表示网络由一层肢体级别(Limb-level)LSTM+Dropout和两层身体级别(Body-level)LSTM+Dropout组成；VT-RNNs对比RNNs添加了如图3所示的视角转换模块(View-transformation Module)；VT-HRNNs对比HRNNs添加了如图3所示的视角转换模块(View-transformation Module)；VT-HRNNs-ATT对比VT-HRNNs添加了如图3所示的肢体注意力模块(Limb-Attention)，即为本发明的层次肢体注意力 LSTM网络。从表2可以看出，采用层次结构(添加Limb-level)相比全部采用身体级别 (Body-level)LSTM在测试准确率上提升了1.22％，视角转换模块的添加在测试准确率上提升了1.04％，肢体注意力模块的添加在测试准确率上提升了2.41％，这说明了本发明中层次肢体注意力LSTM网络中各个模块的有效性。In order to study the effectiveness of each module in the classification network (hierarchical limb attention LSTM network), the present invention compares the comparison experiments between each model with/without addition and the baseline network. Among them, the baseline network is RNNs, which consists of three layers of LSTM+Dropout; HRNNs indicates that the network consists of one layer of Limb-level LSTM+Dropout and two layers of body-level (Body-level) LSTM+Dropout; VT-RNNs compared to RNNs Added View-transformation Module as shown in Figure 3; VT-HRNNs vs HRNNs Added View-transformation Module as shown in Figure 3; VT-HRNNs-ATT vs VT-HRNNs The limb-attention module (Limb-Attention) shown in Figure 3 is added, which is the hierarchical limb-attention LSTM network of the present invention. As can be seen from Table 2, the use of the hierarchical structure (adding Limb-level) compared to all using the body-level (Body-level) LSTM improves the test accuracy by 1.22%, and the addition of the perspective conversion module improves the test accuracy. 1.04%, the addition of the limb attention module improves the test accuracy by 2.41%, which illustrates the effectiveness of each module in the hierarchical limb attention LSTM network in the present invention.

表格2，本发明层次肢体注意力LSTM网络在RDSD数据集上各模块消融实验Table 2, the ablation experiment of each module of the hierarchical limb attention LSTM network of the present invention on the RDSD data set

网络结构network structure 训练准确率training accuracy 测试准确率test accuracy RNNs(Baseline)RNNs (Baseline) 95.30％95.30% 92.01％92.01% HRNNsHRNNs 96.30％96.30% 93.23％93.23% VT-RNNsVT-RNNs 95.43％95.43% 93.05％93.05% VT-HRNNsVT-HRNNs 96.06％96.06% 92.93％92.93% VT-HRNNs-ATTVT-HRNNs-ATT 96.64％96.64% 95.34％ 95.34%

(5)智能康复辅助训练系统的实现(5) Realization of intelligent rehabilitation auxiliary training system

本发明中的智能康复辅助训练系统包括以下四部分的功能：基于0penPose[1]隔帧检测结合稀疏光流追踪的骨架序列实时获取和分类、骨架序列的分割与实时纠错和骨架序列段的评分，如图1所示。The intelligent rehabilitation auxiliary training system in the present invention includes the following four functions: real-time acquisition and classification of skeleton sequences based on OpenPose[1] frame-by-frame detection combined with sparse optical flow tracking, segmentation and real-time error correction of skeleton sequences, and skeleton sequence segmentation score, as shown in Figure 1.

系统从普通二维RGB摄像机接收图像帧信号源，并实时获取骨架序列。具体地，通过每5帧运行一次OpenPose^[1]二维姿态估计，并根据得到的25个人体关键点二维坐标，通过Lucas-Kanade稀疏光流追踪法得到接下来4帧的人体关键点二维坐标。The system receives the image frame signal source from the ordinary two-dimensional RGB camera, and obtains the skeleton sequence in real time. Specifically, by running OpenPose ^[1] 2D pose estimation every 5 frames, and according to the obtained 25 human body keypoint 2D coordinates, the human body keypoint 2 of the next 4 frames is obtained by the Lucas-Kanade sparse optical flow tracking method. dimensional coordinates.

每获取到最新的10帧人体骨架序列，运行一次本发明中的分类网络(层次肢体注意力LSTM网络)，得到这10帧序列的动作类别标签。具体地，对10帧人体骨架序列进行包括平移、归一化和零填充的预处理后，馈入到预训练好的层次肢体注意力LSTM网络中，得到实时分类结果。注意，层次肢体注意力LSTM网络从系统运行开始保持所有LSTM 单元的记忆状态，并每间隔30s清空一次记忆状态。Each time the latest 10 frames of human skeleton sequences are obtained, the classification network (hierarchical limb attention LSTM network) in the present invention is run once to obtain the action category labels of the 10 frames of sequences. Specifically, 10 frames of human skeleton sequences are preprocessed including translation, normalization and zero padding, and then fed into the pre-trained hierarchical limb attention LSTM network to obtain real-time classification results. Note that the Hierarchical Limb Attention LSTM network maintains the memory state of all LSTM units from the start of system operation and clears the memory state every 30s.

对每一帧的骨架序列，根据其关键点二维坐标信息和该帧的动作类别，生成响应的文本纠错信息，提示患者当前进行功法的动作要点。具体地，本发明中为8种功法动作分别定义了不同的参考参数，如表格1所示，用来动态的计算生成文本信息，并通过语音播报的方式提醒用户。For the skeleton sequence of each frame, according to the two-dimensional coordinate information of its key points and the action category of the frame, the corresponding text error correction information is generated to remind the patient of the action points of the current exercise. Specifically, in the present invention, different reference parameters are respectively defined for 8 kinds of exercises, as shown in Table 1, which are used to dynamically calculate and generate text information, and remind the user through voice broadcast.

系统对所有分类好且尚未划分动作段的帧进行动态地管理，即动作段分割。具体地，根据当前最新的若干帧未分类的帧的动作类别确定动作段的类别，并且容忍最长100帧的动作类别不一致。The system dynamically manages all classified frames that have not yet been divided into action segments, that is, action segment segmentation. Specifically, the class of the action segment is determined according to the action class of the unclassified frames of the latest several frames, and the inconsistency of the action class of the longest 100 frames is tolerated.

对于分割好的动作段，系统根据其动作类别选择相应地提前加载的专家组视频骨架序列信息，并通过动态时间规整算法，计算当前动作段和专家组动作段之间的相似度距离，然后根据预定义的距离-分数转换规则，计算当前动作段的百分制评分。For the segmented action segment, the system selects the corresponding pre-loaded expert group video skeleton sequence information according to its action category, and calculates the similarity distance between the current action segment and the expert group action segment through the dynamic time warping algorithm, and then according to Predefined distance-score conversion rules to calculate the percentile score of the current action segment.

系统基于Python3语言实现，其中主要使用Tensorflow，OpenCV以及多进程处理等公开代码库，并以BS的方式在便携式min-PC上实时运行，对脊柱退行性病变患者康复训练具有很高的智能辅助应用价值。The system is implemented based on the Python3 language, which mainly uses public code libraries such as Tensorflow, OpenCV and multi-process processing, and runs in real time on the portable min-PC in the form of BS, which has a high intelligent auxiliary application for the rehabilitation training of patients with spinal degenerative diseases value.

参考文献references

[1]Cao Z，Hidalgo G，Simon T，et al.OpenPose：Realtime Multi-Person 2DPose Estimation using Part Affinity Fields[J].IEEE Transactions on PatternAnalysis and Machine Intelligence，2018.[1] Cao Z, Hidalgo G, Simon T, et al. OpenPose: Realtime Multi-Person 2DPose Estimation using Part Affinity Fields [J]. IEEE Transactions on PatternAnalysis and Machine Intelligence, 2018.

[2]Donald J Berndt and James Clifford.1994.Using Dynamic Time Wrapingto Find Patterns in Time Series.Proceedings of the AAAI Conference onArtificial Intelligence(AAAI).359- 370pages。[2] Donald J Berndt and James Clifford. 1994. Using Dynamic Time Wraping to Find Patterns in Time Series. Proceedings of the AAAI Conference onArtificial Intelligence (AAAI). 359-370pages.

Claims

1. A spine degenerative disease intelligent rehabilitation auxiliary training system based on deep learning is characterized by comprising a traditional Chinese medicine guidance video real-time classification module based on deep learning and a video sequence dividing and evaluating module based on human body skeleton representation;

the traditional Chinese medicine guidance video real-time classification module based on deep learning obtains two-dimensional human skeleton data through Openpos, the two-dimensional human skeleton data serve as training data of a deep learning model, supervised deep learning training is carried out, and a generalized deep learning model is obtained; then, performing real-time preprocessing on the input video by combining OpenPose frame separation detection with sparse optical flow tracking, and taking the result as an input feed-in pre-trained classification model to obtain a real-time frame classification result;

and the human body skeleton representation and video sequence division and evaluation module is used for carrying out segmentation and real-time error correction on skeleton sequences of the same category according to the frame classification result of the real-time classification module, and carrying out sequence comparison grading on the segmented sequence segments and the video skeleton sequence segments of the corresponding category expert group.

2. The intelligent rehabilitation auxiliary training system for the deep learning-based degenerative spine disease as claimed in claim 1, wherein:

the working content of the real-time classification module of the traditional Chinese medicine guidance video based on deep learning is as follows;

(1) acquiring training data of deep learning;

(2) training a deep learning model;

(3) the method comprises the steps of preprocessing videos in real time by combining OpenPose frame separation detection with sparse optical flow tracking and classifying the videos;

corresponding to the human body skeleton representation-based and video sequence partitioning and evaluating module, the working contents are as follows:

(4) carrying out segmentation and real-time error correction on the skeleton sequence based on the classification result;

(5) carrying out comparison scoring on the sequence segments after the segmentation is finished;

in the content (1), the specific operation flow of obtaining deep learning training data is as follows:

(11) processing video data, and clipping all the video data according to the design of the rehabilitation training skill and action to obtain short video data with the length not more than 1000 frames;

(12) performing left and right mirror image turnover on all short video data to serve as data enhancement;

(13) performing two-dimensional posture extraction on a video data sample processed in the process (11) by using an Openpos model pre-trained on a BODY _25 dataset to obtain a skeleton data sequence represented by two-dimensional space coordinates of 25 key points, wherein the 25 key points are respectively Nose, Neck, RShoulder, Relbow, RChrist, LShoulder, LElbow, LWrist, MidHip, Rip, RKne, Rankle, LHip, Lknee, Lankle, REye, LEye, reader, LEAr, LBigToe, LSmalToe, LHeel, RBigToe, RSmalToe, Rheel, Back;

(14) performing data preprocessing on all the framework data sequences, firstly performing translation operation, and subtracting the hip middle coordinate of the first effective frame from all the frame coordinates of the framework sequences; then, normalization operation is carried out, and the average distance d between shoulders of the skeleton sequence is calculated_sThen, all frame coordinates of the skeleton sequence are scaled by a scaling factor of

Finally, zero filling operation is carried out, all zeros are filled behind the framework sequence, and the length of the framework sequence is fixed to 1000 frames;

in the content (2), the training of the deep learning model specifically includes:

(21) the deep learning model adopts a classification network model, particularly adopts a hierarchical limb attention LSTM network, and the network comprises a visual angle conversion module, a limb attention module, a limb level classification module and a body level classification module; the hierarchical limb attention LSTM takes skeleton data as input, firstly, two-dimensional translation transformation is carried out on input sequence coordinates through a visual angle conversion module, then, each frame coordinate of a skeleton sequence is divided into 8 limb parts, and the limb parts pass through an LSTM layer and a Dropout layer of a limb level classification module respectively; then, the feature vector magnitude of 8 limb parts is connected and the space-time attention weight of each limb is calculated through a limb attention module; finally, according to the space-time attention weight, corresponding to the feature vector magnitude of 8 limbs and weighting, and finally obtaining the classification score of the skeleton sequence on each frame through an LSTM layer, a Dropout layer and a softmax layer of a body level classification module;

(22) setting a model hyper-parameter;

the main hyper-parameters in the model are: training conditions, batch size, learning rate, discarding rate, LSTM orthogonal initialization multiplication factor and maximum iteration number;

(23) starting training, namely taking the verification loss value of the model during training as a reference, and when the verification loss value of the model does not decrease and continues for 30 iterations, indicating that the network is already stored, and ending the training;

(24) adjusting the hyper-parameters for multiple times to obtain a model with the best generalization performance;

in the content (3), the behavior classification is carried out on the video with classification by using the trained model, and the specific operation flow is as follows:

(31) acquiring and processing a skeleton sequence and classification from an input video signal source, acquiring skeleton sequence information in real time by adopting an OpenPose frame separation detection and sparse optical flow tracking mode, and performing OpenPose attitude estimation once every 5 frames to obtain the coordinates of the key points of the 25 human bodies; tracking two-dimensional coordinate information of key points of 25 human bodies of each frame in subsequent 4 frames by adopting a Lucas-Kanade method, wherein the specific process comprises the following steps:

S_t+1＝calOpticalFlowPyrLK(I_t，I_t+1，S_t)，

wherein S is_t，I_tRespectively representing a gray image and skeleton information of a t-th frame, wherein the calOpticalFlowPyrLK is one implementation of an OpenCV open source library to a Lucas-Kanade method;

(32) the classification of the skeleton sequence frame is slightly different from the classification in the reasoning process and the training process, in the reasoning process, skeleton information preprocessing and classification are carried out once after 10 frames of skeleton information are obtained, and the preprocessing process comprises translation, normalization and zero filling;

(33) in the reasoning process, all LSTMs in the model keep the parameter information of the LSTM after each classification, namely, an LSTM stateful mode is started, and in the mode, the LSTM network with hierarchical limb attention keeps the states of all LSTM layers after reasoning each time and is used as the initial state in the next reasoning;

in the content (4), the skeleton sequence is segmented and corrected in real time based on the classification result, and the process comprises the following steps:

(41) and (3) skeleton sequence segmentation based on classification results:

after the sequence frames are classified, real-time sequence segmentation is performed, specifically: when the first frame of the sequence or the last frame of the last sequence section is determined, judging the type of the current sequence section according to the classification result of the current frame, and determining the basis for ending the sequence section, namely whether the frame type is continuously inconsistent with the type of the current sequence section and the maximum error tolerance length is reached or the input sequence is ended;

(42) the real-time error correction based on the classification result comprises the following processes:

calculating different parameters according to the skeleton information and the classification result of each frame, and generating error correction text information;

the comparative scoring of the segmentation-ended sequence segments in the content (5) comprises:

after a new sequence segment is determined, carrying out sequence comparison scoring according to the category of the sequence segment; first, the system maintains an expert group skeleton sequence information Q for each action analogy_nFor a sequence segment C_mCalculating the similarity of the sequences at the two ends by a dynamic time rule (DTW) algorithm, finding the optimal alignment path of the two sequences by the DTW algorithm through the idea of dynamic programming, and calculating the Euclidean distance sum of sequence frames on the path:

Cost(i，j)＝D(i，j)+min[Cost(i-1，j)，Cost(i，j-1)，Cost(i-1，j-1)]，

wherein x, y represent two-dimensional coordinates in the sequence segment, and D (i, j) represents the sequence Q_nFrame i and sequence C_mThe cumulative euclidean distance sum of Cost (i, j) at the (i, j) position of the best alignment path of the two sequences, and the similarity of the two sequences of Cost (n, m); according to the similarity of the sequences, calculating the comparison score of the current sequence segment and the expert group framework sequence, and adopting a percent system; in order to make the score distribution more uniform, the scores are divided into 4 grades according to the similarity of the sequences: respectively 90-100, corresponding to similarity 0-2; 75-90, corresponding to similarity 2-4; 60-75 parts of; the corresponding similarity is 4-6; 40-60, corresponding to the similarity of 6-infinity, and for the similarity of each grade, the score calculation process is as follows:

score＝low+(len-10*2^{cost-highCost})，

wherein low and highpost represent the lowest score and highest similarity of the current gear, len represents the length of the score range of the current gear, and cost represents the similarity of the sequence.

3. The intelligent rehabilitation training system for spinal degenerative disease based on deep learning of claim 2, wherein the operation process of the view transformation module in the LSTM network for level limb attention in the step (21) is as follows:

in order to reduce the influence of shooting angle change on the classification performance of the model, the visual angle conversion module is used for carrying out self-adaptive two-dimensional translation and rotation operation on the input skeleton sequence, and coordinates of each frame of the input skeleton sequence are adjusted, wherein the specific calculation process is defined as:

S′_t，j＝[x′_t，j，y′_t，j]′＝R_t(S_t，j-d_t)，

wherein S is_t，j＝[x_t，j，y_t，j]' two-dimensional coordinates representing the jth key point of the tth frame of the input skeleton sequence;

representing a translation vector corresponding to the t-th frame; r_tThe two-dimensional rotation matrix corresponding to the t-th frame is represented as:

wherein,

respectively represents the translation quantity alpha of all coordinates of the t-th frame of the skeleton sequence along the horizontal axis and the vertical axis_tRepresenting the radian of anticlockwise rotation of all coordinates of the t frame of the skeleton sequence;

the skeleton information passing through the visual angle conversion module is divided into 8 limbs with certain overlap according to human body distribution, namely a head, a left arm, a right arm, a trunk, a left leg, a right leg, a left foot and a right foot, and then the information of different limbs is calculated by an independent LSTM layer and a Dropout layer and then is cascaded into the integral skeleton information again.

4. The intelligent rehabilitation training system for spinal degenerative disease based on deep learning of claim 3, wherein the operation process of the limb attention module in the hierarchical limb attention LSTM network in the step (21) is as follows:

wherein H_t，iI-th limb information indicating the t-th frame skeleton sequence, i ≦ L (L ≦ 8), H_tRepresenting the characteristic information obtained after 8 limb characteristic information passes through an LSTM layer and a Dropout layer after being cascaded, W₁，W₂Is a learnable parameter matrix, b₁，b₂Is a deviation vector, then a weight vector a of each limb is calculated by softmax activation_t，lFinally, obtaining the weighted feature vector of each limb, and cascading all weighted limb feature vectors as the input of a subsequent module:

H′_t，l＝a_t，l·H_t，l，

H′_t＝concat(H′_t，1，...，H′_t，L)，

after the cascaded weighted limb feature vectors pass through two layers of LSTM and Dropout and a full connection layer, classification scores are obtained through softmax activation.

5. The intelligent rehabilitation training system for spinal degenerative disease based on deep learning as claimed in claim 3, wherein the process (42) calculates different parameters according to the skeleton information and classification result of each frame, and the relationship between the motion category and the calculated parameters is as follows:

the action categories are: 0, 1, 2, 3, 4, 5, 6, 7; the corresponding parameter names are as follows in sequence: the two hands form an included angle with the lifting amplitude and the upper arm and the lower arm; the relative amplitude of left-right unfolding of the hands and the left-right rotation amplitude of the head; the relative longitudinal height of the double elbows and the relative transverse width of the double axes; the relative longitudinal height of the double elbows and the relative transverse width of the double axes; the relative downward extension range of the two hands; the ratio of the relative downward extension amplitude of the hands to the average length of the legs; longitudinal distance between the hip and the neck; the ratio of the difference in longitudinal distance between the feet to the longitudinal distance between the hip and the neck.