CN110991268B

CN110991268B - Depth image-based Parkinson hand motion quantization analysis method and system

Info

Publication number: CN110991268B
Application number: CN201911110171.9A
Authority: CN
Inventors: 曹治国; 于泰东; 肖阳; 綦浩喆; 张博深
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2019-11-13
Filing date: 2019-11-13
Publication date: 2022-05-20
Anticipated expiration: 2039-11-13
Also published as: CN110991268A

Abstract

The invention discloses a depth image-based Parkinson hand motion quantization analysis method and system, and belongs to the field of computer vision and machine learning. The method comprises the following steps: the method comprises the steps that a detected person makes hand motions according to the requirements of a Parkinson disease rating scale, multiple frames of depth images of the detected person are obtained in the period, and 3D coordinates of the mass center of the hand in each frame of depth image are identified; segmenting the hand point cloud and the noise point cloud of each frame of depth image according to the 3D coordinates of the mass center of the hand; predicting the 3D coordinates of each joint point of the hand in each frame of depth image based on the point cloud of the hand in the single frame of depth image; according to time sequence information between continuous frames and priori knowledge of hand postures, integrally optimizing 3D coordinates of all hand joint points of all frames; extracting hand motion characteristics according to the optimized joint point 3D coordinates; and classifying the extracted hand motion characteristics by using a trained XGboost classifier, and giving a corresponding grading result.

Description

Depth image-based Parkinson hand motion quantization analysis method and system

Technical Field

The invention belongs to the field of computer vision and machine learning, and particularly relates to a depth image-based Parkinson hand motion amount analysis method and system.

Background

Early Parkinson's disease symptoms are not obvious, and a series of judgments are needed in the traditional diagnosis mode. One of the key elements is to make a decision based on the completion status of a series of specified actions. One of the traditional diagnostic modalities is the individual assessment of patients by the parkinson rating scale UPDRS. The physician instructs the patient to perform the activity according to the parkinson rating scale and then scores the patient item by item, perhaps taking 30 minutes or more, depending on the patient's completion. The period may be hurt by self-suspicion and the physician's speech from the psychological factors of physical coordination, stress, etc. The diagnosis mode depends on language communication, the time cost is high, the scoring mainly depends on visual observation of doctors, such as the distance, the amplitude, the frequency and the like of actions, quantitative indexes are lacked, and deviation caused by subjectivity can occur.

The prior art analysis of parkinson hand motion is mainly divided into two categories: 1 is to analyze the three-dimensional position change of the center of the hand using the color and depth images as input. In patent CN105701806A, a detected person wears pure color gloves, the detected person manually selects the approximate position of the hand of the detected person in the picture, obtains color image and depth image information simultaneously through Kinect, identifies and positions the hand position through color image, and obtains spatial position information through depth image conversion. And in the identification process, the color gloves are used for marking, the color filtering technology and the region growing algorithm are combined to complete the identification, each frame is predicted according to the identification result of the previous frame, and the identification is completed after the reference point is determined. In the conversion process, the identification result needs to be converted from two-dimensional coordinates in the color image into three-dimensional coordinates of the hand position in the depth image, four vertexes of the upper left, the lower left, the upper right, the lower right and the center point are selected for recording, the four vertexes represent the plane of the palm when the hand moves in each frame, the center point represents the hand space motion track and the tremor condition, the data file is processed, and the period information is obtained through analysis and fitting. And 2, positioning the two-dimensional plane position of the hand joint point by using the color image as input, and further realizing the evaluation of the hand motion. A new technology for AI-assisted diagnosis of Parkinson's disease is introduced in an Tencent medical artificial intelligence laboratory, and the evaluation of a Parkinson rating scale UPDRS is automatically realized aiming at the motion video of a Parkinson's disease person based on a motion video analysis technology without a wearable sensor. Briefly, the user does not need to wear any sensor, only needs to shoot through a camera (common smart phone can meet the requirements), does simple actions of some Parkinson rating scales, such as stretching a palm and making a fist, rotating hands and other actions, can identify key nodes of body parts in a motion video through the system, quantitatively analyzes action indexes, and completes a diagnosis process.

However, the former technique only analyzes the three-dimensional position change of the central point of the hand and fails to evaluate other motion characteristics of the hand; the latter technique utilizes information of two-dimensional joint points, cannot reflect three-dimensional motion characteristics of the hand in a physical space, and has poor evaluation accuracy. Both techniques use color images and do not protect the privacy of the patient.

Disclosure of Invention

Aiming at the problems that the privacy of a patient cannot be protected and the motion evaluation accuracy is poor in the prior art, the invention provides a depth image-based method and a depth image-based system for analyzing the motion amount of a Parkinson hand, and aims to effectively protect the privacy of the patient based on the analysis of a depth image; the hand motion is quantitatively analyzed based on the three-dimensional motion of the joint points in the physical space, and the 3D coordinates of all the hand joint points of all the frames are integrally optimized according to the time sequence information between the continuous frames and the priori knowledge of the hand postures, so that the evaluation accuracy is improved.

To achieve the above object, according to a first aspect of the present invention, there is provided a depth image-based parkinson hand motion quantization method, comprising:

s1, a detected person makes a hand action according to the requirements of a Parkinson disease rating scale, a plurality of frames of depth images of the detected person are obtained in the period, and 3D coordinates of a hand mass center in each frame of depth image are identified;

s2, segmenting hand point clouds and noise point clouds of each frame of depth image according to the 3D coordinates of the mass center of the hand;

s3, predicting the 3D coordinates of all joint points of the hand in each frame of depth image based on the hand point cloud in the single frame of depth image;

s4, integrally optimizing the 3D coordinates of all hand joint points of all frames according to time sequence information between continuous frames and priori knowledge of hand postures;

s5, extracting hand motion characteristics according to the motion characteristics of hand motions of the detected person according to the comprehensive Parkinson disease rating scale and the optimized joint point 3D coordinates;

and S6, classifying the extracted hand motion characteristics by using the trained XGboost classifier, and giving a corresponding grading result.

Specifically, step S2 includes the steps of:

s21, converting all pixel points with depth values in each frame of depth image into 3D coordinates in space;

s22, defining a 3D target frame with a hand mass center as a center, removing point clouds outside the target frame as noise point clouds, and keeping the point clouds inside the target frame as hand point clouds.

Specifically, step S4 includes the steps of:

s41, calculating the length proportion of the hand bones of the tested person

S42, establishing an optimized objective function E according to time sequence information between continuous frames_T(X, β) adding temporal smoothing constraints between successive frames, X representing the 3D coordinates of 21 joint points in all frames;

s43, establishing an optimized objective function E according to the priori knowledge of the hand gesture_P(X, beta), adding constraints of hand posture prior and position preference;

s44, carrying out integral optimization on the 3D coordinates of the hand joint points of all frames in the video and the parameter beta, so that an optimization objective function E (X, beta) is E_T(X，β)+E_P(X, β) is minimal.

Specifically, the specific calculation method of β is:

wherein j ═ 4+ k (i-1), B_i，kRepresenting the actual length of the kth skeleton of the ith finger, i ∈ {1, 2, 3, 4, 5} respectively representing the thumb, index finger, middle finger, ring finger, little finger, k ∈ {1, 2, 3, 4} respectively representing the 4-membered skeleton from the wrist joint point to the fingertip joint point, B_totalRepresenting the sum of the lengths of all bones.

In particular, the objective function E is optimized_TThe formula for (X, β) is as follows:

where ρ is a robust Huber error function,

3D coordinates representing the ith joint point of the tth frame, T representing the number of video frames, J representing the number of defined hand joint points,

representing the degree of freedom, λ, calculated from the t-th frame hand joint point₁、λ₂Is a set hyper-parameter.

In particular, the objective function E is optimized_PThe formula for (X, β) is as follows:

wherein T represents the number of video frames, J represents the number of defined hand joint points,

representing degrees of freedom calculated from the t-th frame hand joint points, E_J(theta) is a negative log-likelihood function of the Gaussian mixture model, P represents the number of Gaussian models, g_pWeight, μ, representing the p-th Gaussian model_p，∑_pMeans and covariance, λ, representing the p-th Gaussian model_I、λ_βIn order to set the hyper-parameters,

and 3D coordinates of the ith joint point of the tth frame are shown, and the superscript indicates the prediction result of the corresponding variable.

Specifically, step S5 specifically includes: when the motion is designated as finger-tapping, the whole time series of ten finger-tapping times is intercepted, and the calculated distance vector D is equal to (D)₁，d₂，….，d_T)^TPerforming discrete Fourier transform to obtain fixed-length amplitude response vector F ═ F (F) at different frequencies₁，f₂，…，f_N)^TEach component of vector F represents the magnitude of the magnitude at that frequency of distance vector D, where the actual physical distance of the thumb tip and index finger tip in frame t

Respectively representing the 3D coordinates of the thumb and forefinger finger joint points predicted in the T frame, and T representing the video frame number.

Specifically, the XGBoost classifier in step S6 is trained by:

(1) carrying out data cleaning operation on the collected grading result of the real Parkinson disease sample by using low-rank decomposition;

(2) and training the XGboost classifier by using different hand motion characteristics in the obtained scale and corresponding expert scoring results after data cleaning.

To achieve the above object, according to a second aspect of the present invention, there is provided a depth image-based parkinson hand motion quantization system, comprising:

the depth image acquisition module is used for acquiring multi-frame depth images of the detected person and identifying 3D coordinates of a hand centroid in each frame of depth image during the period that the detected person makes hand motions according to the requirements of the Parkinson disease rating scale;

the hand point cloud segmentation module is used for segmenting hand point clouds and noise point clouds of each frame of depth image according to the 3D coordinates of the mass center of the hand;

the joint point prediction module is used for predicting the 3D coordinates of each joint point of the hand in each frame of depth image based on the hand point cloud in the single frame of depth image;

the integral optimization module is used for integrally optimizing the 3D coordinates of all hand joint points of all frames according to the time sequence information between the continuous frames and the priori knowledge of the hand postures;

the hand motion characteristic extraction module is used for extracting hand motion characteristics according to the motion characteristics of hand motions of the detected person according to the comprehensive Parkinson disease rating scale and the optimized joint point 3D coordinates;

and the XGboost classifier is used for classifying the extracted hand motion characteristics and giving a corresponding grading result.

To achieve the above object, according to a third aspect of the present invention, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the depth-image-based parkinson hand motion quantization analysis method according to the first aspect.

Generally, by the above technical solution conceived by the present invention, the following beneficial effects can be obtained:

(1) aiming at the problem that the privacy of a patient is exposed due to the adoption of color images in the prior art, the method obtains the multi-frame depth image of the detected person, and the depth image can not accurately distinguish the identity of the person, so that the privacy of the patient can be effectively protected; the distance between each point of the visible surface of the hand and the camera can be obtained through the depth image, and the space structure and the actual motion characteristic of the hand can be captured.

(2) The hand motion characteristics are extracted by utilizing the 3D coordinates of all joint points of the hand, the extracted motion characteristics can reflect the motion change of the hand in a real physical space, the hand motion characteristics have actual physical meanings, and the hand motion quantitative analysis accuracy can be effectively improved.

(3) According to the invention, the 3D coordinates of all hand joint points of all frames are optimized integrally according to the time sequence information between the continuous frames and the prior knowledge of the hand postures. In the optimization process, the proportional relation between the prior of the hand freedom degree and the skeleton length is considered, the hand posture of a single frame is accurately and reliably analyzed, meanwhile, the joint point shaking phenomenon caused by inter-frame errors can be well smoothed, and the quantitative analysis accuracy of hand motion is effectively improved.

Drawings

Fig. 1 is a flowchart of a parkinson hand motion quantization analysis method based on depth images according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a hand movement sample collection process according to an embodiment of the present invention;

FIG. 3 is a collected depth image of hand movements of a Parkinson's disease patient provided by an embodiment of the invention;

FIG. 4 is a prior art color image corresponding to FIG. 3;

fig. 5 is an imaging result of the segmented hand point cloud on the 2D depth image according to the embodiment of the present invention;

FIG. 6 is a schematic diagram of a hand point cloud in a 3D space according to an embodiment of the present invention;

FIG. 7 is a projection result of predicted 3D joint coordinates on a 2D image according to an embodiment of the present invention;

FIG. 8 is a schematic diagram illustrating a hand skeleton definition according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of hand joint and degree of freedom distribution provided by an embodiment of the present invention;

FIG. 10(a) is a diagram illustrating the distance between the tips of the thumb and the index finger for a hand-clapping action according to an embodiment of the present inventionFrom d_tChange schematic diagram with frame number t;

FIG. 10(b) is a graph showing a fixed-length amplitude response for different frequencies of a hand clapping motion according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

As shown in fig. 1, the present invention provides a depth image-based parkinson hand motion quantization analysis method, which includes the following steps:

s1, the detected person makes hand motions according to requirements of the Parkinson disease rating scale, multi-frame depth images of the detected person are obtained in the period, and 3D coordinates of a hand centroid in each frame of depth image are identified.

A depth camera, such as Intel SR300, is utilized to acquire three-dimensional imaging information of hand movement of a Parkinson patient, the acquisition process is shown in figure 2, the Parkinson patient is right opposite to the camera, and hand movement in an MDS-UPDRS comprehensive Parkinson disease rating scale is performed within the effective imaging range of the camera according to requirements. The acquired depth image is shown in fig. 3, and the palm of the patient is preferably over against the camera in the image acquisition process, so that the hand imaging quality is better. Compared with the color image shown in fig. 4, the depth image cannot accurately distinguish the identity of an individual, and the privacy of a patient can be effectively protected. And the distance from each point of the visible surface of the hand to the camera can be obtained by the depth image, so that the space structure and the actual motion characteristic of the hand can be captured.

When the depth image is acquired, the 3D coordinates of the centroid of the hand in each frame of depth image under the camera coordinate system are obtained by using a self-contained tracking detection algorithm of a corresponding software development kit (such as IntelRealSenseSDK2.0) or other image algorithms such as a threshold segmentation and position correction method adopted in DeepPrior + +.

And S2, segmenting the hand point cloud and the noise point cloud of each frame of depth image according to the 3D coordinates of the hand centroid.

And defining a physical space range according to the actual size of the hand by using the 3D coordinates of the hand center under a camera coordinate system, and deleting noise point clouds of the point clouds outside the range and not belonging to the hand.

S21, converting all pixel points (u, v, D) with depth values in each frame of depth image into 3D coordinates (x, y, z) in a space by using camera parameters, wherein u and v represent 2D coordinates of the pixel points in an image plane, and D represents the depth values corresponding to the pixel points; x, y, z represent the 3D coordinates of the point cloud in the camera coordinate system.

Under the premise of not considering the imaging distortion of the camera, the conversion formula is as follows:

wherein ppx and ppy are the horizontal and vertical coordinates of the optical center of the camera in the image, f_x、f_yRespectively, the focal length of the camera in the direction of the X, Y axis.

And S22, segmenting the hand point cloud from the converted 3D space based on the 3D coordinates of the mass center of the hand.

Because the hand can appear various forms in the motion process, the palm center can not be ensured to be always over against the camera. In order to segment clean hand point cloud in each frame of image in a 3D space, a 3D target frame taking the mass center of the hand as the center is defined on the basis of the acquired 3D coordinates of the mass center of the hand, the point cloud outside the target frame is removed as noise point cloud, and the interference caused by the noise point cloud in the subsequent steps is removed. In the embodiment, the size of the 3D target block is preferably a three-dimensional frame with the side length of 20-30 cm. The imaging result of the segmented clean hand point cloud on the 2D depth image is shown in fig. 5, fig. 6 is a schematic diagram of the hand point cloud in a 3D space, and the corresponding coordinates are actual coordinates of the point cloud in a camera coordinate system, and the unit is millimeters.

And S3, predicting the 3D coordinates of each joint point of the hand in each frame of depth image based on the hand point cloud in the single frame of depth image.

Step S3 includes the following steps:

and S31, training the 3D hand posture estimation neural network by using the hand data set with the hand joint point labels to obtain the trained 3D hand posture estimation neural network.

In order to enable the prediction result of the 3D hand posture estimation method to be more accurate and the generalization capability of the model to be strong enough, the method utilizes the related hand database to train the neural network, such as the HANDS2017 data set, so that the model can learn enough hand postures. The data set should contain different hand samples as much as possible, the coverage range is wide in the aspects of hand size, posture, shape, visual angle and the like, and the accuracy and robustness of model prediction can be improved by training the model by using the data set.

The 3D hand pose estimation neural network includes but is not limited to: 3DCNNs or V2V-PoseNet method based on 3D convolution, DeepPrior + + method based on 2D convolution and HandPointNet method based on point cloud.

And S32, predicting the 3D coordinates of the hand joint points by using the trained 3D hand posture estimation neural network for the hand point cloud in the single-frame depth image.

The projection result of the predicted 3D joint coordinates on the 2D image is shown in fig. 7.

And S4, integrally optimizing the 3D coordinates of all hand joint points of all frames according to the time sequence information between the continuous frames and the priori knowledge of the hand postures.

In the optimization process, the proportional relation between the prior of the hand degree of freedom and the skeleton length is considered, the hand posture of a single frame is accurately and reliably analyzed, and meanwhile, the good smoothing effect on the joint point shaking phenomenon caused by inter-frame errors can be achieved.

Step S4 includes the following steps:

s41, calculating the length proportion beta of the hand bones of the tested person.

Due to the fact thatThe gesture of the same hand is analyzed in the whole sequence, the lengths and the proportions of the hand bones of different people are different, the lengths of the hand bones of the same person are kept unchanged, and the single hand bone is used in the invention in the above error terms

The method for calculating the length ratio of the hand bones is characterized in that the length ratio of the hand bones is subjected to consistency constraint, and the specific calculation method of the beta comprises the following steps:

S42, establishing an optimized objective function E according to time sequence information between continuous frames_T(X, β), adding temporal smoothing constraints between successive frames.

Since the prediction result of the 3D hand pose estimation method is based on the joint coordinates of a single depth map, joint information of previous and subsequent frames is not considered. In order to make an overall and effective judgment on the single motion sequence of the Parkinson disease patient, the invention smoothly constrains the changes of the 3D positions of the joint points and the freedom degree of hands between the continuous frames, so that the joint points are more consistent with the characteristics of actual motion, the invention is favorable for eliminating the jitter phenomenon caused by the prediction error of the joint points of each frame, and the overall prediction error is further reduced.

Wherein X represents the 3D coordinates of 21 joint points in all frames,

representing a defined hand skeleton scale parameter, p is a robust Huber error function,

the 3D coordinates representing the ith joint point in the tth frame, T ∈ {1, 2, 3, …, T }, T being the number of video frames, i ∈ {1, 2, 3, …, J }, J ═ 21, representing the number of defined hand joint points.

Representing the degrees of freedom calculated from the hand joint points. Lambda [ alpha ]₁、λ₂To set hyper-parameters, E_TThe smaller (X, β) represents the position of the joint points in 3D space, and the angular transformation between joint points is smoother.

As shown in fig. 9, a total of 21 degrees of freedom excluding 6 degrees of freedom in relative space of the wrist joint is obtained as follows:

(a) index finger, middle finger, ring finger, little finger (4 degrees of freedom each, 16 degrees of freedom in total):

1 stDoF: the opening and closing between the distal phalanx and the middle phalanx;

2 ndDoF: the middle phalanx and the proximal phalanx are opened and closed;

3 rdDoF: the opening and closing between the proximal phalanx and the metacarpal bone;

4 thDoF: abduction and adduction between the proximal phalanx and the metacarpal bone;

(b) thumb (5 degrees of freedom total):

1 stDoF: the opening and closing between the distal phalanx and the proximal phalanx;

2 ndDoF: the opening and closing between the proximal phalanx and the metacarpal bone;

3 rdDoF: abduction-adduction between the proximal phalanx and the metacarpal bone;

4 thDoF: the opening and closing between the metacarpal bone and the trapezium bone;

5 thDoF: abduction and adduction between the metacarpal bone and the trapezium bone.

S43, establishing an optimized objective function E according to the priori knowledge of the hand gesture_P(X, β), adding constraints of hand pose priors and position preferences.

In the whole pose optimization, a priori knowledge about hand poses is added, and penalty is given to hand poses which are unlikely to occur from physiological structures and possibly occur in certain frames, wherein the hand poses comprise joint angles and bone length proportion which are unlikely to occur.

Where T represents the number of video frames,

representing degrees of freedom calculated from the t-th frame hand joint points, E_J(theta) is a negative log-likelihood function of the Gaussian mixture model, J represents the number of defined hand joint points, P represents the number of Gaussian models, g_pWeight, μ, representing the p-th Gaussian model_p，∑_pMean and covariance, λ, representing the p-th Gaussian model_I、λ_βIn order to set the hyper-parameters,

and 3D coordinates of the ith joint point of the tth frame are shown, and the superscript indicates the prediction result of the corresponding variable. E_PThe smaller (X, β) is, the closer the position of the obtained joint point is to the possible posture of the hand.

E_I(X，Beta) is the last to encourage final optimization results and per-frame prediction results

As close as possible, the latter term is intended to give the skeleton proportion of the hand in all frames

Keeping the same as much as possible.

And optimizing the 3D coordinates of the hand joint points and the parameter beta of all frames in the video by using an L-BFGS algorithm, and using the prediction result of a single frame and the average value of all frame parameters beta as initial values, wherein the total number of the parameters is 20+63F to be optimized, wherein F represents the number of frames in the video. The algorithm has the advantages of high convergence rate, low memory overhead and the like.

And S5, extracting hand motion characteristics according to the motion characteristics of hand motions of the detected person according to the comprehensive Parkinson disease rating scale and the optimized joint point 3D coordinates.

For example, in the assessment of the term two-handed postural tremor, the present invention may use the absolute distance traveled by the joint points to score. In the evaluation of the finger-clapping action, the tester was asked to open the thumb and forefinger as much as possible and clap ten times at the fastest speed. For the finger flapping of the hand, after the 3D coordinates of the optimized joint points are obtained, the opening and the combination of the finger tips can be characterized by calculating the actual physical distance between the finger tips of the thumb and the finger tips of the index finger:

wherein the content of the first and second substances,

to representPredicted 3D coordinates of the joints of the thumb and index finger tips in the t-th frame, and FIG. 10(a) shows the distance D between the thumb and index finger tips_tThe diagram varies with the number of frames t.

The 3D physical distance can reflect the motion change of the hand in the real physical space and has actual physical meaning. The 2D joint points are coordinates of the reflected hand joint points in the image, the distance refers to the pixel distance (not the real physical distance) between two points in the image plane, meanwhile, in the same hand gesture, the 3D distance calculated from different shooting angles is not changed, but the distance on the projected image is not the same.

Intercepting the whole time sequence of ten times of finger flapping, and obtaining a distance vector D (D)₁，d₂，…，d_T)^TPerforming discrete Fourier transform to obtain fixed-length amplitude response vector F ═ F (F) at different frequencies₁，f₂，…，f_N)^TEach component of the vector F represents the magnitude of the distance vector D at that frequency. When the tapping motion of the tester is more normal, the vector F will have a larger magnitude response in a certain component, as shown in FIG. 10 (b).

And S6, classifying the extracted hand motion characteristics by using an XGboost classifier, and giving a corresponding grading result.

S61, training the XGboost classifier by using a training sample set, wherein each training sample comprises hand motion characteristics of a hand motion sequence, and a label of the training sample is the grade of an expert on the sequence according to a comprehensive Parkinson disease rating scale.

(1) And carrying out data cleaning operation on the collected grading result of the real Parkinson disease sample.

And the collected real Parkinson disease sample scoring result is subjected to data cleaning operation by using low-rank decomposition, so that the uncertainty and noise influence in a diagnosis conclusion of a medical expert are overcome, the subjectivity of the medical expert in the scoring process is eliminated, and the scoring result is more accurate and objective.

Firstly, the collected hand motion sequences of the Parkinson disease patients are submitted to different medical experts for grading, grading results of different actions of different experts are recorded by a matrix G, the number of columns of the matrix G is the number of the experts, and rows of the matrix G represent different actions. And performing low-rank decomposition on the matrix G by adopting a method such as robust principal component analysis (RobustPCA):

G＝A+E

the matrix a is a low rank matrix (correlation between columns is strong) obtained after decomposition, the matrix E is a noise matrix (generally sparse), and the final scoring result is determined through the matrix a.

XGboost is one of Boosting algorithms, and the basic idea of Boosting algorithm is to integrate many weak classifiers together to form a strong classifier. Because the XGBoost is a lifting tree model, it integrates many tree models, one leaf node of each tree corresponds to one score, and finally, only the score corresponding to each tree needs to be added up to be the predicted value of the sample.

S62, predicting the hand characteristics of the hand motion sequence of the tested person by using the trained XGboost classifier, and giving a grading result of the corresponding hand motion of the Parkinson patient.

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A depth image-based Parkinson hand motion quantization analysis method is characterized by comprising the following steps:

s6, classifying the extracted hand motion characteristics by using a trained XGboost classifier, and giving a corresponding grading result;

step S4 includes the following steps:

s41, calculating the length proportion of the hand bones of the tested person

The specific calculation method of the beta is as follows:

wherein j ═ 4+ k (i-1), B_i，kRepresenting the actual length of the kth skeleton of the ith finger, i ∈ {1, 2, 3, 4, 5} respectively representing the thumb, index finger, middle finger, ring finger, little finger, k ∈ {1, 2, 3, 4} respectively representing the 4-membered skeleton from the wrist joint point to the fingertip joint point, B_totalRepresents the sum of the lengths of all bones;

s42, establishing an optimized objective function E according to time sequence information between continuous frames_T(X) adding a temporal smoothing constraint between successive frames, X representing the 3D coordinates of 21 joint points in all frames; optimizing an objective function E_T(X) the calculation formula is as follows:

where ρ is a robust Huber error function,

representing the degree of freedom, λ, calculated from the t-th frame hand joint point₁、λ₂Is a set hyper-parameter;

s43, establishing an optimized objective function E according to the priori knowledge of the hand gesture_P(X, beta), adding constraints of hand posture prior and position preference; optimizing an objective function E_PThe formula for (X, β) is as follows:

wherein E is_J(theta) is a negative log-likelihood function of the Gaussian mixture model, P represents the number of Gaussian models, g_pWeight, μ, representing the p-th Gaussian model_p，∑_pMeans and covariance, λ, representing the p-th Gaussian model_I、λ_βFor a set hyper-parameter, superscript-indicates the prediction result of the corresponding variable, topMark t represents the corresponding t frame;

s44, carrying out integral optimization on the 3D coordinates of the hand joint points of all frames in the video and the parameter beta, so that an optimization objective function E (X, beta) is E_T(X)+E_P(X, β) is minimal.

2. The method of claim 1, wherein the step S2 includes the steps of:

3. The method according to claim 1, wherein step S5 is specifically: when the motion is designated as finger-tapping, the whole time series of ten finger-tapping times is intercepted, and the calculated distance vector D is equal to (D)₁，d₂，...，d_T)^TPerforming discrete Fourier transform to obtain fixed-length amplitude response vector F ═ F (F) at different frequencies₁，f₂，...，f_N)^TEach component of vector F represents the magnitude of the magnitude at that frequency of distance vector D, where the actual physical distance of the thumb tip and index finger tip in frame t

4. The method of claim 1, wherein the XGBoost classifier in step S6 is trained by:

5. A depth image based parkinsonian hand motion quantification analysis system, the system comprising:

the XGboost classifier is used for classifying the extracted hand motion characteristics and giving a corresponding grading result;

the global optimization module performs global optimization on the 3D coordinates of all hand joint points of all frames by:

(1) calculating the length ratio of the hand skeleton of the tested person

The specific calculation method of the beta is as follows:

(2) establishing an optimized objective function E according to time sequence information between continuous frames_T(X) adding a temporal smoothing constraint between successive frames, X representing the 3D coordinates of 21 joint points in all frames; optimizing an objective function E_T(X) the calculation formula is as follows:

where ρ is a robust Huber error function,

(3) establishing an optimized objective function E according to the priori knowledge of the hand gesture_P(X, beta), adding constraints of hand posture prior and position preference; optimizing an objective function E_PThe formula for (X, β) is as follows:

wherein E is_J(theta) is a negative log-likelihood function of the Gaussian mixture model, P represents the number of Gaussian models, g_pWeight, μ, representing the p-th Gaussian model_p，∑_pMeans and covariance, λ, representing the p-th Gaussian model_I、λ_βFor the set hyper-parameter, the superscript represents the prediction result of the corresponding variable, and the superscript t represents the corresponding t-th frame;

(4) performing overall optimization on the 3D coordinates of the hand joint points and the parameter beta of all frames in the video, so that an optimization objective function E (X, beta) is E_T(X)+E_P(X, β) is minimal.

6. A computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when executed by a processor, the computer program implements the depth image-based Parkinson hand motion quantification analysis method according to any one of claims 1 to 4.