CN110991268B - Depth image-based Parkinson hand motion quantization analysis method and system - Google Patents

Depth image-based Parkinson hand motion quantization analysis method and system Download PDF

Info

Publication number
CN110991268B
CN110991268B CN201911110171.9A CN201911110171A CN110991268B CN 110991268 B CN110991268 B CN 110991268B CN 201911110171 A CN201911110171 A CN 201911110171A CN 110991268 B CN110991268 B CN 110991268B
Authority
CN
China
Prior art keywords
hand
coordinates
frame
depth image
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911110171.9A
Other languages
Chinese (zh)
Other versions
CN110991268A (en
Inventor
曹治国
于泰东
肖阳
綦浩喆
张博深
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN201911110171.9A priority Critical patent/CN110991268B/en
Publication of CN110991268A publication Critical patent/CN110991268A/en
Application granted granted Critical
Publication of CN110991268B publication Critical patent/CN110991268B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/66Analysis of geometric attributes of image moments or centre of gravity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing

Abstract

The invention discloses a depth image-based Parkinson hand motion quantization analysis method and system, and belongs to the field of computer vision and machine learning. The method comprises the following steps: the method comprises the steps that a detected person makes hand motions according to the requirements of a Parkinson disease rating scale, multiple frames of depth images of the detected person are obtained in the period, and 3D coordinates of the mass center of the hand in each frame of depth image are identified; segmenting the hand point cloud and the noise point cloud of each frame of depth image according to the 3D coordinates of the mass center of the hand; predicting the 3D coordinates of each joint point of the hand in each frame of depth image based on the point cloud of the hand in the single frame of depth image; according to time sequence information between continuous frames and priori knowledge of hand postures, integrally optimizing 3D coordinates of all hand joint points of all frames; extracting hand motion characteristics according to the optimized joint point 3D coordinates; and classifying the extracted hand motion characteristics by using a trained XGboost classifier, and giving a corresponding grading result.

Description

Depth image-based Parkinson hand motion quantization analysis method and system
Technical Field
The invention belongs to the field of computer vision and machine learning, and particularly relates to a depth image-based Parkinson hand motion amount analysis method and system.
Background
Early Parkinson's disease symptoms are not obvious, and a series of judgments are needed in the traditional diagnosis mode. One of the key elements is to make a decision based on the completion status of a series of specified actions. One of the traditional diagnostic modalities is the individual assessment of patients by the parkinson rating scale UPDRS. The physician instructs the patient to perform the activity according to the parkinson rating scale and then scores the patient item by item, perhaps taking 30 minutes or more, depending on the patient's completion. The period may be hurt by self-suspicion and the physician's speech from the psychological factors of physical coordination, stress, etc. The diagnosis mode depends on language communication, the time cost is high, the scoring mainly depends on visual observation of doctors, such as the distance, the amplitude, the frequency and the like of actions, quantitative indexes are lacked, and deviation caused by subjectivity can occur.
The prior art analysis of parkinson hand motion is mainly divided into two categories: 1 is to analyze the three-dimensional position change of the center of the hand using the color and depth images as input. In patent CN105701806A, a detected person wears pure color gloves, the detected person manually selects the approximate position of the hand of the detected person in the picture, obtains color image and depth image information simultaneously through Kinect, identifies and positions the hand position through color image, and obtains spatial position information through depth image conversion. And in the identification process, the color gloves are used for marking, the color filtering technology and the region growing algorithm are combined to complete the identification, each frame is predicted according to the identification result of the previous frame, and the identification is completed after the reference point is determined. In the conversion process, the identification result needs to be converted from two-dimensional coordinates in the color image into three-dimensional coordinates of the hand position in the depth image, four vertexes of the upper left, the lower left, the upper right, the lower right and the center point are selected for recording, the four vertexes represent the plane of the palm when the hand moves in each frame, the center point represents the hand space motion track and the tremor condition, the data file is processed, and the period information is obtained through analysis and fitting. And 2, positioning the two-dimensional plane position of the hand joint point by using the color image as input, and further realizing the evaluation of the hand motion. A new technology for AI-assisted diagnosis of Parkinson's disease is introduced in an Tencent medical artificial intelligence laboratory, and the evaluation of a Parkinson rating scale UPDRS is automatically realized aiming at the motion video of a Parkinson's disease person based on a motion video analysis technology without a wearable sensor. Briefly, the user does not need to wear any sensor, only needs to shoot through a camera (common smart phone can meet the requirements), does simple actions of some Parkinson rating scales, such as stretching a palm and making a fist, rotating hands and other actions, can identify key nodes of body parts in a motion video through the system, quantitatively analyzes action indexes, and completes a diagnosis process.
However, the former technique only analyzes the three-dimensional position change of the central point of the hand and fails to evaluate other motion characteristics of the hand; the latter technique utilizes information of two-dimensional joint points, cannot reflect three-dimensional motion characteristics of the hand in a physical space, and has poor evaluation accuracy. Both techniques use color images and do not protect the privacy of the patient.
Disclosure of Invention
Aiming at the problems that the privacy of a patient cannot be protected and the motion evaluation accuracy is poor in the prior art, the invention provides a depth image-based method and a depth image-based system for analyzing the motion amount of a Parkinson hand, and aims to effectively protect the privacy of the patient based on the analysis of a depth image; the hand motion is quantitatively analyzed based on the three-dimensional motion of the joint points in the physical space, and the 3D coordinates of all the hand joint points of all the frames are integrally optimized according to the time sequence information between the continuous frames and the priori knowledge of the hand postures, so that the evaluation accuracy is improved.
To achieve the above object, according to a first aspect of the present invention, there is provided a depth image-based parkinson hand motion quantization method, comprising:
s1, a detected person makes a hand action according to the requirements of a Parkinson disease rating scale, a plurality of frames of depth images of the detected person are obtained in the period, and 3D coordinates of a hand mass center in each frame of depth image are identified;
s2, segmenting hand point clouds and noise point clouds of each frame of depth image according to the 3D coordinates of the mass center of the hand;
s3, predicting the 3D coordinates of all joint points of the hand in each frame of depth image based on the hand point cloud in the single frame of depth image;
s4, integrally optimizing the 3D coordinates of all hand joint points of all frames according to time sequence information between continuous frames and priori knowledge of hand postures;
s5, extracting hand motion characteristics according to the motion characteristics of hand motions of the detected person according to the comprehensive Parkinson disease rating scale and the optimized joint point 3D coordinates;
and S6, classifying the extracted hand motion characteristics by using the trained XGboost classifier, and giving a corresponding grading result.
Specifically, step S2 includes the steps of:
s21, converting all pixel points with depth values in each frame of depth image into 3D coordinates in space;
s22, defining a 3D target frame with a hand mass center as a center, removing point clouds outside the target frame as noise point clouds, and keeping the point clouds inside the target frame as hand point clouds.
Specifically, step S4 includes the steps of:
s41, calculating the length proportion of the hand bones of the tested person
Figure BDA0002271171910000031
S42, establishing an optimized objective function E according to time sequence information between continuous framesT(X, β) adding temporal smoothing constraints between successive frames, X representing the 3D coordinates of 21 joint points in all frames;
s43, establishing an optimized objective function E according to the priori knowledge of the hand gestureP(X, beta), adding constraints of hand posture prior and position preference;
s44, carrying out integral optimization on the 3D coordinates of the hand joint points of all frames in the video and the parameter beta, so that an optimization objective function E (X, beta) is ET(X,β)+EP(X, β) is minimal.
Specifically, the specific calculation method of β is:
Figure BDA0002271171910000032
Figure BDA0002271171910000041
wherein j ═ 4+ k (i-1), Bi,kRepresenting the actual length of the kth skeleton of the ith finger, i ∈ {1, 2, 3, 4, 5} respectively representing the thumb, index finger, middle finger, ring finger, little finger, k ∈ {1, 2, 3, 4} respectively representing the 4-membered skeleton from the wrist joint point to the fingertip joint point, BtotalRepresenting the sum of the lengths of all bones.
In particular, the objective function E is optimizedTThe formula for (X, β) is as follows:
Figure BDA0002271171910000042
where ρ is a robust Huber error function,
Figure BDA0002271171910000043
3D coordinates representing the ith joint point of the tth frame, T representing the number of video frames, J representing the number of defined hand joint points,
Figure BDA0002271171910000044
representing the degree of freedom, λ, calculated from the t-th frame hand joint point1、λ2Is a set hyper-parameter.
In particular, the objective function E is optimizedPThe formula for (X, β) is as follows:
Figure BDA0002271171910000045
Figure BDA0002271171910000046
Figure BDA0002271171910000047
wherein T represents the number of video frames, J represents the number of defined hand joint points,
Figure BDA0002271171910000048
representing degrees of freedom calculated from the t-th frame hand joint points, EJ(theta) is a negative log-likelihood function of the Gaussian mixture model, P represents the number of Gaussian models, gpWeight, μ, representing the p-th Gaussian modelp,∑pMeans and covariance, λ, representing the p-th Gaussian modelI、λβIn order to set the hyper-parameters,
Figure BDA0002271171910000049
and 3D coordinates of the ith joint point of the tth frame are shown, and the superscript indicates the prediction result of the corresponding variable.
Specifically, step S5 specifically includes: when the motion is designated as finger-tapping, the whole time series of ten finger-tapping times is intercepted, and the calculated distance vector D is equal to (D)1,d2,….,dT)TPerforming discrete Fourier transform to obtain fixed-length amplitude response vector F ═ F (F) at different frequencies1,f2,…,fN)TEach component of vector F represents the magnitude of the magnitude at that frequency of distance vector D, where the actual physical distance of the thumb tip and index finger tip in frame t
Figure BDA0002271171910000051
Figure BDA0002271171910000052
Respectively representing the 3D coordinates of the thumb and forefinger finger joint points predicted in the T frame, and T representing the video frame number.
Specifically, the XGBoost classifier in step S6 is trained by:
(1) carrying out data cleaning operation on the collected grading result of the real Parkinson disease sample by using low-rank decomposition;
(2) and training the XGboost classifier by using different hand motion characteristics in the obtained scale and corresponding expert scoring results after data cleaning.
To achieve the above object, according to a second aspect of the present invention, there is provided a depth image-based parkinson hand motion quantization system, comprising:
the depth image acquisition module is used for acquiring multi-frame depth images of the detected person and identifying 3D coordinates of a hand centroid in each frame of depth image during the period that the detected person makes hand motions according to the requirements of the Parkinson disease rating scale;
the hand point cloud segmentation module is used for segmenting hand point clouds and noise point clouds of each frame of depth image according to the 3D coordinates of the mass center of the hand;
the joint point prediction module is used for predicting the 3D coordinates of each joint point of the hand in each frame of depth image based on the hand point cloud in the single frame of depth image;
the integral optimization module is used for integrally optimizing the 3D coordinates of all hand joint points of all frames according to the time sequence information between the continuous frames and the priori knowledge of the hand postures;
the hand motion characteristic extraction module is used for extracting hand motion characteristics according to the motion characteristics of hand motions of the detected person according to the comprehensive Parkinson disease rating scale and the optimized joint point 3D coordinates;
and the XGboost classifier is used for classifying the extracted hand motion characteristics and giving a corresponding grading result.
To achieve the above object, according to a third aspect of the present invention, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the depth-image-based parkinson hand motion quantization analysis method according to the first aspect.
Generally, by the above technical solution conceived by the present invention, the following beneficial effects can be obtained:
(1) aiming at the problem that the privacy of a patient is exposed due to the adoption of color images in the prior art, the method obtains the multi-frame depth image of the detected person, and the depth image can not accurately distinguish the identity of the person, so that the privacy of the patient can be effectively protected; the distance between each point of the visible surface of the hand and the camera can be obtained through the depth image, and the space structure and the actual motion characteristic of the hand can be captured.
(2) The hand motion characteristics are extracted by utilizing the 3D coordinates of all joint points of the hand, the extracted motion characteristics can reflect the motion change of the hand in a real physical space, the hand motion characteristics have actual physical meanings, and the hand motion quantitative analysis accuracy can be effectively improved.
(3) According to the invention, the 3D coordinates of all hand joint points of all frames are optimized integrally according to the time sequence information between the continuous frames and the prior knowledge of the hand postures. In the optimization process, the proportional relation between the prior of the hand freedom degree and the skeleton length is considered, the hand posture of a single frame is accurately and reliably analyzed, meanwhile, the joint point shaking phenomenon caused by inter-frame errors can be well smoothed, and the quantitative analysis accuracy of hand motion is effectively improved.
Drawings
Fig. 1 is a flowchart of a parkinson hand motion quantization analysis method based on depth images according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a hand movement sample collection process according to an embodiment of the present invention;
FIG. 3 is a collected depth image of hand movements of a Parkinson's disease patient provided by an embodiment of the invention;
FIG. 4 is a prior art color image corresponding to FIG. 3;
fig. 5 is an imaging result of the segmented hand point cloud on the 2D depth image according to the embodiment of the present invention;
FIG. 6 is a schematic diagram of a hand point cloud in a 3D space according to an embodiment of the present invention;
FIG. 7 is a projection result of predicted 3D joint coordinates on a 2D image according to an embodiment of the present invention;
FIG. 8 is a schematic diagram illustrating a hand skeleton definition according to an embodiment of the present invention;
FIG. 9 is a schematic diagram of hand joint and degree of freedom distribution provided by an embodiment of the present invention;
FIG. 10(a) is a diagram illustrating the distance between the tips of the thumb and the index finger for a hand-clapping action according to an embodiment of the present inventionFrom dtChange schematic diagram with frame number t;
FIG. 10(b) is a graph showing a fixed-length amplitude response for different frequencies of a hand clapping motion according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
As shown in fig. 1, the present invention provides a depth image-based parkinson hand motion quantization analysis method, which includes the following steps:
s1, the detected person makes hand motions according to requirements of the Parkinson disease rating scale, multi-frame depth images of the detected person are obtained in the period, and 3D coordinates of a hand centroid in each frame of depth image are identified.
A depth camera, such as Intel SR300, is utilized to acquire three-dimensional imaging information of hand movement of a Parkinson patient, the acquisition process is shown in figure 2, the Parkinson patient is right opposite to the camera, and hand movement in an MDS-UPDRS comprehensive Parkinson disease rating scale is performed within the effective imaging range of the camera according to requirements. The acquired depth image is shown in fig. 3, and the palm of the patient is preferably over against the camera in the image acquisition process, so that the hand imaging quality is better. Compared with the color image shown in fig. 4, the depth image cannot accurately distinguish the identity of an individual, and the privacy of a patient can be effectively protected. And the distance from each point of the visible surface of the hand to the camera can be obtained by the depth image, so that the space structure and the actual motion characteristic of the hand can be captured.
When the depth image is acquired, the 3D coordinates of the centroid of the hand in each frame of depth image under the camera coordinate system are obtained by using a self-contained tracking detection algorithm of a corresponding software development kit (such as IntelRealSenseSDK2.0) or other image algorithms such as a threshold segmentation and position correction method adopted in DeepPrior + +.
And S2, segmenting the hand point cloud and the noise point cloud of each frame of depth image according to the 3D coordinates of the hand centroid.
And defining a physical space range according to the actual size of the hand by using the 3D coordinates of the hand center under a camera coordinate system, and deleting noise point clouds of the point clouds outside the range and not belonging to the hand.
S21, converting all pixel points (u, v, D) with depth values in each frame of depth image into 3D coordinates (x, y, z) in a space by using camera parameters, wherein u and v represent 2D coordinates of the pixel points in an image plane, and D represents the depth values corresponding to the pixel points; x, y, z represent the 3D coordinates of the point cloud in the camera coordinate system.
Under the premise of not considering the imaging distortion of the camera, the conversion formula is as follows:
Figure BDA0002271171910000081
wherein ppx and ppy are the horizontal and vertical coordinates of the optical center of the camera in the image, fx、fyRespectively, the focal length of the camera in the direction of the X, Y axis.
And S22, segmenting the hand point cloud from the converted 3D space based on the 3D coordinates of the mass center of the hand.
Because the hand can appear various forms in the motion process, the palm center can not be ensured to be always over against the camera. In order to segment clean hand point cloud in each frame of image in a 3D space, a 3D target frame taking the mass center of the hand as the center is defined on the basis of the acquired 3D coordinates of the mass center of the hand, the point cloud outside the target frame is removed as noise point cloud, and the interference caused by the noise point cloud in the subsequent steps is removed. In the embodiment, the size of the 3D target block is preferably a three-dimensional frame with the side length of 20-30 cm. The imaging result of the segmented clean hand point cloud on the 2D depth image is shown in fig. 5, fig. 6 is a schematic diagram of the hand point cloud in a 3D space, and the corresponding coordinates are actual coordinates of the point cloud in a camera coordinate system, and the unit is millimeters.
And S3, predicting the 3D coordinates of each joint point of the hand in each frame of depth image based on the hand point cloud in the single frame of depth image.
Step S3 includes the following steps:
and S31, training the 3D hand posture estimation neural network by using the hand data set with the hand joint point labels to obtain the trained 3D hand posture estimation neural network.
In order to enable the prediction result of the 3D hand posture estimation method to be more accurate and the generalization capability of the model to be strong enough, the method utilizes the related hand database to train the neural network, such as the HANDS2017 data set, so that the model can learn enough hand postures. The data set should contain different hand samples as much as possible, the coverage range is wide in the aspects of hand size, posture, shape, visual angle and the like, and the accuracy and robustness of model prediction can be improved by training the model by using the data set.
The 3D hand pose estimation neural network includes but is not limited to: 3DCNNs or V2V-PoseNet method based on 3D convolution, DeepPrior + + method based on 2D convolution and HandPointNet method based on point cloud.
And S32, predicting the 3D coordinates of the hand joint points by using the trained 3D hand posture estimation neural network for the hand point cloud in the single-frame depth image.
The projection result of the predicted 3D joint coordinates on the 2D image is shown in fig. 7.
And S4, integrally optimizing the 3D coordinates of all hand joint points of all frames according to the time sequence information between the continuous frames and the priori knowledge of the hand postures.
In the optimization process, the proportional relation between the prior of the hand degree of freedom and the skeleton length is considered, the hand posture of a single frame is accurately and reliably analyzed, and meanwhile, the good smoothing effect on the joint point shaking phenomenon caused by inter-frame errors can be achieved.
Step S4 includes the following steps:
s41, calculating the length proportion beta of the hand bones of the tested person.
Due to the fact thatThe gesture of the same hand is analyzed in the whole sequence, the lengths and the proportions of the hand bones of different people are different, the lengths of the hand bones of the same person are kept unchanged, and the single hand bone is used in the invention in the above error terms
Figure BDA0002271171910000101
The method for calculating the length ratio of the hand bones is characterized in that the length ratio of the hand bones is subjected to consistency constraint, and the specific calculation method of the beta comprises the following steps:
Figure BDA0002271171910000102
Figure BDA0002271171910000103
wherein j ═ 4+ k (i-1), Bi,kRepresenting the actual length of the kth skeleton of the ith finger, i ∈ {1, 2, 3, 4, 5} respectively representing the thumb, index finger, middle finger, ring finger, little finger, k ∈ {1, 2, 3, 4} respectively representing the 4-membered skeleton from the wrist joint point to the fingertip joint point, BtotalRepresenting the sum of the lengths of all bones.
S42, establishing an optimized objective function E according to time sequence information between continuous framesT(X, β), adding temporal smoothing constraints between successive frames.
Since the prediction result of the 3D hand pose estimation method is based on the joint coordinates of a single depth map, joint information of previous and subsequent frames is not considered. In order to make an overall and effective judgment on the single motion sequence of the Parkinson disease patient, the invention smoothly constrains the changes of the 3D positions of the joint points and the freedom degree of hands between the continuous frames, so that the joint points are more consistent with the characteristics of actual motion, the invention is favorable for eliminating the jitter phenomenon caused by the prediction error of the joint points of each frame, and the overall prediction error is further reduced.
Figure BDA0002271171910000104
Wherein X represents the 3D coordinates of 21 joint points in all frames,
Figure BDA0002271171910000111
representing a defined hand skeleton scale parameter, p is a robust Huber error function,
Figure BDA0002271171910000112
the 3D coordinates representing the ith joint point in the tth frame, T ∈ {1, 2, 3, …, T }, T being the number of video frames, i ∈ {1, 2, 3, …, J }, J ═ 21, representing the number of defined hand joint points.
Figure BDA0002271171910000113
Representing the degrees of freedom calculated from the hand joint points. Lambda [ alpha ]1、λ2To set hyper-parameters, ETThe smaller (X, β) represents the position of the joint points in 3D space, and the angular transformation between joint points is smoother.
As shown in fig. 9, a total of 21 degrees of freedom excluding 6 degrees of freedom in relative space of the wrist joint is obtained as follows:
(a) index finger, middle finger, ring finger, little finger (4 degrees of freedom each, 16 degrees of freedom in total):
1 stDoF: the opening and closing between the distal phalanx and the middle phalanx;
2 ndDoF: the middle phalanx and the proximal phalanx are opened and closed;
3 rdDoF: the opening and closing between the proximal phalanx and the metacarpal bone;
4 thDoF: abduction and adduction between the proximal phalanx and the metacarpal bone;
(b) thumb (5 degrees of freedom total):
1 stDoF: the opening and closing between the distal phalanx and the proximal phalanx;
2 ndDoF: the opening and closing between the proximal phalanx and the metacarpal bone;
3 rdDoF: abduction-adduction between the proximal phalanx and the metacarpal bone;
4 thDoF: the opening and closing between the metacarpal bone and the trapezium bone;
5 thDoF: abduction and adduction between the metacarpal bone and the trapezium bone.
S43, establishing an optimized objective function E according to the priori knowledge of the hand gestureP(X, β), adding constraints of hand pose priors and position preferences.
In the whole pose optimization, a priori knowledge about hand poses is added, and penalty is given to hand poses which are unlikely to occur from physiological structures and possibly occur in certain frames, wherein the hand poses comprise joint angles and bone length proportion which are unlikely to occur.
Figure BDA0002271171910000121
Figure BDA0002271171910000122
Figure BDA0002271171910000123
Where T represents the number of video frames,
Figure BDA0002271171910000124
representing degrees of freedom calculated from the t-th frame hand joint points, EJ(theta) is a negative log-likelihood function of the Gaussian mixture model, J represents the number of defined hand joint points, P represents the number of Gaussian models, gpWeight, μ, representing the p-th Gaussian modelp,∑pMean and covariance, λ, representing the p-th Gaussian modelI、λβIn order to set the hyper-parameters,
Figure BDA0002271171910000125
and 3D coordinates of the ith joint point of the tth frame are shown, and the superscript indicates the prediction result of the corresponding variable. EPThe smaller (X, β) is, the closer the position of the obtained joint point is to the possible posture of the hand.
EI(X,Beta) is the last to encourage final optimization results and per-frame prediction results
Figure BDA0002271171910000126
As close as possible, the latter term is intended to give the skeleton proportion of the hand in all frames
Figure BDA0002271171910000127
Keeping the same as much as possible.
S44, carrying out integral optimization on the 3D coordinates of the hand joint points of all frames in the video and the parameter beta, so that an optimization objective function E (X, beta) is ET(X,β)+EP(X, β) is minimal.
And optimizing the 3D coordinates of the hand joint points and the parameter beta of all frames in the video by using an L-BFGS algorithm, and using the prediction result of a single frame and the average value of all frame parameters beta as initial values, wherein the total number of the parameters is 20+63F to be optimized, wherein F represents the number of frames in the video. The algorithm has the advantages of high convergence rate, low memory overhead and the like.
And S5, extracting hand motion characteristics according to the motion characteristics of hand motions of the detected person according to the comprehensive Parkinson disease rating scale and the optimized joint point 3D coordinates.
For example, in the assessment of the term two-handed postural tremor, the present invention may use the absolute distance traveled by the joint points to score. In the evaluation of the finger-clapping action, the tester was asked to open the thumb and forefinger as much as possible and clap ten times at the fastest speed. For the finger flapping of the hand, after the 3D coordinates of the optimized joint points are obtained, the opening and the combination of the finger tips can be characterized by calculating the actual physical distance between the finger tips of the thumb and the finger tips of the index finger:
Figure BDA0002271171910000131
wherein the content of the first and second substances,
Figure BDA0002271171910000132
to representPredicted 3D coordinates of the joints of the thumb and index finger tips in the t-th frame, and FIG. 10(a) shows the distance D between the thumb and index finger tipstThe diagram varies with the number of frames t.
The 3D physical distance can reflect the motion change of the hand in the real physical space and has actual physical meaning. The 2D joint points are coordinates of the reflected hand joint points in the image, the distance refers to the pixel distance (not the real physical distance) between two points in the image plane, meanwhile, in the same hand gesture, the 3D distance calculated from different shooting angles is not changed, but the distance on the projected image is not the same.
Intercepting the whole time sequence of ten times of finger flapping, and obtaining a distance vector D (D)1,d2,…,dT)TPerforming discrete Fourier transform to obtain fixed-length amplitude response vector F ═ F (F) at different frequencies1,f2,…,fN)TEach component of the vector F represents the magnitude of the distance vector D at that frequency. When the tapping motion of the tester is more normal, the vector F will have a larger magnitude response in a certain component, as shown in FIG. 10 (b).
And S6, classifying the extracted hand motion characteristics by using an XGboost classifier, and giving a corresponding grading result.
S61, training the XGboost classifier by using a training sample set, wherein each training sample comprises hand motion characteristics of a hand motion sequence, and a label of the training sample is the grade of an expert on the sequence according to a comprehensive Parkinson disease rating scale.
(1) And carrying out data cleaning operation on the collected grading result of the real Parkinson disease sample.
And the collected real Parkinson disease sample scoring result is subjected to data cleaning operation by using low-rank decomposition, so that the uncertainty and noise influence in a diagnosis conclusion of a medical expert are overcome, the subjectivity of the medical expert in the scoring process is eliminated, and the scoring result is more accurate and objective.
Firstly, the collected hand motion sequences of the Parkinson disease patients are submitted to different medical experts for grading, grading results of different actions of different experts are recorded by a matrix G, the number of columns of the matrix G is the number of the experts, and rows of the matrix G represent different actions. And performing low-rank decomposition on the matrix G by adopting a method such as robust principal component analysis (RobustPCA):
G=A+E
the matrix a is a low rank matrix (correlation between columns is strong) obtained after decomposition, the matrix E is a noise matrix (generally sparse), and the final scoring result is determined through the matrix a.
(2) And training the XGboost classifier by using different hand motion characteristics in the obtained scale and corresponding expert scoring results after data cleaning.
XGboost is one of Boosting algorithms, and the basic idea of Boosting algorithm is to integrate many weak classifiers together to form a strong classifier. Because the XGBoost is a lifting tree model, it integrates many tree models, one leaf node of each tree corresponds to one score, and finally, only the score corresponding to each tree needs to be added up to be the predicted value of the sample.
S62, predicting the hand characteristics of the hand motion sequence of the tested person by using the trained XGboost classifier, and giving a grading result of the corresponding hand motion of the Parkinson patient.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (6)

1. A depth image-based Parkinson hand motion quantization analysis method is characterized by comprising the following steps:
s1, a detected person makes a hand action according to the requirements of a Parkinson disease rating scale, a plurality of frames of depth images of the detected person are obtained in the period, and 3D coordinates of a hand mass center in each frame of depth image are identified;
s2, segmenting hand point clouds and noise point clouds of each frame of depth image according to the 3D coordinates of the mass center of the hand;
s3, predicting the 3D coordinates of all joint points of the hand in each frame of depth image based on the hand point cloud in the single frame of depth image;
s4, integrally optimizing the 3D coordinates of all hand joint points of all frames according to time sequence information between continuous frames and priori knowledge of hand postures;
s5, extracting hand motion characteristics according to the motion characteristics of hand motions of the detected person according to the comprehensive Parkinson disease rating scale and the optimized joint point 3D coordinates;
s6, classifying the extracted hand motion characteristics by using a trained XGboost classifier, and giving a corresponding grading result;
step S4 includes the following steps:
s41, calculating the length proportion of the hand bones of the tested person
Figure FDA0003508034450000011
The specific calculation method of the beta is as follows:
Figure FDA0003508034450000012
Figure FDA0003508034450000013
wherein j ═ 4+ k (i-1), Bi,kRepresenting the actual length of the kth skeleton of the ith finger, i ∈ {1, 2, 3, 4, 5} respectively representing the thumb, index finger, middle finger, ring finger, little finger, k ∈ {1, 2, 3, 4} respectively representing the 4-membered skeleton from the wrist joint point to the fingertip joint point, BtotalRepresents the sum of the lengths of all bones;
s42, establishing an optimized objective function E according to time sequence information between continuous framesT(X) adding a temporal smoothing constraint between successive frames, X representing the 3D coordinates of 21 joint points in all frames; optimizing an objective function ET(X) the calculation formula is as follows:
Figure FDA0003508034450000021
where ρ is a robust Huber error function,
Figure FDA0003508034450000022
3D coordinates representing the ith joint point of the tth frame, T representing the number of video frames, J representing the number of defined hand joint points,
Figure FDA0003508034450000023
representing the degree of freedom, λ, calculated from the t-th frame hand joint point1、λ2Is a set hyper-parameter;
s43, establishing an optimized objective function E according to the priori knowledge of the hand gestureP(X, beta), adding constraints of hand posture prior and position preference; optimizing an objective function EPThe formula for (X, β) is as follows:
Figure FDA0003508034450000024
Figure FDA0003508034450000025
Figure FDA0003508034450000026
wherein E isJ(theta) is a negative log-likelihood function of the Gaussian mixture model, P represents the number of Gaussian models, gpWeight, μ, representing the p-th Gaussian modelp,∑pMeans and covariance, λ, representing the p-th Gaussian modelI、λβFor a set hyper-parameter, superscript-indicates the prediction result of the corresponding variable, topMark t represents the corresponding t frame;
s44, carrying out integral optimization on the 3D coordinates of the hand joint points of all frames in the video and the parameter beta, so that an optimization objective function E (X, beta) is ET(X)+EP(X, β) is minimal.
2. The method of claim 1, wherein the step S2 includes the steps of:
s21, converting all pixel points with depth values in each frame of depth image into 3D coordinates in space;
s22, defining a 3D target frame with a hand mass center as a center, removing point clouds outside the target frame as noise point clouds, and keeping the point clouds inside the target frame as hand point clouds.
3. The method according to claim 1, wherein step S5 is specifically: when the motion is designated as finger-tapping, the whole time series of ten finger-tapping times is intercepted, and the calculated distance vector D is equal to (D)1,d2,...,dT)TPerforming discrete Fourier transform to obtain fixed-length amplitude response vector F ═ F (F) at different frequencies1,f2,...,fN)TEach component of vector F represents the magnitude of the magnitude at that frequency of distance vector D, where the actual physical distance of the thumb tip and index finger tip in frame t
Figure FDA0003508034450000031
Figure FDA0003508034450000032
Respectively representing the 3D coordinates of the thumb and forefinger finger joint points predicted in the T frame, and T representing the video frame number.
4. The method of claim 1, wherein the XGBoost classifier in step S6 is trained by:
(1) carrying out data cleaning operation on the collected grading result of the real Parkinson disease sample by using low-rank decomposition;
(2) and training the XGboost classifier by using different hand motion characteristics in the obtained scale and corresponding expert scoring results after data cleaning.
5. A depth image based parkinsonian hand motion quantification analysis system, the system comprising:
the depth image acquisition module is used for acquiring multi-frame depth images of the detected person and identifying 3D coordinates of a hand centroid in each frame of depth image during the period that the detected person makes hand motions according to the requirements of the Parkinson disease rating scale;
the hand point cloud segmentation module is used for segmenting hand point clouds and noise point clouds of each frame of depth image according to the 3D coordinates of the mass center of the hand;
the joint point prediction module is used for predicting the 3D coordinates of each joint point of the hand in each frame of depth image based on the hand point cloud in the single frame of depth image;
the integral optimization module is used for integrally optimizing the 3D coordinates of all hand joint points of all frames according to the time sequence information between the continuous frames and the priori knowledge of the hand postures;
the hand motion characteristic extraction module is used for extracting hand motion characteristics according to the motion characteristics of hand motions of the detected person according to the comprehensive Parkinson disease rating scale and the optimized joint point 3D coordinates;
the XGboost classifier is used for classifying the extracted hand motion characteristics and giving a corresponding grading result;
the global optimization module performs global optimization on the 3D coordinates of all hand joint points of all frames by:
(1) calculating the length ratio of the hand skeleton of the tested person
Figure FDA0003508034450000041
The specific calculation method of the beta is as follows:
Figure FDA0003508034450000051
Figure FDA0003508034450000052
wherein j ═ 4+ k (i-1), Bi,kRepresenting the actual length of the kth skeleton of the ith finger, i ∈ {1, 2, 3, 4, 5} respectively representing the thumb, index finger, middle finger, ring finger, little finger, k ∈ {1, 2, 3, 4} respectively representing the 4-membered skeleton from the wrist joint point to the fingertip joint point, BtotalRepresents the sum of the lengths of all bones;
(2) establishing an optimized objective function E according to time sequence information between continuous framesT(X) adding a temporal smoothing constraint between successive frames, X representing the 3D coordinates of 21 joint points in all frames; optimizing an objective function ET(X) the calculation formula is as follows:
Figure FDA0003508034450000053
where ρ is a robust Huber error function,
Figure FDA0003508034450000054
3D coordinates representing the ith joint point of the tth frame, T representing the number of video frames, J representing the number of defined hand joint points,
Figure FDA0003508034450000055
representing the degree of freedom, λ, calculated from the t-th frame hand joint point1、λ2Is a set hyper-parameter;
(3) establishing an optimized objective function E according to the priori knowledge of the hand gestureP(X, beta), adding constraints of hand posture prior and position preference; optimizing an objective function EPThe formula for (X, β) is as follows:
Figure FDA0003508034450000056
Figure FDA0003508034450000061
Figure FDA0003508034450000062
wherein E isJ(theta) is a negative log-likelihood function of the Gaussian mixture model, P represents the number of Gaussian models, gpWeight, μ, representing the p-th Gaussian modelp,∑pMeans and covariance, λ, representing the p-th Gaussian modelI、λβFor the set hyper-parameter, the superscript represents the prediction result of the corresponding variable, and the superscript t represents the corresponding t-th frame;
(4) performing overall optimization on the 3D coordinates of the hand joint points and the parameter beta of all frames in the video, so that an optimization objective function E (X, beta) is ET(X)+EP(X, β) is minimal.
6. A computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when executed by a processor, the computer program implements the depth image-based Parkinson hand motion quantification analysis method according to any one of claims 1 to 4.
CN201911110171.9A 2019-11-13 2019-11-13 Depth image-based Parkinson hand motion quantization analysis method and system Active CN110991268B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911110171.9A CN110991268B (en) 2019-11-13 2019-11-13 Depth image-based Parkinson hand motion quantization analysis method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911110171.9A CN110991268B (en) 2019-11-13 2019-11-13 Depth image-based Parkinson hand motion quantization analysis method and system

Publications (2)

Publication Number Publication Date
CN110991268A CN110991268A (en) 2020-04-10
CN110991268B true CN110991268B (en) 2022-05-20

Family

ID=70084248

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911110171.9A Active CN110991268B (en) 2019-11-13 2019-11-13 Depth image-based Parkinson hand motion quantization analysis method and system

Country Status (1)

Country Link
CN (1) CN110991268B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112753210A (en) * 2020-04-26 2021-05-04 深圳市大疆创新科技有限公司 Movable platform, control method thereof and storage medium
CN111539941B (en) * 2020-04-27 2022-08-16 上海交通大学 Parkinson's disease leg flexibility task evaluation method and system, storage medium and terminal
CN114264628A (en) * 2021-12-16 2022-04-01 北京航空航天大学 Psoriasis arthritis imaging system based on terahertz spectroscopy imaging
CN115170616B (en) * 2022-09-08 2022-11-18 欣诚信息技术有限公司 Personnel trajectory analysis method, device, terminal and storage medium
CN117084835B (en) * 2023-10-20 2024-03-12 北京大学 Intelligent artificial limb system and control method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101601660B1 (en) * 2014-11-07 2016-03-10 재단법인대구경북과학기술원 Hand part classification method using depth images and apparatus thereof
CN105701806A (en) * 2016-01-11 2016-06-22 上海交通大学 Depth image-based parkinson's tremor motion characteristic detection method and system
CN108960178A (en) * 2018-07-13 2018-12-07 清华大学 A kind of manpower Attitude estimation method and system
CN110223317A (en) * 2019-04-26 2019-09-10 中国矿业大学 A kind of Moving target detection based on image procossing and trajectory predictions method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101601660B1 (en) * 2014-11-07 2016-03-10 재단법인대구경북과학기술원 Hand part classification method using depth images and apparatus thereof
CN105701806A (en) * 2016-01-11 2016-06-22 上海交通大学 Depth image-based parkinson's tremor motion characteristic detection method and system
CN108960178A (en) * 2018-07-13 2018-12-07 清华大学 A kind of manpower Attitude estimation method and system
CN110223317A (en) * 2019-04-26 2019-09-10 中国矿业大学 A kind of Moving target detection based on image procossing and trajectory predictions method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A2J: Anchor-to-Joint Regression Network for 3D Articulated Pose Estimation from a Single Depth Image;Fu Xiong et al.;《Computer Vision and Pattern Recognition》;20190827;1-11 *
基于Kinect的帕金森病步态不对称性识别方法;张幼安;《中国康复理论与实践》;20180725;795-801 *

Also Published As

Publication number Publication date
CN110991268A (en) 2020-04-10

Similar Documents

Publication Publication Date Title
CN110991268B (en) Depth image-based Parkinson hand motion quantization analysis method and system
CN111144217B (en) Motion evaluation method based on human body three-dimensional joint point detection
CN110711374B (en) Multi-modal dance action evaluation method
WO2018120964A1 (en) Posture correction method based on depth information and skeleton information
US8824802B2 (en) Method and system for gesture recognition
CN109598219B (en) Adaptive electrode registration method for robust electromyography control
CN106600626A (en) Three-dimensional human body movement capturing method and system
US11663845B2 (en) Method and apparatus for privacy protected assessment of movement disorder video recordings
CN111460976B (en) Data-driven real-time hand motion assessment method based on RGB video
CN116507276A (en) Method and apparatus for machine learning to analyze musculoskeletal rehabilitation from images
CN106846372B (en) Human motion quality visual analysis and evaluation system and method thereof
Loureiro et al. Using a skeleton gait energy image for pathological gait classification
JP5604249B2 (en) Human body posture estimation device, human body posture estimation method, and computer program
CN111883229A (en) Intelligent movement guidance method and system based on visual AI
CN114550299A (en) System and method for evaluating daily life activity ability of old people based on video
Kumar et al. Human Activity Recognition (HAR) Using Deep Learning: Review, Methodologies, Progress and Future Research Directions
CN113974612A (en) Automatic assessment method and system for upper limb movement function of stroke patient
CN112907635A (en) Method for extracting eye abnormal motion characteristics based on geometric analysis
CN110503056A (en) It is applied to the body action identification method of cognitive function assessment based on AR technology
Nguyen et al. Vision-Based Global Localization of Points of Gaze in Sport Climbing
Guerreiro et al. Detection of Osteoarthritis from Multimodal Hand Data
Leng et al. Fine-grained Human Activity Recognition Using Virtual On-body Acceleration Data
Skurowski et al. Functional body mesh representation, a simplified kinematic model, its inference and applications
CN112102358B (en) Non-invasive animal behavior characteristic observation method
Drory Computer Vision and Machine Learning for Biomechanics Applications: Human Detection, Pose and Shape Estimation and Tracking in Unconstrained Environment from Uncalibrated Images, Videos and Depth

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant