CN111814615A

CN111814615A - Parkinson non-contact intelligent detection method based on instruction video

Info

Publication number: CN111814615A
Application number: CN202010596575.XA
Authority: CN
Inventors: 邹娟; 房海鹏; 陈钢; 曾碧霄; 向懿; 王求真; 郭建强
Original assignee: Xiangtan University
Current assignee: Institute of Computing Technology of CAS; Xiangtan University
Priority date: 2020-06-28
Filing date: 2020-06-28
Publication date: 2020-10-23
Anticipated expiration: 2040-06-28
Also published as: CN111814615B

Abstract

The invention provides a Parkinson non-contact intelligent detection method and system based on instruction video. The method comprises the following steps: acquiring an instruction type video data set of a Parkinson patient and a non-patient; constructing a face model and calibrating key points; determining eye feature vectors according to the eye key points of the face model; determining a mouth feature vector according to the mouth key points of the face model; constructing a fusion network model; training an optimal model according to the mouth feature vector, the eye feature vector and the fusion network model; and determining the Parkinson patient according to the optimal model. The method comprehensively analyzes the mouth characteristics and the eye characteristics, introduces the difference idea into dynamic characteristic extraction, designs frame segmentation according to instructions to carry out statistical calculation of the characteristics, and finally trains a model by using a support vector machine algorithm, thereby improving the accuracy of the Parkinson detection and improving the detection accuracy.

Description

Parkinson non-contact intelligent detection method based on instruction video

Technical Field

The invention relates to the field of Parkinson non-contact intelligent detection, in particular to a Parkinson non-contact intelligent detection method and system based on instruction video.

Background

Parkinson's Disease (PD) is a common degenerative disease of the nervous system, and with the development of face recognition technology and natural language processing technology, medical applications for disease diagnosis based on video are emerging, and the requirements of scenes such as online inquiry, intelligent diagnosis guidance, and patient-related communication for symptom detection are becoming more and more "concise", "efficient", and "multidimensional".

Parkinson's face mask' refers to a decrease in facial expression of parkinson patients due to dyskinesia, with the clinical manifestations of light to heavy in turn: normal, dull face, poor facial expression, involuntary mouth opening, no expression at all, etc. As the development phase of parkinson's disease continues to evolve, the sensation of stiffness will become more apparent as facial muscles move. The mask face is an important index for clinically judging whether the patient suffers from the Parkinson disease.

Based on the characteristics of the 'mask face' of the Parkinson patient, an instruction type Parkinson detection method can be designed, and has the following characteristics: firstly, the clear instruction task can fully guide the patient to complete a simple expression task, is more accurate and more vivid compared with the traditional complex expression simulation task, and is suitable for an intelligent diagnosis guide platform of a hospital; secondly, because a single instruction corresponds to the movement of a single part, the dynamic characteristic extraction according to the instruction is more targeted during characteristic analysis, the effect of different characteristic sources on the Parkinson detection is compared in a training Support Vector Machine (SVM) mode, and the detection accuracy is improved.

Disclosure of Invention

The invention aims to provide a Parkinson non-contact intelligent detection method and system based on instruction video, so as to synthesize facial features and improve detection efficiency.

In order to achieve the purpose, the invention provides the following scheme:

a Parkinson non-contact intelligent detection method based on instruction video comprises the following steps:

acquiring an instruction type video data set of a Parkinson patient and a non-patient;

constructing a face model and calibrating key points;

determining eye feature vectors according to the eye key points of the face model;

determining a mouth feature vector according to the mouth key points of the face model;

constructing a fusion network model;

training an optimal model according to the mouth feature vector, the eye feature vector and the fusion network model;

and determining the Parkinson patient according to the optimal model.

Optionally, the constructing a face model and calibrating the key points specifically include:

based on a multi-task interface provided by the dlib library, which is used for face recognition and key point calibration, 68 personal face key points are extracted from a subject instruction video frame by frame. And 6 left-eye link key points, 6 right-eye link key points and 21 mouth link key points in the 68 key points are extracted as main target points for extracting features.

Optionally, the determining an eye feature vector according to the eye key points of the face model specifically includes:

in order to describe the opening and closing condition of the eyelids at a certain moment, an eyelid opening and closing rate eye based on the distance between the upper eyelid and the lower eyelid and the distance between the inner canthus and the outer canthus is defined_ratioThe calculation method of (1) is to calculate the opening and closing rate of the eyelids by comparing the Euclidean distance of the two key points with the Euclidean distance of the remaining key points according to the Euclidean distance of the 2 key points of the upper eyelid and the lower eyelid and the Euclidean distance of the two key points of the inner canthus and the outer canthus.

Extracting eyelid opening rate eye of all frames in instruction video_ratioAnd the eye opening difference Δ eye from frame to frame_ratioAnd an ocular feature vector of dimension 14 was calculated based on the 7 statistical interfaces provided by pandas.

Optionally, determining a mouth feature vector according to the mouth key point of the face model specifically includes:

defining the included angle between the key point p [0] of left mouth angle and the horizontal axis, defining the key point p [4] of left mouth angle and the rest key points p [ j ] as beta (pi, pj), calculating mouth compensation angle theta (pi, pj) by summing alpha and beta (pi, pj), and calculating 8 mouth characteristic vectors theta (p [0], p [1]), theta (p [1], p [2]) … theta (p [6], p [7]), theta (p [7], p [0 ]).

The relative distance Eudis (p 2) between the upper and lower key points],p[6]) Comparing the relative distance between the left and right key points Eudis (p [0]],p[4]) Calculating the lip opening and closing rate mth_ratio。

Optionally, constructing a converged network model according to the method specifically includes:

the method comprises the steps of constructing a fusion network model consisting of a feature fusion stage and a full-connection stage, wherein the feature fusion stage comprises an input layer and an output layer, and the fusion full-connection stage comprises an input layer, a first hidden layer, a second hidden layer and an output layer.

Optionally, the training an optimal model according to the mouth feature vector, the eye feature vector, and the fusion network model specifically includes:

and setting a penalty factor C, wherein the parameter represents the fault tolerance of the classifier to a 'slack variable', namely the 'tolerance' to misclassification, and selecting the default 'rbf' of the kernel function kernel.

And training different models based on the extracted features and the SVC method provided by the skrlearn. Dividing the data set D into f mutually exclusive subsets of similar size by cross-validation means, for each subset D_iWhen the method is used for testing, other 1/f data are used for training the model to obtain the subset D_iTest result under test r_iAveraging all r to obtain the test result under f times of cross validation, and comparing different parametersThe number C (k) corresponds to f_bTest result r of double cross validation_bkAnd selecting an optimal model under the current data sample.

A Parkinson non-contact intelligent detection system based on instruction video comprises:

the data set acquisition module is used for acquiring audio and video data sets of Parkinson patients and non-Parkinson patients;

the face model building module is used for building a face model and marking key points;

the eye feature vector determining module is used for determining eye feature vectors according to the eye key points of the face model;

the mouth feature vector determining module is used for determining mouth feature vectors according to the mouth key points of the face model;

the fusion network model building module is used for building a fusion network model;

the optimal model training module is used for training an optimal model by the mouth feature vector, the eye feature vector and the fusion network model;

and the Parkinson patient determination module is used for determining the Parkinson patient according to the optimal model.

According to the specific embodiment provided by the invention, the invention discloses the following technical effects:

the method comprehensively analyzes the mouth characteristics and the eye characteristics, introduces the difference idea into dynamic characteristic extraction, designs frame segmentation according to instructions to carry out statistical calculation of the characteristics, and finally trains a model by using a support vector machine algorithm, thereby improving the accuracy of the Parkinson detection and improving the detection accuracy.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

FIG. 1 is a flow chart of a Parkinson non-contact intelligent detection method based on instruction video according to the invention;

FIG. 2 is a structural diagram of a Parkinson non-contact intelligent detection system based on instruction video according to the invention;

FIG. 3 is a face keypoint calibration graph of the present invention;

FIG. 4 is a schematic diagram of the ocular key of the present invention;

FIG. 5 is a schematic diagram of the key points of the mouth of the present invention;

FIG. 6 is a diagram of a prediction confusion matrix of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention aims to provide a Parkinson non-contact intelligent detection method and system based on instruction video, which can comprehensively analyze mouth features and eye features and improve interactivity and detection efficiency.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

FIG. 1 is a flow chart of a Parkinson non-contact intelligent detection method based on instruction video. As shown in fig. 1, a parkinson non-contact intelligent detection method based on instruction video includes:

step 101: instructional video data sets are acquired for both parkinson and non-parkinson patients.

The invention constructs a clinically validated data set consisting of 2N subjects, with a parkinson to non-patient ratio of 1: 1. the subject is required to move along with the step-by-step command mode of the command 1 'please relax and look straight ahead' and the command 2 'please smile and expose teeth' and 'eyes-mouth', the process of the motion of the subject is recorded, and the most effective duration of the recorded video selection and splicing is 15s to be used as a video stream data set of the method.

Step 102: constructing a face model and marking key points, and specifically comprising the following steps:

Step 103: determining eye feature vectors according to the eye key points of the face model, specifically comprising:

Step 104: determining a mouth feature vector according to the mouth key points of the face model, specifically comprising:

defining the included angle between the key point p [0] of the left mouth angle and the horizontal axis, defining the key point p [4] of the right mouth angle and the horizontal axis as alpha, defining the key point pi of the left mouth angle and the other key points p [ j as beta (pi, pj), summing alpha and beta (pi, pj) to calculate the mouth compensation angle theta (pi, pj), and then obtaining 8 mouth characteristic vectors theta (p [0], p [1]), theta (p [1], p [2]), … theta (p [6], p [7]), theta (p [7], p [0 ]).

The relative distance Eudis (p 2) between the upper and lower key points],p[6]) Comparing the relative distance between the left key point and the right key pointIo Eudis (p [0]],p[4]) Calculating the lip opening and closing rate mth_ratio。

Step 105: constructing a fusion network model, which specifically comprises the following steps:

Step 106: training an optimal model according to the mouth feature vector, the eye feature vector and the fusion network model, specifically comprising:

And training different models based on the extracted features and the SVC method provided by the skrlearn. Dividing the data set D into f mutually exclusive subsets of similar size by cross-validation means, for each subset D_iWhen the method is used for testing, other 1/f data are used for training the model to obtain the subset D_iTest result under test r_iAveraging all r to obtain the test result under f times of cross validation, and comparing different parameters C (k) corresponding to f_bTest result r of double cross validation_bkAnd selecting an optimal model under the current data sample.

Step 107: and determining the Parkinson patient according to the optimal model.

FIG. 2 is a structural diagram of a Parkinson non-contact intelligent detection system based on instruction video. As shown in fig. 2, a parkinson non-contact intelligent detection system based on instruction video includes:

a data set acquisition module 201, configured to acquire audio and video data sets of parkinson patients and non-parkinson patients;

the face model construction module 202 is used for constructing a face model and marking key points;

the eye feature vector determination module 203 is configured to determine an eye feature vector according to the eye key points of the face model;

a mouth feature vector determining module 204, configured to determine a mouth feature vector according to the mouth key point of the face model;

a converged network model construction module 205, configured to construct a converged network model;

an optimal model training module 206, configured to train an optimal model based on the mouth feature vector, the eye feature vector, and the fusion network model;

a parkinson patient determination module 207, configured to determine a parkinson patient according to the optimal model.

Example 1:

for a more detailed discussion of the present invention, a specific example is provided below, comprising the following steps:

step one, acquiring instruction video data sets of Parkinson patients and non-Parkinson patients:

this example constructed a clinically validated data set consisting of 200 subjects with a parkinson to non-patient ratio of 1: 1. the subject is required to move along with the step-by-step command mode of the command 1 'please relax and look straight ahead' and the command 2 'please smile and expose teeth' and 'eyes-mouth', the process of the motion of the subject is recorded, and the most effective duration of the recorded video selection and splicing is 15s to be used as a video stream data set of the method.

Step two, constructing a face model, and calibrating key points:

based on the multitask interface provided by the dlib library for face recognition and keypoint targeting, 68 personal face keypoints are extracted from the video of the subject frame by frame. Due to the targeted instruction design, of the 68 key points of the whole face, only the key points of the eyes and the mouth really need to be concerned, specifically, the first 32 key points in the method, wherein No.37-No.42 links the left eye, No.43-No.48 links the right eye, and No. 49-No. 68 links the mouth. As shown in fig. 3, which is the extraction of the coordinates (x, y) of the key points in the frame sequence, where the blue circles represent the key point locations and the numbers represent their corresponding serial numbers.

Thirdly, determining eye feature vectors according to the eye key points of the face model:

based on the 12 keypoint coordinates obtained in step two for the left and right eyes, consider the static and dynamic information of the eyes in the sequence of keyframes (the first 1/3 segments of the total video), as shown in fig. 4.

To describe the eyelid opening and closing conditions at a certain moment, we define the eyelid opening and closing rate based on the distance between the upper and lower eyelids and the distance between the inner and outer canthus.

s.t.Eudis(p[0],p[3]≠0

Where p [. cndot. ] represents the eye key points in a coordinate shape like (x, y), and Eudis represents the Euclidean distance between two points

Since eye motion runs through the entire video itself, eye is calculated for all video frames_ratioAnd delta eye from frame to frame_ratioWhich reflects the variation of the eyelid opening and closing rate over time, where Δ eye_ratioIs calculated as follows

Wherein m is the total frame number, dopna (-) is the deletion function, if the (-) is empty, the deletion function is discarded, otherwise, the deletion function is taken. Eye feature vector eyefeat with dimension 14 is then computed based on the 7 statistical feature interfaces provided by pandas,

eyefeat＝(ef1；ef2；…；ef7；ef8；…；ef14)

wherein the ef 1-ef 7 are derived from eye_ratioEf 8-ef 14 are derived from Δ eye_ratio。

Step four, determining a mouth feature vector according to the mouth key points of the face model:

based on the 8 key point coordinates of the mouth obtained in step two, a "smile elevation angle" α is defined, as shown in fig. 5.

The reference quantity really valuable when studying mouth movement should be the angle between the connecting line of the key points of the adjacent inner mouth and the mouth angle connecting line, i.e. the angle theta shown in fig. 4, and the angle beta is the angle between the connecting line of the adjacent points and the horizontal line. Obviously, α, β, θ satisfy the following relation:

theta is alpha + beta type (4-1)

We will turn p [0]],p[1]Theta corresponding to two points is represented as theta (p [ i ]],p[j]) For the same reason, beta is represented as beta (p [ i ]],p[j]) The rest are analogized, and point p [. cndot.)]Respectively using p [. cndot. ]]_x，p[·]_yIndicates that there is

s.t.p[0]_x-p[4]_x≠0

Where atan () is the inverse function of tan (), which is used to find the tangent value versus the radian. In the same way, the method for preparing the composite material,

s.t.p[0]_x-p[1]_x≠0

by substituting formula (3-11), formula (3-12) for formula (3-10)

Theta (p 0, p 1) alpha + beta (p 0, p 1) formula (4-4)

s.t.p[0]_x-p[1]_x≠0

p[0]_x-p[4]_x≠0

From the practical situation we have reason to believe that the above constraint is always true.

The formula for calculating the lip opening and closing rate is written as

s.t.Eudis(p[0],p[4])≠0

Referring to the principles of formula (4-1), formula (4-2), formula (4-3), θ (p 0)]，p[1])，θ(p[1]，p[2])，θ(p[2]，p[3])，θ(p[3]，p[4])，θ(p[4]，p[5])，θ(p[5]，p[6])，θ(p[6]，p[7])，θ(p[7]，p[0])8 opening and closing angles and mth_ratioAs the key frame (middle 1/3 segment of the total video) corresponds to the 9-valued feature, the feature vector mthfeat,

mthfeat＝(mt1；mt2；…；mt8；mt9)

wherein mt 1-mt 8 are originated from theta, mt9 is originated from mth_ratio。

Step five, constructing a fusion network model:

Step six, training an optimal model according to the mouth feature vector, the eye feature vector and the fusion network model:

and training different models based on the extracted features and the SVC method provided by the skrlearn. The SVC method needs to set a penalty factor C, which represents the fault tolerance of the classifier to "slack variables", i.e. the "tolerance" to misclassification: when C tends to infinity, the fault tolerance is small, and all samples are required to meet the constraint

s.t.y_i(w^Tx_i+ b) is not less than 1, i is 1,2, …, m is (6-1)

Therefore, overfitting is easy to happen, and the trained model is weak in generalization ability; when C takes a finite value, some samples are allowed to fail the constraint. Based on past experience, let

In this section, the default 'rbf' of the kernel function kernel is chosen.

By means of cross-validation, the data set D is divided into f mutually exclusive subsets of similar size, for each subset D_iWhen the method is used for testing, other 1/f data are used for training the model to obtain the subset D_iTest result under test r_iAnd averaging all the r to obtain the test result under the f-fold cross validation. In this section, the optimal test fraction f is selected for each classifier M (C (k))_bI.e. f_bThe best result of the corresponding mean value of the double cross validation is f_b∈{f_bL 3,4,5,6 }. Comparing different parameters C (k) to f_bTest result r of double cross validation_bkSelecting the optimal model under the current data sample, and recording_k＝max(r_bk). The trained optimal models were compared when different features were entered, and the specific results are shown in table 5-1.

TABLE 5-1 results corresponding to different input characteristics

It can be seen that the effect of computing the difference is generally better than the case of not being differentiated whether the source of the features is the mouth or the eye; in the characteristic items of the mouth, the training effect that the included angles theta and delta theta between the connecting line of adjacent key points and the mouth angle are not as good as the difference delta mr of the mouth opening and closing rate is good; for instruction type intelligent Parkinson detection, the research value of the mouth is obviously superior to that of eyes, and the training effect is better than that of the eyes. For multiple comparisons, this document finally adopts a scheme of feature quantity Δ mr ═ mean, var, skew, kurt, max, min, ptp) which predicts a confusion matrix as shown in fig. 6.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.

The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims

1. A Parkinson non-contact intelligent detection method based on instruction video is characterized by comprising the following steps:

constructing a face model and calibrating key points;

constructing a fusion network model;

and determining the Parkinson patient according to the optimal model.

2. The parkinson non-contact intelligent detection method based on instruction video according to claim 1, wherein the constructing a face model and calibrating key points specifically comprises:

3. The audio/video-based parkinson non-contact intelligent detection method according to claim 1, wherein the determining of eye feature vectors according to eye key points of the face model specifically comprises:

4. The parkinson-based non-contact intelligent detection method based on instruction video according to claim 1, wherein the determining mouth feature vectors according to the mouth key points of the face model specifically includes:

defining the included angle between the key point p [0] of the left mouth angle and the horizontal axis as alpha, defining the key point p [ i ] of the left mouth angle and the other key points p [ j ] as beta (pi, pj), calculating the mouth compensation angle theta (pi, pj) by summing alpha and beta (pi, pj), and then obtaining 8 mouth feature vectors theta (p [0], p [1]), theta (p [1], p [2]). theta (p [6], p [7]), theta (p [7], p [0 ]).

The relative distance Eudis (p 2) between the upper and lower key points]，p[6]) Comparing the relative distance between the left and right key points Eudis (p [0]]，p[4]) Calculating the lip opening and closing rate mth_ratio。

5. The parkinson non-contact intelligent detection method based on instruction video according to claim 1, wherein the constructing a converged network model specifically comprises:

6. The parkinson-based non-contact intelligent detection method based on instruction video according to claim 1, wherein the training of the optimal model according to the mouth feature vector, the eye feature vector, and the fusion network model specifically comprises:

7. The utility model provides a parkinson non-contact intelligent detection system based on instruction video which characterized in that includes: