CN111814615A - Parkinson non-contact intelligent detection method based on instruction video - Google Patents
Parkinson non-contact intelligent detection method based on instruction video Download PDFInfo
- Publication number
- CN111814615A CN111814615A CN202010596575.XA CN202010596575A CN111814615A CN 111814615 A CN111814615 A CN 111814615A CN 202010596575 A CN202010596575 A CN 202010596575A CN 111814615 A CN111814615 A CN 111814615A
- Authority
- CN
- China
- Prior art keywords
- key points
- eye
- parkinson
- mouth
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 35
- 239000013598 vector Substances 0.000 claims abstract description 54
- 230000004927 fusion Effects 0.000 claims abstract description 35
- 238000000034 method Methods 0.000 claims abstract description 27
- 238000012549 training Methods 0.000 claims abstract description 23
- 238000004364 calculation method Methods 0.000 claims abstract description 6
- 210000000744 eyelid Anatomy 0.000 claims description 28
- 238000012360 testing method Methods 0.000 claims description 21
- 238000002790 cross-validation Methods 0.000 claims description 13
- 238000012935 Averaging Methods 0.000 claims description 4
- 208000025174 PANDAS Diseases 0.000 claims description 4
- 208000021155 Paediatric autoimmune neuropsychiatric disorders associated with streptococcal infection Diseases 0.000 claims description 4
- 235000016496 Panda oleosa Nutrition 0.000 claims description 4
- 240000000220 Panda oleosa Species 0.000 claims 1
- 238000000605 extraction Methods 0.000 abstract description 5
- 238000012706 support-vector machine Methods 0.000 abstract description 5
- 238000013461 design Methods 0.000 abstract description 4
- 238000004422 calculation algorithm Methods 0.000 abstract description 3
- 230000011218 segmentation Effects 0.000 abstract description 3
- 230000006870 function Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 230000033001 locomotion Effects 0.000 description 5
- 208000018737 Parkinson disease Diseases 0.000 description 4
- 240000004718 Panda Species 0.000 description 3
- 238000012217 deletion Methods 0.000 description 3
- 230000037430 deletion Effects 0.000 description 3
- 238000003745 diagnosis Methods 0.000 description 3
- 238000010276 construction Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000008921 facial expression Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 208000012661 Dyskinesia Diseases 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 230000001815 facial effect Effects 0.000 description 1
- 210000001097 facial muscle Anatomy 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 210000000653 nervous system Anatomy 0.000 description 1
- 208000015122 neurodegenerative disease Diseases 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000035807 sensation Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 208000024891 symptom Diseases 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
- G06V40/171—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/40—Detecting, measuring or recording for evaluating the nervous system
- A61B5/4076—Diagnosing or monitoring particular conditions of the nervous system
- A61B5/4082—Diagnosing or monitoring movement diseases, e.g. Parkinson, Huntington or Tourette
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/254—Fusion techniques of classification results, e.g. of results related to same input data
- G06F18/256—Fusion techniques of classification results, e.g. of results related to same input data of results relating to different input data, e.g. multimodal recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Abstract
The invention provides a Parkinson non-contact intelligent detection method and system based on instruction video. The method comprises the following steps: acquiring an instruction type video data set of a Parkinson patient and a non-patient; constructing a face model and calibrating key points; determining eye feature vectors according to the eye key points of the face model; determining a mouth feature vector according to the mouth key points of the face model; constructing a fusion network model; training an optimal model according to the mouth feature vector, the eye feature vector and the fusion network model; and determining the Parkinson patient according to the optimal model. The method comprehensively analyzes the mouth characteristics and the eye characteristics, introduces the difference idea into dynamic characteristic extraction, designs frame segmentation according to instructions to carry out statistical calculation of the characteristics, and finally trains a model by using a support vector machine algorithm, thereby improving the accuracy of the Parkinson detection and improving the detection accuracy.
Description
Technical Field
The invention relates to the field of Parkinson non-contact intelligent detection, in particular to a Parkinson non-contact intelligent detection method and system based on instruction video.
Background
Parkinson's Disease (PD) is a common degenerative disease of the nervous system, and with the development of face recognition technology and natural language processing technology, medical applications for disease diagnosis based on video are emerging, and the requirements of scenes such as online inquiry, intelligent diagnosis guidance, and patient-related communication for symptom detection are becoming more and more "concise", "efficient", and "multidimensional".
Parkinson's face mask' refers to a decrease in facial expression of parkinson patients due to dyskinesia, with the clinical manifestations of light to heavy in turn: normal, dull face, poor facial expression, involuntary mouth opening, no expression at all, etc. As the development phase of parkinson's disease continues to evolve, the sensation of stiffness will become more apparent as facial muscles move. The mask face is an important index for clinically judging whether the patient suffers from the Parkinson disease.
Based on the characteristics of the 'mask face' of the Parkinson patient, an instruction type Parkinson detection method can be designed, and has the following characteristics: firstly, the clear instruction task can fully guide the patient to complete a simple expression task, is more accurate and more vivid compared with the traditional complex expression simulation task, and is suitable for an intelligent diagnosis guide platform of a hospital; secondly, because a single instruction corresponds to the movement of a single part, the dynamic characteristic extraction according to the instruction is more targeted during characteristic analysis, the effect of different characteristic sources on the Parkinson detection is compared in a training Support Vector Machine (SVM) mode, and the detection accuracy is improved.
Disclosure of Invention
The invention aims to provide a Parkinson non-contact intelligent detection method and system based on instruction video, so as to synthesize facial features and improve detection efficiency.
In order to achieve the purpose, the invention provides the following scheme:
a Parkinson non-contact intelligent detection method based on instruction video comprises the following steps:
acquiring an instruction type video data set of a Parkinson patient and a non-patient;
constructing a face model and calibrating key points;
determining eye feature vectors according to the eye key points of the face model;
determining a mouth feature vector according to the mouth key points of the face model;
constructing a fusion network model;
training an optimal model according to the mouth feature vector, the eye feature vector and the fusion network model;
and determining the Parkinson patient according to the optimal model.
Optionally, the constructing a face model and calibrating the key points specifically include:
based on a multi-task interface provided by the dlib library, which is used for face recognition and key point calibration, 68 personal face key points are extracted from a subject instruction video frame by frame. And 6 left-eye link key points, 6 right-eye link key points and 21 mouth link key points in the 68 key points are extracted as main target points for extracting features.
Optionally, the determining an eye feature vector according to the eye key points of the face model specifically includes:
in order to describe the opening and closing condition of the eyelids at a certain moment, an eyelid opening and closing rate eye based on the distance between the upper eyelid and the lower eyelid and the distance between the inner canthus and the outer canthus is definedratioThe calculation method of (1) is to calculate the opening and closing rate of the eyelids by comparing the Euclidean distance of the two key points with the Euclidean distance of the remaining key points according to the Euclidean distance of the 2 key points of the upper eyelid and the lower eyelid and the Euclidean distance of the two key points of the inner canthus and the outer canthus.
Extracting eyelid opening rate eye of all frames in instruction videoratioAnd the eye opening difference Δ eye from frame to frameratioAnd an ocular feature vector of dimension 14 was calculated based on the 7 statistical interfaces provided by pandas.
Optionally, determining a mouth feature vector according to the mouth key point of the face model specifically includes:
defining the included angle between the key point p [0] of left mouth angle and the horizontal axis, defining the key point p [4] of left mouth angle and the rest key points p [ j ] as beta (pi, pj), calculating mouth compensation angle theta (pi, pj) by summing alpha and beta (pi, pj), and calculating 8 mouth characteristic vectors theta (p [0], p [1]), theta (p [1], p [2]) … theta (p [6], p [7]), theta (p [7], p [0 ]).
The relative distance Eudis (p 2) between the upper and lower key points],p[6]) Comparing the relative distance between the left and right key points Eudis (p [0]],p[4]) Calculating the lip opening and closing rate mthratio。
Optionally, constructing a converged network model according to the method specifically includes:
the method comprises the steps of constructing a fusion network model consisting of a feature fusion stage and a full-connection stage, wherein the feature fusion stage comprises an input layer and an output layer, and the fusion full-connection stage comprises an input layer, a first hidden layer, a second hidden layer and an output layer.
Optionally, the training an optimal model according to the mouth feature vector, the eye feature vector, and the fusion network model specifically includes:
and setting a penalty factor C, wherein the parameter represents the fault tolerance of the classifier to a 'slack variable', namely the 'tolerance' to misclassification, and selecting the default 'rbf' of the kernel function kernel.
And training different models based on the extracted features and the SVC method provided by the skrlearn. Dividing the data set D into f mutually exclusive subsets of similar size by cross-validation means, for each subset DiWhen the method is used for testing, other 1/f data are used for training the model to obtain the subset DiTest result under test riAveraging all r to obtain the test result under f times of cross validation, and comparing different parametersThe number C (k) corresponds to fbTest result r of double cross validationbkAnd selecting an optimal model under the current data sample.
A Parkinson non-contact intelligent detection system based on instruction video comprises:
the data set acquisition module is used for acquiring audio and video data sets of Parkinson patients and non-Parkinson patients;
the face model building module is used for building a face model and marking key points;
the eye feature vector determining module is used for determining eye feature vectors according to the eye key points of the face model;
the mouth feature vector determining module is used for determining mouth feature vectors according to the mouth key points of the face model;
the fusion network model building module is used for building a fusion network model;
the optimal model training module is used for training an optimal model by the mouth feature vector, the eye feature vector and the fusion network model;
and the Parkinson patient determination module is used for determining the Parkinson patient according to the optimal model.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the method comprehensively analyzes the mouth characteristics and the eye characteristics, introduces the difference idea into dynamic characteristic extraction, designs frame segmentation according to instructions to carry out statistical calculation of the characteristics, and finally trains a model by using a support vector machine algorithm, thereby improving the accuracy of the Parkinson detection and improving the detection accuracy.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a flow chart of a Parkinson non-contact intelligent detection method based on instruction video according to the invention;
FIG. 2 is a structural diagram of a Parkinson non-contact intelligent detection system based on instruction video according to the invention;
FIG. 3 is a face keypoint calibration graph of the present invention;
FIG. 4 is a schematic diagram of the ocular key of the present invention;
FIG. 5 is a schematic diagram of the key points of the mouth of the present invention;
FIG. 6 is a diagram of a prediction confusion matrix of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention aims to provide a Parkinson non-contact intelligent detection method and system based on instruction video, which can comprehensively analyze mouth features and eye features and improve interactivity and detection efficiency.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
FIG. 1 is a flow chart of a Parkinson non-contact intelligent detection method based on instruction video. As shown in fig. 1, a parkinson non-contact intelligent detection method based on instruction video includes:
step 101: instructional video data sets are acquired for both parkinson and non-parkinson patients.
The invention constructs a clinically validated data set consisting of 2N subjects, with a parkinson to non-patient ratio of 1: 1. the subject is required to move along with the step-by-step command mode of the command 1 'please relax and look straight ahead' and the command 2 'please smile and expose teeth' and 'eyes-mouth', the process of the motion of the subject is recorded, and the most effective duration of the recorded video selection and splicing is 15s to be used as a video stream data set of the method.
Step 102: constructing a face model and marking key points, and specifically comprising the following steps:
based on a multi-task interface provided by the dlib library, which is used for face recognition and key point calibration, 68 personal face key points are extracted from a subject instruction video frame by frame. And 6 left-eye link key points, 6 right-eye link key points and 21 mouth link key points in the 68 key points are extracted as main target points for extracting features.
Step 103: determining eye feature vectors according to the eye key points of the face model, specifically comprising:
in order to describe the opening and closing condition of the eyelids at a certain moment, an eyelid opening and closing rate eye based on the distance between the upper eyelid and the lower eyelid and the distance between the inner canthus and the outer canthus is definedratioThe calculation method of (1) is to calculate the opening and closing rate of the eyelids by comparing the Euclidean distance of the two key points with the Euclidean distance of the remaining key points according to the Euclidean distance of the 2 key points of the upper eyelid and the lower eyelid and the Euclidean distance of the two key points of the inner canthus and the outer canthus.
Extracting eyelid opening rate eye of all frames in instruction videoratioAnd the eye opening difference Δ eye from frame to frameratioAnd an ocular feature vector of dimension 14 was calculated based on the 7 statistical interfaces provided by pandas.
Step 104: determining a mouth feature vector according to the mouth key points of the face model, specifically comprising:
defining the included angle between the key point p [0] of the left mouth angle and the horizontal axis, defining the key point p [4] of the right mouth angle and the horizontal axis as alpha, defining the key point pi of the left mouth angle and the other key points p [ j as beta (pi, pj), summing alpha and beta (pi, pj) to calculate the mouth compensation angle theta (pi, pj), and then obtaining 8 mouth characteristic vectors theta (p [0], p [1]), theta (p [1], p [2]), … theta (p [6], p [7]), theta (p [7], p [0 ]).
The relative distance Eudis (p 2) between the upper and lower key points],p[6]) Comparing the relative distance between the left key point and the right key pointIo Eudis (p [0]],p[4]) Calculating the lip opening and closing rate mthratio。
Step 105: constructing a fusion network model, which specifically comprises the following steps:
the method comprises the steps of constructing a fusion network model consisting of a feature fusion stage and a full-connection stage, wherein the feature fusion stage comprises an input layer and an output layer, and the fusion full-connection stage comprises an input layer, a first hidden layer, a second hidden layer and an output layer.
Step 106: training an optimal model according to the mouth feature vector, the eye feature vector and the fusion network model, specifically comprising:
and setting a penalty factor C, wherein the parameter represents the fault tolerance of the classifier to a 'slack variable', namely the 'tolerance' to misclassification, and selecting the default 'rbf' of the kernel function kernel.
And training different models based on the extracted features and the SVC method provided by the skrlearn. Dividing the data set D into f mutually exclusive subsets of similar size by cross-validation means, for each subset DiWhen the method is used for testing, other 1/f data are used for training the model to obtain the subset DiTest result under test riAveraging all r to obtain the test result under f times of cross validation, and comparing different parameters C (k) corresponding to fbTest result r of double cross validationbkAnd selecting an optimal model under the current data sample.
Step 107: and determining the Parkinson patient according to the optimal model.
The method comprehensively analyzes the mouth characteristics and the eye characteristics, introduces the difference idea into dynamic characteristic extraction, designs frame segmentation according to instructions to carry out statistical calculation of the characteristics, and finally trains a model by using a support vector machine algorithm, thereby improving the accuracy of the Parkinson detection and improving the detection accuracy.
FIG. 2 is a structural diagram of a Parkinson non-contact intelligent detection system based on instruction video. As shown in fig. 2, a parkinson non-contact intelligent detection system based on instruction video includes:
a data set acquisition module 201, configured to acquire audio and video data sets of parkinson patients and non-parkinson patients;
the face model construction module 202 is used for constructing a face model and marking key points;
the eye feature vector determination module 203 is configured to determine an eye feature vector according to the eye key points of the face model;
a mouth feature vector determining module 204, configured to determine a mouth feature vector according to the mouth key point of the face model;
a converged network model construction module 205, configured to construct a converged network model;
an optimal model training module 206, configured to train an optimal model based on the mouth feature vector, the eye feature vector, and the fusion network model;
a parkinson patient determination module 207, configured to determine a parkinson patient according to the optimal model.
Example 1:
for a more detailed discussion of the present invention, a specific example is provided below, comprising the following steps:
step one, acquiring instruction video data sets of Parkinson patients and non-Parkinson patients:
this example constructed a clinically validated data set consisting of 200 subjects with a parkinson to non-patient ratio of 1: 1. the subject is required to move along with the step-by-step command mode of the command 1 'please relax and look straight ahead' and the command 2 'please smile and expose teeth' and 'eyes-mouth', the process of the motion of the subject is recorded, and the most effective duration of the recorded video selection and splicing is 15s to be used as a video stream data set of the method.
Step two, constructing a face model, and calibrating key points:
based on the multitask interface provided by the dlib library for face recognition and keypoint targeting, 68 personal face keypoints are extracted from the video of the subject frame by frame. Due to the targeted instruction design, of the 68 key points of the whole face, only the key points of the eyes and the mouth really need to be concerned, specifically, the first 32 key points in the method, wherein No.37-No.42 links the left eye, No.43-No.48 links the right eye, and No. 49-No. 68 links the mouth. As shown in fig. 3, which is the extraction of the coordinates (x, y) of the key points in the frame sequence, where the blue circles represent the key point locations and the numbers represent their corresponding serial numbers.
Thirdly, determining eye feature vectors according to the eye key points of the face model:
based on the 12 keypoint coordinates obtained in step two for the left and right eyes, consider the static and dynamic information of the eyes in the sequence of keyframes (the first 1/3 segments of the total video), as shown in fig. 4.
To describe the eyelid opening and closing conditions at a certain moment, we define the eyelid opening and closing rate based on the distance between the upper and lower eyelids and the distance between the inner and outer canthus.
s.t.Eudis(p[0],p[3]≠0
Where p [. cndot. ] represents the eye key points in a coordinate shape like (x, y), and Eudis represents the Euclidean distance between two points
Since eye motion runs through the entire video itself, eye is calculated for all video framesratioAnd delta eye from frame to frameratioWhich reflects the variation of the eyelid opening and closing rate over time, where Δ eyeratioIs calculated as follows
Wherein m is the total frame number, dopna (-) is the deletion function, if the (-) is empty, the deletion function is discarded, otherwise, the deletion function is taken. Eye feature vector eyefeat with dimension 14 is then computed based on the 7 statistical feature interfaces provided by pandas,
eyefeat=(ef1;ef2;…;ef7;ef8;…;ef14)
wherein the ef 1-ef 7 are derived from eyeratioEf 8-ef 14 are derived from Δ eyeratio。
Step four, determining a mouth feature vector according to the mouth key points of the face model:
based on the 8 key point coordinates of the mouth obtained in step two, a "smile elevation angle" α is defined, as shown in fig. 5.
The reference quantity really valuable when studying mouth movement should be the angle between the connecting line of the key points of the adjacent inner mouth and the mouth angle connecting line, i.e. the angle theta shown in fig. 4, and the angle beta is the angle between the connecting line of the adjacent points and the horizontal line. Obviously, α, β, θ satisfy the following relation:
theta is alpha + beta type (4-1)
We will turn p [0]],p[1]Theta corresponding to two points is represented as theta (p [ i ]],p[j]) For the same reason, beta is represented as beta (p [ i ]],p[j]) The rest are analogized, and point p [. cndot.)]Respectively using p [. cndot. ]]x,p[·]yIndicates that there is
s.t.p[0]x-p[4]x≠0
Where atan () is the inverse function of tan (), which is used to find the tangent value versus the radian. In the same way, the method for preparing the composite material,
s.t.p[0]x-p[1]x≠0
by substituting formula (3-11), formula (3-12) for formula (3-10)
Theta (p 0, p 1) alpha + beta (p 0, p 1) formula (4-4)
s.t.p[0]x-p[1]x≠0
p[0]x-p[4]x≠0
From the practical situation we have reason to believe that the above constraint is always true.
The formula for calculating the lip opening and closing rate is written as
s.t.Eudis(p[0],p[4])≠0
Referring to the principles of formula (4-1), formula (4-2), formula (4-3), θ (p 0)],p[1]),θ(p[1],p[2]),θ(p[2],p[3]),θ(p[3],p[4]),θ(p[4],p[5]),θ(p[5],p[6]),θ(p[6],p[7]),θ(p[7],p[0])8 opening and closing angles and mthratioAs the key frame (middle 1/3 segment of the total video) corresponds to the 9-valued feature, the feature vector mthfeat,
mthfeat=(mt1;mt2;…;mt8;mt9)
wherein mt 1-mt 8 are originated from theta, mt9 is originated from mthratio。
Step five, constructing a fusion network model:
the method comprises the steps of constructing a fusion network model consisting of a feature fusion stage and a full-connection stage, wherein the feature fusion stage comprises an input layer and an output layer, and the fusion full-connection stage comprises an input layer, a first hidden layer, a second hidden layer and an output layer.
Step six, training an optimal model according to the mouth feature vector, the eye feature vector and the fusion network model:
and training different models based on the extracted features and the SVC method provided by the skrlearn. The SVC method needs to set a penalty factor C, which represents the fault tolerance of the classifier to "slack variables", i.e. the "tolerance" to misclassification: when C tends to infinity, the fault tolerance is small, and all samples are required to meet the constraint
s.t.yi(wTxi+ b) is not less than 1, i is 1,2, …, m is (6-1)
Therefore, overfitting is easy to happen, and the trained model is weak in generalization ability; when C takes a finite value, some samples are allowed to fail the constraint. Based on past experience, let
In this section, the default 'rbf' of the kernel function kernel is chosen.
By means of cross-validation, the data set D is divided into f mutually exclusive subsets of similar size, for each subset DiWhen the method is used for testing, other 1/f data are used for training the model to obtain the subset DiTest result under test riAnd averaging all the r to obtain the test result under the f-fold cross validation. In this section, the optimal test fraction f is selected for each classifier M (C (k))bI.e. fbThe best result of the corresponding mean value of the double cross validation is fb∈{fbL 3,4,5,6 }. Comparing different parameters C (k) to fbTest result r of double cross validationbkSelecting the optimal model under the current data sample, and recordingk=max(rbk). The trained optimal models were compared when different features were entered, and the specific results are shown in table 5-1.
TABLE 5-1 results corresponding to different input characteristics
It can be seen that the effect of computing the difference is generally better than the case of not being differentiated whether the source of the features is the mouth or the eye; in the characteristic items of the mouth, the training effect that the included angles theta and delta theta between the connecting line of adjacent key points and the mouth angle are not as good as the difference delta mr of the mouth opening and closing rate is good; for instruction type intelligent Parkinson detection, the research value of the mouth is obviously superior to that of eyes, and the training effect is better than that of the eyes. For multiple comparisons, this document finally adopts a scheme of feature quantity Δ mr ═ mean, var, skew, kurt, max, min, ptp) which predicts a confusion matrix as shown in fig. 6.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.
Claims (7)
1. A Parkinson non-contact intelligent detection method based on instruction video is characterized by comprising the following steps:
acquiring an instruction type video data set of a Parkinson patient and a non-patient;
constructing a face model and calibrating key points;
determining eye feature vectors according to the eye key points of the face model;
determining a mouth feature vector according to the mouth key points of the face model;
constructing a fusion network model;
training an optimal model according to the mouth feature vector, the eye feature vector and the fusion network model;
and determining the Parkinson patient according to the optimal model.
2. The parkinson non-contact intelligent detection method based on instruction video according to claim 1, wherein the constructing a face model and calibrating key points specifically comprises:
based on a multi-task interface provided by the dlib library, which is used for face recognition and key point calibration, 68 personal face key points are extracted from a subject instruction video frame by frame. And 6 left-eye link key points, 6 right-eye link key points and 21 mouth link key points in the 68 key points are extracted as main target points for extracting features.
3. The audio/video-based parkinson non-contact intelligent detection method according to claim 1, wherein the determining of eye feature vectors according to eye key points of the face model specifically comprises:
in order to describe the opening and closing condition of the eyelids at a certain moment, an eyelid opening and closing rate eye based on the distance between the upper eyelid and the lower eyelid and the distance between the inner canthus and the outer canthus is definedratioThe calculation method of (1) is to calculate the opening and closing rate of the eyelids by comparing the Euclidean distance of the two key points with the Euclidean distance of the remaining key points according to the Euclidean distance of the 2 key points of the upper eyelid and the lower eyelid and the Euclidean distance of the two key points of the inner canthus and the outer canthus.
Extracting eyelid opening rate eye of all frames in instruction videoratioAnd the eye opening difference Δ eye from frame to frameratioAnd an ocular feature vector of dimension 14 was calculated based on the 7 statistical interfaces provided by pandas.
4. The parkinson-based non-contact intelligent detection method based on instruction video according to claim 1, wherein the determining mouth feature vectors according to the mouth key points of the face model specifically includes:
defining the included angle between the key point p [0] of the left mouth angle and the horizontal axis as alpha, defining the key point p [ i ] of the left mouth angle and the other key points p [ j ] as beta (pi, pj), calculating the mouth compensation angle theta (pi, pj) by summing alpha and beta (pi, pj), and then obtaining 8 mouth feature vectors theta (p [0], p [1]), theta (p [1], p [2]). theta (p [6], p [7]), theta (p [7], p [0 ]).
The relative distance Eudis (p 2) between the upper and lower key points],p[6]) Comparing the relative distance between the left and right key points Eudis (p [0]],p[4]) Calculating the lip opening and closing rate mthratio。
5. The parkinson non-contact intelligent detection method based on instruction video according to claim 1, wherein the constructing a converged network model specifically comprises:
the method comprises the steps of constructing a fusion network model consisting of a feature fusion stage and a full-connection stage, wherein the feature fusion stage comprises an input layer and an output layer, and the fusion full-connection stage comprises an input layer, a first hidden layer, a second hidden layer and an output layer.
6. The parkinson-based non-contact intelligent detection method based on instruction video according to claim 1, wherein the training of the optimal model according to the mouth feature vector, the eye feature vector, and the fusion network model specifically comprises:
and setting a penalty factor C, wherein the parameter represents the fault tolerance of the classifier to a 'slack variable', namely the 'tolerance' to misclassification, and selecting the default 'rbf' of the kernel function kernel.
And training different models based on the extracted features and the SVC method provided by the skrlearn. Dividing the data set D into f mutually exclusive subsets of similar size by cross-validation means, for each subset DiWhen the method is used for testing, other 1/f data are used for training the model to obtain the subset DiTest result under test riAveraging all r to obtain the test result under f times of cross validation, and comparing different parameters C (k) corresponding to fbTest result r of double cross validationbkAnd selecting an optimal model under the current data sample.
7. The utility model provides a parkinson non-contact intelligent detection system based on instruction video which characterized in that includes:
the data set acquisition module is used for acquiring audio and video data sets of Parkinson patients and non-Parkinson patients;
the face model building module is used for building a face model and marking key points;
the eye feature vector determining module is used for determining eye feature vectors according to the eye key points of the face model;
the mouth feature vector determining module is used for determining mouth feature vectors according to the mouth key points of the face model;
the fusion network model building module is used for building a fusion network model;
the optimal model training module is used for training an optimal model by the mouth feature vector, the eye feature vector and the fusion network model;
and the Parkinson patient determination module is used for determining the Parkinson patient according to the optimal model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010596575.XA CN111814615B (en) | 2020-06-28 | 2020-06-28 | Parkinson non-contact intelligent detection method based on instruction video |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010596575.XA CN111814615B (en) | 2020-06-28 | 2020-06-28 | Parkinson non-contact intelligent detection method based on instruction video |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111814615A true CN111814615A (en) | 2020-10-23 |
CN111814615B CN111814615B (en) | 2024-04-12 |
Family
ID=72856436
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010596575.XA Active CN111814615B (en) | 2020-06-28 | 2020-06-28 | Parkinson non-contact intelligent detection method based on instruction video |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111814615B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113076813A (en) * | 2021-03-12 | 2021-07-06 | 首都医科大学宣武医院 | Mask face feature recognition model training method and device |
CN116392086A (en) * | 2023-06-06 | 2023-07-07 | 浙江多模医疗科技有限公司 | Method, system, terminal and storage medium for detecting stimulus |
CN117137442A (en) * | 2023-09-04 | 2023-12-01 | 佳木斯大学 | Parkinsonism auxiliary detection system based on biological characteristics and machine-readable medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105096528A (en) * | 2015-08-05 | 2015-11-25 | 广州云从信息科技有限公司 | Fatigue driving detection method and system |
CN106781282A (en) * | 2016-12-29 | 2017-05-31 | 天津中科智能识别产业技术研究院有限公司 | A kind of intelligent travelling crane driver fatigue early warning system |
CN106997451A (en) * | 2016-01-26 | 2017-08-01 | 北方工业大学 | Lip contour positioning method |
CN109919049A (en) * | 2019-02-21 | 2019-06-21 | 北京以萨技术股份有限公司 | Fatigue detection method based on deep learning human face modeling |
-
2020
- 2020-06-28 CN CN202010596575.XA patent/CN111814615B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105096528A (en) * | 2015-08-05 | 2015-11-25 | 广州云从信息科技有限公司 | Fatigue driving detection method and system |
CN106997451A (en) * | 2016-01-26 | 2017-08-01 | 北方工业大学 | Lip contour positioning method |
CN106781282A (en) * | 2016-12-29 | 2017-05-31 | 天津中科智能识别产业技术研究院有限公司 | A kind of intelligent travelling crane driver fatigue early warning system |
CN109919049A (en) * | 2019-02-21 | 2019-06-21 | 北京以萨技术股份有限公司 | Fatigue detection method based on deep learning human face modeling |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113076813A (en) * | 2021-03-12 | 2021-07-06 | 首都医科大学宣武医院 | Mask face feature recognition model training method and device |
CN113076813B (en) * | 2021-03-12 | 2024-04-12 | 首都医科大学宣武医院 | Training method and device for mask face feature recognition model |
CN116392086A (en) * | 2023-06-06 | 2023-07-07 | 浙江多模医疗科技有限公司 | Method, system, terminal and storage medium for detecting stimulus |
CN116392086B (en) * | 2023-06-06 | 2023-08-25 | 浙江多模医疗科技有限公司 | Method, terminal and storage medium for detecting stimulation |
CN117137442A (en) * | 2023-09-04 | 2023-12-01 | 佳木斯大学 | Parkinsonism auxiliary detection system based on biological characteristics and machine-readable medium |
CN117137442B (en) * | 2023-09-04 | 2024-03-29 | 佳木斯大学 | Parkinsonism auxiliary detection system based on biological characteristics and machine-readable medium |
Also Published As
Publication number | Publication date |
---|---|
CN111814615B (en) | 2024-04-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhou et al. | Visually interpretable representation learning for depression recognition from facial images | |
Wu et al. | Transfer learning for EEG-based brain–computer interfaces: A review of progress made since 2016 | |
CN111814615A (en) | Parkinson non-contact intelligent detection method based on instruction video | |
Gunes et al. | Categorical and dimensional affect analysis in continuous input: Current trends and future directions | |
Bilakhia et al. | The MAHNOB Mimicry Database: A database of naturalistic human interactions | |
Al Osman et al. | Multimodal affect recognition: Current approaches and challenges | |
CN110491502A (en) | Microscope video stream processing method, system, computer equipment and storage medium | |
Liu et al. | Adaptive multilayer perceptual attention network for facial expression recognition | |
Tolosana et al. | DeepFakes detection across generations: Analysis of facial regions, fusion, and performance evaluation | |
Liu et al. | Dual-stream generative adversarial networks for distributionally robust zero-shot learning | |
Li et al. | Learning representations for facial actions from unlabeled videos | |
Chetty et al. | A multilevel fusion approach for audiovisual emotion recognition | |
Zhu et al. | Hybrid feature-based analysis of video’s affective content using protagonist detection | |
Liu et al. | PRA-Net: Part-and-Relation Attention Network for depression recognition from facial expression | |
Jingchao et al. | Recognition of classroom student state features based on deep learning algorithms and machine learning | |
Shen et al. | Multi-modal feature fusion for better understanding of human personality traits in social human–robot interaction | |
Li et al. | Depression severity prediction from facial expression based on the DRR_DepressionNet network | |
Wang et al. | Semi-supervised classification-aware cross-modal deep adversarial data augmentation | |
Li et al. | Emotion recognition of Chinese paintings at the thirteenth national exhibition of fines arts in China based on advanced affective computing | |
Zheng et al. | Bridging clip and stylegan through latent alignment for image editing | |
Aslam et al. | Privileged knowledge distillation for dimensional emotion recognition in the wild | |
Cao et al. | Concept-centric Personalization with Large-scale Diffusion Priors | |
S'adan et al. | Deep learning techniques for depression assessment | |
Singh et al. | Multi-modal Expression Detection (MED): A cutting-edge review of current trends, challenges and solutions | |
Ning et al. | ICGNet: An intensity-controllable generation network based on covering learning for face attribute synthesis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20210609 Address after: 100000 No. 6 South Road, Zhongguancun Academy of Sciences, Beijing, Haidian District Applicant after: Institute of Computing Technology, Chinese Academy of Sciences Applicant after: XIANGTAN University Address before: Xiangtan University, yanggutang street, Yuhu District, Xiangtan City, Hunan Province Applicant before: XIANGTAN University |
|
TA01 | Transfer of patent application right | ||
GR01 | Patent grant | ||
GR01 | Patent grant |