CN111814615A - Parkinson non-contact intelligent detection method based on instruction video - Google Patents

Parkinson non-contact intelligent detection method based on instruction video Download PDF

Info

Publication number
CN111814615A
CN111814615A CN202010596575.XA CN202010596575A CN111814615A CN 111814615 A CN111814615 A CN 111814615A CN 202010596575 A CN202010596575 A CN 202010596575A CN 111814615 A CN111814615 A CN 111814615A
Authority
CN
China
Prior art keywords
key points
eye
parkinson
mouth
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010596575.XA
Other languages
Chinese (zh)
Other versions
CN111814615B (en
Inventor
邹娟
房海鹏
陈钢
曾碧霄
向懿
王求真
郭建强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Xiangtan University
Original Assignee
Xiangtan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiangtan University filed Critical Xiangtan University
Priority to CN202010596575.XA priority Critical patent/CN111814615B/en
Publication of CN111814615A publication Critical patent/CN111814615A/en
Application granted granted Critical
Publication of CN111814615B publication Critical patent/CN111814615B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/40Detecting, measuring or recording for evaluating the nervous system
    • A61B5/4076Diagnosing or monitoring particular conditions of the nervous system
    • A61B5/4082Diagnosing or monitoring movement diseases, e.g. Parkinson, Huntington or Tourette
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • G06F18/256Fusion techniques of classification results, e.g. of results related to same input data of results relating to different input data, e.g. multimodal recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention provides a Parkinson non-contact intelligent detection method and system based on instruction video. The method comprises the following steps: acquiring an instruction type video data set of a Parkinson patient and a non-patient; constructing a face model and calibrating key points; determining eye feature vectors according to the eye key points of the face model; determining a mouth feature vector according to the mouth key points of the face model; constructing a fusion network model; training an optimal model according to the mouth feature vector, the eye feature vector and the fusion network model; and determining the Parkinson patient according to the optimal model. The method comprehensively analyzes the mouth characteristics and the eye characteristics, introduces the difference idea into dynamic characteristic extraction, designs frame segmentation according to instructions to carry out statistical calculation of the characteristics, and finally trains a model by using a support vector machine algorithm, thereby improving the accuracy of the Parkinson detection and improving the detection accuracy.

Description

Parkinson non-contact intelligent detection method based on instruction video
Technical Field
The invention relates to the field of Parkinson non-contact intelligent detection, in particular to a Parkinson non-contact intelligent detection method and system based on instruction video.
Background
Parkinson's Disease (PD) is a common degenerative disease of the nervous system, and with the development of face recognition technology and natural language processing technology, medical applications for disease diagnosis based on video are emerging, and the requirements of scenes such as online inquiry, intelligent diagnosis guidance, and patient-related communication for symptom detection are becoming more and more "concise", "efficient", and "multidimensional".
Parkinson's face mask' refers to a decrease in facial expression of parkinson patients due to dyskinesia, with the clinical manifestations of light to heavy in turn: normal, dull face, poor facial expression, involuntary mouth opening, no expression at all, etc. As the development phase of parkinson's disease continues to evolve, the sensation of stiffness will become more apparent as facial muscles move. The mask face is an important index for clinically judging whether the patient suffers from the Parkinson disease.
Based on the characteristics of the 'mask face' of the Parkinson patient, an instruction type Parkinson detection method can be designed, and has the following characteristics: firstly, the clear instruction task can fully guide the patient to complete a simple expression task, is more accurate and more vivid compared with the traditional complex expression simulation task, and is suitable for an intelligent diagnosis guide platform of a hospital; secondly, because a single instruction corresponds to the movement of a single part, the dynamic characteristic extraction according to the instruction is more targeted during characteristic analysis, the effect of different characteristic sources on the Parkinson detection is compared in a training Support Vector Machine (SVM) mode, and the detection accuracy is improved.
Disclosure of Invention
The invention aims to provide a Parkinson non-contact intelligent detection method and system based on instruction video, so as to synthesize facial features and improve detection efficiency.
In order to achieve the purpose, the invention provides the following scheme:
a Parkinson non-contact intelligent detection method based on instruction video comprises the following steps:
acquiring an instruction type video data set of a Parkinson patient and a non-patient;
constructing a face model and calibrating key points;
determining eye feature vectors according to the eye key points of the face model;
determining a mouth feature vector according to the mouth key points of the face model;
constructing a fusion network model;
training an optimal model according to the mouth feature vector, the eye feature vector and the fusion network model;
and determining the Parkinson patient according to the optimal model.
Optionally, the constructing a face model and calibrating the key points specifically include:
based on a multi-task interface provided by the dlib library, which is used for face recognition and key point calibration, 68 personal face key points are extracted from a subject instruction video frame by frame. And 6 left-eye link key points, 6 right-eye link key points and 21 mouth link key points in the 68 key points are extracted as main target points for extracting features.
Optionally, the determining an eye feature vector according to the eye key points of the face model specifically includes:
in order to describe the opening and closing condition of the eyelids at a certain moment, an eyelid opening and closing rate eye based on the distance between the upper eyelid and the lower eyelid and the distance between the inner canthus and the outer canthus is definedratioThe calculation method of (1) is to calculate the opening and closing rate of the eyelids by comparing the Euclidean distance of the two key points with the Euclidean distance of the remaining key points according to the Euclidean distance of the 2 key points of the upper eyelid and the lower eyelid and the Euclidean distance of the two key points of the inner canthus and the outer canthus.
Extracting eyelid opening rate eye of all frames in instruction videoratioAnd the eye opening difference Δ eye from frame to frameratioAnd an ocular feature vector of dimension 14 was calculated based on the 7 statistical interfaces provided by pandas.
Optionally, determining a mouth feature vector according to the mouth key point of the face model specifically includes:
defining the included angle between the key point p [0] of left mouth angle and the horizontal axis, defining the key point p [4] of left mouth angle and the rest key points p [ j ] as beta (pi, pj), calculating mouth compensation angle theta (pi, pj) by summing alpha and beta (pi, pj), and calculating 8 mouth characteristic vectors theta (p [0], p [1]), theta (p [1], p [2]) … theta (p [6], p [7]), theta (p [7], p [0 ]).
The relative distance Eudis (p 2) between the upper and lower key points],p[6]) Comparing the relative distance between the left and right key points Eudis (p [0]],p[4]) Calculating the lip opening and closing rate mthratio
Optionally, constructing a converged network model according to the method specifically includes:
the method comprises the steps of constructing a fusion network model consisting of a feature fusion stage and a full-connection stage, wherein the feature fusion stage comprises an input layer and an output layer, and the fusion full-connection stage comprises an input layer, a first hidden layer, a second hidden layer and an output layer.
Optionally, the training an optimal model according to the mouth feature vector, the eye feature vector, and the fusion network model specifically includes:
and setting a penalty factor C, wherein the parameter represents the fault tolerance of the classifier to a 'slack variable', namely the 'tolerance' to misclassification, and selecting the default 'rbf' of the kernel function kernel.
And training different models based on the extracted features and the SVC method provided by the skrlearn. Dividing the data set D into f mutually exclusive subsets of similar size by cross-validation means, for each subset DiWhen the method is used for testing, other 1/f data are used for training the model to obtain the subset DiTest result under test riAveraging all r to obtain the test result under f times of cross validation, and comparing different parametersThe number C (k) corresponds to fbTest result r of double cross validationbkAnd selecting an optimal model under the current data sample.
A Parkinson non-contact intelligent detection system based on instruction video comprises:
the data set acquisition module is used for acquiring audio and video data sets of Parkinson patients and non-Parkinson patients;
the face model building module is used for building a face model and marking key points;
the eye feature vector determining module is used for determining eye feature vectors according to the eye key points of the face model;
the mouth feature vector determining module is used for determining mouth feature vectors according to the mouth key points of the face model;
the fusion network model building module is used for building a fusion network model;
the optimal model training module is used for training an optimal model by the mouth feature vector, the eye feature vector and the fusion network model;
and the Parkinson patient determination module is used for determining the Parkinson patient according to the optimal model.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the method comprehensively analyzes the mouth characteristics and the eye characteristics, introduces the difference idea into dynamic characteristic extraction, designs frame segmentation according to instructions to carry out statistical calculation of the characteristics, and finally trains a model by using a support vector machine algorithm, thereby improving the accuracy of the Parkinson detection and improving the detection accuracy.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a flow chart of a Parkinson non-contact intelligent detection method based on instruction video according to the invention;
FIG. 2 is a structural diagram of a Parkinson non-contact intelligent detection system based on instruction video according to the invention;
FIG. 3 is a face keypoint calibration graph of the present invention;
FIG. 4 is a schematic diagram of the ocular key of the present invention;
FIG. 5 is a schematic diagram of the key points of the mouth of the present invention;
FIG. 6 is a diagram of a prediction confusion matrix of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention aims to provide a Parkinson non-contact intelligent detection method and system based on instruction video, which can comprehensively analyze mouth features and eye features and improve interactivity and detection efficiency.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
FIG. 1 is a flow chart of a Parkinson non-contact intelligent detection method based on instruction video. As shown in fig. 1, a parkinson non-contact intelligent detection method based on instruction video includes:
step 101: instructional video data sets are acquired for both parkinson and non-parkinson patients.
The invention constructs a clinically validated data set consisting of 2N subjects, with a parkinson to non-patient ratio of 1: 1. the subject is required to move along with the step-by-step command mode of the command 1 'please relax and look straight ahead' and the command 2 'please smile and expose teeth' and 'eyes-mouth', the process of the motion of the subject is recorded, and the most effective duration of the recorded video selection and splicing is 15s to be used as a video stream data set of the method.
Step 102: constructing a face model and marking key points, and specifically comprising the following steps:
based on a multi-task interface provided by the dlib library, which is used for face recognition and key point calibration, 68 personal face key points are extracted from a subject instruction video frame by frame. And 6 left-eye link key points, 6 right-eye link key points and 21 mouth link key points in the 68 key points are extracted as main target points for extracting features.
Step 103: determining eye feature vectors according to the eye key points of the face model, specifically comprising:
in order to describe the opening and closing condition of the eyelids at a certain moment, an eyelid opening and closing rate eye based on the distance between the upper eyelid and the lower eyelid and the distance between the inner canthus and the outer canthus is definedratioThe calculation method of (1) is to calculate the opening and closing rate of the eyelids by comparing the Euclidean distance of the two key points with the Euclidean distance of the remaining key points according to the Euclidean distance of the 2 key points of the upper eyelid and the lower eyelid and the Euclidean distance of the two key points of the inner canthus and the outer canthus.
Extracting eyelid opening rate eye of all frames in instruction videoratioAnd the eye opening difference Δ eye from frame to frameratioAnd an ocular feature vector of dimension 14 was calculated based on the 7 statistical interfaces provided by pandas.
Step 104: determining a mouth feature vector according to the mouth key points of the face model, specifically comprising:
defining the included angle between the key point p [0] of the left mouth angle and the horizontal axis, defining the key point p [4] of the right mouth angle and the horizontal axis as alpha, defining the key point pi of the left mouth angle and the other key points p [ j as beta (pi, pj), summing alpha and beta (pi, pj) to calculate the mouth compensation angle theta (pi, pj), and then obtaining 8 mouth characteristic vectors theta (p [0], p [1]), theta (p [1], p [2]), … theta (p [6], p [7]), theta (p [7], p [0 ]).
The relative distance Eudis (p 2) between the upper and lower key points],p[6]) Comparing the relative distance between the left key point and the right key pointIo Eudis (p [0]],p[4]) Calculating the lip opening and closing rate mthratio
Step 105: constructing a fusion network model, which specifically comprises the following steps:
the method comprises the steps of constructing a fusion network model consisting of a feature fusion stage and a full-connection stage, wherein the feature fusion stage comprises an input layer and an output layer, and the fusion full-connection stage comprises an input layer, a first hidden layer, a second hidden layer and an output layer.
Step 106: training an optimal model according to the mouth feature vector, the eye feature vector and the fusion network model, specifically comprising:
and setting a penalty factor C, wherein the parameter represents the fault tolerance of the classifier to a 'slack variable', namely the 'tolerance' to misclassification, and selecting the default 'rbf' of the kernel function kernel.
And training different models based on the extracted features and the SVC method provided by the skrlearn. Dividing the data set D into f mutually exclusive subsets of similar size by cross-validation means, for each subset DiWhen the method is used for testing, other 1/f data are used for training the model to obtain the subset DiTest result under test riAveraging all r to obtain the test result under f times of cross validation, and comparing different parameters C (k) corresponding to fbTest result r of double cross validationbkAnd selecting an optimal model under the current data sample.
Step 107: and determining the Parkinson patient according to the optimal model.
The method comprehensively analyzes the mouth characteristics and the eye characteristics, introduces the difference idea into dynamic characteristic extraction, designs frame segmentation according to instructions to carry out statistical calculation of the characteristics, and finally trains a model by using a support vector machine algorithm, thereby improving the accuracy of the Parkinson detection and improving the detection accuracy.
FIG. 2 is a structural diagram of a Parkinson non-contact intelligent detection system based on instruction video. As shown in fig. 2, a parkinson non-contact intelligent detection system based on instruction video includes:
a data set acquisition module 201, configured to acquire audio and video data sets of parkinson patients and non-parkinson patients;
the face model construction module 202 is used for constructing a face model and marking key points;
the eye feature vector determination module 203 is configured to determine an eye feature vector according to the eye key points of the face model;
a mouth feature vector determining module 204, configured to determine a mouth feature vector according to the mouth key point of the face model;
a converged network model construction module 205, configured to construct a converged network model;
an optimal model training module 206, configured to train an optimal model based on the mouth feature vector, the eye feature vector, and the fusion network model;
a parkinson patient determination module 207, configured to determine a parkinson patient according to the optimal model.
Example 1:
for a more detailed discussion of the present invention, a specific example is provided below, comprising the following steps:
step one, acquiring instruction video data sets of Parkinson patients and non-Parkinson patients:
this example constructed a clinically validated data set consisting of 200 subjects with a parkinson to non-patient ratio of 1: 1. the subject is required to move along with the step-by-step command mode of the command 1 'please relax and look straight ahead' and the command 2 'please smile and expose teeth' and 'eyes-mouth', the process of the motion of the subject is recorded, and the most effective duration of the recorded video selection and splicing is 15s to be used as a video stream data set of the method.
Step two, constructing a face model, and calibrating key points:
based on the multitask interface provided by the dlib library for face recognition and keypoint targeting, 68 personal face keypoints are extracted from the video of the subject frame by frame. Due to the targeted instruction design, of the 68 key points of the whole face, only the key points of the eyes and the mouth really need to be concerned, specifically, the first 32 key points in the method, wherein No.37-No.42 links the left eye, No.43-No.48 links the right eye, and No. 49-No. 68 links the mouth. As shown in fig. 3, which is the extraction of the coordinates (x, y) of the key points in the frame sequence, where the blue circles represent the key point locations and the numbers represent their corresponding serial numbers.
Thirdly, determining eye feature vectors according to the eye key points of the face model:
based on the 12 keypoint coordinates obtained in step two for the left and right eyes, consider the static and dynamic information of the eyes in the sequence of keyframes (the first 1/3 segments of the total video), as shown in fig. 4.
To describe the eyelid opening and closing conditions at a certain moment, we define the eyelid opening and closing rate based on the distance between the upper and lower eyelids and the distance between the inner and outer canthus.
Figure BDA0002557566960000071
s.t.Eudis(p[0],p[3]≠0
Where p [. cndot. ] represents the eye key points in a coordinate shape like (x, y), and Eudis represents the Euclidean distance between two points
Figure BDA0002557566960000072
Since eye motion runs through the entire video itself, eye is calculated for all video framesratioAnd delta eye from frame to frameratioWhich reflects the variation of the eyelid opening and closing rate over time, where Δ eyeratioIs calculated as follows
Figure BDA0002557566960000073
Wherein m is the total frame number, dopna (-) is the deletion function, if the (-) is empty, the deletion function is discarded, otherwise, the deletion function is taken. Eye feature vector eyefeat with dimension 14 is then computed based on the 7 statistical feature interfaces provided by pandas,
eyefeat=(ef1;ef2;…;ef7;ef8;…;ef14)
wherein the ef 1-ef 7 are derived from eyeratioEf 8-ef 14 are derived from Δ eyeratio
Step four, determining a mouth feature vector according to the mouth key points of the face model:
based on the 8 key point coordinates of the mouth obtained in step two, a "smile elevation angle" α is defined, as shown in fig. 5.
The reference quantity really valuable when studying mouth movement should be the angle between the connecting line of the key points of the adjacent inner mouth and the mouth angle connecting line, i.e. the angle theta shown in fig. 4, and the angle beta is the angle between the connecting line of the adjacent points and the horizontal line. Obviously, α, β, θ satisfy the following relation:
theta is alpha + beta type (4-1)
We will turn p [0]],p[1]Theta corresponding to two points is represented as theta (p [ i ]],p[j]) For the same reason, beta is represented as beta (p [ i ]],p[j]) The rest are analogized, and point p [. cndot.)]Respectively using p [. cndot. ]]x,p[·]yIndicates that there is
Figure BDA0002557566960000081
s.t.p[0]x-p[4]x≠0
Where atan () is the inverse function of tan (), which is used to find the tangent value versus the radian. In the same way, the method for preparing the composite material,
Figure BDA0002557566960000082
s.t.p[0]x-p[1]x≠0
by substituting formula (3-11), formula (3-12) for formula (3-10)
Theta (p 0, p 1) alpha + beta (p 0, p 1) formula (4-4)
s.t.p[0]x-p[1]x≠0
p[0]x-p[4]x≠0
From the practical situation we have reason to believe that the above constraint is always true.
The formula for calculating the lip opening and closing rate is written as
Figure BDA0002557566960000083
s.t.Eudis(p[0],p[4])≠0
Referring to the principles of formula (4-1), formula (4-2), formula (4-3), θ (p 0)],p[1]),θ(p[1],p[2]),θ(p[2],p[3]),θ(p[3],p[4]),θ(p[4],p[5]),θ(p[5],p[6]),θ(p[6],p[7]),θ(p[7],p[0])8 opening and closing angles and mthratioAs the key frame (middle 1/3 segment of the total video) corresponds to the 9-valued feature, the feature vector mthfeat,
mthfeat=(mt1;mt2;…;mt8;mt9)
wherein mt 1-mt 8 are originated from theta, mt9 is originated from mthratio
Step five, constructing a fusion network model:
the method comprises the steps of constructing a fusion network model consisting of a feature fusion stage and a full-connection stage, wherein the feature fusion stage comprises an input layer and an output layer, and the fusion full-connection stage comprises an input layer, a first hidden layer, a second hidden layer and an output layer.
Step six, training an optimal model according to the mouth feature vector, the eye feature vector and the fusion network model:
and training different models based on the extracted features and the SVC method provided by the skrlearn. The SVC method needs to set a penalty factor C, which represents the fault tolerance of the classifier to "slack variables", i.e. the "tolerance" to misclassification: when C tends to infinity, the fault tolerance is small, and all samples are required to meet the constraint
s.t.yi(wTxi+ b) is not less than 1, i is 1,2, …, m is (6-1)
Therefore, overfitting is easy to happen, and the trained model is weak in generalization ability; when C takes a finite value, some samples are allowed to fail the constraint. Based on past experience, let
Figure BDA0002557566960000091
In this section, the default 'rbf' of the kernel function kernel is chosen.
By means of cross-validation, the data set D is divided into f mutually exclusive subsets of similar size, for each subset DiWhen the method is used for testing, other 1/f data are used for training the model to obtain the subset DiTest result under test riAnd averaging all the r to obtain the test result under the f-fold cross validation. In this section, the optimal test fraction f is selected for each classifier M (C (k))bI.e. fbThe best result of the corresponding mean value of the double cross validation is fb∈{fbL 3,4,5,6 }. Comparing different parameters C (k) to fbTest result r of double cross validationbkSelecting the optimal model under the current data sample, and recordingk=max(rbk). The trained optimal models were compared when different features were entered, and the specific results are shown in table 5-1.
Figure BDA0002557566960000092
TABLE 5-1 results corresponding to different input characteristics
It can be seen that the effect of computing the difference is generally better than the case of not being differentiated whether the source of the features is the mouth or the eye; in the characteristic items of the mouth, the training effect that the included angles theta and delta theta between the connecting line of adjacent key points and the mouth angle are not as good as the difference delta mr of the mouth opening and closing rate is good; for instruction type intelligent Parkinson detection, the research value of the mouth is obviously superior to that of eyes, and the training effect is better than that of the eyes. For multiple comparisons, this document finally adopts a scheme of feature quantity Δ mr ═ mean, var, skew, kurt, max, min, ptp) which predicts a confusion matrix as shown in fig. 6.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims (7)

1. A Parkinson non-contact intelligent detection method based on instruction video is characterized by comprising the following steps:
acquiring an instruction type video data set of a Parkinson patient and a non-patient;
constructing a face model and calibrating key points;
determining eye feature vectors according to the eye key points of the face model;
determining a mouth feature vector according to the mouth key points of the face model;
constructing a fusion network model;
training an optimal model according to the mouth feature vector, the eye feature vector and the fusion network model;
and determining the Parkinson patient according to the optimal model.
2. The parkinson non-contact intelligent detection method based on instruction video according to claim 1, wherein the constructing a face model and calibrating key points specifically comprises:
based on a multi-task interface provided by the dlib library, which is used for face recognition and key point calibration, 68 personal face key points are extracted from a subject instruction video frame by frame. And 6 left-eye link key points, 6 right-eye link key points and 21 mouth link key points in the 68 key points are extracted as main target points for extracting features.
3. The audio/video-based parkinson non-contact intelligent detection method according to claim 1, wherein the determining of eye feature vectors according to eye key points of the face model specifically comprises:
in order to describe the opening and closing condition of the eyelids at a certain moment, an eyelid opening and closing rate eye based on the distance between the upper eyelid and the lower eyelid and the distance between the inner canthus and the outer canthus is definedratioThe calculation method of (1) is to calculate the opening and closing rate of the eyelids by comparing the Euclidean distance of the two key points with the Euclidean distance of the remaining key points according to the Euclidean distance of the 2 key points of the upper eyelid and the lower eyelid and the Euclidean distance of the two key points of the inner canthus and the outer canthus.
Extracting eyelid opening rate eye of all frames in instruction videoratioAnd the eye opening difference Δ eye from frame to frameratioAnd an ocular feature vector of dimension 14 was calculated based on the 7 statistical interfaces provided by pandas.
4. The parkinson-based non-contact intelligent detection method based on instruction video according to claim 1, wherein the determining mouth feature vectors according to the mouth key points of the face model specifically includes:
defining the included angle between the key point p [0] of the left mouth angle and the horizontal axis as alpha, defining the key point p [ i ] of the left mouth angle and the other key points p [ j ] as beta (pi, pj), calculating the mouth compensation angle theta (pi, pj) by summing alpha and beta (pi, pj), and then obtaining 8 mouth feature vectors theta (p [0], p [1]), theta (p [1], p [2]). theta (p [6], p [7]), theta (p [7], p [0 ]).
The relative distance Eudis (p 2) between the upper and lower key points],p[6]) Comparing the relative distance between the left and right key points Eudis (p [0]],p[4]) Calculating the lip opening and closing rate mthratio
5. The parkinson non-contact intelligent detection method based on instruction video according to claim 1, wherein the constructing a converged network model specifically comprises:
the method comprises the steps of constructing a fusion network model consisting of a feature fusion stage and a full-connection stage, wherein the feature fusion stage comprises an input layer and an output layer, and the fusion full-connection stage comprises an input layer, a first hidden layer, a second hidden layer and an output layer.
6. The parkinson-based non-contact intelligent detection method based on instruction video according to claim 1, wherein the training of the optimal model according to the mouth feature vector, the eye feature vector, and the fusion network model specifically comprises:
and setting a penalty factor C, wherein the parameter represents the fault tolerance of the classifier to a 'slack variable', namely the 'tolerance' to misclassification, and selecting the default 'rbf' of the kernel function kernel.
And training different models based on the extracted features and the SVC method provided by the skrlearn. Dividing the data set D into f mutually exclusive subsets of similar size by cross-validation means, for each subset DiWhen the method is used for testing, other 1/f data are used for training the model to obtain the subset DiTest result under test riAveraging all r to obtain the test result under f times of cross validation, and comparing different parameters C (k) corresponding to fbTest result r of double cross validationbkAnd selecting an optimal model under the current data sample.
7. The utility model provides a parkinson non-contact intelligent detection system based on instruction video which characterized in that includes:
the data set acquisition module is used for acquiring audio and video data sets of Parkinson patients and non-Parkinson patients;
the face model building module is used for building a face model and marking key points;
the eye feature vector determining module is used for determining eye feature vectors according to the eye key points of the face model;
the mouth feature vector determining module is used for determining mouth feature vectors according to the mouth key points of the face model;
the fusion network model building module is used for building a fusion network model;
the optimal model training module is used for training an optimal model by the mouth feature vector, the eye feature vector and the fusion network model;
and the Parkinson patient determination module is used for determining the Parkinson patient according to the optimal model.
CN202010596575.XA 2020-06-28 2020-06-28 Parkinson non-contact intelligent detection method based on instruction video Active CN111814615B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010596575.XA CN111814615B (en) 2020-06-28 2020-06-28 Parkinson non-contact intelligent detection method based on instruction video

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010596575.XA CN111814615B (en) 2020-06-28 2020-06-28 Parkinson non-contact intelligent detection method based on instruction video

Publications (2)

Publication Number Publication Date
CN111814615A true CN111814615A (en) 2020-10-23
CN111814615B CN111814615B (en) 2024-04-12

Family

ID=72856436

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010596575.XA Active CN111814615B (en) 2020-06-28 2020-06-28 Parkinson non-contact intelligent detection method based on instruction video

Country Status (1)

Country Link
CN (1) CN111814615B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113076813A (en) * 2021-03-12 2021-07-06 首都医科大学宣武医院 Mask face feature recognition model training method and device
CN116392086A (en) * 2023-06-06 2023-07-07 浙江多模医疗科技有限公司 Method, system, terminal and storage medium for detecting stimulus
CN117137442A (en) * 2023-09-04 2023-12-01 佳木斯大学 Parkinsonism auxiliary detection system based on biological characteristics and machine-readable medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105096528A (en) * 2015-08-05 2015-11-25 广州云从信息科技有限公司 Fatigue driving detection method and system
CN106781282A (en) * 2016-12-29 2017-05-31 天津中科智能识别产业技术研究院有限公司 A kind of intelligent travelling crane driver fatigue early warning system
CN106997451A (en) * 2016-01-26 2017-08-01 北方工业大学 Lip contour positioning method
CN109919049A (en) * 2019-02-21 2019-06-21 北京以萨技术股份有限公司 Fatigue detection method based on deep learning human face modeling

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105096528A (en) * 2015-08-05 2015-11-25 广州云从信息科技有限公司 Fatigue driving detection method and system
CN106997451A (en) * 2016-01-26 2017-08-01 北方工业大学 Lip contour positioning method
CN106781282A (en) * 2016-12-29 2017-05-31 天津中科智能识别产业技术研究院有限公司 A kind of intelligent travelling crane driver fatigue early warning system
CN109919049A (en) * 2019-02-21 2019-06-21 北京以萨技术股份有限公司 Fatigue detection method based on deep learning human face modeling

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113076813A (en) * 2021-03-12 2021-07-06 首都医科大学宣武医院 Mask face feature recognition model training method and device
CN113076813B (en) * 2021-03-12 2024-04-12 首都医科大学宣武医院 Training method and device for mask face feature recognition model
CN116392086A (en) * 2023-06-06 2023-07-07 浙江多模医疗科技有限公司 Method, system, terminal and storage medium for detecting stimulus
CN116392086B (en) * 2023-06-06 2023-08-25 浙江多模医疗科技有限公司 Method, terminal and storage medium for detecting stimulation
CN117137442A (en) * 2023-09-04 2023-12-01 佳木斯大学 Parkinsonism auxiliary detection system based on biological characteristics and machine-readable medium
CN117137442B (en) * 2023-09-04 2024-03-29 佳木斯大学 Parkinsonism auxiliary detection system based on biological characteristics and machine-readable medium

Also Published As

Publication number Publication date
CN111814615B (en) 2024-04-12

Similar Documents

Publication Publication Date Title
Zhou et al. Visually interpretable representation learning for depression recognition from facial images
Wu et al. Transfer learning for EEG-based brain–computer interfaces: A review of progress made since 2016
CN111814615A (en) Parkinson non-contact intelligent detection method based on instruction video
Gunes et al. Categorical and dimensional affect analysis in continuous input: Current trends and future directions
Bilakhia et al. The MAHNOB Mimicry Database: A database of naturalistic human interactions
Al Osman et al. Multimodal affect recognition: Current approaches and challenges
CN110491502A (en) Microscope video stream processing method, system, computer equipment and storage medium
Liu et al. Adaptive multilayer perceptual attention network for facial expression recognition
Tolosana et al. DeepFakes detection across generations: Analysis of facial regions, fusion, and performance evaluation
Liu et al. Dual-stream generative adversarial networks for distributionally robust zero-shot learning
Li et al. Learning representations for facial actions from unlabeled videos
Chetty et al. A multilevel fusion approach for audiovisual emotion recognition
Zhu et al. Hybrid feature-based analysis of video’s affective content using protagonist detection
Liu et al. PRA-Net: Part-and-Relation Attention Network for depression recognition from facial expression
Jingchao et al. Recognition of classroom student state features based on deep learning algorithms and machine learning
Shen et al. Multi-modal feature fusion for better understanding of human personality traits in social human–robot interaction
Li et al. Depression severity prediction from facial expression based on the DRR_DepressionNet network
Wang et al. Semi-supervised classification-aware cross-modal deep adversarial data augmentation
Li et al. Emotion recognition of Chinese paintings at the thirteenth national exhibition of fines arts in China based on advanced affective computing
Zheng et al. Bridging clip and stylegan through latent alignment for image editing
Aslam et al. Privileged knowledge distillation for dimensional emotion recognition in the wild
Cao et al. Concept-centric Personalization with Large-scale Diffusion Priors
S'adan et al. Deep learning techniques for depression assessment
Singh et al. Multi-modal Expression Detection (MED): A cutting-edge review of current trends, challenges and solutions
Ning et al. ICGNet: An intensity-controllable generation network based on covering learning for face attribute synthesis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20210609

Address after: 100000 No. 6 South Road, Zhongguancun Academy of Sciences, Beijing, Haidian District

Applicant after: Institute of Computing Technology, Chinese Academy of Sciences

Applicant after: XIANGTAN University

Address before: Xiangtan University, yanggutang street, Yuhu District, Xiangtan City, Hunan Province

Applicant before: XIANGTAN University

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant