WO2023116736A1 - 一种基于视频数据的抽动症辅助筛查系统 - Google Patents
一种基于视频数据的抽动症辅助筛查系统 Download PDFInfo
- Publication number
- WO2023116736A1 WO2023116736A1 PCT/CN2022/140523 CN2022140523W WO2023116736A1 WO 2023116736 A1 WO2023116736 A1 WO 2023116736A1 CN 2022140523 W CN2022140523 W CN 2022140523W WO 2023116736 A1 WO2023116736 A1 WO 2023116736A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- tic
- module
- video data
- instance
- twitch
- Prior art date
Links
- 238000012216 screening Methods 0.000 title claims abstract description 33
- 201000004311 Gilles de la Tourette syndrome Diseases 0.000 title abstract description 5
- 208000000323 Tourette Syndrome Diseases 0.000 title abstract description 5
- 208000016620 Tourette disease Diseases 0.000 title abstract description 5
- 238000004458 analytical method Methods 0.000 claims abstract description 76
- 238000000034 method Methods 0.000 claims abstract description 39
- 230000036541 health Effects 0.000 claims abstract description 33
- 230000008569 process Effects 0.000 claims abstract description 32
- 230000004927 fusion Effects 0.000 claims abstract description 30
- 230000006870 function Effects 0.000 claims abstract description 29
- 238000001514 detection method Methods 0.000 claims abstract description 28
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 14
- 230000002159 abnormal effect Effects 0.000 claims description 55
- 238000012549 training Methods 0.000 claims description 52
- 230000036461 convulsion Effects 0.000 claims description 44
- 230000000007 visual effect Effects 0.000 claims description 44
- 230000009471 action Effects 0.000 claims description 29
- 238000004364 calculation method Methods 0.000 claims description 20
- 206010028347 Muscle twitching Diseases 0.000 claims description 16
- 208000016686 tic disease Diseases 0.000 claims description 15
- 238000012545 processing Methods 0.000 claims description 12
- 238000011176 pooling Methods 0.000 claims description 11
- 238000007781 pre-processing Methods 0.000 claims description 10
- 230000001815 facial effect Effects 0.000 claims description 7
- 230000004913 activation Effects 0.000 claims description 6
- 230000002123 temporal effect Effects 0.000 claims description 6
- 230000008859 change Effects 0.000 claims description 5
- 238000003759 clinical diagnosis Methods 0.000 claims description 5
- 238000013135 deep learning Methods 0.000 claims description 4
- 235000006694 eating habits Nutrition 0.000 claims description 4
- 238000013145 classification model Methods 0.000 claims description 3
- 230000007613 environmental effect Effects 0.000 claims description 3
- 239000000284 extract Substances 0.000 claims description 3
- 230000003321 amplification Effects 0.000 claims description 2
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 2
- 230000009466 transformation Effects 0.000 claims description 2
- 238000003745 diagnosis Methods 0.000 abstract description 10
- 208000024891 symptom Diseases 0.000 abstract description 8
- 201000010099 disease Diseases 0.000 abstract description 7
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 abstract description 7
- 238000013480 data collection Methods 0.000 abstract description 4
- 238000012800 visualization Methods 0.000 abstract description 4
- 238000009499 grossing Methods 0.000 abstract description 2
- 230000019771 cognition Effects 0.000 abstract 1
- 239000010410 layer Substances 0.000 description 16
- 238000012360 testing method Methods 0.000 description 6
- 230000005856 abnormality Effects 0.000 description 4
- 238000007405 data analysis Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 238000007689 inspection Methods 0.000 description 3
- 238000000605 extraction Methods 0.000 description 2
- 210000002569 neuron Anatomy 0.000 description 2
- 206010038743 Restlessness Diseases 0.000 description 1
- 241001122767 Theaceae Species 0.000 description 1
- 208000008234 Tics Diseases 0.000 description 1
- 208000036142 Viral infection Diseases 0.000 description 1
- 239000000654 additive Substances 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000004397 blinking Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 210000003710 cerebral cortex Anatomy 0.000 description 1
- 230000001684 chronic effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 206010014599 encephalitis Diseases 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 230000002401 inhibitory effect Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 239000002356 single layer Substances 0.000 description 1
- 230000000638 stimulation Effects 0.000 description 1
- 230000003461 thalamocortical effect Effects 0.000 description 1
- 230000009385 viral infection Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/0059—Measuring for diagnostic purposes; Identification of persons using light, e.g. diagnosis by transillumination, diascopy, fluorescence
- A61B5/0077—Devices for viewing the surface of the body, e.g. camera, magnifying lens
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/103—Measuring devices for testing the shape, pattern, colour, size or movement of the body or parts thereof, for diagnostic purposes
- A61B5/11—Measuring movement of the entire body or parts thereof, e.g. head or hand tremor or mobility of a limb
- A61B5/1126—Measuring movement of the entire body or parts thereof, e.g. head or hand tremor or mobility of a limb using a particular sensing technique
- A61B5/1128—Measuring movement of the entire body or parts thereof, e.g. head or hand tremor or mobility of a limb using a particular sensing technique using image analysis
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/40—Detecting, measuring or recording for evaluating the nervous system
- A61B5/4076—Diagnosing or monitoring particular conditions of the nervous system
- A61B5/4094—Diagnosing or monitoring seizure diseases, e.g. epilepsy
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/72—Signal processing specially adapted for physiological signals or for diagnostic purposes
- A61B5/7235—Details of waveform analysis
- A61B5/7264—Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
Definitions
- the invention relates to the technical field of medical and health information, in particular to an auxiliary screening system for tic disorder based on video data.
- this invention uses the three-dimensional convolutional neural network model in the field of deep learning to detect abnormal tic movements in frontal recorded videos, combined with the comprehensive analysis of health information in clinical outpatient clinics, and proposes tic detection based on video data. Detection method and auxiliary screening system for tic disorder.
- the doctor needs to spend a long time to observe and confirm the patient's tic characteristics, and needs to inquire with the patient and family members to confirm the recent and previous tic characteristics, eating habits, living habits, family medical history, etc.;
- patients may experience an inhibitory state due to going to a new environment or coming into contact with strangers, which is not conducive to the actual diagnosis and evaluation of the disease.
- the tic symptoms of tic patients mainly rely on the complex process based on the clinical diagnosis of tic disease, and many tic patients have problems that are not easy to detect.
- the existing tic detection methods rely on deep brain stimulation or wearable devices to collect data, and the data collection method is relatively complex.
- the purpose of the present invention is to address the deficiencies in the prior art, and propose a video data-based auxiliary screening system for tics, which uses video data to automatically identify tic symptoms, uses a three-dimensional convolutional neural network based on multi-instance learning, and uses a combination of three-dimensional channel attention
- the power and three-dimensional spatial attention module optimizes the learning features of the three-dimensional convolutional neural network, and optimizes the loss function by using time smoothing constraints, which can improve the model's ability to detect tics, and combine the health information questionnaire data transformed from clinical consultation to form
- the auxiliary screening system for tic disorder improves the efficiency of screening and identification, and reduces the tension and discomfort of patients in unfamiliar environments through non-direct contact.
- the present invention simplifies the most time-consuming symptom observation process through video data collection and twitch detection, and through data fusion analysis and visualization, provides screening patients with a preliminary understanding of the disease, and also provides reference for doctors' follow-up diagnosis and treatment and basis.
- a video data-based tic auxiliary screening system the system includes a tic motion detection module, a health information collection and processing module, a visual data acquisition module and a fusion analysis module;
- the visual data acquisition module is used to collect screener's facial video data, which is input to the twitch detection module;
- the twitch detection module includes a data preprocessing module, a visual feature analysis module, a twitch abnormal score generation module and a multi-instance strategy training module;
- the data preprocessing module processes the video data collected by the visual data acquisition module into temporal image data applicable to the deep learning network, and inputs it to the visual feature analysis module;
- the visual feature analysis module performs video data feature analysis through a three-dimensional convolutional neural network model based on three-dimensional channel attention and three-dimensional spatial attention;
- the convolutional block of the first layer and q convolutional blocks comprising two layers of three-dimensional convolutional layers; the q convolutional blocks comprising two layers of three-dimensional convolutional layers are all connected in parallel to the three-dimensional channel attention module and the three-dimensional space
- the attention module extracts the three-dimensional channel attention feature and the three-dimensional space attention feature of the feature map after the convolution calculation, and the generated feature map is input into the twitch action abnormal score generation module composed of a fully connected layer network model to obtain the twitch action abnormal score Value, through abnormal score threshold analysis to determine whether there is twitching action; at the same time, the abnormal score value forms time series data and input it to the fusion analysis module;
- the multi-example strategy training module carries out multi-example learning strategy training to the network model in the visual feature analysis module based on the example of the control group and the example of the tic group, and the example of the control group and the example of the tic group are respectively extracted from respective video data to fix several segments. Continuous frames are obtained; the tic abnormality scores of different examples of the tic group and the control group are obtained through the visual feature analysis module, the loss value of each training is calculated based on the ranking loss function, and the network model parameters in the visual feature analysis module are updated;
- the health information collection and processing module collects and counts the health information of the screener based on the clinical diagnosis process of tic disorder, and performs numerical transformation on the collected health information data, and inputs it to the fusion analysis module;
- the fusion analysis module is used to calculate the recognition probability of tic or normal through the classification model of the numerically processed health information data and the time series data formed by the abnormal score value, and then use the Bayesian addition fusion rule to perform the identification of the two results. Addition and fusion, the category corresponding to the maximum value is used as the judgment result; the number of twitch peaks and timing points are obtained through the peak detection algorithm, and the peak time location is obtained from the frame sequence backtracking the original video, and the twitch occurrence time is obtained; the threshold is used to filter before and after the twitch peak occurs Interval, positioning to get the duration of each twitch; according to the abnormal score value, twitch occurrence time and each twitch duration, draw the corresponding analysis video's twitch abnormal score change curve and twitch action heat map, and calculate per minute based on the original video duration The frequency and duration of tics; the analysis results of the fusion analysis module provide patients with suggestions for the next step of examination and feedback information on their own
- the preprocessing process of the video data by the data preprocessing module is specifically as follows: the collected facial video data is positioned through the face detection algorithm OpenFace to locate the area of the face in each frame of video image, and the twitching action in the original video image is removed. Irrelevant environmental information, focusing on the screener's facial twitching movements, and saving the processed image.
- the three-dimensional channel attention module compresses the feature map F 3D with a size of (Channel, Dimension, Height, Weight) after convolution and pooling into a size of (Channel, 1, 1, 1)
- the average temporal feature F 3D′ of the multi-layer perceptron MLP and the Sigmoid activation function are used to predict and calculate the importance of each channel, and the three-dimensional channel attention feature F 3D-C is obtained.
- the specific calculation is:
- F 3D-C Sigmoid(MLP(F 3D' )).
- the three-dimensional spatial attention module compresses the feature F 3D with a size of (Channel, Dimension, Height, Weight) after convolution and pooling into a size of (1, Dimension, Height, Weight) through average pooling. ) average spatial feature F 3D′′ , and then the spatial attention feature F 3D-S is obtained through the Sigmoid activation function, which is specifically calculated as:
- F 3D-S Sigmoid(F 3D′′ ).
- the 3D channel attention module and the 3D spatial attention module are connected in parallel to the 3D convolution block containing two 3D convolution layers, and are respectively multiplied and added to the feature F 3D , and finally obtained
- the calculation formula of the output feature F A is:
- the training process of the multi-instance strategy training module is specifically as follows: the examples of the tic group and the examples of the control group respectively constitute the tic multi-instance package and against the multi-example package
- the tic anomaly score sets ⁇ k a ⁇ and ⁇ k n ⁇ of all examples in the tic group and the control group are obtained, and the maximum abnormal scores in the tic multi-instance package and the control multi-instance package are calculated respectively and
- i is the i-th example corresponding to the maximum abnormal score value of the tic multi-example package
- j is the j-th example corresponding to the maximum abnormal score value in the control multi-example package
- the corresponding example The abnormal score value of does not exist and is not included in the calculation of the mean
- N a is the number of examples in the tic multi-example package
- N n is the number of examples in the control multi-example package
- m is the tic multi-instance package
- N a represents the tic multi-instance package the number of examples in
- the ranking loss function L based on the multi-instance learning strategy is expressed as follows:
- an exponential decay function is used to iterate the learning rate Lr, and the expression is:
- epoch_t is the current training round
- the video data of the control group examples and the video data of the tic group examples are added to the data by adding random Gaussian noise, random color jitter, random rotation, and random clipping. Simulate the image quality changes, color changes, face direction changes, and lens distance changes that occur during the video data collection process.
- the health information collected by the health information collection and processing module includes demographic information, living habits, eating habits, family history and family observation records.
- the Gaussian kernel SVM classifier is trained with the health information data processed by the health information collection and processing module to obtain the recognition probability; the data output by the visual feature analysis module contains time series information, and the LSTM network is used to Perform training analysis with the Softmax function to obtain the recognition probability.
- the present invention collects video data in a non-implantable and non-wearable manner, which is convenient, has good universal applicability of camera equipment, and has high system implantability.
- the present invention uses video data to analyze and detect twitching movements, so that patients do not need to communicate face-to-face with doctors to reduce the tension and discomfort of patients in unfamiliar environments, and can better express the real condition.
- the tic screening results can provide patients and parents with disease knowledge popularization, and can also provide doctors with reference for disease assessment and management.
- the present invention can realize remote twitch recognition and detection through a communication network, reduce the number of times patients and parents go to specialized hospitals, and reduce time and travel costs.
- Figure 1 is a schematic diagram of the three-dimensional convolutional neural network structure of the combined channel attention and spatial attention modules.
- Figure 2 is a schematic diagram of the visual model analysis training process.
- Fig. 3 is a schematic diagram of an auxiliary screening system for tic disorder based on machine vision.
- Figure 4 is a schematic diagram of a fusion analysis and visualization module and a visualization example of screening results.
- the present invention proposes an auxiliary tic disorder screening system based on video data, which includes a tic movement detection module, a health information collection and processing module, a visual data acquisition module and fusion analysis module;
- the visual data acquisition module is used to collect the visual data required for the analysis of the system in two ways: one is to collect the real-time facial video data of the screener through the camera equipment configured in the system; the other is to pass the local upload interface to the screener. Investigators used to collect and save positive video data. In order to make the follow-up analysis results go smoothly, the collected video data requires at least 60 seconds, and there is no upper limit setting.
- the collected video data is input to the twitch motion detection module;
- the twitch detection module includes a data preprocessing module, a visual feature analysis module, a twitch abnormal score generation module and a multi-instance strategy training module;
- the data preprocessing module processes the video data collected by the visual data acquisition module into time-series image data applicable to the deep learning network, specifically: the facial video data collected is positioned in each frame of video image data through the face detection algorithm OpenFace For the area of the face, remove the environmental information unrelated to the twitching action in the original video image, focus on the twitching action on the face of the screener, and intercept the face area, and save it as a 128*128 size image in the order of frames.
- increase the amount of data in the training process by adding random Gaussian noise, random color jitter, random rotation, random clipping and other data amplification methods, and simulate the image quality changes, color changes, and face orientations that occur during video recording. Changes, changes in the distance of the lens, etc., enhance the data feature extraction capability, and finally save it as an image with a size of 112*112, and input it to the visual feature analysis module.
- the visual feature analysis module performs video data feature analysis through a three-dimensional convolutional neural network model based on three-dimensional channel attention and three-dimensional spatial attention; the three-dimensional convolution kernel performs convolution operations on time-series data, which can simultaneously consider temporal features and Spatial features, suitable for video data analysis. Since the twitching parts of different tic patients are not necessarily the same, in addition to the feature extraction of the entire face, special attention needs to be paid to the local twitching part features. 3D-Spatial Attention) improves the three-dimensional convolutional neural network to enhance the ability of the model to extract visual features.
- the three-dimensional convolutional neural network is composed of five three-dimensional convolutional blocks connected in sequence, including two ConvBlock-A (three-dimensional convolution combination A ) and 3 ConvBlock-B (3D Convolution Combination B) consisting of two 3D convolutional layers, one max pooling layer, a 3D channel attention module and a 3D spatial attention module.
- the three-dimensional channel attention module is to compress the feature F 3D with a size of (Channel, Dimension, Height, Weight) after convolution and pooling in the convolution block into (Channel, 1, 1, 1 ) size average temporal feature F 3D′ , and then predict and calculate the importance of each channel through the multi-layer perceptron MLP and Sigmoid activation function to obtain the three-dimensional channel attention feature F 3D-C , the specific calculation is:
- the three-dimensional spatial attention module is to compress the feature F 3D with a size of (Channel, Dimension, Height, Weight) after convolution and pooling in the convolution block into (1, Dimension, Height, Weight) through the average pooling operation
- the average spatial feature F 3D′′ , and then the spatial attention feature F 3D-S is obtained through the Sigmoid activation function, and the specific calculation is:
- the 3D channel attention module and the 3D spatial attention module are connected to the 3D convolution block structure in parallel, and are multiplied and added to the previous process feature F 3D respectively.
- the calculation formula of the visual feature F A output by ConvBlock-B is:
- the abnormal score generation module of the twitch action inputs the visual feature FA output by the visual feature analysis module to the abnormal score generation network for further analysis.
- the abnormal score generation network model of the twitch is composed of three layers of fully connected layers, and the number of neurons is respectively 512, 64, 1, the first two layers of fully connected layers are activated by the ReLU function, the last layer is activated by the Sigmoid function, and finally the tic abnormality score is generated for subsequent learning and training.
- the multi-instance strategy training module trains and learns the network model in the visual feature analysis module through the ranking loss function Ranking Loss based on the multi-instance learning strategy (Multi-Instance Learning, MIL).
- MIL Multi-Instance Learning
- the model learns a classifier based on a set of training bags, each bag bag is composed of multiple training examples instances, the positive bag contains at least one positive example, and all examples of the negative bag are Negative example.
- the video data to be analyzed is regarded as a package in the multi-instance learning strategy, and the video data is divided into continuous and non-overlapping 16-frame time-series data as examples in the package.
- the data is subjected to feature learning, and the score corresponding to each example is obtained through the constructed abnormal tic score generation network as the abnormal tic score value.
- the score value ranges from 0 to 1, 0 means no tic action, 1 means tic action, and the score value High and low values indicate the possibility of twitching.
- the score of the example with the highest score value among all examples represents the probability that the twitching action is present for the entire package (ie, the entire video).
- model training phase 200 minutes of frontal face video data were collected in advance for the tic patients group and the normal control group in a natural state, and every 1 minute was used as an example packet, and every 16 frames of images were used as an example.
- the data set is randomly divided into two groups according to the ratio of 70% of the training set and 30% of the test set.
- the training set is used for model training, and the test set is used for model testing.
- the learning and training are carried out by two groups of data in the tic group and the control group each time.
- the three-dimensional convolutional neural network model on the video data path of the control group and the video data of the tic group implements a model parameter sharing mechanism.
- the tic group and the control group Respectively constitute the tic multi-example package and against the multi-example package
- the tic anomaly score sets ⁇ k a ⁇ and ⁇ k n ⁇ of all examples in the tic group and the control group are obtained, and the maximum anomaly scores in the tic multi-instance package and the control multi-instance package are calculated respectively and
- the abnormal score values of the two examples before and after the maximum value of the abnormal score value of the tic multi-instance package and the control multi-instance package are obtained, and used and Respectively represent the continuation stage of the maximum probability of suspected tic movements in the tic multi-instance package and the control multi-instance package, and use the average value of abnormal scores in the continuation stage to represent the abnormal score value of the maximum probability suspected tic movements, excluding short-lived actions caused by common actions such as blinking , the maximum probability of suspected tic action in the
- i is the i-th example corresponding to the maximum abnormal score value of the tic multi-example package
- j is the j-th example corresponding to the maximum abnormal score value in the control multi-example package
- the corresponding example The abnormal score value of does not exist and is not included in the calculation of the mean
- N a is the number of examples in the tic multi-example package
- N n is the number of examples in the control multi-example package
- N a the tic multi-instance package The number of examples in .
- ⁇ is the penalty coefficient, and the higher the value, the heavier the penalty for the smooth constraint.
- the high-resolution training model has good performance but slow training speed
- the low-resolution training model has poor performance but fast training speed. Therefore, using the method of multi-grid training in numerical analysis, the batch number B and the number of sample frames Model parameters such as K, the length H and width W of the video frame image are used as the parameter grid grid, and the parameters are optimized from coarse-grained to fine-grained.
- each video to be analyzed is regarded as a multi-instance packet, and divided into multiple examples according to 16 frames/example, and each example is passed through the learned three-dimensional convolutional neural network.
- the network obtains the visual features, and the network obtains the abnormal score value through the twitch abnormal score generation network.
- the largest score among the abnormal score values of all examples is used as the overall twitch abnormal score value of the video to be analyzed.
- 0.5 is used as the threshold, and the threshold analysis is judged. Whether there is a twitching action, and the abnormal score values of all examples form time series data and input to the fusion analysis module.
- Table 1 The test results are shown in Table 1:
- the baseline method adopts a model composed of an unmodified three-dimensional convolutional neural network and a cross-entropy function, and * indicates that the results of the present invention and the baseline results are compared with statistical differences, which proves the effectiveness of the present invention in video data twitch detection.
- the tic auxiliary screening system integrates health questionnaire data analysis and visual analysis, as shown in Figure 3, the health information collection and processing module collects health information including demographic information, living habits, eating habits, and family history according to the clinical diagnosis process and family observation records, etc., specifically including gender (male 1, female 0), age, whether abnormal tic movements were found (yes 1, no 0), whether there was a patient with tic symptoms in the family (yes 1, no 0), whether sleep Normal (yes 1, no 0), whether you sleep late (yes 1, no 0), whether you like to drink tea or coffee (yes 1, no 0), whether you exercise regularly (yes 1, no 0), etc., drawn according to statistical information Statistical distribution map, and convert the collected data into numerical values according to the contents of the brackets, and input them into the fusion analysis module.
- the fusion analysis module conducts fusion analysis on the time-series data formed by numerically processed health information data and abnormal score values.
- the time series data formed by the health information data and abnormal score values of the same individual X are respectively calculated through the classification model to obtain the recognition probability of tics or normal, and then the Bayesian fusion rule is used to add and fuse the two results.
- Use the numerically processed health information data to train the Gaussian kernel SVM classifier to obtain the recognition probability Where i is tic or normal; the time series data formed by the abnormal score value contains time series information, so construct a single-layer LSTM network with 128 neurons and Softmax function for training and analysis, and get the recognition probability where i is tic or normal.
- the overall recognition probability is calculated using the additive fusion rule of Bayesian theory Among them, P x is the prior probability of the category, the value is 0.5, M is the total category, the value is 2, and finally the category corresponding to the maximum value in the overall recognition probability is used as the rule for the judgment result Get the final judgment result, where i is tic or normal.
- the number of twitch peaks and timing points are obtained through the peak detection algorithm, and the peak time location is obtained from the frame sequence back to the original video, and the twitch occurrence time is obtained; the interval before and after the twitch peak occurrence is screened through the threshold, and the duration of each twitch occurrence is obtained by positioning; according to the abnormality Score value, tic occurrence time and duration of each twitch to draw the change curve of abnormal tic score value and tic action heat map corresponding to the analysis video, and calculate the frequency and duration of tic occurrence per minute according to the length of the original video;
- the reference basis for the degree based on the final judgment result of the fusion analysis module, the change curve of the abnormal tic score, the thermal map of the tic movement, and the statistical distribution map of the health information, the visualized analysis results are formed to provide the patient with next-step inspection suggestions and provide their own tic situation Feedback information can also provide doctors with an overview of patient tics, and provide auxiliary information for the next step of diagnosis and treatment
- Screeners first enter the health information collection and processing module through the system of the present invention, input age, gender, disease history, living habits and other health data in the system, and then record 1-5 minutes of frontal video through the visual data acquisition module or through the upload button
- the frontal video saved on the personal mobile phone is imported, and the system judges whether the video meets the analysis requirements through preliminary inspection.
- the video data is preprocessed, video data feature analysis, twitch detection and other processes are carried out through the visual feature analysis module. Obtain the abnormal score value and tic detection results with time series characteristics, and then give the screening results according to the fusion analysis module.
- screening results are positive, it will prompt for follow-up examination and diagnosis, and give the visual analysis results and tic fragments to the clinic Doctor's reference; if the screening result is negative, it indicates that no tic abnormality and related test data were found, which is for clinician's reference.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Public Health (AREA)
- Medical Informatics (AREA)
- Artificial Intelligence (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Evolutionary Computation (AREA)
- Pathology (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Veterinary Medicine (AREA)
- Animal Behavior & Ethology (AREA)
- Surgery (AREA)
- Heart & Thoracic Surgery (AREA)
- Physiology (AREA)
- Computational Linguistics (AREA)
- Neurology (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Epidemiology (AREA)
- Primary Health Care (AREA)
- Bioinformatics & Computational Biology (AREA)
- Neurosurgery (AREA)
- Databases & Information Systems (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Fuzzy Systems (AREA)
- Psychiatry (AREA)
- Signal Processing (AREA)
Abstract
一种基于视频数据的抽动症辅助筛查系统,利用视频数据自动识别抽动症状,通过基于多示例学习的三维卷积神经网络,采用结合三维通道注意力和三维空间注意力模块对三维卷积神经网络学习的特征进行优化,采用时间平滑约束对损失函数进行优化,能够提高模型对抽动检测能力,并且结合临床问诊转化的健康信息问卷数据,形成抽动症辅助筛查系统,提高筛查识别效率,并且通过非直接接触方式减少患者在陌生环境的紧张和不适。通过视频数据采集和抽动检测的方式,简化其中最为耗时的症状观察过程,并通过数据融合分析和可视化,给筛查患者提供疾病的初步认知,也为医生后续诊断和治疗提供参考和依据。
Description
本发明涉及医疗健康信息技术领域,尤其涉及一种基于视频数据的抽动症辅助筛查系统。
根据中华医学会儿科学分会神经学组提出的《儿童抽动障碍诊断与治疗专家共识(2017实用版)》
[1],当一个人发病年龄在18岁以前,一年内同时表现出多种运动和一种或多种声音抽动,同时排除其他内科疾病(如病毒感染后脑炎等)或物质影响(如可卡因等)时,可确诊为多发性抽动症(Tourette syndrome,TS),其中持续性观察与检查性交谈部分需要花费较长时间。然而儿童一般天性好动,患者抽动症状产生时难以引起家长重视,使得多数患者儿童确诊时病情已经发展较为严重,影响治疗效果,加上不同患者症状严重程度差异较大,具有难以准确估计的长期预后,因此也需要定期前往医院就诊复查。
人工智能和机器学习技术在医学领域已广泛应用,在抽动症识别检测领域,利用抽动症患者大脑皮层网络活动数据对患者抽动动作进行检测
[2]和利用可穿戴设备记录分析抽动症患者站立与行走期间的运动数据检测抽动动作
[3]等方法都有较好的应用,但目前视频数据还很少应用。抽动患者视频数据分析模拟了医生临床诊断时对患者的观察过程,而在日常生活中,视频数据容易获取且实施过程简单。针对抽动症患者早期发现较为困难的问题,本发明利用深度学习领域的三维卷积神经网络模型来检测正面录制视频中的异常抽动动作,结合临床门诊的健康信息综合分析,提出基于视频数据的抽动检测方法及抽动症辅助筛查系统。
根据现有的诊断流程,医生需要花较长的时间去观察确认患者的抽动特征,需要跟病人和家属询问确认近期及之前发生的抽动特点、饮食习惯、生活习惯、家族病史等;并且在问诊过程中,患者可能会因为到新环境或者接触到陌生人而产生抑制性状态,不利于实际病情诊断和评估。目前抽动症患者抽动症状依靠主要通过根据抽动症临床诊断的复杂流程,且很多抽动患者抽动症状不易察觉的问题,而现有抽动检测方法依靠深度脑刺激或者穿戴式设备采集数据,数据采集方式较为复杂。
[1]中华医学会儿科学分会神经学组.儿童抽动障碍诊断与治疗专家共识(2017实用版)[J].中华实用儿科临床杂志,2017,32(15):1137–1140.
[2]Jonathan B.Shute et al.,“Thalamocortical network activity enables chronic tic detection in humans with Tourette syndrome,”NeuroImage:Clinical,vol.12,pp.165–172,Feb.2016,doi:10.1016/j.nicl.2016.06.015.
[3]Michel.Bernabei et al.,“Automatic detection of tic activity in the Tourette Syndrome,”in 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology,Aug.2010,pp.422–425,doi:10.1109/IEMBS.2010.5627374。
发明内容
本发明目的在于针对现有技术的不足,提出一种基于视频数据的抽动症辅助筛查系统,利用视频数据自动识别抽动症状,通过基于多示例学习的三维卷积神经网络,采用结合三维通道注意力和三维空间注意力模块对三维卷积神经网络学习的特征进行优化,采用时间平滑约束对损失函数进行优化,能够提高模型对抽动检测能力,并且结合临床问诊转化的健康信息问卷数据,形成抽动症辅助筛查系统,提高筛查识别效率,并且通过非直接接触方式减少患者在陌生环境的紧张和不适。本发明通过视频数据采集和抽动检测的方式,简化其中最为耗时的症状观察过程,并通过数据融合分析和可视化,给筛查患者提供疾病的初步认知,也为医生后续诊断和治疗提供参考和依据。
本发明的目的是通过以下技术方案来实现的:一种基于视频数据的抽动症辅助筛查系统,该系统包括抽动动作检测模块、健康信息采集处理模块、视觉数据获取模块和融合分析模块;
所述视觉数据获取模块用于采集筛查者面部视频数据,输入到抽动动作检测模块;
所述抽动动作检测模块包括数据预处理模块、视觉特征分析模块、抽动动作异常分数生成模块和多示例策略训练模块;
所述数据预处理模块将视觉数据获取模块采集的视频数据处理成适用于深度学习网络的时序图像数据,输入到视觉特征分析模块;
所述视觉特征分析模块通过基于三维通道注意力和三维空间注意力的三维卷积神经网络模型进行视频数据特征分析;所述三维卷积神经网络模型具有依次连接的p个包含一层三维卷积层的卷积块和q个包含两层三维卷积层的卷积块;所述q个包含两层三维卷积层的卷积块中均以并行方式接入三维通道注意力模块和三维空间注意力模块,提取卷积计算后的特征图的三维通道注意力特征和三维空间注意力特征,生成的特征图输入由全连接层网络模型组成的抽动动作异常分数生成模块,获得抽动动作异常分数值,通过异常分数阈值分析判断是否存在抽动动作;同时异常分数值形成时序数据输入到融合分析模块;
所述多示例策略训练模块基于对照组示例和抽动组示例对视觉特征分析模块中的网络模型进行多示例学习策略训练,所述对照组示例和抽动组示例分别通过各自的视频数据抽取若干段固定连续帧得到;通过视觉特征分析模块得到抽动组和对照组不同示例的抽动异常分数,基于排序损失函数计算每一次训练的损失值,并更新视觉特征分析模块中的网络模型参数;
所述健康信息采集处理模块基于抽动症临床诊断过程采集并统计筛查者的健康信息,并对采集的健康信息数据进行数值型转化,输入到融合分析模块;
所述融合分析模块用于将数值型处理后的健康信息数据和异常分数值形成的时序数据分别通过分类模型计算得到抽动或正常的识别概率,再利用贝叶斯加法融合规则进行两种结果的相加融合,将最大值对应类别作为判定结果;通过峰值检测算法得到抽动峰值个数和时序位点,从帧序列回溯原视频得到峰值时间定位,得到抽动发生时间;通过阈值筛选抽动峰值发生前后区间,定位得到每次抽动发生持续时间;根据异常分数值、抽动发生时间和每次抽动持续时间绘制对应分析视频的抽动异常分数值变化曲线和抽动动作热力图,并根据原视频时长计算每分钟抽动发生频次和持续时间;融合分析模块的分析结果为患者提供下一步检查建议以及提供自身抽动情况反馈信息,同时也为医生提供患者抽动情况辅助筛查信息。
进一步地,所述数据预处理模块对视频数据预处理过程具体为:将采集的面部视频数据经过人脸检测算法OpenFace定位每一帧视频图像中人脸的区域,去除原始视频图像中与抽动动作无关的环境信息,聚焦筛查者面部抽动动作,并保存处理后的图像。
进一步地,所述三维通道注意力模块将经过卷积和池化后的大小为(Channel,Dimension,Height,Weight)的特征图F
3D通过平均池化压缩成大小为(Channel,1,1,1)的平均时序特征F
3D′,通过多层感知机MLP和Sigmoid激活函数对每个通道的重要性进行预测计算,得到三维通道注意力特征F
3D-C,具体计算为:
F
3D-C=Sigmoid(MLP(F
3D′))。
进一步地,所述三维空间注意力模块将经过卷积和池化后的大小为(Channel,Dimension,Height,Weight)的特征F
3D通过平均池化压缩成大小为(1,Dimension,Height,Weight)的平均空间特征F
3D″,然后通过Sigmoid激活函数得到空间注意力特征F
3D-S,具体计算为:
F
3D-S=Sigmoid(F
3D″)。
进一步地,所述三维通道注意力模块和三维空间注意力模块以并行方式接入包含两层三维卷积层的三维卷积块中,分别与特征F
3D进行相乘并相加操作,最后得到输出特征F
A的计算公式为:
进一步地,所述多示例策略训练模块的训练过程具体如下:抽动组示例和对照组示例分别构成抽动多示例包
和对照多示例包
通过视觉特征分析模块中的网络模型得到抽动 组和对照组的所有示例的抽动异常分数集{k
a}和{k
n},分别计算得到抽动多示例包和对照多示例包中异常分数最大值
和
获取抽动多示例包和对照多示例包异常分数值最大值前后各两个示例的异常分数值,用
和
分别表示抽动多示例包和对照多示例包最大概率疑似抽动动作发生持续阶段,用持续阶段异常分数平均值表示最大概率疑似抽动动作的异常分数值,抽动多示例包中疑似存在抽动动作最大概率
和对照多示例包中疑似抽动动作最大概率
计算公式如下:
其中,i为抽动多示例包异常分数值最大值对应的第i个示例,j为对照多示例包中异常分数值最大值对应的第j个示例;计算过程中,若出现i-2≤0或者i-1≤0或者i+1>N
a或者i+2>N
a或者j-2≤0或者j-1≤0或者j+1>N
n或者j+2>N
n,则对应示例的异常分数值不存在,不计入均值计算;N
a为抽动多示例包中示例个数,N
n为对照多示例包中示例个数;
通过排序损失函数计算每一次训练的损失值L,并通过梯度下降算法和反向传播更新视觉特征分析模块中的网络参数;基于多示例学习策略的排序损失L
1计算公式如下:
损失函数中添加平滑约束项L
2,表达式如下:
基于多示例学习策略的排序损失函数L表达如下:
L=L
1+λL
2
其中λ为惩罚系数,数值越高代表平滑约束项的惩罚越重。
进一步地,所述多示例策略训练模块的训练过程中,采用指数衰减函数进行学习率Lr的迭代,表达式为:
Lr=0.95
epoch_t*lr
其中,epoch_t为当前训练轮次,lr=0.001为初始学习率。
进一步地,所述多示例策略训练模块的训练过程中,将对照组示例的视频数据和抽动组示例的视频数据通过添加随机高斯噪声、随机颜色抖动、随机旋转、随机剪裁方式进行数据 扩增,模拟视频数据采集过程中出现的成像质量变化、色彩变化、人脸方向变化、镜头远近变化情况。
进一步地,所述健康信息采集处理模块采集的健康信息包括人口统计学信息、生活习惯、饮食习惯、家族史和家庭观察记录。
进一步地,所述融合分析模块中,利用健康信息采集处理模块进行数值化处理后的健康信息数据训练高斯核SVM分类器,得到识别概率;视觉特征分析模块输出的数据包含时序信息,采用LSTM网络和Softmax函数进行训练分析,得到识别概率。
本发明的有益效果:
1.本发明通过非植入式非穿戴式的方式采集视频数据,方式便捷,摄像设备普适性好,系统可植入性高。
2.本发明利用视频数据分析检测抽动动作,患者不用和医生面对面交流的方式减少患者在陌生环境的紧张和不适,更能表现真实病情。
3.本发明通过视频数据分析和健康信息数据融合分析,抽动筛查结果可给患者和家长提供疾病知识普及,也可给医生对病情评估和管理提供参考。
4.本发明可通过通信网络实现远程抽动识别检测,减少患者及家长前往专科医院的次数,减少时间和旅途成本。
图1为联合通道注意力和空间注意力模块的三维卷积神经网络结构示意图。
图2为视觉模型分析训练流程示意图。
图3为基于机器视觉的抽动症辅助筛查系统示意图。
图4为融合分析和可视化模块和筛查结果可视化示例示意图。
以下结合附图对本发明具体实施方式作进一步详细说明。
本发明根据抽动症筛查和诊断中的局限性,提出了一种基于视频数据的抽动症辅助筛查系统,该系统包括抽动动作检测模块、健康信息采集处理模块、视觉数据获取模块和融合分析模块;
视觉数据获取模块用于采集本系统分析所需视觉数据,通过两种方式实现:一是通过系统配置的摄像设备,正面采集筛查者即时面部视频数据;二是通过本地上传接口,传入筛查者以往采集并存留正面视频数据。为使后续分析结果顺利进行,采集的视频数据要求至少60秒,无上限设置。采集到的视频数据输入到抽动动作检测模块;
所述抽动动作检测模块包括数据预处理模块、视觉特征分析模块、抽动动作异常分数生 成模块和多示例策略训练模块;
所述数据预处理模块将视觉数据获取模块采集的视频数据处理成适用于深度学习网络的时序图像数据,具体为:将采集的面部视频数据经过人脸检测算法OpenFace定位每一帧视频图像数据中人脸的区域,去除原始视频图像中与抽动动作无关的环境信息,聚焦筛查者面部抽动动作,并截取人脸区域部分,并且按照帧的顺序,并保存成128*128大小的图像。在后续训练过程中,通过添加随机高斯噪声、随机颜色抖动、随机旋转、随机剪裁等数据扩增方式增加训练过程中数据量,模拟视频录制过程中出现的成像质量变化、色彩变化、人脸方向变化、镜头远近变化等情况,增强数据特征提取能力,最终保存成112*112大小的图像,并输入到视觉特征分析模块。
所述视觉特征分析模块通过基于三维通道注意力和三维空间注意力的三维卷积神经网络模型进行视频数据特征分析;通过三维卷积核在时序数据上做卷积操作,可以同时考虑时间特征和空间特征,适用于视频数据分析。由于不同的抽动症患者抽动的部位不一定相同,除了对整个面部的特征提取还需要特别关注局部抽动部位特征,因此通过联合三维通道注意力模块(3D-Channel Attention)和三维空间注意力模块(3D-Spatial Attention)改进三维卷积神经网络以提升模型提取视觉特征能力。如图1所示,三维卷积神经网络由5个三维卷积块依次连接构成,包括2个由一层三维卷积层和一层最大池化层组成的ConvBlock-A(三维卷积组合A)和3个由两层三维卷积层、一层最大池化层、一个三维通道注意力模块和一个三维空间注意力模块组成的ConvBlock-B(三维卷积组合B)。
其中三维通道注意力模块是将卷积块中经过卷积和池化后产生大小为(Channel,Dimension,Height,Weight)的特征F
3D通过平均池化操作压缩成(Channel,1,1,1)大小的平均时序特征F
3D′,然后通过多层感知机MLP和Sigmoid激活函数对每个通道的重要性进行预测计算,得到三维通道注意力特征F
3D-C,具体计算为:
F
3D-C=Sigmoid(MLP(F
3D′))
三维空间注意力模块是将卷积块中经过卷积和池化后产生大小为(Channel,Dimension,Height,Weight)的特征F
3D通过平均池化操作压缩成(1,Dimension,Height,Weight)平均空间特征F
3D″,然后通过Sigmoid激活函数得到空间注意力特征F
3D-S,具体计算为:
F
3D-S=Sigmoid(F
3D″)
为减少模型的复杂度和参数的计算量,三维通道注意力模块和三维空间注意力模块以并行方式接入三维卷积块结构中,分别与前一过程特征F
3D进行相乘并相加操作,最后得到 ConvBlock-B输出的视觉特征F
A的计算公式为:
所述抽动动作异常分数生成模块将视觉特征分析模块输出的视觉特征F
A输入到抽动异常分数生成网络进行进一步分析,抽动异常分数生成网络模型由三层全连接层组成,神经元个数分别为512、64、1,前两层全连接层通过ReLU函数激活,最后一层通过Sigmoid函数激活,最后生成抽动异常分数,用于后续学习训练。
所述多示例策略训练模块通过基于多示例学习策略(Multi-Instance Learning,MIL)的排序损失函数Ranking Loss对视觉特征分析模块中的网络模型训练学习。
在经典的多示例方法下,模型对一个基于一组训练包的分类器进行学习,每个包bag由多个训练示例instance组成,阳性包至少包含一个正的示例,阴性包的所有示例都为负示例。待分析的视频数据视为多示例学习策略中的包,将视频数据分成连续不重合的16帧时序数据作为包中的示例,该模型利用构建的三维卷积网络模型,对每个示例的时序数据进行特征学习,并且通过构建的抽动异常分数生成网络得到每个示例对应的分数作为抽动动作异常分数值,分数值范围为0~1,0代表无抽动动作,1代表有抽动动作,分数值的高低代表存在抽动动作的可能性。所有示例中分数值最高的示例的分数代表整个包(即整段视频)存在抽动动作的可能性。
在模型训练阶段,通过事先采集的抽动症患者组和正常对照组自然状态下正面面部视频数据各200分钟,每1分钟为一个示例包,每16帧图像作为一个示例。数据集按照训练集70%、测试集30%的比例随机分配成两组。训练集用于模型训练,测试集用于模型测试。如图2所示,每次由抽动组和对照组两组数据进行学习训练,对照组视频数据和抽动组视频数据路径上的三维卷积神经网络模型实行模型参数共享机制,抽动组和对照组分别构成抽动多示例包
和对照多示例包
通过视觉特征分析模块中的网络得到抽动组和对照组的所有示例的抽动异常分数集{k
a}和{k
n},分别计算得到抽动多示例包和对照多示例包中异常分数最大值
和
根据抽动动作发生具有一定时长的持续性的特点,获取抽动多示例包和对照多示例包异常分数值最大值前后各两个示例的异常分数值,用
和
分别表示抽动多示例包和对照多示例包最大概率疑似抽动动作发生持续阶段,用持续阶段异常分数平均值表示最大概率疑似抽动动作的异常分数值,排除由于如眨眼等普通动作产生的短暂动作发生,抽动多示例包中疑似存在抽动动作最大概率
和对照多示例包中疑似抽动动作最大概率
计算公式如下:
其中,i为抽动多示例包异常分数值最大值对应的第i个示例,j为对照多示例包中异常分数值最大值对应的第j个示例;计算过程中,若出现i-2≤0或者i-1≤0或者i+1>N
a或者i+2>N
a或者j-2≤0或者j-1≤0或者j+1>N
n或者j+2>N
n,则对应示例的异常分数值不存在,不计入均值计算;N
a为抽动多示例包中示例个数,N
n为对照多示例包中示例个数;
通过排序损失函数计算每一次训练的损失值L,并通过梯度下降算法和反向传播更新视觉特征分析模块中的网络参数;基于多示例学习策略的排序损失L
1计算公式如下:
另外考虑到多示例包中的各个示例也具有时序性,因此在抽动组多示例包中每个示例之间的抽动异常分数应该具有平滑性,由此在损失函数中增加平滑约束项L
2,表达式如下:
最终基于多示例学习策略的排序损失函数L=L
1+λL
2,具体表达式如下:
其中λ为惩罚系数,数值越高代表平滑约束项的惩罚越重。本发明使用λ=0.5用于模型训练。
在模型训练过程中,高分辨率训练模型性能好但训练速度慢,低分辨率训练模型性能差但训练速度快,因此利用数值分析中多网格训练的方法,将批数量B、示例帧数K、视频帧图像的长度H和宽度W等模型参数作为参数网格grid,从粗粒度到细粒度进行参数优化。其中,批数量B、示例帧数K、视频帧图像的长度H和宽度W的默认值设定为B=8,K=16,H=112,W=112,训练过程中参数网格以[
(B,K,H,W)]的顺序依次载入模型迭代训练,每组参数持续2个轮次epoch,一共进行50个轮次训练截止。
为了提高模型收敛效率,采用指数衰减函数进行学习率Lr的迭代,表达式为:
Lr=0.95
epoch_t*lr
其中epoch_t为当前训练轮次,lr=0.001为初始学习率。
在得到训练好的上述网络模型之后,在模型测试阶段,每个待分析视频视作一个多示例包,并按照16帧/个示例分成多个示例,每个示例通过学习好的三维卷积神经网络得到视觉特征,并通过抽动异常分数生成网络获得异常分数值,所有示例的异常分数值中最大的分数作为待分析视频的总体抽动异常分数值,根据统计概率以0.5为阈值,通过阈值分析判断是否存在抽动动作,同时所有示例的异常分数值形成时序数据输入到融合分析模块。测试结果如表1所示:
表1
准确率(Accuracy) | 查全率(Precision) | 召回率(Recall) | |
基线 | 0.7798(±0.017) | 0.8368(±0.032) | 0.7886(±0.016) |
本发明 | 0.9302*(±0.026) | 0.9144*(±0.040) | 0.9396*(±0.032) |
其中基线方法采用无修改的三维卷积神经网络和交叉熵函数组成的模型,*表示本发明结果和基线结果对比具有统计学差异,证明了本发明在视频数据抽动检测上的有效性。
抽动症辅助筛查系统融合健康问卷数据分析以及可视化分析,如图3所示,所述健康信息采集处理模块按照临床诊断过程,采集健康信息包括人口统计学信息、生活习惯、饮食习惯、家族史和家庭观察记录等,具体包括性别(男1,女0)、年龄、是否发现过异常抽动动作(是1,否0)、家族中是否有抽动症状患者(是1,否0)、睡眠是否正常(是1,否0)、是否晚睡(是1,否0)、是否喜欢喝茶或咖啡(是1,否0)、是否经常运动(是1,否0)等,根据统计信息绘制统计分布图,并根据括号内容对收集的数据进行数值型转化,输入到融合分析模块。
如图4所示,所述融合分析模块将数值型处理后的健康信息数据和异常分数值形成的时序数据进行融合分析。在数据融合分析阶段,同一个体X的健康信息数据和异常分数值形成的时序数据分别通过分类模型计算得到抽动或正常的识别概率,再利用贝叶斯融合规则进行两种结果的相加融合。利用数值化处理后的健康信息数据训练高斯核SVM分类器,得到识别概率
其中i为抽动或正常;异常分数值形成的时序数据包含时序信息,因此构建单层128个神经元的LSTM网络和Softmax函数进行训练分析,得到识别概率
其中i为抽动或正常。由于上述两组数据特征是相互独立的,因此采用贝叶斯理论的加法融合规则计算总体识别概率
其中P
x为类别先验概率,取值为0.5,M为总类别,取值为2,最后通过总体识别概率中最大值对应的类别作为判定结果的规则
得到最终判定结果,其中i为抽动或正常。通过峰值检测算法得到抽动 峰值个数和时序位点,从帧序列回溯原视频得到峰值时间定位,得到抽动发生时间;通过阈值筛选抽动峰值发生前后区间,定位得到每次抽动发生持续时间;根据异常分数值、抽动发生时间和每次抽动持续时间绘制对应分析视频的抽动异常分数值变化曲线和抽动动作热力图,并根据原视频时长计算每分钟抽动发生频次和持续时间;作为视频所属患者抽动严重程度的参考依据;根据融合分析模块的最终判定结果、抽动异常分数值变化曲线、抽动动作热力图以及健康信息的统计分布图形成可视化的分析结果,为患者提供下一步检查建议以及提供自身抽动情况反馈信息,同时也可为医生提供患者抽动概况,为下一步诊断治疗提供辅助信息。
实施案例:
筛查者通过本发明系统首先进入健康信息采集处理模块,在系统中输入年龄、性别、疾病史、生活作息习惯等健康数据,然后通过视觉数据获取模块录制1-5分钟正面视频或者通过上传按钮传入个人手机上保存的正面视频,系统通过初步检测,判断视频是否符合分析要求,确认符合分析要求后,通过视觉特征分析模块对视频数据进行预处理、视频数据特征分析、抽动检测等过程,得到具有时序特征的异常分数值和抽动检测结果,再根据融合分析模块给出筛查结果,若筛查结果为阳性,则提示进行后续检查诊断,并给出可视化的分析结果和抽动片段给临床医生参考;若筛查结果为阴性,则提示未发现抽动异常和相关检测数据,供临床医生参考。
上述实施例用来解释说明本发明,而不是对本发明进行限制,在本发明的精神和权利要求的保护范围内,对本发明作出的任何修改和改变,都落入本发明的保护范围。
Claims (10)
- 一种基于视频数据的抽动症辅助筛查系统,其特征在于,该系统包括抽动动作检测模块、健康信息采集处理模块、视觉数据获取模块和融合分析模块;所述视觉数据获取模块用于采集筛查者面部视频数据,输入到抽动动作检测模块;所述抽动动作检测模块包括数据预处理模块、视觉特征分析模块、抽动动作异常分数生成模块和多示例策略训练模块;所述数据预处理模块将视觉数据获取模块采集的视频数据处理成适用于深度学习网络的时序图像数据,输入到视觉特征分析模块;所述视觉特征分析模块通过基于三维通道注意力和三维空间注意力的三维卷积神经网络模型进行视频数据特征分析;所述三维卷积神经网络模型具有依次连接的p个包含一层三维卷积层的卷积块和q个包含两层三维卷积层的卷积块;所述q个包含两层三维卷积层的卷积块中均以并行方式接入三维通道注意力模块和三维空间注意力模块,提取卷积计算后的特征图的三维通道注意力特征和三维空间注意力特征,生成的特征图输入由全连接层网络模型组成的抽动动作异常分数生成模块,获得抽动动作异常分数值,通过异常分数阈值分析判断是否存在抽动动作;同时异常分数值形成时序数据输入到融合分析模块;所述多示例策略训练模块基于对照组示例和抽动组示例对视觉特征分析模块中的网络模型进行多示例学习策略训练,对照组示例和抽动组示例分别通过各自的视频数据抽取若干段固定连续帧得到;通过视觉特征分析模块得到抽动组和对照组不同示例的抽动异常分数,基于排序损失函数计算每一次训练的损失值,并更新视觉特征分析模块中的网络模型参数;所述健康信息采集处理模块基于抽动症临床诊断过程采集并统计筛查者的健康信息,并对采集的健康信息数据进行数值型转化,输入到融合分析模块;所述融合分析模块用于将数值型处理后的健康信息数据和异常分数值形成的时序数据分别通过分类模型计算得到抽动或正常的识别概率,再利用贝叶斯加法融合规则进行两种结果的相加融合,将最大值对应类别作为判定结果;通过峰值检测算法得到抽动峰值个数和时序位点,从帧序列回溯原视频得到峰值时间定位,得到抽动发生时间;通过阈值筛选抽动峰值发生前后区间,定位得到每次抽动发生持续时间;根据异常分数值、抽动发生时间和每次抽动持续时间绘制对应分析视频的抽动异常分数值变化曲线和抽动动作热力图,并根据原视频时长计算每分钟抽动发生频次和持续时间;融合分析模块的分析结果为患者提供下一步检查建议以及提供自身抽动情况反馈信息,同时也为医生提供患者抽动情况辅助筛查信息。
- 根据权利要求1所述的一种基于视频数据的抽动症辅助筛查系统,其特征在于,所述 数据预处理模块对视频数据预处理过程具体为:将采集的面部视频数据经过人脸检测算法OpenFace定位每一帧视频图像中人脸的区域,去除原始视频图像中与抽动动作无关的环境信息,聚焦筛查者面部抽动动作,并保存处理后的图像。
- 根据权利要求1所述的一种基于视频数据的抽动症辅助筛查系统,其特征在于,所述三维通道注意力模块将经过卷积和池化后的大小为(Channel,Dimension,Height,Weight)的特征图F 3D通过平均池化压缩成大小为(Channel,1,1,1)的平均时序特征F 3D′,通过多层感知机MLP和Sigmoid激活函数对每个通道的重要性进行预测计算,得到三维通道注意力特征F 3D-C,具体计算为:F 3D-C=Sigmoid(MLP(F 3D′))。
- 根据权利要求3所述的一种基于视频数据的抽动症辅助筛查系统,其特征在于,所述三维空间注意力模块将经过卷积和池化后的大小为(Channel,Dimension,Height,Weight)的特征F 3D通过平均池化压缩成大小为(1,Dimension,Height,Weight)的平均空间特征F 2D″,然后通过Sigmoid激活函数得到空间注意力特征F 3D-S,具体计算为:F 3D-S=Sigmoid(F 3D″)。
- 根据权利要求1所述的一种基于视频数据的抽动症辅助筛查系统,其特征在于,所述多示例策略训练模块的训练过程具体如下:抽动组示例和对照组示例分别构成抽动多示例包 和对照多示例包 通过视觉特征分析模块中的网络模型得到抽动组和对照组的所有示例的抽动异常分数集{k a}和{k n},分别计算得到抽动多示例包和对照多示例包中异常分数最大值 和 获取抽动多示例包和对照多示例包异常分数值最大值前后各两个示例的异常分数值,用 和 分别表示抽动多示例包和对照多示例包最大概率疑似抽动动作发生持续阶段,用持续阶段异常分数平均值表示最大概率疑似抽动动作的异常分数值,抽动多示例包中疑似存在抽动动作最大概率 和对照多示例包中疑似抽动动作最大概率 计算公式如下:其中,i为抽动多示例包异常分数值最大值对应的第i个示例,j为对照多示例包中异常分数值最大值对应的第j个示例;计算过程中,若出现i-2≤0或者i-1≤0或者i+1>N a或者i+2>N a或者j-2≤0或者j-1≤0或者j+1>N n或者j+2>N n,则对应示例的异常分数值不存在,不计入均值计算;N a为抽动多示例包中示例个数,N n为对照多示例包中示例个数;通过排序损失函数计算每一次训练的损失值L,并通过梯度下降算法和反向传播更新视觉特征分析模块中的网络参数;基于多示例学习策略的排序损失L 1计算公式如下:损失函数中添加平滑约束项L 2,表达式如下:基于多示例学习策略的排序损失函数L表达如下:L=L 1+λL 2其中λ为惩罚系数,数值越高代表平滑约束项的惩罚越重。
- 根据权利要求6所述的一种基于视频数据的抽动症辅助筛查系统,其特征在于,所述多示例策略训练模块的训练过程中,采用指数衰减函数进行学习率Lr的迭代,表达式为:Lr=0.95 epoch_t*lr其中,epoch_t为当前训练轮次,lr=0.001为初始学习率。
- 根据权利要求6所述的一种基于视频数据的抽动症辅助筛查系统,其特征在于,所述多示例策略训练模块的训练过程中,将对照组示例的视频数据和抽动组示例的视频数据通过添加随机高斯噪声、随机颜色抖动、随机旋转、随机剪裁方式进行数据扩增,模拟视频数据采集过程中出现的成像质量变化、色彩变化、人脸方向变化、镜头远近变化情况。
- 根据权利要求1所述的一种基于视频数据的抽动症辅助筛查系统,其特征在于,所述健康信息采集处理模块采集的健康信息包括人口统计学信息、生活习惯、饮食习惯、家族史和家庭观察记录。
- 根据权利要求1所述的一种基于视频数据的抽动症辅助筛查系统,其特征在于,所述融合分析模块中,利用健康信息采集处理模块进行数值化处理后的健康信息数据训练高斯 核SVM分类器,得到识别概率;视觉特征分析模块输出的数据包含时序信息,采用LSTM网络和Softmax函数进行训练分析,得到识别概率。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111594285.2 | 2021-12-24 | ||
CN202111594285.2A CN113990494B (zh) | 2021-12-24 | 2021-12-24 | 一种基于视频数据的抽动症辅助筛查系统 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023116736A1 true WO2023116736A1 (zh) | 2023-06-29 |
Family
ID=79734204
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/140523 WO2023116736A1 (zh) | 2021-12-24 | 2022-12-21 | 一种基于视频数据的抽动症辅助筛查系统 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113990494B (zh) |
WO (1) | WO2023116736A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118155835A (zh) * | 2024-05-11 | 2024-06-07 | 成都中医药大学附属医院(四川省中医医院) | 一种基于对比学习的抽动障碍检测方法、系统及存储介质 |
CN118172800A (zh) * | 2024-05-15 | 2024-06-11 | 沈阳新维盛科生物科技有限公司 | 一种改进实验动物行为图像处理方法 |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113990494B (zh) * | 2021-12-24 | 2022-03-25 | 浙江大学 | 一种基于视频数据的抽动症辅助筛查系统 |
CN114496235B (zh) * | 2022-04-18 | 2022-07-19 | 浙江大学 | 一种基于深度强化学习的血透患者干体重辅助调节系统 |
CN115105075A (zh) * | 2022-05-17 | 2022-09-27 | 清华大学 | 抽动障碍检测方法及装置 |
CN115714016B (zh) * | 2022-11-16 | 2024-01-19 | 内蒙古卫数数据科技有限公司 | 一种基于机器学习的布鲁氏菌病筛查率提升方法 |
CN117437678A (zh) * | 2023-11-01 | 2024-01-23 | 烟台持久钟表有限公司 | 正面人脸持续时间统计方法、系统、装置、存储介质 |
CN117807154B (zh) * | 2024-02-28 | 2024-04-30 | 成都菲宇科技有限公司 | 一种用于展示系统的时序数据可视化方法、设备和介质 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110301447A1 (en) * | 2010-06-07 | 2011-12-08 | Sti Medical Systems, Llc | Versatile video interpretation, visualization, and management system |
CN110516611A (zh) * | 2019-08-28 | 2019-11-29 | 中科人工智能创新技术研究院(青岛)有限公司 | 一种自闭症检测系统及自闭症检测装置 |
CN111528859A (zh) * | 2020-05-13 | 2020-08-14 | 浙江大学人工智能研究所德清研究院 | 基于多模态深度学习技术的儿童adhd筛查评估系统 |
CN111870253A (zh) * | 2020-07-27 | 2020-11-03 | 上海大学 | 基于视觉和语音融合技术的抽动障碍症病情监测方法及其系统 |
CN113990494A (zh) * | 2021-12-24 | 2022-01-28 | 浙江大学 | 一种基于视频数据的抽动症辅助筛查系统 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9530452B2 (en) * | 2013-02-05 | 2016-12-27 | Alc Holdings, Inc. | Video preview creation with link |
CN214128817U (zh) * | 2020-08-17 | 2021-09-07 | 浙江大学 | 一种用于固定抽动症患者肢体的固定装置 |
CN113066576A (zh) * | 2021-05-12 | 2021-07-02 | 北京大学深圳医院 | 一种基于三维掩模-区域卷积神经网络的肺癌筛查方法 |
CN113611411B (zh) * | 2021-10-09 | 2021-12-31 | 浙江大学 | 一种基于假阴性样本识别的体检辅助决策系统 |
-
2021
- 2021-12-24 CN CN202111594285.2A patent/CN113990494B/zh active Active
-
2022
- 2022-12-21 WO PCT/CN2022/140523 patent/WO2023116736A1/zh unknown
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110301447A1 (en) * | 2010-06-07 | 2011-12-08 | Sti Medical Systems, Llc | Versatile video interpretation, visualization, and management system |
CN110516611A (zh) * | 2019-08-28 | 2019-11-29 | 中科人工智能创新技术研究院(青岛)有限公司 | 一种自闭症检测系统及自闭症检测装置 |
CN111528859A (zh) * | 2020-05-13 | 2020-08-14 | 浙江大学人工智能研究所德清研究院 | 基于多模态深度学习技术的儿童adhd筛查评估系统 |
CN111870253A (zh) * | 2020-07-27 | 2020-11-03 | 上海大学 | 基于视觉和语音融合技术的抽动障碍症病情监测方法及其系统 |
CN113990494A (zh) * | 2021-12-24 | 2022-01-28 | 浙江大学 | 一种基于视频数据的抽动症辅助筛查系统 |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118155835A (zh) * | 2024-05-11 | 2024-06-07 | 成都中医药大学附属医院(四川省中医医院) | 一种基于对比学习的抽动障碍检测方法、系统及存储介质 |
CN118172800A (zh) * | 2024-05-15 | 2024-06-11 | 沈阳新维盛科生物科技有限公司 | 一种改进实验动物行为图像处理方法 |
CN118172800B (zh) * | 2024-05-15 | 2024-08-16 | 沈阳医学院 | 一种改进实验动物行为图像处理方法 |
Also Published As
Publication number | Publication date |
---|---|
CN113990494A (zh) | 2022-01-28 |
CN113990494B (zh) | 2022-03-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2023116736A1 (zh) | 一种基于视频数据的抽动症辅助筛查系统 | |
WO2022042122A1 (zh) | 脑电信号的分类方法、分类模型的训练方法、装置及介质 | |
CN111990989A (zh) | 一种基于生成对抗及卷积循环网络的心电信号识别方法 | |
Haidar et al. | Convolutional neural networks on multiple respiratory channels to detect hypopnea and obstructive apnea events | |
CN110619322A (zh) | 一种基于多流态卷积循环神经网络的多导联心电异常信号识别方法及系统 | |
Wang et al. | A novel multi-scale dilated 3D CNN for epileptic seizure prediction | |
Chen et al. | A new deep learning framework based on blood pressure range constraint for continuous cuffless BP estimation | |
Kuo et al. | Automatic sleep staging based on a hybrid stacked LSTM neural network: verification using large-scale dataset | |
CN115530847A (zh) | 一种基于多尺度注意力的脑电信号自动睡眠分期方法 | |
CN114732409A (zh) | 一种基于脑电信号的情绪识别方法 | |
CN115336973A (zh) | 基于自注意力机制和单导联心电信号的睡眠分期系统构建方法、及睡眠分期系统 | |
Shu et al. | Data augmentation for seizure prediction with generative diffusion model | |
Taghizadegan et al. | Prediction of obstructive sleep apnea using ensemble of recurrence plot convolutional neural networks (RPCNNs) from polysomnography signals | |
Luo et al. | Exploring adaptive graph topologies and temporal graph networks for EEG-based depression detection | |
Prabha et al. | A Novel Analysis and Detection of Autism Spectrum Disorder in Artificial Intelligence Using Hybrid Machine Learning | |
Wang et al. | Pay attention and watch temporal correlation: a novel 1-D convolutional neural network for ECG record classification | |
CN118044813B (zh) | 基于多任务学习的心理健康状况评估方法及系统 | |
Mohammadi et al. | Two-step deep learning for estimating human sleep pose occluded by bed covers | |
Sangeetha et al. | A CNN based similarity learning for cardiac arrhythmia prediction | |
Tyagi et al. | Systematic review of automated sleep apnea detection based on physiological signal data using deep learning algorithm: a meta-analysis approach | |
CN115349821A (zh) | 一种基于多模态生理信号融合的睡眠分期方法及系统 | |
Melinda et al. | A Novel Autism Spectrum Disorder Children Dataset Based on Thermal Imaging | |
Bao et al. | A Feature Fusion Model Based on Temporal Convolutional Network for Automatic Sleep Staging Using Single-Channel EEG | |
CN117958759B (zh) | 一种面向多人群的多导睡眠监测系统 | |
Bakhtyari et al. | Combination of ConvLSTM and attention mechanism to diagnose ADHD based on EEG signals |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22910048 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |