WO2023116736A1 - 一种基于视频数据的抽动症辅助筛查系统 - Google Patents

一种基于视频数据的抽动症辅助筛查系统 Download PDF

Info

Publication number
WO2023116736A1
WO2023116736A1 PCT/CN2022/140523 CN2022140523W WO2023116736A1 WO 2023116736 A1 WO2023116736 A1 WO 2023116736A1 CN 2022140523 W CN2022140523 W CN 2022140523W WO 2023116736 A1 WO2023116736 A1 WO 2023116736A1
Authority
WO
WIPO (PCT)
Prior art keywords
tic
module
video data
instance
twitch
Prior art date
Application number
PCT/CN2022/140523
Other languages
English (en)
French (fr)
Inventor
李劲松
周天舒
田雨
吴君雅
Original Assignee
浙江大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浙江大学 filed Critical 浙江大学
Publication of WO2023116736A1 publication Critical patent/WO2023116736A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/0059Measuring for diagnostic purposes; Identification of persons using light, e.g. diagnosis by transillumination, diascopy, fluorescence
    • A61B5/0077Devices for viewing the surface of the body, e.g. camera, magnifying lens
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/103Measuring devices for testing the shape, pattern, colour, size or movement of the body or parts thereof, for diagnostic purposes
    • A61B5/11Measuring movement of the entire body or parts thereof, e.g. head or hand tremor or mobility of a limb
    • A61B5/1126Measuring movement of the entire body or parts thereof, e.g. head or hand tremor or mobility of a limb using a particular sensing technique
    • A61B5/1128Measuring movement of the entire body or parts thereof, e.g. head or hand tremor or mobility of a limb using a particular sensing technique using image analysis
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/40Detecting, measuring or recording for evaluating the nervous system
    • A61B5/4076Diagnosing or monitoring particular conditions of the nervous system
    • A61B5/4094Diagnosing or monitoring seizure diseases, e.g. epilepsy
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7235Details of waveform analysis
    • A61B5/7264Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Definitions

  • the invention relates to the technical field of medical and health information, in particular to an auxiliary screening system for tic disorder based on video data.
  • this invention uses the three-dimensional convolutional neural network model in the field of deep learning to detect abnormal tic movements in frontal recorded videos, combined with the comprehensive analysis of health information in clinical outpatient clinics, and proposes tic detection based on video data. Detection method and auxiliary screening system for tic disorder.
  • the doctor needs to spend a long time to observe and confirm the patient's tic characteristics, and needs to inquire with the patient and family members to confirm the recent and previous tic characteristics, eating habits, living habits, family medical history, etc.;
  • patients may experience an inhibitory state due to going to a new environment or coming into contact with strangers, which is not conducive to the actual diagnosis and evaluation of the disease.
  • the tic symptoms of tic patients mainly rely on the complex process based on the clinical diagnosis of tic disease, and many tic patients have problems that are not easy to detect.
  • the existing tic detection methods rely on deep brain stimulation or wearable devices to collect data, and the data collection method is relatively complex.
  • the purpose of the present invention is to address the deficiencies in the prior art, and propose a video data-based auxiliary screening system for tics, which uses video data to automatically identify tic symptoms, uses a three-dimensional convolutional neural network based on multi-instance learning, and uses a combination of three-dimensional channel attention
  • the power and three-dimensional spatial attention module optimizes the learning features of the three-dimensional convolutional neural network, and optimizes the loss function by using time smoothing constraints, which can improve the model's ability to detect tics, and combine the health information questionnaire data transformed from clinical consultation to form
  • the auxiliary screening system for tic disorder improves the efficiency of screening and identification, and reduces the tension and discomfort of patients in unfamiliar environments through non-direct contact.
  • the present invention simplifies the most time-consuming symptom observation process through video data collection and twitch detection, and through data fusion analysis and visualization, provides screening patients with a preliminary understanding of the disease, and also provides reference for doctors' follow-up diagnosis and treatment and basis.
  • a video data-based tic auxiliary screening system the system includes a tic motion detection module, a health information collection and processing module, a visual data acquisition module and a fusion analysis module;
  • the visual data acquisition module is used to collect screener's facial video data, which is input to the twitch detection module;
  • the twitch detection module includes a data preprocessing module, a visual feature analysis module, a twitch abnormal score generation module and a multi-instance strategy training module;
  • the data preprocessing module processes the video data collected by the visual data acquisition module into temporal image data applicable to the deep learning network, and inputs it to the visual feature analysis module;
  • the visual feature analysis module performs video data feature analysis through a three-dimensional convolutional neural network model based on three-dimensional channel attention and three-dimensional spatial attention;
  • the convolutional block of the first layer and q convolutional blocks comprising two layers of three-dimensional convolutional layers; the q convolutional blocks comprising two layers of three-dimensional convolutional layers are all connected in parallel to the three-dimensional channel attention module and the three-dimensional space
  • the attention module extracts the three-dimensional channel attention feature and the three-dimensional space attention feature of the feature map after the convolution calculation, and the generated feature map is input into the twitch action abnormal score generation module composed of a fully connected layer network model to obtain the twitch action abnormal score Value, through abnormal score threshold analysis to determine whether there is twitching action; at the same time, the abnormal score value forms time series data and input it to the fusion analysis module;
  • the multi-example strategy training module carries out multi-example learning strategy training to the network model in the visual feature analysis module based on the example of the control group and the example of the tic group, and the example of the control group and the example of the tic group are respectively extracted from respective video data to fix several segments. Continuous frames are obtained; the tic abnormality scores of different examples of the tic group and the control group are obtained through the visual feature analysis module, the loss value of each training is calculated based on the ranking loss function, and the network model parameters in the visual feature analysis module are updated;
  • the health information collection and processing module collects and counts the health information of the screener based on the clinical diagnosis process of tic disorder, and performs numerical transformation on the collected health information data, and inputs it to the fusion analysis module;
  • the fusion analysis module is used to calculate the recognition probability of tic or normal through the classification model of the numerically processed health information data and the time series data formed by the abnormal score value, and then use the Bayesian addition fusion rule to perform the identification of the two results. Addition and fusion, the category corresponding to the maximum value is used as the judgment result; the number of twitch peaks and timing points are obtained through the peak detection algorithm, and the peak time location is obtained from the frame sequence backtracking the original video, and the twitch occurrence time is obtained; the threshold is used to filter before and after the twitch peak occurs Interval, positioning to get the duration of each twitch; according to the abnormal score value, twitch occurrence time and each twitch duration, draw the corresponding analysis video's twitch abnormal score change curve and twitch action heat map, and calculate per minute based on the original video duration The frequency and duration of tics; the analysis results of the fusion analysis module provide patients with suggestions for the next step of examination and feedback information on their own
  • the preprocessing process of the video data by the data preprocessing module is specifically as follows: the collected facial video data is positioned through the face detection algorithm OpenFace to locate the area of the face in each frame of video image, and the twitching action in the original video image is removed. Irrelevant environmental information, focusing on the screener's facial twitching movements, and saving the processed image.
  • the three-dimensional channel attention module compresses the feature map F 3D with a size of (Channel, Dimension, Height, Weight) after convolution and pooling into a size of (Channel, 1, 1, 1)
  • the average temporal feature F 3D′ of the multi-layer perceptron MLP and the Sigmoid activation function are used to predict and calculate the importance of each channel, and the three-dimensional channel attention feature F 3D-C is obtained.
  • the specific calculation is:
  • F 3D-C Sigmoid(MLP(F 3D' )).
  • the three-dimensional spatial attention module compresses the feature F 3D with a size of (Channel, Dimension, Height, Weight) after convolution and pooling into a size of (1, Dimension, Height, Weight) through average pooling. ) average spatial feature F 3D′′ , and then the spatial attention feature F 3D-S is obtained through the Sigmoid activation function, which is specifically calculated as:
  • F 3D-S Sigmoid(F 3D′′ ).
  • the 3D channel attention module and the 3D spatial attention module are connected in parallel to the 3D convolution block containing two 3D convolution layers, and are respectively multiplied and added to the feature F 3D , and finally obtained
  • the calculation formula of the output feature F A is:
  • the training process of the multi-instance strategy training module is specifically as follows: the examples of the tic group and the examples of the control group respectively constitute the tic multi-instance package and against the multi-example package
  • the tic anomaly score sets ⁇ k a ⁇ and ⁇ k n ⁇ of all examples in the tic group and the control group are obtained, and the maximum abnormal scores in the tic multi-instance package and the control multi-instance package are calculated respectively and
  • i is the i-th example corresponding to the maximum abnormal score value of the tic multi-example package
  • j is the j-th example corresponding to the maximum abnormal score value in the control multi-example package
  • the corresponding example The abnormal score value of does not exist and is not included in the calculation of the mean
  • N a is the number of examples in the tic multi-example package
  • N n is the number of examples in the control multi-example package
  • m is the tic multi-instance package
  • N a represents the tic multi-instance package the number of examples in
  • the ranking loss function L based on the multi-instance learning strategy is expressed as follows:
  • an exponential decay function is used to iterate the learning rate Lr, and the expression is:
  • epoch_t is the current training round
  • the video data of the control group examples and the video data of the tic group examples are added to the data by adding random Gaussian noise, random color jitter, random rotation, and random clipping. Simulate the image quality changes, color changes, face direction changes, and lens distance changes that occur during the video data collection process.
  • the health information collected by the health information collection and processing module includes demographic information, living habits, eating habits, family history and family observation records.
  • the Gaussian kernel SVM classifier is trained with the health information data processed by the health information collection and processing module to obtain the recognition probability; the data output by the visual feature analysis module contains time series information, and the LSTM network is used to Perform training analysis with the Softmax function to obtain the recognition probability.
  • the present invention collects video data in a non-implantable and non-wearable manner, which is convenient, has good universal applicability of camera equipment, and has high system implantability.
  • the present invention uses video data to analyze and detect twitching movements, so that patients do not need to communicate face-to-face with doctors to reduce the tension and discomfort of patients in unfamiliar environments, and can better express the real condition.
  • the tic screening results can provide patients and parents with disease knowledge popularization, and can also provide doctors with reference for disease assessment and management.
  • the present invention can realize remote twitch recognition and detection through a communication network, reduce the number of times patients and parents go to specialized hospitals, and reduce time and travel costs.
  • Figure 1 is a schematic diagram of the three-dimensional convolutional neural network structure of the combined channel attention and spatial attention modules.
  • Figure 2 is a schematic diagram of the visual model analysis training process.
  • Fig. 3 is a schematic diagram of an auxiliary screening system for tic disorder based on machine vision.
  • Figure 4 is a schematic diagram of a fusion analysis and visualization module and a visualization example of screening results.
  • the present invention proposes an auxiliary tic disorder screening system based on video data, which includes a tic movement detection module, a health information collection and processing module, a visual data acquisition module and fusion analysis module;
  • the visual data acquisition module is used to collect the visual data required for the analysis of the system in two ways: one is to collect the real-time facial video data of the screener through the camera equipment configured in the system; the other is to pass the local upload interface to the screener. Investigators used to collect and save positive video data. In order to make the follow-up analysis results go smoothly, the collected video data requires at least 60 seconds, and there is no upper limit setting.
  • the collected video data is input to the twitch motion detection module;
  • the twitch detection module includes a data preprocessing module, a visual feature analysis module, a twitch abnormal score generation module and a multi-instance strategy training module;
  • the data preprocessing module processes the video data collected by the visual data acquisition module into time-series image data applicable to the deep learning network, specifically: the facial video data collected is positioned in each frame of video image data through the face detection algorithm OpenFace For the area of the face, remove the environmental information unrelated to the twitching action in the original video image, focus on the twitching action on the face of the screener, and intercept the face area, and save it as a 128*128 size image in the order of frames.
  • increase the amount of data in the training process by adding random Gaussian noise, random color jitter, random rotation, random clipping and other data amplification methods, and simulate the image quality changes, color changes, and face orientations that occur during video recording. Changes, changes in the distance of the lens, etc., enhance the data feature extraction capability, and finally save it as an image with a size of 112*112, and input it to the visual feature analysis module.
  • the visual feature analysis module performs video data feature analysis through a three-dimensional convolutional neural network model based on three-dimensional channel attention and three-dimensional spatial attention; the three-dimensional convolution kernel performs convolution operations on time-series data, which can simultaneously consider temporal features and Spatial features, suitable for video data analysis. Since the twitching parts of different tic patients are not necessarily the same, in addition to the feature extraction of the entire face, special attention needs to be paid to the local twitching part features. 3D-Spatial Attention) improves the three-dimensional convolutional neural network to enhance the ability of the model to extract visual features.
  • the three-dimensional convolutional neural network is composed of five three-dimensional convolutional blocks connected in sequence, including two ConvBlock-A (three-dimensional convolution combination A ) and 3 ConvBlock-B (3D Convolution Combination B) consisting of two 3D convolutional layers, one max pooling layer, a 3D channel attention module and a 3D spatial attention module.
  • the three-dimensional channel attention module is to compress the feature F 3D with a size of (Channel, Dimension, Height, Weight) after convolution and pooling in the convolution block into (Channel, 1, 1, 1 ) size average temporal feature F 3D′ , and then predict and calculate the importance of each channel through the multi-layer perceptron MLP and Sigmoid activation function to obtain the three-dimensional channel attention feature F 3D-C , the specific calculation is:
  • the three-dimensional spatial attention module is to compress the feature F 3D with a size of (Channel, Dimension, Height, Weight) after convolution and pooling in the convolution block into (1, Dimension, Height, Weight) through the average pooling operation
  • the average spatial feature F 3D′′ , and then the spatial attention feature F 3D-S is obtained through the Sigmoid activation function, and the specific calculation is:
  • the 3D channel attention module and the 3D spatial attention module are connected to the 3D convolution block structure in parallel, and are multiplied and added to the previous process feature F 3D respectively.
  • the calculation formula of the visual feature F A output by ConvBlock-B is:
  • the abnormal score generation module of the twitch action inputs the visual feature FA output by the visual feature analysis module to the abnormal score generation network for further analysis.
  • the abnormal score generation network model of the twitch is composed of three layers of fully connected layers, and the number of neurons is respectively 512, 64, 1, the first two layers of fully connected layers are activated by the ReLU function, the last layer is activated by the Sigmoid function, and finally the tic abnormality score is generated for subsequent learning and training.
  • the multi-instance strategy training module trains and learns the network model in the visual feature analysis module through the ranking loss function Ranking Loss based on the multi-instance learning strategy (Multi-Instance Learning, MIL).
  • MIL Multi-Instance Learning
  • the model learns a classifier based on a set of training bags, each bag bag is composed of multiple training examples instances, the positive bag contains at least one positive example, and all examples of the negative bag are Negative example.
  • the video data to be analyzed is regarded as a package in the multi-instance learning strategy, and the video data is divided into continuous and non-overlapping 16-frame time-series data as examples in the package.
  • the data is subjected to feature learning, and the score corresponding to each example is obtained through the constructed abnormal tic score generation network as the abnormal tic score value.
  • the score value ranges from 0 to 1, 0 means no tic action, 1 means tic action, and the score value High and low values indicate the possibility of twitching.
  • the score of the example with the highest score value among all examples represents the probability that the twitching action is present for the entire package (ie, the entire video).
  • model training phase 200 minutes of frontal face video data were collected in advance for the tic patients group and the normal control group in a natural state, and every 1 minute was used as an example packet, and every 16 frames of images were used as an example.
  • the data set is randomly divided into two groups according to the ratio of 70% of the training set and 30% of the test set.
  • the training set is used for model training, and the test set is used for model testing.
  • the learning and training are carried out by two groups of data in the tic group and the control group each time.
  • the three-dimensional convolutional neural network model on the video data path of the control group and the video data of the tic group implements a model parameter sharing mechanism.
  • the tic group and the control group Respectively constitute the tic multi-example package and against the multi-example package
  • the tic anomaly score sets ⁇ k a ⁇ and ⁇ k n ⁇ of all examples in the tic group and the control group are obtained, and the maximum anomaly scores in the tic multi-instance package and the control multi-instance package are calculated respectively and
  • the abnormal score values of the two examples before and after the maximum value of the abnormal score value of the tic multi-instance package and the control multi-instance package are obtained, and used and Respectively represent the continuation stage of the maximum probability of suspected tic movements in the tic multi-instance package and the control multi-instance package, and use the average value of abnormal scores in the continuation stage to represent the abnormal score value of the maximum probability suspected tic movements, excluding short-lived actions caused by common actions such as blinking , the maximum probability of suspected tic action in the
  • i is the i-th example corresponding to the maximum abnormal score value of the tic multi-example package
  • j is the j-th example corresponding to the maximum abnormal score value in the control multi-example package
  • the corresponding example The abnormal score value of does not exist and is not included in the calculation of the mean
  • N a is the number of examples in the tic multi-example package
  • N n is the number of examples in the control multi-example package
  • N a the tic multi-instance package The number of examples in .
  • is the penalty coefficient, and the higher the value, the heavier the penalty for the smooth constraint.
  • the high-resolution training model has good performance but slow training speed
  • the low-resolution training model has poor performance but fast training speed. Therefore, using the method of multi-grid training in numerical analysis, the batch number B and the number of sample frames Model parameters such as K, the length H and width W of the video frame image are used as the parameter grid grid, and the parameters are optimized from coarse-grained to fine-grained.
  • each video to be analyzed is regarded as a multi-instance packet, and divided into multiple examples according to 16 frames/example, and each example is passed through the learned three-dimensional convolutional neural network.
  • the network obtains the visual features, and the network obtains the abnormal score value through the twitch abnormal score generation network.
  • the largest score among the abnormal score values of all examples is used as the overall twitch abnormal score value of the video to be analyzed.
  • 0.5 is used as the threshold, and the threshold analysis is judged. Whether there is a twitching action, and the abnormal score values of all examples form time series data and input to the fusion analysis module.
  • Table 1 The test results are shown in Table 1:
  • the baseline method adopts a model composed of an unmodified three-dimensional convolutional neural network and a cross-entropy function, and * indicates that the results of the present invention and the baseline results are compared with statistical differences, which proves the effectiveness of the present invention in video data twitch detection.
  • the tic auxiliary screening system integrates health questionnaire data analysis and visual analysis, as shown in Figure 3, the health information collection and processing module collects health information including demographic information, living habits, eating habits, and family history according to the clinical diagnosis process and family observation records, etc., specifically including gender (male 1, female 0), age, whether abnormal tic movements were found (yes 1, no 0), whether there was a patient with tic symptoms in the family (yes 1, no 0), whether sleep Normal (yes 1, no 0), whether you sleep late (yes 1, no 0), whether you like to drink tea or coffee (yes 1, no 0), whether you exercise regularly (yes 1, no 0), etc., drawn according to statistical information Statistical distribution map, and convert the collected data into numerical values according to the contents of the brackets, and input them into the fusion analysis module.
  • the fusion analysis module conducts fusion analysis on the time-series data formed by numerically processed health information data and abnormal score values.
  • the time series data formed by the health information data and abnormal score values of the same individual X are respectively calculated through the classification model to obtain the recognition probability of tics or normal, and then the Bayesian fusion rule is used to add and fuse the two results.
  • Use the numerically processed health information data to train the Gaussian kernel SVM classifier to obtain the recognition probability Where i is tic or normal; the time series data formed by the abnormal score value contains time series information, so construct a single-layer LSTM network with 128 neurons and Softmax function for training and analysis, and get the recognition probability where i is tic or normal.
  • the overall recognition probability is calculated using the additive fusion rule of Bayesian theory Among them, P x is the prior probability of the category, the value is 0.5, M is the total category, the value is 2, and finally the category corresponding to the maximum value in the overall recognition probability is used as the rule for the judgment result Get the final judgment result, where i is tic or normal.
  • the number of twitch peaks and timing points are obtained through the peak detection algorithm, and the peak time location is obtained from the frame sequence back to the original video, and the twitch occurrence time is obtained; the interval before and after the twitch peak occurrence is screened through the threshold, and the duration of each twitch occurrence is obtained by positioning; according to the abnormality Score value, tic occurrence time and duration of each twitch to draw the change curve of abnormal tic score value and tic action heat map corresponding to the analysis video, and calculate the frequency and duration of tic occurrence per minute according to the length of the original video;
  • the reference basis for the degree based on the final judgment result of the fusion analysis module, the change curve of the abnormal tic score, the thermal map of the tic movement, and the statistical distribution map of the health information, the visualized analysis results are formed to provide the patient with next-step inspection suggestions and provide their own tic situation Feedback information can also provide doctors with an overview of patient tics, and provide auxiliary information for the next step of diagnosis and treatment
  • Screeners first enter the health information collection and processing module through the system of the present invention, input age, gender, disease history, living habits and other health data in the system, and then record 1-5 minutes of frontal video through the visual data acquisition module or through the upload button
  • the frontal video saved on the personal mobile phone is imported, and the system judges whether the video meets the analysis requirements through preliminary inspection.
  • the video data is preprocessed, video data feature analysis, twitch detection and other processes are carried out through the visual feature analysis module. Obtain the abnormal score value and tic detection results with time series characteristics, and then give the screening results according to the fusion analysis module.
  • screening results are positive, it will prompt for follow-up examination and diagnosis, and give the visual analysis results and tic fragments to the clinic Doctor's reference; if the screening result is negative, it indicates that no tic abnormality and related test data were found, which is for clinician's reference.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Pathology (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Veterinary Medicine (AREA)
  • Animal Behavior & Ethology (AREA)
  • Surgery (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Physiology (AREA)
  • Computational Linguistics (AREA)
  • Neurology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Neurosurgery (AREA)
  • Databases & Information Systems (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Fuzzy Systems (AREA)
  • Psychiatry (AREA)
  • Signal Processing (AREA)

Abstract

一种基于视频数据的抽动症辅助筛查系统,利用视频数据自动识别抽动症状,通过基于多示例学习的三维卷积神经网络,采用结合三维通道注意力和三维空间注意力模块对三维卷积神经网络学习的特征进行优化,采用时间平滑约束对损失函数进行优化,能够提高模型对抽动检测能力,并且结合临床问诊转化的健康信息问卷数据,形成抽动症辅助筛查系统,提高筛查识别效率,并且通过非直接接触方式减少患者在陌生环境的紧张和不适。通过视频数据采集和抽动检测的方式,简化其中最为耗时的症状观察过程,并通过数据融合分析和可视化,给筛查患者提供疾病的初步认知,也为医生后续诊断和治疗提供参考和依据。

Description

一种基于视频数据的抽动症辅助筛查系统 技术领域
本发明涉及医疗健康信息技术领域,尤其涉及一种基于视频数据的抽动症辅助筛查系统。
背景技术
根据中华医学会儿科学分会神经学组提出的《儿童抽动障碍诊断与治疗专家共识(2017实用版)》 [1],当一个人发病年龄在18岁以前,一年内同时表现出多种运动和一种或多种声音抽动,同时排除其他内科疾病(如病毒感染后脑炎等)或物质影响(如可卡因等)时,可确诊为多发性抽动症(Tourette syndrome,TS),其中持续性观察与检查性交谈部分需要花费较长时间。然而儿童一般天性好动,患者抽动症状产生时难以引起家长重视,使得多数患者儿童确诊时病情已经发展较为严重,影响治疗效果,加上不同患者症状严重程度差异较大,具有难以准确估计的长期预后,因此也需要定期前往医院就诊复查。
人工智能和机器学习技术在医学领域已广泛应用,在抽动症识别检测领域,利用抽动症患者大脑皮层网络活动数据对患者抽动动作进行检测 [2]和利用可穿戴设备记录分析抽动症患者站立与行走期间的运动数据检测抽动动作 [3]等方法都有较好的应用,但目前视频数据还很少应用。抽动患者视频数据分析模拟了医生临床诊断时对患者的观察过程,而在日常生活中,视频数据容易获取且实施过程简单。针对抽动症患者早期发现较为困难的问题,本发明利用深度学习领域的三维卷积神经网络模型来检测正面录制视频中的异常抽动动作,结合临床门诊的健康信息综合分析,提出基于视频数据的抽动检测方法及抽动症辅助筛查系统。
根据现有的诊断流程,医生需要花较长的时间去观察确认患者的抽动特征,需要跟病人和家属询问确认近期及之前发生的抽动特点、饮食习惯、生活习惯、家族病史等;并且在问诊过程中,患者可能会因为到新环境或者接触到陌生人而产生抑制性状态,不利于实际病情诊断和评估。目前抽动症患者抽动症状依靠主要通过根据抽动症临床诊断的复杂流程,且很多抽动患者抽动症状不易察觉的问题,而现有抽动检测方法依靠深度脑刺激或者穿戴式设备采集数据,数据采集方式较为复杂。
[1]中华医学会儿科学分会神经学组.儿童抽动障碍诊断与治疗专家共识(2017实用版)[J].中华实用儿科临床杂志,2017,32(15):1137–1140.
[2]Jonathan B.Shute et al.,“Thalamocortical network activity enables chronic tic detection in humans with Tourette syndrome,”NeuroImage:Clinical,vol.12,pp.165–172,Feb.2016,doi:10.1016/j.nicl.2016.06.015.
[3]Michel.Bernabei et al.,“Automatic detection of tic activity in the Tourette Syndrome,”in 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology,Aug.2010,pp.422–425,doi:10.1109/IEMBS.2010.5627374。
发明内容
本发明目的在于针对现有技术的不足,提出一种基于视频数据的抽动症辅助筛查系统,利用视频数据自动识别抽动症状,通过基于多示例学习的三维卷积神经网络,采用结合三维通道注意力和三维空间注意力模块对三维卷积神经网络学习的特征进行优化,采用时间平滑约束对损失函数进行优化,能够提高模型对抽动检测能力,并且结合临床问诊转化的健康信息问卷数据,形成抽动症辅助筛查系统,提高筛查识别效率,并且通过非直接接触方式减少患者在陌生环境的紧张和不适。本发明通过视频数据采集和抽动检测的方式,简化其中最为耗时的症状观察过程,并通过数据融合分析和可视化,给筛查患者提供疾病的初步认知,也为医生后续诊断和治疗提供参考和依据。
本发明的目的是通过以下技术方案来实现的:一种基于视频数据的抽动症辅助筛查系统,该系统包括抽动动作检测模块、健康信息采集处理模块、视觉数据获取模块和融合分析模块;
所述视觉数据获取模块用于采集筛查者面部视频数据,输入到抽动动作检测模块;
所述抽动动作检测模块包括数据预处理模块、视觉特征分析模块、抽动动作异常分数生成模块和多示例策略训练模块;
所述数据预处理模块将视觉数据获取模块采集的视频数据处理成适用于深度学习网络的时序图像数据,输入到视觉特征分析模块;
所述视觉特征分析模块通过基于三维通道注意力和三维空间注意力的三维卷积神经网络模型进行视频数据特征分析;所述三维卷积神经网络模型具有依次连接的p个包含一层三维卷积层的卷积块和q个包含两层三维卷积层的卷积块;所述q个包含两层三维卷积层的卷积块中均以并行方式接入三维通道注意力模块和三维空间注意力模块,提取卷积计算后的特征图的三维通道注意力特征和三维空间注意力特征,生成的特征图输入由全连接层网络模型组成的抽动动作异常分数生成模块,获得抽动动作异常分数值,通过异常分数阈值分析判断是否存在抽动动作;同时异常分数值形成时序数据输入到融合分析模块;
所述多示例策略训练模块基于对照组示例和抽动组示例对视觉特征分析模块中的网络模型进行多示例学习策略训练,所述对照组示例和抽动组示例分别通过各自的视频数据抽取若干段固定连续帧得到;通过视觉特征分析模块得到抽动组和对照组不同示例的抽动异常分数,基于排序损失函数计算每一次训练的损失值,并更新视觉特征分析模块中的网络模型参数;
所述健康信息采集处理模块基于抽动症临床诊断过程采集并统计筛查者的健康信息,并对采集的健康信息数据进行数值型转化,输入到融合分析模块;
所述融合分析模块用于将数值型处理后的健康信息数据和异常分数值形成的时序数据分别通过分类模型计算得到抽动或正常的识别概率,再利用贝叶斯加法融合规则进行两种结果的相加融合,将最大值对应类别作为判定结果;通过峰值检测算法得到抽动峰值个数和时序位点,从帧序列回溯原视频得到峰值时间定位,得到抽动发生时间;通过阈值筛选抽动峰值发生前后区间,定位得到每次抽动发生持续时间;根据异常分数值、抽动发生时间和每次抽动持续时间绘制对应分析视频的抽动异常分数值变化曲线和抽动动作热力图,并根据原视频时长计算每分钟抽动发生频次和持续时间;融合分析模块的分析结果为患者提供下一步检查建议以及提供自身抽动情况反馈信息,同时也为医生提供患者抽动情况辅助筛查信息。
进一步地,所述数据预处理模块对视频数据预处理过程具体为:将采集的面部视频数据经过人脸检测算法OpenFace定位每一帧视频图像中人脸的区域,去除原始视频图像中与抽动动作无关的环境信息,聚焦筛查者面部抽动动作,并保存处理后的图像。
进一步地,所述三维通道注意力模块将经过卷积和池化后的大小为(Channel,Dimension,Height,Weight)的特征图F 3D通过平均池化压缩成大小为(Channel,1,1,1)的平均时序特征F 3D′,通过多层感知机MLP和Sigmoid激活函数对每个通道的重要性进行预测计算,得到三维通道注意力特征F 3D-C,具体计算为:
F 3D-C=Sigmoid(MLP(F 3D′))。
进一步地,所述三维空间注意力模块将经过卷积和池化后的大小为(Channel,Dimension,Height,Weight)的特征F 3D通过平均池化压缩成大小为(1,Dimension,Height,Weight)的平均空间特征F 3D″,然后通过Sigmoid激活函数得到空间注意力特征F 3D-S,具体计算为:
F 3D-S=Sigmoid(F 3D″)。
进一步地,所述三维通道注意力模块和三维空间注意力模块以并行方式接入包含两层三维卷积层的三维卷积块中,分别与特征F 3D进行相乘并相加操作,最后得到输出特征F A的计算公式为:
Figure PCTCN2022140523-appb-000001
其中
Figure PCTCN2022140523-appb-000002
为相乘操作。
进一步地,所述多示例策略训练模块的训练过程具体如下:抽动组示例和对照组示例分别构成抽动多示例包
Figure PCTCN2022140523-appb-000003
和对照多示例包
Figure PCTCN2022140523-appb-000004
通过视觉特征分析模块中的网络模型得到抽动 组和对照组的所有示例的抽动异常分数集{k a}和{k n},分别计算得到抽动多示例包和对照多示例包中异常分数最大值
Figure PCTCN2022140523-appb-000005
Figure PCTCN2022140523-appb-000006
获取抽动多示例包和对照多示例包异常分数值最大值前后各两个示例的异常分数值,用
Figure PCTCN2022140523-appb-000007
Figure PCTCN2022140523-appb-000008
分别表示抽动多示例包和对照多示例包最大概率疑似抽动动作发生持续阶段,用持续阶段异常分数平均值表示最大概率疑似抽动动作的异常分数值,抽动多示例包中疑似存在抽动动作最大概率
Figure PCTCN2022140523-appb-000009
和对照多示例包中疑似抽动动作最大概率
Figure PCTCN2022140523-appb-000010
计算公式如下:
Figure PCTCN2022140523-appb-000011
Figure PCTCN2022140523-appb-000012
其中,i为抽动多示例包异常分数值最大值对应的第i个示例,j为对照多示例包中异常分数值最大值对应的第j个示例;计算过程中,若出现i-2≤0或者i-1≤0或者i+1>N a或者i+2>N a或者j-2≤0或者j-1≤0或者j+1>N n或者j+2>N n,则对应示例的异常分数值不存在,不计入均值计算;N a为抽动多示例包中示例个数,N n为对照多示例包中示例个数;
通过排序损失函数计算每一次训练的损失值L,并通过梯度下降算法和反向传播更新视觉特征分析模块中的网络参数;基于多示例学习策略的排序损失L 1计算公式如下:
Figure PCTCN2022140523-appb-000013
损失函数中添加平滑约束项L 2,表达式如下:
Figure PCTCN2022140523-appb-000014
其中,m为抽动多示例包
Figure PCTCN2022140523-appb-000015
中第m个示例,N a表示抽动多示例包
Figure PCTCN2022140523-appb-000016
中示例个数;
基于多示例学习策略的排序损失函数L表达如下:
L=L 1+λL 2
其中λ为惩罚系数,数值越高代表平滑约束项的惩罚越重。
进一步地,所述多示例策略训练模块的训练过程中,采用指数衰减函数进行学习率Lr的迭代,表达式为:
Lr=0.95 epoch_t*lr
其中,epoch_t为当前训练轮次,lr=0.001为初始学习率。
进一步地,所述多示例策略训练模块的训练过程中,将对照组示例的视频数据和抽动组示例的视频数据通过添加随机高斯噪声、随机颜色抖动、随机旋转、随机剪裁方式进行数据 扩增,模拟视频数据采集过程中出现的成像质量变化、色彩变化、人脸方向变化、镜头远近变化情况。
进一步地,所述健康信息采集处理模块采集的健康信息包括人口统计学信息、生活习惯、饮食习惯、家族史和家庭观察记录。
进一步地,所述融合分析模块中,利用健康信息采集处理模块进行数值化处理后的健康信息数据训练高斯核SVM分类器,得到识别概率;视觉特征分析模块输出的数据包含时序信息,采用LSTM网络和Softmax函数进行训练分析,得到识别概率。
本发明的有益效果:
1.本发明通过非植入式非穿戴式的方式采集视频数据,方式便捷,摄像设备普适性好,系统可植入性高。
2.本发明利用视频数据分析检测抽动动作,患者不用和医生面对面交流的方式减少患者在陌生环境的紧张和不适,更能表现真实病情。
3.本发明通过视频数据分析和健康信息数据融合分析,抽动筛查结果可给患者和家长提供疾病知识普及,也可给医生对病情评估和管理提供参考。
4.本发明可通过通信网络实现远程抽动识别检测,减少患者及家长前往专科医院的次数,减少时间和旅途成本。
附图说明
图1为联合通道注意力和空间注意力模块的三维卷积神经网络结构示意图。
图2为视觉模型分析训练流程示意图。
图3为基于机器视觉的抽动症辅助筛查系统示意图。
图4为融合分析和可视化模块和筛查结果可视化示例示意图。
具体实施方式
以下结合附图对本发明具体实施方式作进一步详细说明。
本发明根据抽动症筛查和诊断中的局限性,提出了一种基于视频数据的抽动症辅助筛查系统,该系统包括抽动动作检测模块、健康信息采集处理模块、视觉数据获取模块和融合分析模块;
视觉数据获取模块用于采集本系统分析所需视觉数据,通过两种方式实现:一是通过系统配置的摄像设备,正面采集筛查者即时面部视频数据;二是通过本地上传接口,传入筛查者以往采集并存留正面视频数据。为使后续分析结果顺利进行,采集的视频数据要求至少60秒,无上限设置。采集到的视频数据输入到抽动动作检测模块;
所述抽动动作检测模块包括数据预处理模块、视觉特征分析模块、抽动动作异常分数生 成模块和多示例策略训练模块;
所述数据预处理模块将视觉数据获取模块采集的视频数据处理成适用于深度学习网络的时序图像数据,具体为:将采集的面部视频数据经过人脸检测算法OpenFace定位每一帧视频图像数据中人脸的区域,去除原始视频图像中与抽动动作无关的环境信息,聚焦筛查者面部抽动动作,并截取人脸区域部分,并且按照帧的顺序,并保存成128*128大小的图像。在后续训练过程中,通过添加随机高斯噪声、随机颜色抖动、随机旋转、随机剪裁等数据扩增方式增加训练过程中数据量,模拟视频录制过程中出现的成像质量变化、色彩变化、人脸方向变化、镜头远近变化等情况,增强数据特征提取能力,最终保存成112*112大小的图像,并输入到视觉特征分析模块。
所述视觉特征分析模块通过基于三维通道注意力和三维空间注意力的三维卷积神经网络模型进行视频数据特征分析;通过三维卷积核在时序数据上做卷积操作,可以同时考虑时间特征和空间特征,适用于视频数据分析。由于不同的抽动症患者抽动的部位不一定相同,除了对整个面部的特征提取还需要特别关注局部抽动部位特征,因此通过联合三维通道注意力模块(3D-Channel Attention)和三维空间注意力模块(3D-Spatial Attention)改进三维卷积神经网络以提升模型提取视觉特征能力。如图1所示,三维卷积神经网络由5个三维卷积块依次连接构成,包括2个由一层三维卷积层和一层最大池化层组成的ConvBlock-A(三维卷积组合A)和3个由两层三维卷积层、一层最大池化层、一个三维通道注意力模块和一个三维空间注意力模块组成的ConvBlock-B(三维卷积组合B)。
其中三维通道注意力模块是将卷积块中经过卷积和池化后产生大小为(Channel,Dimension,Height,Weight)的特征F 3D通过平均池化操作压缩成(Channel,1,1,1)大小的平均时序特征F 3D′,然后通过多层感知机MLP和Sigmoid激活函数对每个通道的重要性进行预测计算,得到三维通道注意力特征F 3D-C,具体计算为:
F 3D-C=Sigmoid(MLP(F 3D′))
三维空间注意力模块是将卷积块中经过卷积和池化后产生大小为(Channel,Dimension,Height,Weight)的特征F 3D通过平均池化操作压缩成(1,Dimension,Height,Weight)平均空间特征F 3D″,然后通过Sigmoid激活函数得到空间注意力特征F 3D-S,具体计算为:
F 3D-S=Sigmoid(F 3D″)
为减少模型的复杂度和参数的计算量,三维通道注意力模块和三维空间注意力模块以并行方式接入三维卷积块结构中,分别与前一过程特征F 3D进行相乘并相加操作,最后得到 ConvBlock-B输出的视觉特征F A的计算公式为:
Figure PCTCN2022140523-appb-000017
其中
Figure PCTCN2022140523-appb-000018
为相乘操作。
所述抽动动作异常分数生成模块将视觉特征分析模块输出的视觉特征F A输入到抽动异常分数生成网络进行进一步分析,抽动异常分数生成网络模型由三层全连接层组成,神经元个数分别为512、64、1,前两层全连接层通过ReLU函数激活,最后一层通过Sigmoid函数激活,最后生成抽动异常分数,用于后续学习训练。
所述多示例策略训练模块通过基于多示例学习策略(Multi-Instance Learning,MIL)的排序损失函数Ranking Loss对视觉特征分析模块中的网络模型训练学习。
在经典的多示例方法下,模型对一个基于一组训练包的分类器进行学习,每个包bag由多个训练示例instance组成,阳性包至少包含一个正的示例,阴性包的所有示例都为负示例。待分析的视频数据视为多示例学习策略中的包,将视频数据分成连续不重合的16帧时序数据作为包中的示例,该模型利用构建的三维卷积网络模型,对每个示例的时序数据进行特征学习,并且通过构建的抽动异常分数生成网络得到每个示例对应的分数作为抽动动作异常分数值,分数值范围为0~1,0代表无抽动动作,1代表有抽动动作,分数值的高低代表存在抽动动作的可能性。所有示例中分数值最高的示例的分数代表整个包(即整段视频)存在抽动动作的可能性。
在模型训练阶段,通过事先采集的抽动症患者组和正常对照组自然状态下正面面部视频数据各200分钟,每1分钟为一个示例包,每16帧图像作为一个示例。数据集按照训练集70%、测试集30%的比例随机分配成两组。训练集用于模型训练,测试集用于模型测试。如图2所示,每次由抽动组和对照组两组数据进行学习训练,对照组视频数据和抽动组视频数据路径上的三维卷积神经网络模型实行模型参数共享机制,抽动组和对照组分别构成抽动多示例包
Figure PCTCN2022140523-appb-000019
和对照多示例包
Figure PCTCN2022140523-appb-000020
通过视觉特征分析模块中的网络得到抽动组和对照组的所有示例的抽动异常分数集{k a}和{k n},分别计算得到抽动多示例包和对照多示例包中异常分数最大值
Figure PCTCN2022140523-appb-000021
Figure PCTCN2022140523-appb-000022
根据抽动动作发生具有一定时长的持续性的特点,获取抽动多示例包和对照多示例包异常分数值最大值前后各两个示例的异常分数值,用
Figure PCTCN2022140523-appb-000023
Figure PCTCN2022140523-appb-000024
分别表示抽动多示例包和对照多示例包最大概率疑似抽动动作发生持续阶段,用持续阶段异常分数平均值表示最大概率疑似抽动动作的异常分数值,排除由于如眨眼等普通动作产生的短暂动作发生,抽动多示例包中疑似存在抽动动作最大概率
Figure PCTCN2022140523-appb-000025
和对照多示例包中疑似抽动动作最大概率
Figure PCTCN2022140523-appb-000026
计算公式如下:
Figure PCTCN2022140523-appb-000027
Figure PCTCN2022140523-appb-000028
其中,i为抽动多示例包异常分数值最大值对应的第i个示例,j为对照多示例包中异常分数值最大值对应的第j个示例;计算过程中,若出现i-2≤0或者i-1≤0或者i+1>N a或者i+2>N a或者j-2≤0或者j-1≤0或者j+1>N n或者j+2>N n,则对应示例的异常分数值不存在,不计入均值计算;N a为抽动多示例包中示例个数,N n为对照多示例包中示例个数;
通过排序损失函数计算每一次训练的损失值L,并通过梯度下降算法和反向传播更新视觉特征分析模块中的网络参数;基于多示例学习策略的排序损失L 1计算公式如下:
Figure PCTCN2022140523-appb-000029
另外考虑到多示例包中的各个示例也具有时序性,因此在抽动组多示例包中每个示例之间的抽动异常分数应该具有平滑性,由此在损失函数中增加平滑约束项L 2,表达式如下:
Figure PCTCN2022140523-appb-000030
其中m代表抽动多示例包
Figure PCTCN2022140523-appb-000031
中的第m个示例,N a表示抽动多示例包
Figure PCTCN2022140523-appb-000032
中示例个数。
最终基于多示例学习策略的排序损失函数L=L 1+λL 2,具体表达式如下:
Figure PCTCN2022140523-appb-000033
其中λ为惩罚系数,数值越高代表平滑约束项的惩罚越重。本发明使用λ=0.5用于模型训练。
在模型训练过程中,高分辨率训练模型性能好但训练速度慢,低分辨率训练模型性能差但训练速度快,因此利用数值分析中多网格训练的方法,将批数量B、示例帧数K、视频帧图像的长度H和宽度W等模型参数作为参数网格grid,从粗粒度到细粒度进行参数优化。其中,批数量B、示例帧数K、视频帧图像的长度H和宽度W的默认值设定为B=8,K=16,H=112,W=112,训练过程中参数网格以[
Figure PCTCN2022140523-appb-000034
Figure PCTCN2022140523-appb-000035
(B,K,H,W)]的顺序依次载入模型迭代训练,每组参数持续2个轮次epoch,一共进行50个轮次训练截止。
为了提高模型收敛效率,采用指数衰减函数进行学习率Lr的迭代,表达式为:
Lr=0.95 epoch_t*lr
其中epoch_t为当前训练轮次,lr=0.001为初始学习率。
在得到训练好的上述网络模型之后,在模型测试阶段,每个待分析视频视作一个多示例包,并按照16帧/个示例分成多个示例,每个示例通过学习好的三维卷积神经网络得到视觉特征,并通过抽动异常分数生成网络获得异常分数值,所有示例的异常分数值中最大的分数作为待分析视频的总体抽动异常分数值,根据统计概率以0.5为阈值,通过阈值分析判断是否存在抽动动作,同时所有示例的异常分数值形成时序数据输入到融合分析模块。测试结果如表1所示:
表1
  准确率(Accuracy) 查全率(Precision) 召回率(Recall)
基线 0.7798(±0.017) 0.8368(±0.032) 0.7886(±0.016)
本发明 0.9302*(±0.026) 0.9144*(±0.040) 0.9396*(±0.032)
其中基线方法采用无修改的三维卷积神经网络和交叉熵函数组成的模型,*表示本发明结果和基线结果对比具有统计学差异,证明了本发明在视频数据抽动检测上的有效性。
抽动症辅助筛查系统融合健康问卷数据分析以及可视化分析,如图3所示,所述健康信息采集处理模块按照临床诊断过程,采集健康信息包括人口统计学信息、生活习惯、饮食习惯、家族史和家庭观察记录等,具体包括性别(男1,女0)、年龄、是否发现过异常抽动动作(是1,否0)、家族中是否有抽动症状患者(是1,否0)、睡眠是否正常(是1,否0)、是否晚睡(是1,否0)、是否喜欢喝茶或咖啡(是1,否0)、是否经常运动(是1,否0)等,根据统计信息绘制统计分布图,并根据括号内容对收集的数据进行数值型转化,输入到融合分析模块。
如图4所示,所述融合分析模块将数值型处理后的健康信息数据和异常分数值形成的时序数据进行融合分析。在数据融合分析阶段,同一个体X的健康信息数据和异常分数值形成的时序数据分别通过分类模型计算得到抽动或正常的识别概率,再利用贝叶斯融合规则进行两种结果的相加融合。利用数值化处理后的健康信息数据训练高斯核SVM分类器,得到识别概率
Figure PCTCN2022140523-appb-000036
其中i为抽动或正常;异常分数值形成的时序数据包含时序信息,因此构建单层128个神经元的LSTM网络和Softmax函数进行训练分析,得到识别概率
Figure PCTCN2022140523-appb-000037
其中i为抽动或正常。由于上述两组数据特征是相互独立的,因此采用贝叶斯理论的加法融合规则计算总体识别概率
Figure PCTCN2022140523-appb-000038
其中P x为类别先验概率,取值为0.5,M为总类别,取值为2,最后通过总体识别概率中最大值对应的类别作为判定结果的规则
Figure PCTCN2022140523-appb-000039
得到最终判定结果,其中i为抽动或正常。通过峰值检测算法得到抽动 峰值个数和时序位点,从帧序列回溯原视频得到峰值时间定位,得到抽动发生时间;通过阈值筛选抽动峰值发生前后区间,定位得到每次抽动发生持续时间;根据异常分数值、抽动发生时间和每次抽动持续时间绘制对应分析视频的抽动异常分数值变化曲线和抽动动作热力图,并根据原视频时长计算每分钟抽动发生频次和持续时间;作为视频所属患者抽动严重程度的参考依据;根据融合分析模块的最终判定结果、抽动异常分数值变化曲线、抽动动作热力图以及健康信息的统计分布图形成可视化的分析结果,为患者提供下一步检查建议以及提供自身抽动情况反馈信息,同时也可为医生提供患者抽动概况,为下一步诊断治疗提供辅助信息。
实施案例:
筛查者通过本发明系统首先进入健康信息采集处理模块,在系统中输入年龄、性别、疾病史、生活作息习惯等健康数据,然后通过视觉数据获取模块录制1-5分钟正面视频或者通过上传按钮传入个人手机上保存的正面视频,系统通过初步检测,判断视频是否符合分析要求,确认符合分析要求后,通过视觉特征分析模块对视频数据进行预处理、视频数据特征分析、抽动检测等过程,得到具有时序特征的异常分数值和抽动检测结果,再根据融合分析模块给出筛查结果,若筛查结果为阳性,则提示进行后续检查诊断,并给出可视化的分析结果和抽动片段给临床医生参考;若筛查结果为阴性,则提示未发现抽动异常和相关检测数据,供临床医生参考。
上述实施例用来解释说明本发明,而不是对本发明进行限制,在本发明的精神和权利要求的保护范围内,对本发明作出的任何修改和改变,都落入本发明的保护范围。

Claims (10)

  1. 一种基于视频数据的抽动症辅助筛查系统,其特征在于,该系统包括抽动动作检测模块、健康信息采集处理模块、视觉数据获取模块和融合分析模块;
    所述视觉数据获取模块用于采集筛查者面部视频数据,输入到抽动动作检测模块;
    所述抽动动作检测模块包括数据预处理模块、视觉特征分析模块、抽动动作异常分数生成模块和多示例策略训练模块;
    所述数据预处理模块将视觉数据获取模块采集的视频数据处理成适用于深度学习网络的时序图像数据,输入到视觉特征分析模块;
    所述视觉特征分析模块通过基于三维通道注意力和三维空间注意力的三维卷积神经网络模型进行视频数据特征分析;所述三维卷积神经网络模型具有依次连接的p个包含一层三维卷积层的卷积块和q个包含两层三维卷积层的卷积块;所述q个包含两层三维卷积层的卷积块中均以并行方式接入三维通道注意力模块和三维空间注意力模块,提取卷积计算后的特征图的三维通道注意力特征和三维空间注意力特征,生成的特征图输入由全连接层网络模型组成的抽动动作异常分数生成模块,获得抽动动作异常分数值,通过异常分数阈值分析判断是否存在抽动动作;同时异常分数值形成时序数据输入到融合分析模块;
    所述多示例策略训练模块基于对照组示例和抽动组示例对视觉特征分析模块中的网络模型进行多示例学习策略训练,对照组示例和抽动组示例分别通过各自的视频数据抽取若干段固定连续帧得到;通过视觉特征分析模块得到抽动组和对照组不同示例的抽动异常分数,基于排序损失函数计算每一次训练的损失值,并更新视觉特征分析模块中的网络模型参数;
    所述健康信息采集处理模块基于抽动症临床诊断过程采集并统计筛查者的健康信息,并对采集的健康信息数据进行数值型转化,输入到融合分析模块;
    所述融合分析模块用于将数值型处理后的健康信息数据和异常分数值形成的时序数据分别通过分类模型计算得到抽动或正常的识别概率,再利用贝叶斯加法融合规则进行两种结果的相加融合,将最大值对应类别作为判定结果;通过峰值检测算法得到抽动峰值个数和时序位点,从帧序列回溯原视频得到峰值时间定位,得到抽动发生时间;通过阈值筛选抽动峰值发生前后区间,定位得到每次抽动发生持续时间;根据异常分数值、抽动发生时间和每次抽动持续时间绘制对应分析视频的抽动异常分数值变化曲线和抽动动作热力图,并根据原视频时长计算每分钟抽动发生频次和持续时间;融合分析模块的分析结果为患者提供下一步检查建议以及提供自身抽动情况反馈信息,同时也为医生提供患者抽动情况辅助筛查信息。
  2. 根据权利要求1所述的一种基于视频数据的抽动症辅助筛查系统,其特征在于,所述 数据预处理模块对视频数据预处理过程具体为:将采集的面部视频数据经过人脸检测算法OpenFace定位每一帧视频图像中人脸的区域,去除原始视频图像中与抽动动作无关的环境信息,聚焦筛查者面部抽动动作,并保存处理后的图像。
  3. 根据权利要求1所述的一种基于视频数据的抽动症辅助筛查系统,其特征在于,所述三维通道注意力模块将经过卷积和池化后的大小为(Channel,Dimension,Height,Weight)的特征图F 3D通过平均池化压缩成大小为(Channel,1,1,1)的平均时序特征F 3D′,通过多层感知机MLP和Sigmoid激活函数对每个通道的重要性进行预测计算,得到三维通道注意力特征F 3D-C,具体计算为:
    F 3D-C=Sigmoid(MLP(F 3D′))。
  4. 根据权利要求3所述的一种基于视频数据的抽动症辅助筛查系统,其特征在于,所述三维空间注意力模块将经过卷积和池化后的大小为(Channel,Dimension,Height,Weight)的特征F 3D通过平均池化压缩成大小为(1,Dimension,Height,Weight)的平均空间特征F 2D″,然后通过Sigmoid激活函数得到空间注意力特征F 3D-S,具体计算为:
    F 3D-S=Sigmoid(F 3D″)。
  5. 根据权利要求4所述的一种基于视频数据的抽动症辅助筛查系统,其特征在于,所述三维通道注意力模块和三维空间注意力模块以并行方式接入包含两层三维卷积层的三维卷积块中,分别与特征F 3D进行相乘并相加操作,最后得到输出特征F A的计算公式为:
    Figure PCTCN2022140523-appb-100001
    其中
    Figure PCTCN2022140523-appb-100002
    为相乘操作。
  6. 根据权利要求1所述的一种基于视频数据的抽动症辅助筛查系统,其特征在于,所述多示例策略训练模块的训练过程具体如下:抽动组示例和对照组示例分别构成抽动多示例包
    Figure PCTCN2022140523-appb-100003
    和对照多示例包
    Figure PCTCN2022140523-appb-100004
    通过视觉特征分析模块中的网络模型得到抽动组和对照组的所有示例的抽动异常分数集{k a}和{k n},分别计算得到抽动多示例包和对照多示例包中异常分数最大值
    Figure PCTCN2022140523-appb-100005
    Figure PCTCN2022140523-appb-100006
    获取抽动多示例包和对照多示例包异常分数值最大值前后各两个示例的异常分数值,用
    Figure PCTCN2022140523-appb-100007
    Figure PCTCN2022140523-appb-100008
    分别表示抽动多示例包和对照多示例包最大概率疑似抽动动作发生持续阶段,用持续阶段异常分数平均值表示最大概率疑似抽动动作的异常分数值,抽动多示例包中疑似存在抽动动作最大概率
    Figure PCTCN2022140523-appb-100009
    和对照多示例包中疑似抽动动作最大概率
    Figure PCTCN2022140523-appb-100010
    计算公式如下:
    Figure PCTCN2022140523-appb-100011
    Figure PCTCN2022140523-appb-100012
    其中,i为抽动多示例包异常分数值最大值对应的第i个示例,j为对照多示例包中异常分数值最大值对应的第j个示例;计算过程中,若出现i-2≤0或者i-1≤0或者i+1>N a或者i+2>N a或者j-2≤0或者j-1≤0或者j+1>N n或者j+2>N n,则对应示例的异常分数值不存在,不计入均值计算;N a为抽动多示例包中示例个数,N n为对照多示例包中示例个数;
    通过排序损失函数计算每一次训练的损失值L,并通过梯度下降算法和反向传播更新视觉特征分析模块中的网络参数;基于多示例学习策略的排序损失L 1计算公式如下:
    Figure PCTCN2022140523-appb-100013
    损失函数中添加平滑约束项L 2,表达式如下:
    Figure PCTCN2022140523-appb-100014
    其中,m为抽动多示例包
    Figure PCTCN2022140523-appb-100015
    中第m个示例,N a表示抽动多示例包
    Figure PCTCN2022140523-appb-100016
    中示例个数;
    基于多示例学习策略的排序损失函数L表达如下:
    L=L 1+λL 2
    其中λ为惩罚系数,数值越高代表平滑约束项的惩罚越重。
  7. 根据权利要求6所述的一种基于视频数据的抽动症辅助筛查系统,其特征在于,所述多示例策略训练模块的训练过程中,采用指数衰减函数进行学习率Lr的迭代,表达式为:
    Lr=0.95 epoch_t*lr
    其中,epoch_t为当前训练轮次,lr=0.001为初始学习率。
  8. 根据权利要求6所述的一种基于视频数据的抽动症辅助筛查系统,其特征在于,所述多示例策略训练模块的训练过程中,将对照组示例的视频数据和抽动组示例的视频数据通过添加随机高斯噪声、随机颜色抖动、随机旋转、随机剪裁方式进行数据扩增,模拟视频数据采集过程中出现的成像质量变化、色彩变化、人脸方向变化、镜头远近变化情况。
  9. 根据权利要求1所述的一种基于视频数据的抽动症辅助筛查系统,其特征在于,所述健康信息采集处理模块采集的健康信息包括人口统计学信息、生活习惯、饮食习惯、家族史和家庭观察记录。
  10. 根据权利要求1所述的一种基于视频数据的抽动症辅助筛查系统,其特征在于,所述融合分析模块中,利用健康信息采集处理模块进行数值化处理后的健康信息数据训练高斯 核SVM分类器,得到识别概率;视觉特征分析模块输出的数据包含时序信息,采用LSTM网络和Softmax函数进行训练分析,得到识别概率。
PCT/CN2022/140523 2021-12-24 2022-12-21 一种基于视频数据的抽动症辅助筛查系统 WO2023116736A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111594285.2 2021-12-24
CN202111594285.2A CN113990494B (zh) 2021-12-24 2021-12-24 一种基于视频数据的抽动症辅助筛查系统

Publications (1)

Publication Number Publication Date
WO2023116736A1 true WO2023116736A1 (zh) 2023-06-29

Family

ID=79734204

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/140523 WO2023116736A1 (zh) 2021-12-24 2022-12-21 一种基于视频数据的抽动症辅助筛查系统

Country Status (2)

Country Link
CN (1) CN113990494B (zh)
WO (1) WO2023116736A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118155835A (zh) * 2024-05-11 2024-06-07 成都中医药大学附属医院(四川省中医医院) 一种基于对比学习的抽动障碍检测方法、系统及存储介质
CN118172800A (zh) * 2024-05-15 2024-06-11 沈阳新维盛科生物科技有限公司 一种改进实验动物行为图像处理方法

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113990494B (zh) * 2021-12-24 2022-03-25 浙江大学 一种基于视频数据的抽动症辅助筛查系统
CN114496235B (zh) * 2022-04-18 2022-07-19 浙江大学 一种基于深度强化学习的血透患者干体重辅助调节系统
CN115105075A (zh) * 2022-05-17 2022-09-27 清华大学 抽动障碍检测方法及装置
CN115714016B (zh) * 2022-11-16 2024-01-19 内蒙古卫数数据科技有限公司 一种基于机器学习的布鲁氏菌病筛查率提升方法
CN117437678A (zh) * 2023-11-01 2024-01-23 烟台持久钟表有限公司 正面人脸持续时间统计方法、系统、装置、存储介质
CN117807154B (zh) * 2024-02-28 2024-04-30 成都菲宇科技有限公司 一种用于展示系统的时序数据可视化方法、设备和介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110301447A1 (en) * 2010-06-07 2011-12-08 Sti Medical Systems, Llc Versatile video interpretation, visualization, and management system
CN110516611A (zh) * 2019-08-28 2019-11-29 中科人工智能创新技术研究院(青岛)有限公司 一种自闭症检测系统及自闭症检测装置
CN111528859A (zh) * 2020-05-13 2020-08-14 浙江大学人工智能研究所德清研究院 基于多模态深度学习技术的儿童adhd筛查评估系统
CN111870253A (zh) * 2020-07-27 2020-11-03 上海大学 基于视觉和语音融合技术的抽动障碍症病情监测方法及其系统
CN113990494A (zh) * 2021-12-24 2022-01-28 浙江大学 一种基于视频数据的抽动症辅助筛查系统

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9530452B2 (en) * 2013-02-05 2016-12-27 Alc Holdings, Inc. Video preview creation with link
CN214128817U (zh) * 2020-08-17 2021-09-07 浙江大学 一种用于固定抽动症患者肢体的固定装置
CN113066576A (zh) * 2021-05-12 2021-07-02 北京大学深圳医院 一种基于三维掩模-区域卷积神经网络的肺癌筛查方法
CN113611411B (zh) * 2021-10-09 2021-12-31 浙江大学 一种基于假阴性样本识别的体检辅助决策系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110301447A1 (en) * 2010-06-07 2011-12-08 Sti Medical Systems, Llc Versatile video interpretation, visualization, and management system
CN110516611A (zh) * 2019-08-28 2019-11-29 中科人工智能创新技术研究院(青岛)有限公司 一种自闭症检测系统及自闭症检测装置
CN111528859A (zh) * 2020-05-13 2020-08-14 浙江大学人工智能研究所德清研究院 基于多模态深度学习技术的儿童adhd筛查评估系统
CN111870253A (zh) * 2020-07-27 2020-11-03 上海大学 基于视觉和语音融合技术的抽动障碍症病情监测方法及其系统
CN113990494A (zh) * 2021-12-24 2022-01-28 浙江大学 一种基于视频数据的抽动症辅助筛查系统

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118155835A (zh) * 2024-05-11 2024-06-07 成都中医药大学附属医院(四川省中医医院) 一种基于对比学习的抽动障碍检测方法、系统及存储介质
CN118172800A (zh) * 2024-05-15 2024-06-11 沈阳新维盛科生物科技有限公司 一种改进实验动物行为图像处理方法
CN118172800B (zh) * 2024-05-15 2024-08-16 沈阳医学院 一种改进实验动物行为图像处理方法

Also Published As

Publication number Publication date
CN113990494A (zh) 2022-01-28
CN113990494B (zh) 2022-03-25

Similar Documents

Publication Publication Date Title
WO2023116736A1 (zh) 一种基于视频数据的抽动症辅助筛查系统
WO2022042122A1 (zh) 脑电信号的分类方法、分类模型的训练方法、装置及介质
CN111990989A (zh) 一种基于生成对抗及卷积循环网络的心电信号识别方法
Haidar et al. Convolutional neural networks on multiple respiratory channels to detect hypopnea and obstructive apnea events
CN110619322A (zh) 一种基于多流态卷积循环神经网络的多导联心电异常信号识别方法及系统
Wang et al. A novel multi-scale dilated 3D CNN for epileptic seizure prediction
Chen et al. A new deep learning framework based on blood pressure range constraint for continuous cuffless BP estimation
Kuo et al. Automatic sleep staging based on a hybrid stacked LSTM neural network: verification using large-scale dataset
CN115530847A (zh) 一种基于多尺度注意力的脑电信号自动睡眠分期方法
CN114732409A (zh) 一种基于脑电信号的情绪识别方法
CN115336973A (zh) 基于自注意力机制和单导联心电信号的睡眠分期系统构建方法、及睡眠分期系统
Shu et al. Data augmentation for seizure prediction with generative diffusion model
Taghizadegan et al. Prediction of obstructive sleep apnea using ensemble of recurrence plot convolutional neural networks (RPCNNs) from polysomnography signals
Luo et al. Exploring adaptive graph topologies and temporal graph networks for EEG-based depression detection
Prabha et al. A Novel Analysis and Detection of Autism Spectrum Disorder in Artificial Intelligence Using Hybrid Machine Learning
Wang et al. Pay attention and watch temporal correlation: a novel 1-D convolutional neural network for ECG record classification
CN118044813B (zh) 基于多任务学习的心理健康状况评估方法及系统
Mohammadi et al. Two-step deep learning for estimating human sleep pose occluded by bed covers
Sangeetha et al. A CNN based similarity learning for cardiac arrhythmia prediction
Tyagi et al. Systematic review of automated sleep apnea detection based on physiological signal data using deep learning algorithm: a meta-analysis approach
CN115349821A (zh) 一种基于多模态生理信号融合的睡眠分期方法及系统
Melinda et al. A Novel Autism Spectrum Disorder Children Dataset Based on Thermal Imaging
Bao et al. A Feature Fusion Model Based on Temporal Convolutional Network for Automatic Sleep Staging Using Single-Channel EEG
CN117958759B (zh) 一种面向多人群的多导睡眠监测系统
Bakhtyari et al. Combination of ConvLSTM and attention mechanism to diagnose ADHD based on EEG signals

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22910048

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE