CN114093501A  Intelligent auxiliary analysis method for children's motor epilepsy based on synchronous video and electroencephalogram  Google Patents
Intelligent auxiliary analysis method for children's motor epilepsy based on synchronous video and electroencephalogram Download PDFInfo
 Publication number
 CN114093501A CN114093501A CN202111216851.6A CN202111216851A CN114093501A CN 114093501 A CN114093501 A CN 114093501A CN 202111216851 A CN202111216851 A CN 202111216851A CN 114093501 A CN114093501 A CN 114093501A
 Authority
 CN
 China
 Prior art keywords
 video
 electroencephalogram
 children
 feature
 convolution
 Prior art date
 Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
 Pending
Links
Images
Classifications

 G—PHYSICS
 G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
 G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
 G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
 G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computeraided diagnosis, e.g. based on medical expert systems

 A—HUMAN NECESSITIES
 A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
 A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
 A61B5/00—Measuring for diagnostic purposes; Identification of persons
 A61B5/0033—Features or imagerelated aspects of imaging apparatus classified in A61B5/00, e.g. for MRI, optical tomography or impedance tomography apparatus; arrangements of imaging apparatus in a room

 A—HUMAN NECESSITIES
 A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
 A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
 A61B5/00—Measuring for diagnostic purposes; Identification of persons
 A61B5/0059—Measuring for diagnostic purposes; Identification of persons using light, e.g. diagnosis by transillumination, diascopy, fluorescence
 A61B5/0077—Devices for viewing the surface of the body, e.g. camera, magnifying lens

 A—HUMAN NECESSITIES
 A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
 A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
 A61B5/00—Measuring for diagnostic purposes; Identification of persons
 A61B5/103—Detecting, measuring or recording devices for testing the shape, pattern, colour, size or movement of the body or parts thereof, for diagnostic purposes
 A61B5/11—Measuring movement of the entire body or parts thereof, e.g. head or hand tremor, mobility of a limb
 A61B5/1118—Determining activity level

 A—HUMAN NECESSITIES
 A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
 A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
 A61B5/00—Measuring for diagnostic purposes; Identification of persons
 A61B5/103—Detecting, measuring or recording devices for testing the shape, pattern, colour, size or movement of the body or parts thereof, for diagnostic purposes
 A61B5/11—Measuring movement of the entire body or parts thereof, e.g. head or hand tremor, mobility of a limb
 A61B5/1126—Measuring movement of the entire body or parts thereof, e.g. head or hand tremor, mobility of a limb using a particular sensing technique
 A61B5/1128—Measuring movement of the entire body or parts thereof, e.g. head or hand tremor, mobility of a limb using a particular sensing technique using image analysis

 A—HUMAN NECESSITIES
 A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
 A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
 A61B5/00—Measuring for diagnostic purposes; Identification of persons
 A61B5/24—Detecting, measuring or recording bioelectric or biomagnetic signals of the body or parts thereof
 A61B5/316—Modalities, i.e. specific diagnostic methods
 A61B5/369—Electroencephalography [EEG]
 A61B5/372—Analysis of electroencephalograms

 A—HUMAN NECESSITIES
 A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
 A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
 A61B5/00—Measuring for diagnostic purposes; Identification of persons
 A61B5/40—Detecting, measuring or recording for evaluating the nervous system
 A61B5/4076—Diagnosing or monitoring particular conditions of the nervous system
 A61B5/4094—Diagnosing or monitoring seizure diseases, e.g. epilepsy

 A—HUMAN NECESSITIES
 A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
 A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
 A61B5/00—Measuring for diagnostic purposes; Identification of persons
 A61B5/72—Signal processing specially adapted for physiological signals or for diagnostic purposes
 A61B5/7235—Details of waveform analysis

 A—HUMAN NECESSITIES
 A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
 A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
 A61B5/00—Measuring for diagnostic purposes; Identification of persons
 A61B5/72—Signal processing specially adapted for physiological signals or for diagnostic purposes
 A61B5/7235—Details of waveform analysis
 A61B5/725—Details of waveform analysis using specific filters therefor, e.g. Kalman or adaptive filters

 A—HUMAN NECESSITIES
 A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
 A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
 A61B5/00—Measuring for diagnostic purposes; Identification of persons
 A61B5/72—Signal processing specially adapted for physiological signals or for diagnostic purposes
 A61B5/7235—Details of waveform analysis
 A61B5/7264—Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems

 A—HUMAN NECESSITIES
 A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
 A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
 A61B5/00—Measuring for diagnostic purposes; Identification of persons
 A61B5/72—Signal processing specially adapted for physiological signals or for diagnostic purposes
 A61B5/7235—Details of waveform analysis
 A61B5/7264—Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
 A61B5/7267—Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems involving training the classification device

 G—PHYSICS
 G06—COMPUTING; CALCULATING OR COUNTING
 G06F—ELECTRIC DIGITAL DATA PROCESSING
 G06F18/00—Pattern recognition
 G06F18/20—Analysing
 G06F18/23—Clustering techniques
 G06F18/232—Nonhierarchical techniques
 G06F18/2321—Nonhierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
 G06F18/23213—Nonhierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. Kmeans clustering

 G—PHYSICS
 G06—COMPUTING; CALCULATING OR COUNTING
 G06F—ELECTRIC DIGITAL DATA PROCESSING
 G06F18/00—Pattern recognition
 G06F18/20—Analysing
 G06F18/24—Classification techniques
 G06F18/241—Classification techniques relating to the classification model, e.g. parametric or nonparametric approaches
 G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or nonparametric approaches based on the proximity to a decision surface, e.g. support vector machines

 G—PHYSICS
 G06—COMPUTING; CALCULATING OR COUNTING
 G06F—ELECTRIC DIGITAL DATA PROCESSING
 G06F18/00—Pattern recognition
 G06F18/20—Analysing
 G06F18/25—Fusion techniques
 G06F18/253—Fusion techniques of extracted features

 G—PHYSICS
 G06—COMPUTING; CALCULATING OR COUNTING
 G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
 G06N3/00—Computing arrangements based on biological models
 G06N3/02—Neural networks
 G06N3/04—Architecture, e.g. interconnection topology
 G06N3/045—Combinations of networks

 G—PHYSICS
 G06—COMPUTING; CALCULATING OR COUNTING
 G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
 G06N3/00—Computing arrangements based on biological models
 G06N3/02—Neural networks
 G06N3/08—Learning methods
Abstract
The invention discloses an intelligent auxiliary analysis method for children's motor epilepsy based on synchronous video and electroencephalogram. The invention combines the characteristics of the electroencephalogram signal and the characteristics of the synchronous video data, and aims to realize accurate and reliable seizure detection and identification of the children epileptic. According to the invention, electroencephalogram and video data characteristics are fused together to train a classification model, so that the recognition rate of seizure detection is improved, meanwhile, data balance is utilized to process the fusion characteristics, and the problem that data in an epileptic seizure period is far less than the seizure interval is solved; meanwhile, the position of the epileptic patient of the child in the video is detected by adopting a YOLO target detection method, and then the epileptic patient of the child is processed in the next step, so that the problem that video data acquired by a hospital camera is not ideal enough in actual work is solved; after the spatiotemporal interest points are extracted, the spatiotemporal interest point screening module is introduced, so that the feature extraction of redundant information in the video is reduced, the video features are simplified, and the children seizure detection rate is improved.
Description
Technical Field
The invention belongs to the field of electroencephalogram signal processing and computer vision and intelligent medical auxiliary analysis, and relates to an intelligent auxiliary analysis method for children's motor epilepsy based on synchronous video and electroencephalogram.
Background
Epileptic patients account for approximately 1% of the world population, most of which are children, and most of which are in developing countries, a chronic neurological disease that is very common in childhood. According to statistics, the incidence rate of children epilepsy is 1015 times that of adults, and longterm epileptic seizure seriously jeopardizes the growth and development of children and the development of cognitive ability. Therefore, treatment of epilepsy in children has become one of the concerns of countries around the world. Although the analysis of epilepsy in children has become more sophisticated, the randomness, repeatability and unpredictability of its seizures have been a difficulty in epilepsy analysis. And the deficiency of practical medical conditions, such as poor performance of medical equipment and the lack of effective auxiliary epilepsy detection and diagnosis systems, is also the reason why the current children epilepsy syndrome is so serious. Therefore, the reliable and efficient automatic detection system for the child active epileptic seizures has strong social and economic benefits, on one hand, the system can assist a doctor in preliminary screening and detection, and especially for largescale offline data, the work of the doctor is reduced; on the other hand, the system can provide auxiliary diagnosis training and learning for doctors in lowlevel hospitals such as communities and the like. This also helps to further reveal the regularity of seizures, thereby providing a more effective treatment for pediatric epileptic patients.
The brain electrical signal is the general reflection of the electrical activity of nerve cells in the brain and contains a large amount of information on physiological activity and mental activity. Generally, the electroencephalogram signals provide important basis for clinical diagnosis and analysis of children epilepsy. However, the conventional electroencephalogram signals have great limitations in terms of time and space, and in order to detect children's epileptic seizure more quickly and effectively in an auxiliary mode, the conventional electroencephalogram signals and video signals synchronized with the conventional electroencephalogram signals are combined to detect children's epileptic seizure.
Staging of epilepsy: based on clinical electroencephalogram signals of epileptics, the electroencephalogram signals are divided into two stages of attack intervals and attack periods, corresponding synchronous video data division is kept consistent with the two stages, and states of patients are detected on the basis of staging.
And (3) attack detection: firstly, intercepting electroencephalogram and video segments (including attack periods and attack intervals) with fixed lengths from electroencephalogram signals and synchronous video signals of epileptics, then analyzing feature differences of the attack periods and the attack intervals by utilizing a feature extraction and learning algorithm, and finally carrying out attack detection.
In motor epilepsy, although the duration of a seizure period is short relative to an interseizure period, in a video, motor epilepsy seizures of a child patient have obvious action changes, so that the detection of the motor epilepsy seizures of the child patient through the video is very important. And secondly, the electroencephalogram signals in the attack period can also present a form different from that in the attack period, so that a foundation is laid for detecting whether the children have the epilepsy attack or not by utilizing the fusion of the electroencephalogram and video characteristics. However, the current research on the detection of children's seizures has several major problems: 1) the existing research mainly focuses on detecting epileptic seizure by independently utilizing electroencephalogram signals or videos, and meanwhile, the research of utilizing the two signals is less. 2) In the existing research of detecting the epileptic seizure of children based on videos, the video data processing mode is ideal because only epileptic patients are left after the videos are cut in advance, however, the videos collected in hospitals not only include epileptic patients, but also other noise information such as the existence of other people. 3) In the existing children seizure detection method, information irrelevant to a patient in a video is not effectively screened out, so that the training difficulty is increased.
Disclosure of Invention
Aiming at the problems of the children epileptic seizure detection method, the electroencephalogram signal and the video signal are combined, a YOLO target detection algorithm is adopted, a method for screening spacetime interest points is introduced, and an intelligent auxiliary analysis method for children motor epilepsy based on synchronous video and electroencephalogram is provided.
In the invention, multichannel electroencephalogram data are firstly segmented, electroencephalogram signals of an attack interval and an attack period are intercepted, and then a series of preprocessing is adopted to obtain electroencephalogram signals with less interference. The preprocessing process of the brain electrical signals comprises the following steps: firstly, a bandpass filter is used for filtering frequency bands beyond 0.570 Hz, then a notch filter is used for eliminating power frequency interference of 50Hz, and finally artifacts are removed to obtain electroencephalogram signals with less interference. Then, the preprocessed electroencephalogram signals are subjected to MFCC (Mel Frequency Cepstral coeffients) and LPCC (Linear predictive Cepstral coeffients) feature extraction. While the synchronized video data signal is truncated and the resolution of the video is adjusted to 416 x 416 pixels, the frame rate of the video is 20 frames per second. And then detecting the position of the patient through a YOLO target detection neural network, extracting spacetime interest points in the video, screening the spacetime interest points outside a detection frame through a screening mechanism, reserving 9 blocks with each spacetime interest point as the center, forming 9 blocks at the same position of two continuous frames into a cube, and extracting the features of HOG, HOF, LBP and MBH of the cube. And further processing all the extracted video features by utilizing a bagofwords model to obtain final video features. And then fusing the video features and the electroencephalogram features. Because the number of samples between the onset and the onset is not equal, the feature data is subjected to SMOTE + Tomeklinks data equalization operation, and then the equalized features are put into a machine learning model for training. 10fold cross validation is used to evaluate model performance and ultimately obtain seizure detection results.
The technical scheme of the invention mainly comprises the following steps:
step 1, intercepting fragments of an attack interval and an attack period of an original electroencephalogram signal in original data, dividing the fragments into data frames of 6 seconds, filtering frequency bands except for 0.570 Hz by using a bandpass filter, removing power frequency interference of 50Hz by using a notch filter, removing artifacts, and finally obtaining an electroencephalogram signal with less interference;
step 2, extracting MFCC characteristics and LPCC characteristics of the preprocessed brain electrical signals, and visualizing the MFCC characteristics and the LPCC characteristics in a histogram form;
step 3, intercepting video segments synchronous with the electroencephalogram signals from the video data in the original data, wherein each segment of video data is also 6 seconds, adjusting the resolution of the video to 416 × 416 pixels, and adjusting the frame rate of the video to 20 frames per second, namely, the total frame number of one video is 120 frames;
step 4, inputting the video data processed in the step 3 into a neural network of a YOLO (YOLO) for target detection, wherein the neural network identifies the position of the children epileptic in the video and uses a red frame for identification;
step 5, detecting spacetime interest points of the video data processed in the step 4 to obtain the position of the spacetime interest points in each frame, comparing the position of the epileptic patient of the child, namely a red frame in the video, and screening out the spacetime interest points which are not in the red frame;
step 6, reserving 9 blocks with each spacetime interest point as the center, forming a cube by the 9 blocks at the same position of two continuous frames, and extracting features of HOG, HOF, LBP and MBH of the cube;
step 7, putting the extracted characteristics of the spatiotemporal interest points into a word bag model, constructing a word bank of 50 words, classifying the characteristics of each spatiotemporal interest point, storing the characteristics as a word frequency histogram, and finally obtaining 50dimensional video characteristics;
step 8, fusing the extracted electroencephalogram signal characteristics and video data characteristics;
step 9, carrying out SMOTE + Tomekliks data equalization processing on the fused features to enable the number of samples in the attack period to be equal to that of samples in the attack period, and obtaining the final children epilepsy features;
and step 10, putting the final children epileptic features into a machine learning algorithm for training, evaluating the performance of the model by adopting 10fold cross validation, and obtaining an epileptic seizure detection result.
Further, the specific process of step 2 is as follows:
21, segmenting the preprocessed electroencephalogram signals every 6s, performing preemphasis, framing and windowing on each segment of electroencephalogram signals, performing discrete Fourier transform to obtain frequency spectrums of each segment of electroencephalogram signals, performing square on the frequency spectrums to obtain power spectrums of each segment of electroencephalogram signals, processing the power spectrums through a Mel triangular filter bank, and performing discrete cosine transform to obtain Mel frequency cepstrum coefficient characteristics with the size of 21 x 12;
22, simultaneously, obtaining LPC coefficients by using an LPC function in Matlab for the preprocessed signals, and obtaining 21 x 16dimensional linear prediction cepstrum coefficient characteristics by calculating cepstrum.
Further, the specific process of step 3 is as follows:
31, intercepting interval of onset and segment of onset synchronous with the brain electrical signal in the video data, wherein each segment of video data is also 6 seconds, setting the resolution of the video to be 416 × 416 pixels, and setting the frame rate of the video to be 20 frames per second, namely, the total frame number of one 6second video to be 120 frames.
Further, the specific process of step 4 is as follows:
41, inputting the processed video data into a backbone network for feature extraction, wherein the used backbone network is Darknet53, the size of an input matrix is 416 × 3, and the total number of the input matrices is 2922 (the number of total videos) 120 (the number of frames of each video); the neural network of Darknet53 extracts features through 53 convolution and residual convolution layers, and a residual block is used for relieving the gradient disappearance problem caused by increasing the depth in the deep neural network; in the backbone network, the residual convolution is firstly carried out convolution with convolution kernel of 3 × 3 and step length of 2, the convolution can compress the width and height of the input characteristic layer to obtain a new characteristic layer, which is called layer, then 1 × 1 convolution and 3 × 3 convolution are carried out again, and the layer itself is added with the result; for the backbone network, each convolution part is convoluted by using an nn.Conv2d function, nn.BatchNorm2d is used for batch standardization, and an activation function is realized by using an nn.LeakyReLU function;
42, dividing the process of obtaining a prediction result from the features into two parts, namely constructing an FPN feature pyramid for reinforced feature extraction and predicting three effective feature layers by using a YOLO HEAD;
43, decoding the obtained three prediction results, reading the positions of the prediction frames from the numerical values of the stored frame coordinates, and performing score sorting and nonmaximum inhibition screening on the frame confidence degrees corresponding to the final multiple prediction results to obtain the final positions of the prediction frames;
and 44, obtaining the final position of the prediction frame and then drawing the final position on an original image, so that the position of the children epileptic patient in the video can be detected.
Further, the step of constructing the FPN feature pyramid for enhanced feature extraction in step 42 is as follows:
421, in the feature extraction part, a total of three feature layers extracted by YOLO are respectively positioned at the middle layer, the middle lower layer and the bottom layer of a trunk part Darknet53 for target detection, and the shape of the three feature layers is (52, 256), (26, 26, 512), (13, 13, 1024);
422. construction of FPN layer using three valid feature layers, performing 5 times convolution processing on 13 × 1024 feature layers for YOLO HEAD to obtain prediction result, and combining this part with 26 × 512 feature layers by performing upsampling using nn.upsample function, where the combined feature layer is 26 × 768;
423, continuing to perform convolution for 5 times on the combined feature layers in the same way, and respectively using the convolution for the YOLO HEAD to obtain a prediction result and performing upsampling to combine the feature layers with the feature layer of a higher layer, namely 52 × 512 feature layer, wherein the combined feature layer is 52 × 384;
424, finally, performing convolution processing on the combined characteristic layers for 5 times and using the combined characteristic layers for a YOLO HEAD to obtain a prediction result, and obtaining three characteristic layers in total and using the three characteristic layers for the YOLO HEAD to obtain the prediction result;
the step of predicting three effective feature layers by using the YOLO HEAD in the step 42 is as follows:
firstly, performing one convolution of 3 x 3 on three enhanced feature layers for feature integration, then performing one convolution of 1 x 1 for adjusting the number of channels, and finally obtaining three prediction results of an output layer; based on that the VOC data set has 20 types, plus 4 values for storing frame coordinates and one value for storing frame confidence, and there are 25 total values, and there are 3 prior frames for each feature point of each feature layer by YOLO, the channel number of the prediction result is 25 × 3 — 75; in practical cases, the input data is N pictures 416 × 416, so the shape of the three prediction results is (N,13,13,75), (N,26, 26, 75), (N,52, 52, 75), and corresponds to the position of 3 prior frames on the grid of 13 × 13, 26 × 26, 52 × 52 for each picture.
Further, the loss function used in the training includes a target positioning offset loss L_{loc}Target confidence loss L_{cof}And target classification penalty L_{cla}:
Loss＝L_{loc}+L_{conf}+L_{cla} (9)
(1) In order to predict the probability of the target existing in the target rectangular frame, the target confidence loss adopts binary cross entropy loss:
where S denotes the grid position, a total of S x S grid divisions, here 13, 26, 52, respectively, B denotes the prediction box generated by each grid, here 3,the probability score of the target contained in the jth prediction frame of the ith grid is shown, 0 represents the absence, and 1 represents the presence;representing its corresponding true value, λ_{noobj}Taking here the value of 0.5,whether the jth prediction box of the ith grid is responsible for the target or not is shown, if so, the value is 1, and the value is not responsible for 0;
(2) target class penalty L_{cla}(O, C) also used is a binary cross entropy loss:
wherein the content of the first and second substances,representing the probability of the jth prediction box on the ith mesh belonging to class C,the real value of the category of the mark box is represented, if the real value of the category belongs to the category C, the value is 1, and the rest is 0;
(3) loss of target location L_{loc}(l, g) loss due toThe function is the sum of the squares of the difference between the true deviation value and the predicted deviation value:
whereinAndindicating the predicted and true rectangular box center coordinates,andrepresenting the predicted and true rectangular box width height size, i and j represent the jth prediction box of the ith mesh.
Further, the specific process of step 5 is as follows:
51, detecting spacetime interest points of the processed video data; firstly, carrying out scale transformation on the video sequence, and converting the video into linear scale space representation through convolution operation of a video sequence f and a Gaussian kernel:
L(·:σ_{l} ^{2},τ_{l} ^{2})＝g(·:σ_{l} ^{2},τ_{l} ^{2})*f(·) (18)
where f () denotes a video sequence, g () sigma_{l} ^{2},τ_{l} ^{2}) Representing the Gaussian kernel, σ_{l} ^{2}Representing the variance, τ, of the spatial domain_{l} ^{2}Represents the variance of the time domain;
52, after the scale transformation, averaging the firstorder spatial and time derivatives by a Gaussian weighting function to obtain a spacetime secondorder matrix mu of 3 x 3, wherein the spacetime secondorder matrix mu is as follows:
wherein the temporal and spatial dimensions σ of the spatiotemporal second order matrix_{i} ^{2},τ_{i} ^{2}From the local time scale and spatial scale sigma_{l} ^{2},τ_{l} ^{2}(ii) related; g is g (.: sigma)_{i} ^{2},τ_{i} ^{2}) F is f (·);
53, calculating three eigenvalues lambda of spacetime second order matrix_{1},λ_{2},λ_{3}Then the expression form of the Harris corner function in the timespace domain is as follows:
H＝det(μ)k·trace^{3}(μ)＝λ_{1}λ_{2}λ_{3}k(λ_{1}+λ_{2}+λ_{3})^{3} (16)
wherein k is 0.001, σ_{l} ^{2}Is 4, tau_{l} ^{2}Is 2, σ_{i} ^{2}Is 8, τ_{i} ^{2}Is 4;
54, calculating a positive maximum value of the function H to obtain the positions of the spatiotemporal interest points in each frame, so that the number of the spatiotemporal interest points in the attack period of the pediatric epileptic patient is found to be remarkably increased compared with the attack interval;
and 55, comparing the positions of the children epileptic patients, namely a prediction box obtained by a YOLO target detection algorithm in the video, and screening out the spacetime interest points which are not in the prediction box.
Further, the specific process of step 6 is as follows:
61, reserving 9 blocks taking each spacetime interest point as a center, and forming 9 blocks at the same position of two continuous frames into a cube for extracting features;
62, extracting features of HOG, HOF, LBP and MBH for each cube;
63, the HOG, HOF, LBP and MBH characteristics are combined together, that is, each interest point obtains the characteristics with 1620 dimensions.
Further, the specific process of step 7 is as follows:
71, putting the characteristics of all videos into a bagofwords model, randomly generating 50 clustering center points by using a Kmeans algorithm, respectively calculating the distance between the characteristics of the spacetime interest points and each clustering center point, and finding the clustering center point closest to the characteristics of the spacetime interest points to be the category of the spacetime interest points;
72, repeatedly calculating the clustering center point of each category according to the classified data, and classifying according to the distance between the characteristics of the spatiotemporal interest points and each new clustering center point until the position of each clustering center point is not changed any more;
73, counting the number of the spacetime interest points in each category by using a histogram, carrying out homogenization treatment to obtain a final histogram of each video, and converting the final histogram into a feature vector, namely the final feature of each video is 50dimensional.
Further, the specific process of step 9 is as follows:
91, generating new samples using a method of synthesizing a few sample oversampling technique, the formula is as follows:
wherein x_{new}Is a newly generated sample of the sample that is,is one sample out of a few samples, x_{n}Is a sample in a K neighbor of x;
92, in order to eliminate the effect of interpolating a few samples into a majority sample, the dots in the glued state are eliminated by using the TomekLinks method, which is defined as follows:
wherein d () denotes the distance between two samples, C_{1},C_{2}Two different categories are represented which are,belongs to the sample; the final match between the onset of childhood epilepsy and the number of samples during the onset is achieved by both methods.
The invention has the following beneficial effects:
the method for assisting in analyzing and detecting the children's motor epilepsy based on the synchronous video and the electroencephalogram has the advantages that 1) the position of the children's epilepsy patient in the real video collected by a hospital can be detected by using a YOLO target detection algorithm, and the video can be used for detecting the seizures of the children's epilepsy patient for nonideal data, so that the accuracy of the seizures detection is improved. 2) The YOLO target detection frame spatial and temporal interest point screening module can remove redundant information in a video, further narrow the range of feature extraction, pay more attention to the feature difference between the seizure and nonseizure of children epileptics, and better play the advantages of machine learning, so that the accuracy of epileptic seizure detection is improved. 3) The accuracy rate of the detection of the children epileptic seizure of the machine learning model is improved through data equalization processing of fusion characteristics of the video and the electroencephalogram signals.
The method intercepts the interattack period and the interattack period data of the original electroencephalogram data and the video data, carries out a series of preprocessing such as filtering on the original electroencephalogram signals, and extracts the MFCC and LPCC characteristics of the electroencephalogram signals. Meanwhile, the position of the children epileptic patient in the video is detected by using a YOLO target detection algorithm, spacetime interest points of the video are extracted, the spacetime interest points which are not in a prediction frame are screened out, 9 blocks with each spacetime interest point as the center are reserved, 9 blocks in the same position of two continuous frames form a cube, and HOG, HOF, LBP and MBH characteristics of the cube are extracted. And putting the extracted features into a bagofwords model for classification to obtain a word frequency histogram of each video as a final video feature. And then, fusing the characteristics of the electroencephalogram and the video, carrying out data equalization on the fused characteristics by using SMOTE + TomekLinks, putting the obtained characteristics into a machine learning algorithm for training, evaluating the performance of the model by adopting a 10fold cross validation method, and obtaining a seizure detection result. Research shows that under the condition that spacetime interest points are not screened, an SVM model trained by fused EEG features and video HOW features is used for epileptic seizure detection, and the test classification accuracy rate is close to 97.86%; after the spatiotemporal interest points are screened, the classification accuracy of a model test trained by utilizing the fusion characteristics is 98.34%. The accuracy of model test classification trained before and after the point of interest is screened by independently using the HOW features of the video is 86.3 percent and 92.3 percent respectively; the classification accuracy of the SVM model test trained by solely using the EEG characteristics, the HOW characteristics of the video after screening the interest points and the fusion characteristics of the EEG characteristics and the HOW characteristics is respectively as follows: 97.5%, 92.3% and 98.34%. The above shows that under the condition of fusing the EEG characteristics and the video HOW characteristics, the model test classification accuracy before and after the interest point screening is improved by 0.48%; the SVM model is trained by using the video features alone, and the accuracy of testing and classifying before and after the interest points are screened is improved by 6 percent; compared with the method which only uses the EEG characteristic and the video HOW characteristic after the point of interest is screened, the method has the advantages that the HOW characteristic is improved by 0.84% and 6.04%. These data indicate that the method can be used to effectively detect seizures in pediatric epileptic patients. The method can finish accurate identification of the detection of the epileptic seizure of the children, and can establish an accurate, reliable and efficient automatic epileptic seizure detection auxiliary system for the epileptic seizure of the children, especially for largescale offline data. This also helps to further reveal the regularity of seizures, thereby providing a more effective treatment for pediatric epileptic patients.
Drawings
FIG. 1: flow chart of the invention.
Detailed description of the invention
The following detailed description of embodiments of the invention refers to the accompanying drawings.
The invention provides a synchronous video and electroencephalogrambased children motor epilepsy auxiliary analysis and detection method, which combines characteristics of electroencephalogram signals and characteristics of synchronous video data and aims to realize accurate and reliable seizure detection and identification of children epilepsy patients. The innovation of the method is as follows: (1) the method fuses the characteristics of the electroencephalogram and the video data together to train a classification model, improves the recognition rate of seizure detection, and simultaneously utilizes data balance to process the fusion characteristics, thereby overcoming the problem that the data of the epileptic seizure period is far less than the seizure interval; (2) the position of the epileptic patient of the child in the video is detected by adopting a YOLO target detection method, and then the epileptic patient of the child is processed in the next step, so that the problem that video data acquired by a hospital camera is not ideal enough in actual work is solved; (3) after the spatiotemporal interest points are extracted, a spatiotemporal interest point screening module is introduced, so that feature extraction of redundant information in the video is reduced, video features are simplified, and the detection rate of the children epileptic seizure is improved.
As shown in fig. 1, the implementation steps of the child motor epilepsy auxiliary analysis and detection method based on the synchronous video and electroencephalogram are introduced in detail in the invention content, that is, the technical scheme of the invention mainly comprises the following steps:
step 1, intercepting the segments of the attack interval and the attack period of the original electroencephalogram signal in the original data, dividing the segments into data frames of 6 seconds, filtering out the frequency bands of 0.570 Hz by using a bandpass filter, eliminating 50Hz power frequency interference by using a notch filter, removing artifacts, and finally obtaining the electroencephalogram signal with less interference.
And 2, extracting the MFCC characteristics and the LPCC characteristics of the preprocessed brain electrical signals, and visualizing the MFCC characteristics and the LPCC characteristics in a histogram mode.
And 3, intercepting video segments synchronous with the electroencephalogram signals from the video data in the original data, wherein each segment of video data is also 6 seconds, adjusting the resolution of the video to 416 × 416 pixels, and adjusting the frame rate of the video to 20 frames per second, namely, the total frame number of one video is 120 frames.
And 4, inputting the video data processed in the step 3 into a neural network of the YOLO for target detection, wherein the neural network identifies the position of the children epileptic in the video and uses a red frame to identify the position.
And 5, detecting the spacetime interest points of the video data processed in the step 4 to obtain the position of the spacetime interest points in each frame, comparing the position of the children epileptic patient, namely the red frame in the video, and screening out the spacetime interest points which are not in the red frame.
And 6, reserving 9 blocks taking each spatiotemporal interest point as the center, forming 9 blocks at the same position of two continuous frames into a cube, and performing HOG (histogram of organized gradient), HOF (histogram of organized flow), LBP (local Binary pattern) and MBH (motion Boundary maps) feature extraction on the cube.
And 7, putting the extracted characteristics of the spatiotemporal interest points into a word bag model, constructing a word bank of 50 words, classifying the characteristics of each spatiotemporal interest point, storing the characteristics as a word frequency histogram, and finally obtaining 50dimensional video characteristics.
And 8, fusing the extracted electroencephalogram signal characteristics and the video data characteristics.
And 9, carrying out SMOTE + Tomekliks data equalization processing on the fused features to enable the number of samples in the attack period to be equal to that of samples in the attack period, so as to obtain the final children epilepsy features.
And step 10, putting the final children epileptic features into a machine learning algorithm for training, evaluating the model performance by adopting a 10fold cross validation method, and obtaining an epileptic seizure detection result.
The specific process of the step 1 is as follows:
11, the original EEG signal comprises 21 channels, and the sampling frequency is 1000 Hz. Firstly, intercepting the segments of the attack interval and the attack period in the original signal, then filtering out the frequency bands beyond 0.570 Hz by using a bandpass filter, eliminating the power frequency interference of 50Hz by using a notch filter, and removing the artifacts to obtain the preprocessed electroencephalogram signal.
The specific process of the step 2 is as follows:
21, cutting the preprocessed electroencephalogram signals every 6s, preemphasizing, framing and windowing each section of electroencephalogram signals, then performing Discrete Fourier Transform (DFT) to obtain frequency spectrums of each section of electroencephalogram signals, squaring the frequency spectrums to obtain power spectrums of each section of electroencephalogram signals, processing the power spectrums by a Mel triangular filter bank, and then performing Discrete Cosine Transform (DCT) to obtain Mel Frequency Cepstrum Coefficients (MFCC) characteristics with the size of 21 x 12.
Further, the step of extracting the MFCC features by using the Mel triangular filter bank comprises the following steps:
(1) preemphasis processing is performed on the processed signal, namely, the processed signal is passed through a Gaussian filter, which is defined as follows:
H(Z)＝10.97Z^{1} (1)
(2) setting the frame length (wlen) as 200 and the frame shift (inc) as 80, calculating time and framing, and obtaining the frame number after framing;
(3) calculating the time frame corresponding to each frame, wherein the calculation formula is as follows:
frametime＝(((1:fn)1)*inc+wlen/2)/fs (2)
wherein frame time represents the time corresponding to each frame, fn represents the frame number, inc represents the frame shift, wlen represents the frame length, and fs represents the sampling frequency; m denotes the length of the truncated signal.
(4) A windowing function is selected that is windowed to improve continuity between its frames. The window function selects the hamming window, which is defined as follows:
ω(t)＝[0.540.46cos(2πn/(M1))],0≤n≤N1 (3)
(5) and performing Discrete Fourier Transform (DFT) to obtain the frequency spectrum of the signal, and squaring the frequency spectrum to obtain the energy spectrum of the signal.
Where x (N) is the input signal and N represents the number of fourier transform points.
(6) Passing its energy spectrum through a set of Melscale triangular filter banks, wherein the frequency response of the kth Melscale triangular filter is defined as:
where f (m) is the center frequency of each triangular filter.
(7) The logarithmic energy of each filter bank output is calculated as:
(8) the logarithmic energy of each filter is introduced into discrete cosine transform to obtain 12order MFCC coefficients. The corresponding discrete cosine transform formula is as follows:
wherein L refers to the MFCC coefficient order, and here we take 12, and M refers to the number of triangular filters.
22, simultaneously, obtaining LPC coefficients by using LPC functions in Matlab for the preprocessed signals, and obtaining 21 x 16dimensional Linear Prediction Cepstrum Coefficient (LPCC) characteristics by calculating cepstrum.
The specific process of the step 3 is as follows:
31, intercepting interval of onset and segment of onset synchronous with the brain electrical signal in the video data, wherein each segment of video data is also 6 seconds, setting the resolution of the video to be 416 × 416 pixels, and setting the frame rate of the video to be 20 frames per second, namely, the total frame number of one 6second video to be 120 frames.
The specific process of the step 4 is as follows:
41, inputting the processed video data into a backbone network for feature extraction, wherein the backbone network used in the method is Darknet53, the size of an input matrix is 416 × 3, and 2922 (the total number of videos) is input by 120 (the number of frames of each video) in total. The neural network of Darknet53 extracts features through 53 convolution and residual convolution layers, and uses a residual block to alleviate the problem of gradient disappearance caused by increasing depth in a deep neural network. In the backbone network, the residual convolution is used by first performing convolution with convolution kernel 3 × 3 and step size 2, and the convolution will compress the width and height of the input feature layer to obtain a new feature layer, which is called layer, and then performing convolution with 1 × 1 and convolution with 3 × 3, and adding the result to layer itself. For the backbone network, each convolution part is convolved by using an nn.Conv2d function, nn.BatchNorm2d is used for batch normalization, and an activation function is realized by using an nn.LeakyReLU function. The mathematical expression for Leaky ReLU can be expressed as:
where a takes 10 and x, y correspond to values on the x and y axes.
42, the process of obtaining the prediction result from the features can be divided into two parts, namely constructing an FPN feature pyramid to perform enhanced feature extraction and predicting three effective feature layers by using a YOLO HEAD.
Further, the step of constructing the FPN feature pyramid for enhanced feature extraction is as follows:
(1) in the feature extraction part, the YoLO extracts three feature layers in total for target detection, the three feature layers are respectively positioned at the middle layer, the middle lower layer and the bottom layer of the trunk part Darknet53, and the shape of the three feature layers is (52, 256), (26, 26, 512), (13, 13, 1024).
(2) The FPN layer was constructed using three valid feature layers, 13 × 1024 feature layers were convolved 5 times for YOLO HEAD to obtain the prediction results, and this part was upsampled using nn.upsample function and combined with 26 × 512 feature layers, which was 26 × 768.
(3) And similarly, continuously performing 5 times of convolution on the combined feature layers, respectively performing the convolution on the combined feature layers for the YOLO HEAD to obtain a prediction result and performing upsampling to combine the feature layers with a higher feature layer, namely 52 × 512 feature layer, wherein the combined feature layer is 52 × 384.
(4) And finally, performing convolution processing on the combined characteristic layer for 5 times and using the convolution processing for YOLO HEAD to obtain a prediction result. Three feature layers were obtained in total for YOLO HEAD to obtain predicted results.
Further, the steps of predicting three effective feature layers by using the YOLO HEAD are as follows: firstly, performing one convolution of 3 × 3 for feature integration, then performing one convolution of 1 × 1 for adjusting the number of channels, and finally obtaining three prediction results of the output layer. Based on the fact that there are 20 types of VOC data sets, plus 4 values for storing the coordinates of the frames and one value for storing the confidence of the frames, and there are a total of 25 values, and there are 3 prior frames for each feature point of each feature layer by YOLO, the number of channels of the prediction result is 25 × 3 — 75. In practical cases, the input data is N pictures 416 × 416, so the shape of the three prediction results is (N,13,13,75), (N,26, 26, 75), (N,52, 52, 75), and corresponds to the position of 3 prior frames on the grid of 13 × 13, 26 × 26, 52 × 52 for each picture.
And 43, decoding the three prediction results obtained in the previous step, reading the positions of the prediction frames from the numerical values of the stored frame coordinates, and performing score sorting and nonmaximum inhibition screening on the frame confidence degrees corresponding to the final multiple prediction results to obtain the final positions of the prediction frames.
And 44, obtaining the final position of the prediction frame, and then drawing the final position on an original image (namely a frame image in the corresponding video), so that the position of the children epileptic in the video can be detected.
The loss function used in the training is mainly divided into three parts: loss of target location offset L_{loc}Target confidence loss L_{cof}And target classification penalty L_{cla}。
Loss＝L_{loc}+L_{conf}+L_{cla} (9)
(4) Target confidence in order to predict the probability of the target existing in the target rectangular frame, the target confidence loss adopts Binary Cross Entropy loss (Binary Cross Entropy):
where S denotes the grid position, a total of S x S grid divisions, here 13, 26, 52, respectively, B denotes the prediction box generated by each grid, here 3,the probability score of the target contained in the jth prediction box of the ith grid is shown, 0 indicates absence, and 1 indicates presence.Representing its corresponding true value, λ_{noobj}Taking here the value of 0.5,the jth prediction box representing the ith trellis is responsible for this goal, if it is, the value is 1, and not the value is 0.
(5) Target class penalty L_{cla}(O, C) also used is a binary cross entropy loss:
wherein the content of the first and second substances,representing the probability of the jth prediction box on the ith mesh belonging to class C,and the real value of the category of the mark box is represented, if the real value belongs to the category C, the value is 1, and the rest are 0.
(6) Loss of target location L_{loc}(l, g) the loss function used is the sum of the squares of the difference between the true deviation value and the predicted deviation value:
whereinAndindicating the predicted and true rectangular box center coordinates,andrepresenting the predicted and true rectangular box width height size, i and j represent the jth prediction box of the ith mesh.
The specific process of the step 5 is as follows:
and 51, detecting the spacetime interest points of the processed video data. Firstly, carrying out scale transformation on the video sequence, and converting the video into linear scale space representation through convolution operation of a video sequence f and a Gaussian kernel:
L(·:σ_{l} ^{2},τ_{l} ^{2})＝g(·:σ_{l} ^{2},τ_{l} ^{2})*f(·) (18)
where f () denotes a video sequence, g () sigma_{l} ^{2},τ_{l} ^{2}) Representing the Gaussian kernel, σ_{l} ^{2}Representing the variance, τ, of the spatial domain_{l} ^{2}Representing the variance of the time domain.
52, after the scale transformation, averaging the firstorder spatial and time derivatives by a Gaussian weighting function to obtain a spacetime secondorder matrix mu of 3 x 3, wherein the spacetime secondorder matrix mu is as follows:
wherein the temporal and spatial dimensions σ of the spatiotemporal second order matrix_{i} ^{2},τ_{i} ^{2}From the local time scale and spatial scale sigma_{l} ^{2},τ_{l} ^{2}It is related. g is g (.: sigma)_{i} ^{2},τ_{i} ^{2}) And f is f (·).
53, calculating three eigenvalues lambda of spacetime second order matrix_{1},λ_{2},λ_{3}Then the expression form of the Harris corner function in the timespace domain is as follows:
H＝det(μ)k·trace^{3}(μ)＝λ_{1}λ_{2}λ_{3}k(λ_{1}+λ_{2}+λ_{3})^{3} (16)
in this invention, k is 0.001, σ_{l} ^{2}Is 4, tau_{l} ^{2}Is 2, σ_{i} ^{2}Is 8, τ_{i} ^{2}Is 4.
54, calculating the positive maximum value of the function H, namely obtaining the position of the spatiotemporal interest points in each frame, thereby finding that the number of the spatiotemporal interest points in the attack period of the children epileptic is obviously increased compared with the interattack period.
And 55, comparing the positions of the children epileptic patients, namely a prediction box obtained by a YOLO target detection algorithm in the video, and screening out the spacetime interest points which are not in the prediction box.
The specific process of the step 6 is as follows:
61, reserving 9 blocks taking each spacetime interest point as the center, and combining the 9 blocks at the same position of two continuous frames into a cube for extracting the features.
62, extracting the features of HOG, HOF, LBP and MBH for each cube.
(1) Hog (histogramoforinteredgradient) feature. Firstly, the size and the direction of the image gradient are calculated, and the formula is as follows:
G_{x}(x,y)＝H(x+1,y)H(x1,y) (17)
G_{y}(x,y)＝H(x,y+1)H(x,y1) (18)
α(x,y)＝cot(G_{y}(x,y)/G_{x}(x,y)) (20)
wherein G is_{x}(x,y)，G_{x}(x, y), H (x, y) respectively represent the horizontal gradient magnitude, the vertical gradient magnitude and the pixel value at the point (x, y), and G (x, y) and α (x, y) are the gradient magnitude and direction of the point (x, y).
And then dividing 360 degrees into 8 parts, performing weight projection on the gradient direction of one point according to the gradient amplitude to obtain a gradient direction histogram, and performing normalization processing on the obtained histogram. Each interest point gets a feature of 9 x 8 x 2.
(2) An HOF (histogram of ordered Optical flow) feature. First, an optical flow vector of an image is calculated, and a corresponding formula is as follows:
I(x,y,t)＝I(x+dx,y+dy,t+dt) (21)
I_{x}u+I_{y}v+I_{t}＝0 (22)
wherein I (x, y, t) is the light intensity at the (x, y) th point of the tth frame, dx, dy, dt represents the time that a pixel has passed dt, and is moved by the distance of dx and dy to the same pixel of the next frame, u is dx/dt, v is dy/dt, respectively representing partial derivatives of the gray level of a certain pixel point in the image along the directions of x, y and t.
And (u, v) is obtained as an optical flow vector. And then dividing 360 degrees into 8 parts, performing weight projection on the gradient direction of the optical flow vector according to the gradient amplitude to obtain an optical flow direction histogram, and performing normalization processing on the obtained histogram. Each interest point gets a feature of 9 x 8 x 2.
(3) MBH (motion Boundary histories) feature. The MBH feature is to regard the optical flow maps in the x direction and the y direction as two grayscale images, and therefore, only the optical flow vectors used in the calculation of the HOF feature need to be divided into the x direction and the y direction, and then histograms of these grayscale images are extracted and normalized. Each interest point gets a feature of 9 x 8 x 2.
(4) The equivalent LBP (local Binary Pattern) feature. Usually, the LBP operator is defined as that in a window of 3 × 3, the central pixel of the window is used as a threshold, the gray values of 8 adjacent pixels are compared with the central pixel, if the values of the surrounding pixels are greater than the value of the central pixel, the position of the central pixel is marked as 1, otherwise, the position is 0, 8 points in the 3 × 3 neighborhood can generate 8bit binary numbers through comparison, the 8bit binary numbers are sequentially arranged to obtain the LBP value of the central pixel of the window, and the LBP value is used for reflecting the texture information of the region. The equivalent LBP reduces the improvement of the possible equivalent patterns of 256 values to the possibility of 58 values. These LBP values are normalized to a histogram to calculate the frequency of occurrence of each decimal LBP value and normalized. Each interest point gets a feature of 9 x 2 x 58.
63, the HOG, HOF, LBP and MBH characteristics are combined together, that is, each interest point obtains the characteristics with 1620 dimensions.
The specific process of the step 7 is as follows:
71, putting the characteristics of all videos into a bagofwords model, randomly generating 50 clustering center points by using a Kmeans algorithm, respectively calculating the distance between the characteristics of the spacetime interest points and each clustering center point, and finding the clustering center point closest to the characteristics of the spacetime interest points to be the category of the spacetime interest points.
72, repeatedly calculating the clustering center point of each category according to the classified data, and classifying according to the distance between the characteristics of the spatiotemporal interest points and each new clustering center point until the position of each clustering center point is not changed any more.
73, counting the number of the spacetime interest points in each category by using a histogram, carrying out homogenization treatment to obtain a final histogram of each video, and converting the final histogram into a feature vector, namely the final feature of each video is 50dimensional.
The specific process of step 8 is as follows:
81, splicing the feature vectors (588 dimensions) extracted from the electroencephalogram information and the feature vectors (50 dimensions) extracted from the video data together for feature fusion.
The specific process of step 9 is as follows:
91, generating new samples using a method of synthetic few sample oversampling (SMOTE), the formula is as follows:
wherein x_{new}Is a newly generated sample of the sample that is,is one sample out of a few samples, x_{n}Is a sample in a K neighbor of x.
92, in order to eliminate the effect of interpolating a few samples into a majority sample, the dots in the glued state are eliminated by using the TomekLinks method, which is defined as follows:
wherein d () denotes the distance between two samples, C_{1},C_{2}Two different categories are represented which are,belongs to the sample.
The final match between the onset of childhood epilepsy and the number of samples during the onset is achieved by both methods. The equalized data is used in the next classification training.
The specific process of step 10 is as follows:
101, putting the final characteristics into a machine learning algorithm for training, adopting a 10fold cross validation method to evaluate the performance of the model and obtaining the epileptic seizure detection result.
102, adopting Accuracy, Precision, Recall and F as network training effects_{1}. The calculation formula of each parameter is as follows:
Accuracy＝(TP+TN)/(TP+TN+FP+FN) (25)
Precision＝TP/(TP+FP) (26)
Recall＝TP/(TP+FN) (27)
F_{1}＝2*Precision*Recall/(Precision+Recall) (28)
wherein TP represents the positive sample predicted as the positive class by the model, TN represents the negative sample predicted as the negative class by the model, FP represents the negative sample predicted as the positive class by the model, and FN represents the positive sample predicted as the negative class by the model. Accuracy represents the Accuracy, representing the ratio of the number of samples correctly classified by the classifier to the total number of samples, the higher the Accuracy, the better the classifier. Precision represents the accuracy, representing the recognition rate of each class, for a single sample. Recall indicates Recall, the higher the number, the lower the missed diagnosis. F_{1}Represents the harmonic mean of precision and recall.
The invention also provides a synchronous video and electroencephalogrambased children motor epilepsy auxiliary analysis and detection system, which specifically comprises an electroencephalogram feature extraction and learning module, a video feature extraction and learning module and a feature fusion and seizure detection module;
the electroencephalogram feature extraction and learning module is specifically realized as follows: intercepting the segment of the attack interval and the attack period in the original electroencephalogram signal, dividing the segment into data frames of 6 seconds, filtering out the frequency band beyond 0.570 Hz by using a bandpass filter, eliminating the power frequency interference of 50Hz by using a notch filter, removing the artifact, and finally obtaining the electroencephalogram signal with less interference. And extracting MFCC (Mel frequency cepstrum coefficient) features and LPCC features of the preprocessed brain electrical signals, and visualizing the features in a histogram mode.
The video feature extraction and learning module is specifically implemented as follows: for video data in original data, video segments synchronous with the electroencephalogram signal are intercepted, each segment of video data is 6 seconds, the resolution of the video is adjusted to 416 × 416 pixels, the frame rate of the video is 20 frames per second, and the total frame number of one video is 120 frames. The processed video data is input into a neural network of YOLO for target detection, and the neural network identifies the position of the pediatric epileptic patient in the video and is marked by a red frame. And detecting the spacetime interest points of the processed video data to obtain the position of the spacetime interest points in each frame, comparing the position of the children epileptic patient, namely the red frame in the video, and screening out the spacetime interest points which are not in the red frame. And reserving 9 blocks taking each spacetime interest point as the center, forming a cube by the 9 blocks at the same position of two continuous frames, and extracting features of HOG, HOF, LBP and MBH of the cube. And putting the extracted features of the interest points into a word bag model, constructing a word bank of 50 words, classifying the features of each interest point, storing the features as a word frequency histogram, and finally obtaining 50dimensional video features.
The classification identification module is specifically realized as follows: and fusing the extracted electroencephalogram signal characteristics and video data characteristics. And carrying out SMOTE + Tomekliks data equalization processing on the fused features to enable the number of samples in the attack period to be equal to that of samples in the interattack period, so as to obtain the final children epilepsy features. And putting the final characteristics into a machine learning algorithm for training, evaluating the performance of the model by adopting a 10fold cross validation method, and obtaining an epileptic seizure detection result.
The method intercepts the interattack period and the interattack period data of the original electroencephalogram data and the video data, carries out a series of preprocessing such as filtering on the original electroencephalogram signals, and extracts the MFCC and LPCC characteristics of the electroencephalogram signals. Meanwhile, the position of the epileptic patient in the video is detected by using a YOLO target detection algorithm, spacetime interest points of the video are extracted, the spacetime interest points which are not in a prediction frame are screened out, 9 blocks with each spacetime interest point as the center are reserved, 9 blocks at the same position of two continuous frames form a cube, HOG, HOF, LBP and MBH characteristics of the cube are extracted, the extracted characteristics are put into a bagofwords model for classification, and a wordfrequency histogram of each video is obtained to serve as the final video characteristics. And then, fusing the characteristics of the electroencephalogram and the video, carrying out data equalization on the fused characteristics by using SMOTE + TomekLinks, putting the obtained characteristics into a machine learning algorithm for training, evaluating the performance of the model by adopting a 10fold cross validation method, and obtaining a seizure detection result. Research shows that under the condition that spacetime interest points are not screened, an SVM model trained by fused EEG features and video HOW features is used for epileptic seizure detection, and the test classification accuracy rate is close to 97.86%; after the spatiotemporal interest points are screened, the classification accuracy of a model test trained by utilizing the fusion characteristics is 98.34%. The accuracy of model test classification trained before and after the point of interest is screened by independently using the HOW features of the video is 86.3 percent and 92.3 percent respectively; the classification accuracy of the SVM model test trained by solely using the EEG characteristics, the HOW characteristics of the video after screening the interest points and the fusion characteristics of the EEG characteristics and the HOW characteristics is respectively as follows: 97.5%, 92.3% and 98.34%. The above shows that under the condition of fusing the EEG characteristics and the video HOW characteristics, the model test classification accuracy before and after the interest point screening is improved by 0.48%; the SVM model is trained by using the video features alone, and the accuracy of testing and classifying before and after the interest points are screened is improved by 6 percent; compared with the method which only uses the EEG characteristic and the video HOW characteristic after the point of interest is screened, the method has the advantages that the HOW characteristic is improved by 0.84% and 6.04%. These data indicate that the method can be used to effectively detect seizures in pediatric epileptic patients. The method can finish accurate identification of the detection of the epileptic seizure of the children, and can establish an accurate, reliable and efficient automatic epileptic seizure detection auxiliary system for the epileptic seizure of the children, especially for largescale offline data. This also helps to further reveal the regularity of seizures, thereby providing a more effective treatment for pediatric epileptic patients.
Claims (10)
1. The intelligent auxiliary analysis method for the children's motor epilepsy based on the synchronous video and the electroencephalogram is characterized by comprising the following steps:
step 1, intercepting fragments of an attack interval and an attack period of an original electroencephalogram signal in original data, dividing the fragments into data frames of 6 seconds, filtering frequency bands except for 0.570 Hz by using a bandpass filter, removing power frequency interference of 50Hz by using a notch filter, removing artifacts, and finally obtaining an electroencephalogram signal with less interference;
step 2, extracting MFCC characteristics and LPCC characteristics of the preprocessed brain electrical signals, and visualizing the MFCC characteristics and the LPCC characteristics in a histogram form;
step 3, intercepting video segments synchronous with the electroencephalogram signals from the video data in the original data, wherein each segment of video data is also 6 seconds, adjusting the resolution of the video to 416 × 416 pixels, and adjusting the frame rate of the video to 20 frames per second, namely, the total frame number of one video is 120 frames;
step 4, inputting the video data processed in the step 3 into a neural network of a YOLO (YOLO) for target detection, wherein the neural network identifies the position of the children epileptic in the video and uses a red frame for identification;
step 5, detecting spacetime interest points of the video data processed in the step 4 to obtain the position of the spacetime interest points in each frame, comparing the position of the epileptic patient of the child, namely a red frame in the video, and screening out the spacetime interest points which are not in the red frame;
step 6, reserving 9 blocks with each spacetime interest point as the center, forming a cube by the 9 blocks at the same position of two continuous frames, and extracting features of HOG, HOF, LBP and MBH of the cube;
step 7, putting the extracted characteristics of the spatiotemporal interest points into a word bag model, constructing a word bank of 50 words, classifying the characteristics of each spatiotemporal interest point, storing the characteristics as a word frequency histogram, and finally obtaining 50dimensional video characteristics;
step 8, fusing the extracted electroencephalogram signal characteristics and video data characteristics;
step 9, carrying out SMOTE + Tomekliks data equalization processing on the fused features to enable the number of samples in the attack period to be equal to that of samples in the attack period, and obtaining the final children epilepsy features;
and step 10, putting the final children epileptic features into a machine learning algorithm for training, evaluating the performance of the model by adopting 10fold cross validation, and obtaining an epileptic seizure detection result.
2. The intelligent auxiliary analysis method for children's motor epilepsy based on synchronous video and electroencephalogram as claimed in claim 1, wherein the specific process of step 2 is as follows:
21, segmenting the preprocessed electroencephalogram signals every 6s, performing preemphasis, framing and windowing on each segment of electroencephalogram signals, performing discrete Fourier transform to obtain frequency spectrums of each segment of electroencephalogram signals, performing square on the frequency spectrums to obtain power spectrums of each segment of electroencephalogram signals, processing the power spectrums through a Mel triangular filter bank, and performing discrete cosine transform to obtain Mel frequency cepstrum coefficient characteristics with the size of 21 x 12;
22, simultaneously, obtaining LPC coefficients by using an LPC function in Matlab for the preprocessed signals, and obtaining 21 x 16dimensional linear prediction cepstrum coefficient characteristics by calculating cepstrum.
3. The intelligent auxiliary analysis method for children's motor epilepsy based on synchronous video and electroencephalogram according to claim 1 or 2, characterized in that the specific process of step 3 is as follows:
31, intercepting interval of onset and segment of onset synchronous with the brain electrical signal in the video data, wherein each segment of video data is also 6 seconds, setting the resolution of the video to be 416 × 416 pixels, and setting the frame rate of the video to be 20 frames per second, namely, the total frame number of one 6second video to be 120 frames.
4. The intelligent auxiliary analysis method for children's motor epilepsy based on synchronous video and electroencephalogram as claimed in claim 3, characterized in that the specific process of step 4 is as follows:
41, inputting the processed video data into a backbone network for feature extraction, wherein the used backbone network is Darknet53, the size of an input matrix is 416 × 3, and the total number of the input matrices is 2922 (the number of total videos) 120 (the number of frames of each video); the neural network of Darknet53 extracts features through 53 convolution and residual convolution layers, and a residual block is used for relieving the gradient disappearance problem caused by increasing the depth in the deep neural network; in the backbone network, the residual convolution is firstly carried out convolution with convolution kernel of 3 × 3 and step length of 2, the convolution can compress the width and height of the input characteristic layer to obtain a new characteristic layer, which is called layer, then 1 × 1 convolution and 3 × 3 convolution are carried out again, and the layer itself is added with the result; for the backbone network, each convolution part is convoluted by using an nn.Conv2d function, nn.BatchNorm2d is used for batch standardization, and an activation function is realized by using an nn.LeakyReLU function;
42, dividing the process of obtaining a prediction result from the features into two parts, namely constructing an FPN feature pyramid for reinforced feature extraction and predicting three effective feature layers by using a YOLO HEAD;
43, decoding the obtained three prediction results, reading the positions of the prediction frames from the numerical values of the stored frame coordinates, and performing score sorting and nonmaximum inhibition screening on the frame confidence degrees corresponding to the final multiple prediction results to obtain the final positions of the prediction frames;
and 44, obtaining the final position of the prediction frame and then drawing the final position on an original image, so that the position of the children epileptic patient in the video can be detected.
5. The intelligent auxiliary analysis method for children's motor epilepsy based on synchronous video and electroencephalogram as claimed in claim 4, characterized in that:
the step of constructing the FPN characteristic pyramid for extracting the reinforced characteristic in the step 42 comprises the following steps:
421, in the feature extraction part, a total of three feature layers extracted by YOLO are respectively positioned at the middle layer, the middle lower layer and the bottom layer of a trunk part Darknet53 for target detection, and the shape of the three feature layers is (52, 256), (26, 26, 512), (13, 13, 1024);
422. construction of FPN layer using three valid feature layers, performing 5 times convolution processing on 13 × 1024 feature layers for YOLO HEAD to obtain prediction result, and combining this part with 26 × 512 feature layers by performing upsampling using nn.upsample function, where the combined feature layer is 26 × 768;
423, continuing to perform convolution for 5 times on the combined feature layers in the same way, and respectively using the convolution for the YOLO HEAD to obtain a prediction result and performing upsampling to combine the feature layers with the feature layer of a higher layer, namely 52 × 512 feature layer, wherein the combined feature layer is 52 × 384;
424, finally, performing convolution processing on the combined characteristic layers for 5 times and using the combined characteristic layers for a YOLO HEAD to obtain a prediction result, and obtaining three characteristic layers in total and using the three characteristic layers for the YOLO HEAD to obtain the prediction result;
the step of predicting three effective feature layers by using the YOLO HEAD in the step 42 is as follows:
firstly, performing one convolution of 3 x 3 on three enhanced feature layers for feature integration, then performing one convolution of 1 x 1 for adjusting the number of channels, and finally obtaining three prediction results of an output layer; based on that the VOC data set has 20 types, plus 4 values for storing frame coordinates and one value for storing frame confidence, and there are 25 total values, and there are 3 prior frames for each feature point of each feature layer by YOLO, the channel number of the prediction result is 25 × 3 — 75; in practical cases, the input data is N pictures 416 × 416, so the shape of the three prediction results is (N,13,13,75), (N,26, 26, 75), (N,52, 52, 75), and corresponds to the position of 3 prior frames on the grid of 13 × 13, 26 × 26, 52 × 52 for each picture.
6. The intelligent auxiliary analysis method for children's motor epilepsy based on synchronized video and electroencephalogram as claimed in claim 4 or 5, wherein the loss function used in training comprises target localization offset loss L_{loc}Target confidence loss L_{cof}And target classification penalty L_{cla}:
Loss＝L_{loc}+L_{conf}+L_{cla} (9)
(1) In order to predict the probability of the target existing in the target rectangular frame, the target confidence loss adopts binary cross entropy loss:
where S denotes the grid position, a total of S x S grid divisions, here 13, 26, 52, respectively, B denotes the prediction box generated by each grid, here 3,the probability score of the target contained in the jth prediction frame of the ith grid is shown, 0 represents the absence, and 1 represents the presence;representing its corresponding true value, λ_{noobj}Taking here the value of 0.5,whether the jth prediction box of the ith grid is responsible for the target or not is shown, if so, the value is 1, and the value is not responsible for 0;
(2) target class penalty L_{cla}(O, C) also used is a binary cross entropy loss:
wherein the content of the first and second substances,representing the probability of the jth prediction box on the ith mesh belonging to class C,the real value of the category of the mark box is represented, if the real value of the category belongs to the category C, the value is 1, and the rest is 0;
(3) loss of target location L_{loc}(l, g) the loss function used is the true deviation value and the predicted deviationSum of squares of value differences:
7. The intelligent auxiliary analysis method for children's motor epilepsy based on synchronous video and electroencephalogram as claimed in claim 6, characterized in that the specific process of step 5 is as follows:
51, detecting spacetime interest points of the processed video data; firstly, carrying out scale transformation on the video sequence, and converting the video into linear scale space representation through convolution operation of a video sequence f and a Gaussian kernel:
L(·:σ_{l} ^{2},τ_{l} ^{2})＝g(·:σ_{l} ^{2},τ_{l} ^{2})*f(·) (18)
where f () denotes a video sequence, g () sigma_{l} ^{2},τ_{l} ^{2}) Representing the Gaussian kernel, σ_{l} ^{2}Variance representing spatial domain，τ_{l} ^{2}Represents the variance of the time domain;
52, after the scale transformation, averaging the firstorder spatial and time derivatives by a Gaussian weighting function to obtain a spacetime secondorder matrix mu of 3 x 3, wherein the spacetime secondorder matrix mu is as follows:
wherein the temporal and spatial dimensions σ of the spatiotemporal second order matrix_{i} ^{2},τ_{i} ^{2}From the local time scale and spatial scale sigma_{l} ^{2},τ_{l} ^{2}(ii) related; g is g (.: sigma)_{i} ^{2},τ_{i} ^{2}) F is f (·);
53, calculating three eigenvalues lambda of spacetime second order matrix_{1},λ_{2},λ_{3}Then the expression form of the Harris corner function in the timespace domain is as follows:
H＝det(μ)k·trace^{3}(μ)＝λ_{1}λ_{2}λ_{3}k(λ_{1}+λ_{2}+λ_{3})^{3} (16)
wherein k is 0.001, σ_{l} ^{2}Is 4, tau_{l} ^{2}Is 2, σ_{i} ^{2}Is 8, τ_{i} ^{2}Is 4;
54, calculating a positive maximum value of the function H to obtain the positions of the spatiotemporal interest points in each frame, so that the number of the spatiotemporal interest points in the attack period of the pediatric epileptic patient is found to be remarkably increased compared with the attack interval;
and 55, comparing the positions of the children epileptic patients, namely a prediction box obtained by a YOLO target detection algorithm in the video, and screening out the spacetime interest points which are not in the prediction box.
8. The intelligent auxiliary analysis method for children's motor epilepsy based on synchronous video and electroencephalogram as claimed in claim 7, characterized in that the specific process of step 6 is as follows:
61, reserving 9 blocks taking each spacetime interest point as a center, and forming 9 blocks at the same position of two continuous frames into a cube for extracting features;
62, extracting features of HOG, HOF, LBP and MBH for each cube;
63, the HOG, HOF, LBP and MBH characteristics are combined together, that is, each interest point obtains the characteristics with 1620 dimensions.
9. The intelligent auxiliary analysis method for children's motor epilepsy based on synchronous video and electroencephalogram according to claim 7 or 8, characterized in that the specific process of step 7 is as follows:
71, putting the characteristics of all videos into a bagofwords model, randomly generating 50 clustering center points by using a Kmeans algorithm, respectively calculating the distance between the characteristics of the spacetime interest points and each clustering center point, and finding the clustering center point closest to the characteristics of the spacetime interest points to be the category of the spacetime interest points;
72, repeatedly calculating the clustering center point of each category according to the classified data, and classifying according to the distance between the characteristics of the spatiotemporal interest points and each new clustering center point until the position of each clustering center point is not changed any more;
73, counting the number of the spacetime interest points in each category by using a histogram, carrying out homogenization treatment to obtain a final histogram of each video, and converting the final histogram into a feature vector, namely the final feature of each video is 50dimensional.
10. The intelligent auxiliary analysis method for children's motor epilepsy based on synchronous video and electroencephalogram according to claim 9, characterized in that the specific process of step 9 is as follows:
91, generating new samples using a method of synthesizing a few sample oversampling technique, the formula is as follows:
wherein x_{new}Is a newly generated sample of the sample that is,is one sample out of a few samples, x_{n}Is a sample in a K neighbor of x;
92, in order to eliminate the effect of interpolating a few samples into a majority sample, the dots in the glued state are eliminated by using the TomekLinks method, which is defined as follows:
Priority Applications (1)
Application Number  Priority Date  Filing Date  Title 

CN202111216851.6A CN114093501A (en)  20211019  20211019  Intelligent auxiliary analysis method for children's motor epilepsy based on synchronous video and electroencephalogram 
Applications Claiming Priority (1)
Application Number  Priority Date  Filing Date  Title 

CN202111216851.6A CN114093501A (en)  20211019  20211019  Intelligent auxiliary analysis method for children's motor epilepsy based on synchronous video and electroencephalogram 
Publications (1)
Publication Number  Publication Date 

CN114093501A true CN114093501A (en)  20220225 
Family
ID=80297195
Family Applications (1)
Application Number  Title  Priority Date  Filing Date 

CN202111216851.6A Pending CN114093501A (en)  20211019  20211019  Intelligent auxiliary analysis method for children's motor epilepsy based on synchronous video and electroencephalogram 
Country Status (1)
Country  Link 

CN (1)  CN114093501A (en) 
Cited By (4)
Publication number  Priority date  Publication date  Assignee  Title 

CN114532993A (en) *  20220323  20220527  电子科技大学  Automatic detection method for electroencephalogram highfrequency oscillation signals of epileptic 
CN115778330A (en) *  20230207  20230314  之江实验室  Automatic epileptic seizure detection system and device based on video electroencephalogram 
CN115804572A (en) *  20230207  20230317  之江实验室  Automatic monitoring system and device for epileptic seizure 
CN115844336A (en) *  20230207  20230328  之江实验室  Automatic realtime monitoring system and device for epileptic seizure 

2021
 20211019 CN CN202111216851.6A patent/CN114093501A/en active Pending
Cited By (5)
Publication number  Priority date  Publication date  Assignee  Title 

CN114532993A (en) *  20220323  20220527  电子科技大学  Automatic detection method for electroencephalogram highfrequency oscillation signals of epileptic 
CN115778330A (en) *  20230207  20230314  之江实验室  Automatic epileptic seizure detection system and device based on video electroencephalogram 
CN115804572A (en) *  20230207  20230317  之江实验室  Automatic monitoring system and device for epileptic seizure 
CN115844336A (en) *  20230207  20230328  之江实验室  Automatic realtime monitoring system and device for epileptic seizure 
CN115804572B (en) *  20230207  20230526  之江实验室  Automatic epileptic seizure monitoring system and device 
Similar Documents
Publication  Publication Date  Title 

CN114093501A (en)  Intelligent auxiliary analysis method for children's motor epilepsy based on synchronous video and electroencephalogram  
CN109685813B (en)  Ushaped retinal vessel segmentation method capable of adapting to scale information  
CN111340142B (en)  Epilepsia magnetoencephalogram spike automatic detection method and tracing positioning system  
CN110353673B (en)  Electroencephalogram channel selection method based on standard mutual information  
CN109886986A (en)  A kind of skin lens image dividing method based on multiplelimb convolutional neural networks  
CN109389585B (en)  Brain tissue extraction method based on full convolution neural network  
CN110503630B (en)  Cerebral hemorrhage classifying, positioning and predicting method based on threedimensional deep learning model  
CN112587153B (en)  Endtoend noncontact atrial fibrillation automatic detection system and method based on vPPG signal  
CN110600053A (en)  Cerebral stroke dysarthria risk prediction method based on ResNet and LSTM network  
CN112674782B (en)  Device and method for detecting epilepticlike electrical activity of epileptic during interseizure period  
CN111091074A (en)  Motor imagery electroencephalogram signal classification method based on optimal region common space mode  
Wang et al.  A novel multiscale dilated 3D CNN for epileptic seizure prediction  
CN1092372C (en)  Iris recoganizing method  
CN111920420A (en)  Patient behavior multimodal analysis and prediction system based on statistical learning  
CN111754485A (en)  Artificial intelligence ultrasonic auxiliary system for liver  
CN114881105A (en)  Sleep staging method and system based on transformer model and contrast learning  
Karami et al.  A dictionary learning based method for detection of diabetic retinopathy in color fundus images  
CN112884788B (en)  Cup optic disk segmentation method and imaging method based on rich context network  
CN110443276A (en)  Time series classification method based on depth convolutional network Yu the map analysis of gray scale recurrence  
CN113768519A (en)  Method for analyzing consciousness level of patient based on deep learning and resting state electroencephalogram data  
Wan et al.  Segment alignment based crosssubject motor imagery classification under fading data  
CN112767374A (en)  Alzheimer disease focus region semantic segmentation algorithm based on MRI  
CN116091489B (en)  Electrocardiogram image recognition method and system based on deep learning  
CN108629302A (en)  It is a kind of to use eye Activity recognition method based on convolutional neural networks  
CN114862865B (en)  Vessel segmentation method and system based on multiview coronary angiography sequence image 
Legal Events
Date  Code  Title  Description 

PB01  Publication  
PB01  Publication  
SE01  Entry into force of request for substantive examination  
SE01  Entry into force of request for substantive examination 