CN116580832A - Auxiliary diagnosis system and method for senile dementia based on video data - Google Patents
Auxiliary diagnosis system and method for senile dementia based on video data Download PDFInfo
- Publication number
- CN116580832A CN116580832A CN202310497550.8A CN202310497550A CN116580832A CN 116580832 A CN116580832 A CN 116580832A CN 202310497550 A CN202310497550 A CN 202310497550A CN 116580832 A CN116580832 A CN 116580832A
- Authority
- CN
- China
- Prior art keywords
- training
- data
- video data
- network
- sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 49
- 238000003745 diagnosis Methods 0.000 title claims abstract description 48
- 208000024827 Alzheimer disease Diseases 0.000 title claims abstract description 36
- 206010039966 Senile dementia Diseases 0.000 title claims abstract description 36
- 238000012549 training Methods 0.000 claims description 82
- 241001122767 Theaceae Species 0.000 claims description 46
- 230000006399 behavior Effects 0.000 claims description 39
- 230000003993 interaction Effects 0.000 claims description 33
- 230000008569 process Effects 0.000 claims description 24
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims description 18
- 238000007781 pre-processing Methods 0.000 claims description 16
- 238000013528 artificial neural network Methods 0.000 claims description 13
- 230000002452 interceptive effect Effects 0.000 claims description 12
- 230000001537 neural effect Effects 0.000 claims description 12
- 230000003068 static effect Effects 0.000 claims description 11
- 238000012360 testing method Methods 0.000 claims description 10
- 238000009835 boiling Methods 0.000 claims description 7
- 230000036541 health Effects 0.000 claims description 7
- 238000007599 discharging Methods 0.000 claims description 6
- 238000002372 labelling Methods 0.000 claims description 6
- 238000012216 screening Methods 0.000 claims description 6
- 208000010877 cognitive disease Diseases 0.000 claims description 5
- 238000005516 engineering process Methods 0.000 claims description 5
- 201000010099 disease Diseases 0.000 claims description 4
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims description 4
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 3
- 101100372806 Caenorhabditis elegans vit-3 gene Proteins 0.000 claims description 3
- 230000008859 change Effects 0.000 claims description 3
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000013527 convolutional neural network Methods 0.000 claims description 3
- 230000008030 elimination Effects 0.000 claims description 3
- 238000003379 elimination reaction Methods 0.000 claims description 3
- 230000006870 function Effects 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 238000003062 neural network model Methods 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 3
- 230000000717 retained effect Effects 0.000 claims description 3
- 238000012795 verification Methods 0.000 claims description 3
- 230000000007 visual effect Effects 0.000 claims description 3
- 238000004891 communication Methods 0.000 claims description 2
- 238000005070 sampling Methods 0.000 claims description 2
- 238000002405 diagnostic procedure Methods 0.000 claims 1
- 230000001149 cognitive effect Effects 0.000 abstract description 5
- 230000005856 abnormality Effects 0.000 abstract description 4
- 230000009286 beneficial effect Effects 0.000 abstract description 4
- 239000000523 sample Substances 0.000 description 35
- 206010012289 Dementia Diseases 0.000 description 6
- 238000013473 artificial intelligence Methods 0.000 description 4
- 238000011156 evaluation Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000005012 migration Effects 0.000 description 2
- 238000013508 migration Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 235000016795 Cola Nutrition 0.000 description 1
- 244000228088 Cola acuminata Species 0.000 description 1
- 235000011824 Cola pachycarpa Nutrition 0.000 description 1
- 241001180747 Hottea Species 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 239000000090 biomarker Substances 0.000 description 1
- 230000019771 cognition Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 229910052500 inorganic mineral Inorganic materials 0.000 description 1
- 230000003340 mental effect Effects 0.000 description 1
- 239000011707 mineral Substances 0.000 description 1
- 238000002610 neuroimaging Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000005180 public health Effects 0.000 description 1
- 208000024891 symptom Diseases 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Medical Informatics (AREA)
- Public Health (AREA)
- Epidemiology (AREA)
- Pathology (AREA)
- Primary Health Care (AREA)
- Databases & Information Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses an auxiliary diagnosis system and method for senile dementia based on video data, and particularly relates to the field of video analysis. The invention is a novel intelligent auxiliary diagnosis and early warning means for senile dementia, has high diagnosis accuracy, reduces the workload of doctors, and is beneficial to early cognitive abnormality early warning and diagnosis of vast users in communities or at home.
Description
Technical Field
The invention relates to the field of video analysis, in particular to an auxiliary diagnosis system and method for senile dementia based on video data.
Background
As society ages, the prevalence of dementia increases. According to the latest global dementia prevalence report issued by the world health organization, 5000 or more tens of thousands of people have dementia in 2020, this figure doubles every 20 years. China is the country with the largest number of dementia patients, and medical treatment, care and management of dementia elderly people have become the national important public health problem.
At present, the technical means for diagnosing early senile dementia mainly adopts cognitive tests and clinical questionnaires to comprehensively evaluate according to clinical symptoms. However, the mental scale, which is the main diagnostic standard, has technical defects such as being influenced by subjective psychological factors of patients. In addition to clinical cognition, psychological and other questionnaires, biomarker detection and neuroimaging techniques are also increasingly used in clinical diagnostic practice. However, such technology detection relies on specialized equipment and medical personnel and can only be used at specific locations.
Along with the development of medical big data accumulation and artificial intelligence technology, the application of artificial intelligence in the medical field has been advanced to a certain extent, and intelligent auxiliary diagnosis and treatment is one of the most important and core application scenes of artificial intelligence in the medical field, and through a large amount of image data and diagnostic data, the artificial intelligence continuously carries out deep learning training on a neural network, promotes the neural network to master the diagnostic capability, and has great help to reduce the medical burden. The invention provides an auxiliary diagnosis system and an auxiliary diagnosis method for senile dementia based on video data.
Disclosure of Invention
Therefore, the auxiliary diagnosis system and the auxiliary diagnosis method for the senile dementia based on the video data are high in diagnosis accuracy, reduce the workload of doctors, and are beneficial to early-stage cognitive abnormality early warning and diagnosis of vast users in communities or at home.
In order to achieve the above object, the present invention provides the following technical solutions: the senile dementia auxiliary diagnosis system based on the video data comprises a terminal, a network, a server and a database;
the terminal is in communication connection with the server through a network, and is used for collecting videos of the operation process of the tea making task of the subject, uploading the videos to the server through the network, carrying out person interactive identification and health state diagnosis of senile dementia on images to be detected by the server, and storing related data and original video data information into a database;
the server comprises a sample acquisition module and a model training generation module;
the sample acquisition module is used for acquiring original video data, converting the data and preprocessing the data, and generating sample data and text labels for training the module;
the model training generation module is used for inputting training sample data and text labels into the neural training network for training.
Further, the sample acquisition module comprises an original video data acquisition unit, a video data key frame interception unit, an image sample generation unit and an image sample preprocessing unit;
the original video data acquisition unit is an electronic device with a camera shooting function;
the video data key frame intercepting unit is used for reading an original video data stream, intercepting video frames at equal intervals according to video fps and intercepting time intervals t and storing the video frames as static images, wherein the images are png or jpg;
the image sample generation unit performs manual screening and labeling on the static original image stored by the video data key frame interception unit through interactive labeling, the manual screening performs random sampling on images of key character interaction behaviors in the process of the tea making operation of a subject, and the key character interaction behaviors comprise turning on a power switch, boiling water, discharging tea, pouring hot water and pouring tea;
the image sample preprocessing unit adjusts the sizes of all samples to be consistent after the data are loaded smoothly, converts the data into a Tensor form required by a neural training network, divides a data set into a training set and a test set according to a set proportion, and meanwhile reduces the influence caused by insufficient data quantity by using a data enhancement technology, so that the robustness of the model is improved.
Further, the model training generation module comprises a character interaction behavior recognition model training module and a character interaction behavior recognition model prediction module;
the character interaction behavior recognition model training module adopts a CLIP model trained on a large number of Image title pairs, acquires the characteristics of images and texts through an Image-Encoder and a Text-Encoder respectively, calculates the similarity between texts and images in a batch through dot products to obtain a batch size x batch size similarity matrix, wherein the similarity value on a diagonal is the similarity value of a positive sample, so that the optimization target in the training process is that the similarity value of the positive sample is as large as possible; the model architecture is divided into two parts, an image encoder and a text encoder, the image encoder considers 2 different architectures, namely a residual network ResNet or a visual transducer, the residual network adopts a model which is ResNet-50, resNet-100 and is obtained by respectively 4 times, 16 times and 64 times of the ResNet-50 according to the idea of Efficient Net: resNet-50x4, resNet-50x16, resNet-50x64.ViT 3 pre-trained models of ViT-B/32, viT-B/16 and ViT-L/14 were used; the text encoder uses a transform encoder with depth of 12 and width of 512, with eight attention headers, the weights of which are derived from the pre-trained CLIP text encoder;
when the character interaction behavior recognition model prediction module performs reasoning of character interaction behavior classification tasks on input pictures, the model firstly needs to convert category labels into sentences which are the same as those in pre-training, namely, sentences corresponding to categories are obtained through a prompt operation, finally, the similarity of the input pictures and sentences corresponding to each category is calculated, and the category corresponding to the sentence with the highest similarity is the predicted category;
the category label includes: turning on a power switch, boiling water, discharging tea, pouring hot water and pouring tea; the template of Prompt uses A photo of a mask, replacing the mask according to class labels.
Further, the neural training network training method comprises the following steps:
s210: preprocessing a character interaction behavior information sequence:
s220: carrying out characteristic vectorization on the preprocessing result of the character interaction behavior information sequence;
s230: inputting the feature vector sample into a neural network for training;
s240: and (5) saving the trained network model, and carrying out reasoning prediction on cognitive dysfunction.
Further, S210 performs a deduplication operation on the sequence, and only retains a part of key data with two adjacent data being changed before and after, that is, retains character interaction behavior change information in the process of the operation of the subject tea making task, and a judgment algorithm of whether the sequence elements are retained is as follows:
s220, carrying out special vectorization conversion on a sequence of key operation behaviors in the operation process of preserving the subject tea making task after the duplication elimination, and generating a sample meeting the requirement of a neural network input data format;
s230, training and verification of a neural network model are completed, the adopted network model is an MLP, a 1DCNN, RNN, LSTM, GRU, CNN + LSTM, textCNN, BILSTM, attention, multiHeadAttention, attention +BiLSTM, a BiGRU+ Attention, transformer, positionalEmbedding +Transformer model, an optimizer adopts Adam, the data batch length is 64, the training batch is 10, the initial learning rate is set to be 0.001, and the dividing ratio of the training set to the testing set is 9:1.
The invention also comprises a diagnosis method of the senile dementia auxiliary diagnosis system based on the video data, which comprises the following specific steps:
step S110: collecting video of the operation process of the tea making task of the subject through the terminal;
step S120: uploading video data of the tea making operation of the subject to a server through a network;
step S130: the server converts the video data frames of the tea making operation of the subject into static pictures;
step S140: the server preprocesses the converted static picture to generate sample data and text labels for training of the module, and then inputs the training sample data and the text labels into a neural training network for training;
step S150: the server performs interactive character recognition and diagnosis on the images to be detected and the disease health state of the senile dementia;
step S160: the server stores the relevant data and the original video data information in a database.
The invention has the following advantages:
1. the invention is a novel intelligent auxiliary diagnosis and early warning means for senile dementia, has high diagnosis accuracy, reduces the workload of doctors, and is beneficial to early cognitive abnormality early warning and diagnosis of vast users in communities or at home. The invention adopts the pre-trained CLIP model to have strong migration learning capability on a downstream small data set, thereby greatly reducing training cost;
2. the tea making task adopted by the invention is derived from daily life, is easy to understand and accept by the elderly, is simple and convenient to operate, avoids long-time tedious evaluation, has objective evaluation results, and avoids result differences caused by main body differences of the evaluator.
Drawings
FIG. 1 is a view of the application of the auxiliary diagnosis system and method for senile dementia based on video data provided by the invention;
fig. 2 is a diagram of an auxiliary diagnosis system for senile dementia based on video data provided by the invention;
FIG. 3 is a flowchart of an auxiliary diagnosis method for senile dementia based on video data provided by the invention;
FIG. 4 is a flowchart of a training and predicting method for identifying a neural network for human interactive behavior provided by the invention;
FIG. 5 is a flow chart of a method for training and predicting a cognitive dysfunction neural network provided by the present invention;
in the figure: 101 terminals, 102 networks, 103 servers and 104 databases;
the system comprises a 310 sample acquisition module, a 311 original video data acquisition unit, a 312 video data key frame interception unit, a 313 image sample generation unit and a 314 image sample preprocessing unit;
320 model training generation module, 321 human interaction behavior recognition model training module, 322 human interaction behavior recognition model prediction module.
Detailed Description
Other advantages and advantages of the present invention will become apparent to those skilled in the art from the following detailed description, which, by way of illustration, is to be read in connection with certain specific embodiments, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1 of the specification, an embodiment of the invention provides an application scene diagram of an senile dementia auxiliary diagnosis system and a senile dementia auxiliary diagnosis method based on video data. In the scene, the interaction behavior of the subject and the tea making related objects is shot through the camera. By the character interaction behavior recognition method provided by the embodiment of the invention, the operation behavior information of the subject, the related information of the tea making related object and the interaction action information of the subject and the object can be detected. The objects of the tea making scene can comprise kettles, teapots, teacups, tea cans filled with tea, mineral water, colas, sockets with indicator lamps, tables, chairs and the like. The staff orders the elderly to select proper articles to complete the tea making task. The specific implementation steps are as follows: 1. staff orders the subject to sit down and inform: on the table top in front of you, tools required for making tea in daily life are placed, and you can perform actual operation according to the operation mode considered to be correct by you, and finally a cup of hot tea is to be made; 2. when the test is ready, the process of making tea is recorded after the test is shouted to the beginning, the recorded content comprises all operation processes of a subject, and the recording is stopped after the test is shouted to the end, and no prompt is given in the process.
Referring to fig. 2 of the specification, an embodiment of the present invention provides an auxiliary diagnosis system diagram for senile dementia based on video data, which includes a terminal 101, a network 102, a server 103 and a database 104;
the terminal 101 and the server 103 are connected to each other by a network 102. The terminal 101 may be various forms of image capturing devices, such as a video camera, a still camera, a mobile phone, etc. The server 103 may be an independent server, deployed with the senile dementia auxiliary diagnosis system platform provided by the invention, or may be a server group formed by a plurality of servers, where each server is deployed with a module of the senile dementia auxiliary diagnosis system and the method thereof provided by the invention. Of course, the server 103 may also be a cloud server, and the senile dementia auxiliary diagnosis system platform provided by the invention is deployed on the cloud server. The terminal 101 collects the video of the operation process of the tea making task of the subject, uploads the video to the server 103 through the network 102, the server 103 performs interactive character recognition and diagnosis of the disease health state of the senile dementia on the image to be detected, and information such as related data, original video data and the like is stored in the database 104.
Referring to fig. 3 of the specification, an embodiment of the present invention provides a flowchart of an auxiliary diagnosis method for senile dementia based on video data, including:
step S110: collecting video of the operation process of the tea making task of the subject through the terminal 101;
step S120: uploading the subject tea making operation video data to the server 103 through the network 102;
step S130: the server 103 converts the video data frame into a still picture for the subject tea making operation;
step S140: the server 103 preprocesses the converted static picture to generate sample data and text labels for module training, and then inputs the training sample data and the text labels into a neural training network for training;
step S150: the server 103 performs interactive character recognition and diagnosis on the images to be detected and the disease health state of the senile dementia;
step S160: the server 103 saves the relevant data and the original video data information to the database 104.
Referring to fig. 4 of the drawings, an embodiment of the present invention provides a flowchart of a neural network training and predicting method for an auxiliary diagnosis system for senile dementia based on video data, where the neural network training method includes a sample acquisition module 310 and a model training generation module 320;
the sample acquiring module 310 is used for acquiring original video data, converting the data, preprocessing the data, and generating sample data and text labels for training the module;
the model training generation module 320 is configured to input training sample data and text labels into a neural training network for training.
On the basis of the above embodiment, the sample acquisition module 310 includes an original video data acquisition unit 311, a video data key frame capture unit 312, an image sample generation unit 313, and an image sample preprocessing unit 314;
the original video data acquisition unit 311 is an electronic device with a camera function, such as a video camera, a mobile phone, or a camera; the video acquisition is carried out on the operation process of the tea making of the subject under the specific light source condition as far as possible in the acquisition scene, so that the shooting environment is stable, the acquisition pixels and the color degree are high, and the accuracy of the subsequent analysis is improved;
the video data key frame intercepting unit 312 is used for reading an original video data stream, intercepting video frames according to video fps and intercepting time interval t at equal intervals and storing the video frames as static images, wherein the images are png or jpg;
the image sample generation unit 313 performs manual screening and labeling on the static original image stored in the video data key frame capturing unit 312 through interactive labeling, wherein the manual screening is to randomly sample images of key character interaction behaviors in the process of the operation of making tea of a subject, and the key character interaction behaviors comprise turning on a power switch, boiling water, placing tea leaves, pouring hot water, pouring tea water and the like;
the image sample preprocessing unit 314 adjusts the sizes of all samples to be consistent after the data is loaded smoothly, converts the data into a Tensor form required by a neural training network, divides the data set into a training set and a test set according to a set proportion, and meanwhile, reduces the influence caused by insufficient data quantity by using a data enhancement technology, improves the robustness of the model, and the available data enhancement means comprise: dimensional changes, pixel value changes, viewing angle changes, and other changes, i.e., rotation, saturation and brightness adjustment, flipping, center clipping, etc.
On the basis of the above embodiment, the model training generating module 320 includes a character interaction behavior recognition model training module 321 and a character interaction behavior recognition model prediction module 322;
the character interaction behavior recognition model training module 321 adopts a CLIP model trained on a large number of Image title pairs, acquires the characteristics of images and texts from collected Image sample-Text label pairs through an Image-encoding device and a Text-encoding device respectively, calculates the similarity between texts and images in a batch by dot product to obtain a batch size x batch size similarity matrix, and the similarity value on a diagonal is the similarity value of a positive sample, so that the optimization target in the training process is to make the similarity value of the positive sample as large as possible; the model architecture is divided into two parts, an image encoder and a text encoder, the image encoder considers 2 different architectures, namely a residual network ResNet or a visual transducer, the residual network adopts a model which is ResNet-50, resNet-100 and is obtained by respectively 4 times, 16 times and 64 times of the ResNet-50 according to the idea of Efficient Net: resNet-50x4, resNet-50x16, resNet-50x64.ViT 3 pre-trained models of ViT-B/32, viT-B/16 and ViT-L/14 were used; the text encoder uses a transform encoder with depth of 12 and width of 512, with eight attention headers, the weights of which are derived from the pre-trained CLIP text encoder.
When the human interactive behavior recognition model prediction module 322 performs reasoning of the human interactive behavior classification task on the input picture, the model firstly needs to convert the class label into sentences the same as those in the pre-training process, namely, the sentences corresponding to the classes are obtained through the prompt operation, finally, the similarity of the input picture and the sentences corresponding to each class is calculated, and the class corresponding to the sentence with the highest similarity is the predicted class.
The category label includes: turning on a power switch, boiling water, discharging tea, pouring hot water, pouring tea water and the like; the template of Prompt uses A photo of a mask, replacing the mask according to class labels.
Optionally, the category label includes: turning on a power switch, boiling water, discharging tea, pouring hot water, pouring tea water and the like. The template of Prompt uses "Aphoto of a [ mask ]", replacing the mask according to category labels.
In this embodiment, the dementia recognition system may be integrated in an electronic device such as a computer, a mobile phone, a tablet computer, a four-diagnosis instrument of traditional Chinese medicine, and the like.
In this embodiment, model training generation module 320 may be pre-trained, trained alone, or combined to optimize the overall training process. A linear-probe fixed/frozen pre-training network may be used to extract features, then a trainable linear classifier is added to complete the training; finetune can also be used to fine tune the entire network so that all the learnable parameter weights in the network are updated.
Referring to fig. 5 of the specification, an embodiment of the present invention provides a flowchart of a training and predicting method for a cognitive dysfunction neural network of an auxiliary diagnosis system for senile dementia based on video data
On the basis of the above embodiment, the neural network training method further includes:
s210: preprocessing a character interaction behavior information sequence:
s220: carrying out characteristic vectorization on the preprocessing result of the character interaction behavior information sequence;
s230: inputting the feature vector sample into a neural network for training;
s240: and (5) saving the trained network model, and carrying out reasoning prediction on cognitive dysfunction.
Specifically, S210 performs a deduplication operation on the sequence, and only retains some key data with two adjacent data changed before and after, that is, retains character interaction behavior change information in the operation process of the subject tea making task, and a judgment algorithm of whether the sequence elements are retained is as follows:
s220, performing special vectorization conversion on the sequence of key operation behaviors in the operation process of preserving the subject tea making task after the duplication elimination, and generating a sample meeting the requirement of the input data format of the neural networkThe method comprises the steps of carrying out a first treatment on the surface of the For processing convenience, the sequence data can be converted into a length L seq Is a fixed length feature vector of (a). The tail part of the length is not enough to be filled with 0, and the tail part of the length exceeds the length and can be cut off.
S230, training and verification of a neural network model are completed, the adopted network model is an MLP model, a 1DCNN, RNN, LSTM, GRU, CNN + LSTM, textCNN, BILSTM, attention, multiHeadAttention, attention +BiLSTM model, a BiGRU+ Attention, transformer, positionalEmbedding +Transformer model and the like, an optimizer adopts Adam, the data batch length is 64, the training batch is 10, the initial learning rate is set to be 0.001, and the dividing ratio of the training set to the testing set is 9:1.
According to the auxiliary diagnosis system and the auxiliary diagnosis method for senile dementia based on video data, the adopted tea making task is derived from daily life, the senile people can easily understand and accept the task, the operation is simple and convenient, long-time tedious evaluation is avoided, the evaluation result is objective, and the result difference caused by the main body difference of an evaluator is avoided;
in addition, the invention has high diagnosis accuracy, reduces the workload of doctors, and is beneficial to early cognitive abnormality early warning and diagnosis of vast users in communities or at home. The invention adopts the pre-trained CLIP model to have strong migration learning capability on a downstream small data set, thereby greatly reducing training cost.
While the invention has been described in detail in the foregoing general description and specific examples, it will be apparent to those skilled in the art that modifications and improvements can be made thereto. Accordingly, such modifications or improvements may be made without departing from the spirit of the invention and are intended to be within the scope of the invention as claimed.
Claims (6)
1. The auxiliary diagnosis system for senile dementia based on video data is characterized in that:
comprises a terminal (101), a network (102), a server (103) and a database (104);
the terminal (101) is in communication connection with the server (103) through a network (102), the terminal (101) is used for collecting video of the operation process of the tea making task of the subject, then the video is uploaded to the server (103) through the network (102), the server (103) carries out character interactive identification and senile dementia illness health state diagnosis on images to be detected, and relevant data and original video data information are stored in the database (104);
the server (103) comprises a sample acquisition module (310) and a model training generation module (320);
the sample acquisition module (310) is used for acquiring original video data, converting the data and preprocessing the data, and generating sample data and text labels for training the module;
the model training generation module (320) is used for inputting training sample data and text labels into a neural training network for training.
2. The auxiliary diagnosis system for senile dementia based on video data according to claim 1, wherein: the sample acquisition module (310) comprises an original video data acquisition unit (311), a video data key frame interception unit (312), an image sample generation unit (313) and an image sample preprocessing unit (314);
the original video data acquisition unit (311) is an electronic device with a camera shooting function;
the video data key frame intercepting unit (312) is used for reading an original video data stream, intercepting video frames at equal intervals according to video fps and intercepting time interval t and storing the video frames as static images, wherein the images are png or jpg;
the image sample generation unit (313) performs manual screening and labeling on the static original image stored by the video data key frame interception unit (312) through interactive labeling, the manual screening performs random sampling on images of key character interaction behaviors in the process of the operation of making tea of a subject, and the key character interaction behaviors comprise turning on a power switch, boiling water, discharging tea, pouring hot water and pouring tea;
the image sample preprocessing unit (314) adjusts the sizes of all samples to be consistent after the data are loaded smoothly, converts the data into a Tensor form required by a neural training network, divides the data set into a training set and a testing set according to a set proportion, and meanwhile, reduces the influence caused by insufficient data quantity by using a data enhancement technology and improves the robustness of the model.
3. The auxiliary diagnosis system for senile dementia based on video data according to claim 1, wherein: the model training generation module (320) comprises a character interaction behavior recognition model training module (321) and a character interaction behavior recognition model prediction module (322);
the character interaction behavior recognition model training module (321) adopts a CLIP model trained on a large number of Image title pairs, acquires the characteristics of images and texts through an Image-Encoder and a Text-Encoder respectively, calculates the similarity between texts and images in a batch through dot products to obtain a batch size x batch size similarity matrix, and the similarity value on a diagonal is the similarity value of a positive sample, so that the optimization target in the training process is to make the similarity value of the positive sample as large as possible; the model architecture is divided into two parts, an image encoder and a text encoder, the image encoder considers 2 different architectures, namely a residual network ResNet or a visual transducer, the residual network adopts a model which is ResNet-50, resNet-100 and is obtained by respectively 4 times, 16 times and 64 times of the ResNet-50 according to the idea of Efficient Net: resNet-50x4, resNet-50x16, resNet-50x64; viT 3 pre-trained models of ViT-B/32, viT-B/16 and ViT-L/14 were used; the text encoder uses a transform encoder with depth of 12 and width of 512, with eight attention headers, the weights of which are derived from the pre-trained CLIP text encoder;
when the character interaction behavior recognition model prediction module (322) performs reasoning of character interaction behavior classification tasks on input pictures, the model firstly needs to convert category labels into sentences which are the same as those in pre-training, namely, sentences corresponding to categories are obtained through a prompt operation, finally, the similarity of the input pictures and sentences corresponding to each category is calculated, and the category corresponding to the sentence with the highest similarity is the predicted category;
the category label includes: turning on a power switch, boiling water, discharging tea, pouring hot water and pouring tea; the template of Prompt uses A photo of a mask, replacing the mask according to class labels.
4. The auxiliary diagnosis system for senile dementia based on video data according to claim 1, wherein: the neural training network training method comprises the following steps:
s210: preprocessing a character interaction behavior information sequence:
s220: carrying out characteristic vectorization on the preprocessing result of the character interaction behavior information sequence;
s230: inputting the feature vector sample into a neural network for training;
s240: and (5) saving the trained network model, and carrying out reasoning prediction on cognitive dysfunction.
5. The auxiliary diagnosis system for senile dementia based on video data according to claim 4, wherein: s210, performing de-duplication operation on the sequence, and only retaining part of key data with two adjacent data changed before and after, namely retaining character interaction behavior change information in the operation process of a subject tea making task, wherein a judgment algorithm for judging whether sequence elements are retained is as follows:
s220, carrying out special vectorization conversion on a sequence of key operation behaviors in the operation process of preserving the subject tea making task after the duplication elimination, and generating a sample meeting the requirement of a neural network input data format;
s230, training and verification of a neural network model are completed, the adopted network model is an MLP, a 1DCNN, RNN, LSTM, GRU, CNN + LSTM, textCNN, BILSTM, attention, multiHeadAttention, attention +BiLSTM, a BiGRU+ Attention, transformer, positionalEmbedding +Transformer model, an optimizer adopts Adam, the data batch length is 64, the training batch is 10, the initial learning rate is set to be 0.001, and the dividing ratio of the training set to the testing set is 9:1.
6. A diagnostic method of the auxiliary diagnostic system for senile dementia based on video data as claimed in any one of claims 1 to 5, characterized in that: the method comprises the following steps of
Step S110: collecting video of the operation process of the tea making task of the subject through a terminal (101);
step S120: uploading the subject tea making operation video data to a server (103) through a network (102);
step S130: the server (103) converts the video data frames into static pictures for the tea making operation of the subject;
step S140: the server (103) preprocesses the converted static picture to generate sample data and text labels for module training, and then inputs the training sample data and the text labels into a neural training network for training;
step S150: the server (103) performs interactive character recognition and diagnosis on the images to be detected and the disease health state of the senile dementia;
step S160: the server (103) stores the relevant data and the original video data information in a database (104).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310497550.8A CN116580832A (en) | 2023-05-05 | 2023-05-05 | Auxiliary diagnosis system and method for senile dementia based on video data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310497550.8A CN116580832A (en) | 2023-05-05 | 2023-05-05 | Auxiliary diagnosis system and method for senile dementia based on video data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116580832A true CN116580832A (en) | 2023-08-11 |
Family
ID=87543720
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310497550.8A Pending CN116580832A (en) | 2023-05-05 | 2023-05-05 | Auxiliary diagnosis system and method for senile dementia based on video data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116580832A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117649718A (en) * | 2024-01-29 | 2024-03-05 | 四川大学华西医院 | Intelligent arrival reporting method, device, apparatus and medium for hospital |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109473173A (en) * | 2018-09-30 | 2019-03-15 | 华中科技大学 | A kind of the elderly's Cognitive deficiency assessment system and device based on video |
CN110674773A (en) * | 2019-09-29 | 2020-01-10 | 燧人(上海)医疗科技有限公司 | Dementia recognition system, device and storage medium |
KR20200137161A (en) * | 2019-05-29 | 2020-12-09 | 주식회사 허그케어앤테라퓨틱스 | Method for conitive therapy based on artifical intelligence |
CN113239869A (en) * | 2021-05-31 | 2021-08-10 | 西安电子科技大学 | Two-stage behavior identification method and system based on key frame sequence and behavior information |
CN114388143A (en) * | 2021-12-27 | 2022-04-22 | 国家康复辅具研究中心附属康复医院 | Method and device for acquiring facial data of Alzheimer's disease based on game interaction |
CN114511043A (en) * | 2022-04-18 | 2022-05-17 | 苏州浪潮智能科技有限公司 | Image understanding method, device, equipment and medium |
-
2023
- 2023-05-05 CN CN202310497550.8A patent/CN116580832A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109473173A (en) * | 2018-09-30 | 2019-03-15 | 华中科技大学 | A kind of the elderly's Cognitive deficiency assessment system and device based on video |
KR20200137161A (en) * | 2019-05-29 | 2020-12-09 | 주식회사 허그케어앤테라퓨틱스 | Method for conitive therapy based on artifical intelligence |
CN110674773A (en) * | 2019-09-29 | 2020-01-10 | 燧人(上海)医疗科技有限公司 | Dementia recognition system, device and storage medium |
CN113239869A (en) * | 2021-05-31 | 2021-08-10 | 西安电子科技大学 | Two-stage behavior identification method and system based on key frame sequence and behavior information |
CN114388143A (en) * | 2021-12-27 | 2022-04-22 | 国家康复辅具研究中心附属康复医院 | Method and device for acquiring facial data of Alzheimer's disease based on game interaction |
CN114511043A (en) * | 2022-04-18 | 2022-05-17 | 苏州浪潮智能科技有限公司 | Image understanding method, device, equipment and medium |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117649718A (en) * | 2024-01-29 | 2024-03-05 | 四川大学华西医院 | Intelligent arrival reporting method, device, apparatus and medium for hospital |
CN117649718B (en) * | 2024-01-29 | 2024-04-23 | 四川大学华西医院 | Intelligent arrival reporting method, device, apparatus and medium for hospital |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Chen et al. | Chinesefoodnet: A large-scale image dataset for chinese food recognition | |
CN110210542B (en) | Picture character recognition model training method and device and character recognition system | |
Rajashekar et al. | GAFFE: A gaze-attentive fixation finding engine | |
CN107798653B (en) | Image processing method and device | |
US20140270431A1 (en) | Characterizing pathology images with statistical analysis of local neural network responses | |
WO2017036092A1 (en) | Super-resolution method and system, server, user equipment, and method therefor | |
CN108596046A (en) | A kind of cell detection method of counting and system based on deep learning | |
CN110135461B (en) | Hierarchical attention perception depth measurement learning-based emotion image retrieval method | |
CN111599438B (en) | Real-time diet health monitoring method for diabetics based on multi-mode data | |
CN110490242B (en) | Training method of image classification network, fundus image classification method and related equipment | |
CN111950528B (en) | Graph recognition model training method and device | |
CN112818975A (en) | Text detection model training method and device and text detection method and device | |
CN110135242B (en) | Emotion recognition device and method based on low-resolution infrared thermal imaging depth perception | |
CN106845434B (en) | Image type machine room water leakage monitoring method based on support vector machine | |
CN111954250B (en) | Lightweight Wi-Fi behavior sensing method and system | |
CN116580832A (en) | Auxiliary diagnosis system and method for senile dementia based on video data | |
WO2023004546A1 (en) | Traditional chinese medicine constitution recognition method and apparatus, and electronic device, storage medium and program | |
CN114528411A (en) | Automatic construction method, device and medium for Chinese medicine knowledge graph | |
CN114332911A (en) | Head posture detection method and device and computer equipment | |
Burkapalli et al. | TRANSFER LEARNING: INCEPTION-V3 BASED CUSTOM CLASSIFICATION APPROACH FOR FOOD IMAGES. | |
CN109359543B (en) | Portrait retrieval method and device based on skeletonization | |
CN117115505A (en) | Emotion enhancement continuous training method combining knowledge distillation and contrast learning | |
Nahar et al. | A robust model for translating arabic sign language into spoken arabic using deep learning | |
CN115019367A (en) | Genetic disease face recognition device and method | |
CN114998252A (en) | Image quality evaluation method based on electroencephalogram signals and memory characteristics |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |