CN117893528B

CN117893528B - Method and device for constructing cardiovascular and cerebrovascular disease classification model

Info

Publication number: CN117893528B
Application number: CN202410283400.1A
Authority: CN
Inventors: 赖小波
Original assignee: Yunnan Dean Medical Laboratory Co ltd
Current assignee: Yunnan Dean Medical Laboratory Co ltd
Priority date: 2024-03-13
Filing date: 2024-03-13
Publication date: 2024-05-17
Anticipated expiration: 2044-03-13
Also published as: CN117893528A

Abstract

The application provides a method and a device for constructing a cardiovascular and cerebrovascular disease classification model, and relates to the field of intelligent medical treatment, wherein the method comprises the following steps: counting physiological characteristic parameters of a plurality of patients and corresponding patient disease labels to construct a sample data set; training the cardiovascular and cerebrovascular disease classification model based on the training sample subset, and evaluating model performance indexes of the trained cardiovascular and cerebrovascular disease classification model on the verification sample subset to adjust and optimize super parameters of the cardiovascular and cerebrovascular disease classification model; based on the test sample subset, evaluating whether the cardiovascular and cerebrovascular disease classification model meets a preset model convergence condition; the cardiovascular and cerebrovascular disease classification model comprises a data input layer, a feature extraction module, a data fusion analysis layer and a classification prediction layer. Therefore, by fusing multidimensional data and advanced deep learning technology, the accuracy and individual fitness of the cardiovascular and cerebrovascular disease classification model are obviously improved, and the method can play an important role in clinical application.

Description

Method and device for constructing cardiovascular and cerebrovascular disease classification model

Technical Field

The application relates to the technical field of intelligent medical information processing, in particular to a method and a device for constructing a cardiovascular and cerebrovascular disease classification model.

Background

Cardiovascular and cerebrovascular diseases, one of the most major health problems worldwide, constitute a great threat to human life and health. The diagnosis process of the diseases is complex, and various physiological signals and clinical indexes need to be comprehensively considered.

The diagnosis of cardiovascular and cerebrovascular diseases is dependent on analysis of a large number of physiological signals, and the existing artificial intelligent diagnosis model is mainly based on traditional machine learning technology, such as a support vector machine (Support Vector Machine, SVM) and random forests. Although these methods have remarkable effects in specific cases, they do not have the capability of processing complex data such as electrocardiogram (Electrocardiography, ECG) and the like, and have limited capability of identifying nonlinear patterns, so that the model performance is poor, and the method cannot be widely popularized and applied.

In view of the above problems, currently, no preferred technical solution is proposed.

Disclosure of Invention

The application provides a method and a device for constructing a cardiovascular and cerebrovascular disease classification model, which are used for at least solving the problems that an intelligent diagnosis model of the cardiovascular and cerebrovascular disease in the prior art does not have the capability of processing high-complexity data and the model performance is poor.

The application provides a method for constructing a cardiovascular and cerebrovascular disease classification model, which comprises the following steps: counting physiological characteristic parameters of a plurality of patients and corresponding patient disease labels to construct a sample data set; the physiological characteristic parameters comprise ECG data, blood pressure data, basic patient information and blood biochemical indexes; the patient basic information includes a patient age and a patient sex; the patient disease label is a cardiovascular and cerebrovascular disease type; dividing the sample data set into a training sample subset, a verification sample subset and a test sample subset according to a preset proportion; training the cardiovascular and cerebrovascular disease classification model based on the training sample subset, and evaluating model performance indexes of the trained cardiovascular and cerebrovascular disease classification model on the verification sample subset to adjust and optimize super parameters of the cardiovascular and cerebrovascular disease classification model; the model performance index at least comprises model prediction accuracy; based on the test sample subset, evaluating whether the cardiovascular and cerebrovascular disease classification model meets a preset model convergence condition; the cardiovascular and cerebrovascular disease classification model comprises a data input layer, a feature extraction module, a data fusion analysis layer and a classification prediction layer; based on the data input layer, preprocessing an input sample; the preprocessing operation comprises normalization and standardization of ECG data, blood pressure data and blood biochemical indexes in input samples, and further wavelet transformation and Fourier transformation are carried out on the normalized ECG data so as to obtain multi-scale information corresponding to the ECG data; based on an enhanced CNN layer in the feature extraction module, multi-scale information corresponding to the ECG data in the input sample is processed in parallel to extract corresponding scale feature representation; extracting key features from blood biochemical indexes in the input sample based on a VAE layer in the feature extraction module, and generating target potential variables; and processing the ECG data of the input samples using a multi-headed self-attention mechanism based on a time-series data processing layer in the feature extraction module to extract key feature representations in a time series; combining outputs of the enhanced CNN layer, the VAE layer, and the temporal data processing layer based on the data fusion analysis layer to determine corresponding composite feature representations; based on the classification prediction layer, updating the comprehensive feature representation according to the basic information of the patient in the input sample, so as to determine a label prediction result corresponding to the input sample according to the updated comprehensive feature representation; the classification prediction layer adopts a conditional full-connection layer.

Optionally, the processing, based on the enhanced CNN layer in the feature extraction module, the multi-scale information corresponding to the ECG data in the input sample in parallel to extract a corresponding scale feature representation includes:

，

wherein, Representing raw ECG data,/>Representing an electrocardiogram after wavelet transformation,/>Representing an electrocardiogram after Fourier transformation,/>Is the length of the time series,/>And/>Feature dimensions of electrocardiogram and blood pressure, respectively; /(I)、/>And/>Respectively represent corresponding scale feature representations, andAnd/>Respectively representing the weight and the offset of the corresponding convolution layer;

The extracting key features from the blood biochemical indexes in the input sample based on the VAE layer in the feature extracting module and generating target potential variables comprises the following steps:

The encoder structure of the VAE layer is: ，

The decoder structure of the VAE layer is: ，

，

wherein, Indicates the biochemical index of blood,/>Representing the dimension of the biochemical index of blood,/>And/>Is the mean and variance produced by the encoder network,/>Represents a latent variable, τ represents a sampling noise term,/>Representing the target latent variable/>And/>Is a network parameter of the VAE layer;

Wherein the processing of the ECG data of the input samples using a multi-headed self-attention mechanism based on the time-series data processing layer in the feature extraction module to extract key feature representations in a time sequence comprises:

，

wherein, Respectively represent corresponding self-attention weight matrix,/>Representing the dimensions of the key, anRepresenting key feature representations in the time series.

Optionally, the structure of the data fusion analysis layer is as follows:

，

wherein GRU represents a gate-controlled loop unit, Representing blood pressure data/>Representing the composite feature representation.

Optionally, the structure of the classification prediction layer is:

，

wherein, Represents additional condition information determined from patient basic information in the input sample,And/>Representing the condition dependent full connection layer weights and biases, respectively/>Representing the label prediction result.

Optionally, the cardiovascular and cerebrovascular disease category comprises any one of the following: arrhythmia, myocardial infarction, hypertension, coronary artery disease, and ventricular tachycardia.

Optionally, the cardiovascular and cerebrovascular disease classification model employs weighted multiclass focus loss as a model loss function, the model loss function being represented by:

For each cardiovascular and cerebrovascular disease category Calculate each sample/>Loss/>, corresponding to：

，

The loss of all samples under each cardiovascular disease category is integrated to determine as model loss：

，

Wherein the model predicts the output，/>Is the batch size,/>Representing the total number of categories of cardiovascular and cerebrovascular diseases/>Representation model prediction sample/>Belonging to category/>Probability of (2); /(I)Representing the category/>, of cardiovascular and cerebrovascular diseasesWeights of (2); /(I)Representing a focus parameter; /(I)Representation of samples/>Aiming at cardiovascular and cerebrovascular diseasesIs a label value of (a).

Optionally, the model convergence condition is a pass loss thresholdOverall accuracy thresholdAnd key category accuracy threshold/>To be comprehensively determined; wherein the evaluating whether the cardiovascular and cerebrovascular disease classification model meets a preset model convergence condition based on the test sample subset comprises: when the test loss of the cardiovascular and cerebrovascular disease classification model is lower than/>And its overall accuracy exceeds/>Determining that the cardiovascular and cerebrovascular disease classification model has primarily converged; determining whether an accuracy of identification for a preset critical cardiovascular and cerebrovascular disease category exceeds/>; When exceeding/>And when the cardiovascular and cerebrovascular disease classification model is determined to meet the model convergence condition.

Optionally, the counting physiological characteristic parameters of the plurality of patients and corresponding patient disease signatures to construct a sample dataset includes: acquiring electrocardiogram sampling information of a plurality of patients, wherein the electrocardiogram sampling information comprises ECG data and corresponding patient disease labels; generating ECG synthesis data for synthesizing the at least one condition label using a condition generation countermeasure network; each condition label corresponds to a corresponding cardiovascular and cerebrovascular disease category; a sample dataset is constructed based on the respective ECG synthesis data with the condition label and the physiological characteristic parameter with the corresponding patient disease label.

The application also provides a device for constructing the cardiovascular and cerebrovascular disease classification model, which comprises: the data set construction unit is used for counting physiological characteristic parameters of a plurality of patients and corresponding patient disease labels so as to construct a sample data set; the physiological characteristic parameters comprise ECG data, blood pressure data, basic patient information and blood biochemical indexes; the patient basic information includes a patient age and a patient sex; the patient disease label is a cardiovascular and cerebrovascular disease type; the subset dividing unit is used for dividing the sample data set into a training sample subset, a verification sample subset and a test sample subset according to a preset proportion; the model training unit is used for training the cardiovascular and cerebrovascular disease classification model based on the training sample subset, and evaluating model performance indexes of the trained cardiovascular and cerebrovascular disease classification model on the verification sample subset so as to adjust and optimize super-parameters of the cardiovascular and cerebrovascular disease classification model; the model performance index at least comprises model prediction accuracy; the model test unit is used for evaluating whether the cardiovascular and cerebrovascular disease classification model meets a preset model convergence condition based on the test sample subset; the cardiovascular and cerebrovascular disease classification model comprises a data input layer, a feature extraction module, a data fusion analysis layer and a classification prediction layer; based on the data input layer, preprocessing an input sample; the preprocessing operation comprises normalization and standardization of ECG data, blood pressure data and blood biochemical indexes in input samples, and further wavelet transformation and Fourier transformation are carried out on the normalized ECG data so as to obtain multi-scale information corresponding to the ECG data; based on an enhanced CNN layer in the feature extraction module, multi-scale information corresponding to the ECG data in the input sample is processed in parallel to extract corresponding scale feature representation; extracting key features from blood biochemical indexes in the input sample based on a VAE layer in the feature extraction module, and generating target potential variables; and processing the ECG data of the input samples using a multi-headed self-attention mechanism based on a time-series data processing layer in the feature extraction module to extract key feature representations in a time series; combining outputs of the enhanced CNN layer, the VAE layer, and the temporal data processing layer based on the data fusion analysis layer to determine corresponding composite feature representations; based on the classification prediction layer, updating the comprehensive feature representation according to the basic information of the patient in the input sample, so as to determine a label prediction result corresponding to the input sample according to the updated comprehensive feature representation; the classification prediction layer adopts a conditional full-connection layer.

The application also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the method for constructing the cardiovascular and cerebrovascular disease classification model according to any one of the above when executing the program.

The present application also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of constructing a classification model of cardiovascular and cerebrovascular diseases as described in any of the above.

The application also provides a computer program product, which comprises a computer program, wherein the computer program realizes the method for constructing the cardiovascular and cerebrovascular disease classification model when being executed by a processor.

The method, the system, the electronic equipment and the non-transitory computer readable storage medium for constructing the cardiovascular and cerebrovascular disease classification model provided by the application can at least produce the following technical effects:

(1) An efficient and accurate cardiovascular and cerebrovascular disease classification model is constructed, a plurality of physiological and biochemical parameters closely related to cardiovascular and cerebrovascular diseases are comprehensively considered, ECG data, blood pressure data, basic information (age, sex) of patients and blood biochemical indexes are combined, a more comprehensive disease portrait is provided through multi-dimensional feature fusion, and accuracy of cardiovascular and cerebrovascular disease classification results predicted by an artificial intelligent diagnosis model is improved.

(2) By training on the training sample subset and evaluating the model on the verification sample subset, the hyper-parameters of the model can be effectively adjusted and optimized, and the model is ensured to have good generalization capability; through the evaluation of the test sample subset, the effect of the model in practical application can be comprehensively known, and the reliability of the model performance of the cardiovascular and cerebrovascular disease classification model is ensured.

(3) Multi-scale ECG analysis is performed for the critical data dimension "ECG data" in the cardiovascular and cerebrovascular disease classification scenario. The ECG data is subjected to wavelet transformation and Fourier transformation, so that the model can capture details and frequency domain features possibly missing in the original ECG signal, thereby improving the recognition capability of complex electrocardiogram modes and being beneficial to expanding the application range and accuracy of the model.

(4) Aiming at the time series characteristics of ECG data, the electrocardiogram data is analyzed from multiple angles through a multi-head self-attention mechanism, so that the sensitivity and the recognition capability to the dynamic change of cardiovascular and cerebrovascular diseases are improved.

(5) The nonlinear mode in the electrocardiogram can be more effectively extracted and identified by the enhanced Convolutional Neural Network (CNN) for parallel processing of the multiscale information of the ECG; the complex biomarker patterns associated with cardiovascular and cerebrovascular diseases can be further revealed by extracting key features from blood biochemical indicators by means of a variational self-encoder (VAE). Therefore, the nonlinear recognition capability of the cardiovascular and cerebrovascular disease classification model is improved.

(6) Through the conditional full-connection layer, the characteristic representation is adjusted according to the basic information of the patient, so that the model is allowed to provide more personalized diagnosis results according to individual differences (such as age and sex), and the universality and the accuracy of the cardiovascular and cerebrovascular disease classification model are improved.

The embodiment of the application not only makes up the defects of the traditional machine learning method in high-complexity data processing and nonlinear pattern recognition, but also remarkably improves the accuracy and individuation degree of cardiovascular and cerebrovascular disease classification by fusing multidimensional data and advanced deep learning technology, and can play an important role in clinical application.

Drawings

In order to more clearly illustrate the application or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the application, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart showing an example of a method of constructing a classification model of cardiovascular and cerebrovascular diseases according to an embodiment of the present application;

FIG. 2 shows a block diagram of an example of a classification model of cardiovascular and cerebrovascular diseases according to an embodiment of the application;

fig. 3 shows an operation flowchart according to an example of step S140 in fig. 1;

Fig. 4 shows an operation flowchart according to an example of step S110 in fig. 1;

FIG. 5 is a block diagram showing an example of a construction apparatus of a classification model of cardiovascular and cerebrovascular diseases according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an electronic device provided by the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

At present, diagnosis of cardiovascular and cerebrovascular diseases mainly depends on analysis of physiological signals such as Electrocardiogram (ECG), blood pressure, pulse waveform and the like by doctors. However, existing classification methods have several limitations:

1) Data processing complexity. The diagnosis of cardiovascular and cerebrovascular diseases involves a large amount of physiological data and clinical information, and the processing and analysis of these data are extremely complex, requiring powerful data processing capability and expertise.

2) Limited pattern recognition capability. Traditional machine learning methods such as SVM, random forest, etc. have significant effects in dealing with linear or simple non-linear problems, but their performance is often limited when faced with complex, non-linear physiological signals.

3) Large-scale data sets have insufficient processing power. With the progress of medical technology, the data volume available for diagnosing cardiovascular and cerebrovascular diseases is rapidly increased, and the traditional method has low efficiency when processing large-scale data sets, so that rapid and accurate diagnosis is difficult to realize.

4) Individual difference handling problems. There are significant individual differences between patients with cardiovascular and cerebrovascular diseases. Current diagnostic models tend to ignore these differences, resulting in limited generalization of the diagnostic results.

5) Dynamic data analysis is inadequate. The development of cardiovascular and cerebrovascular disease is a dynamic process, but most existing models are not effective in processing and analyzing time-varying data, such as continuously monitored electrocardiographic data.

In view of the above challenges, it is urgent to develop a more advanced and efficient classification model for cardiovascular and cerebrovascular diseases.

Fig. 1 is a flowchart showing an example of a method of constructing a classification model of cardiovascular and cerebrovascular diseases according to an embodiment of the present application.

The execution main body of the method of the embodiment of the application can be any electronic equipment with processing and calculating capabilities, such as a computer, a mobile phone, a server and the like, so as to construct a brand-new artificial intelligent diagnosis model, which can effectively process and analyze large-scale and complex physiological signal data, has stronger nonlinear pattern recognition capability, can adapt to individual differences and dynamic changes, and provides more objective, accurate and personalized analysis results for reference of clinical medical staff.

As shown in fig. 1, in step S110, physiological characteristic parameters of a plurality of patients and corresponding patient disease signatures are counted to construct a sample dataset.

Here, the physiological characteristic parameters include ECG data, blood pressure data, patient basic information, and blood biochemical indicators. In addition, the patient basic information includes the patient's age and patient's sex, and the patient's disease label is a cardiovascular disease category.

For example, in case of authorized data access by a patient, by invoking a patient database to collect physiological characteristic parameters and corresponding cardiovascular and cerebrovascular disease categories of a plurality of patients, the log data of each patient is used to make corresponding sample data, respectively, for example, sample information in the sample data is determined according to the physiological characteristic parameters of the patient, and sample labels in the sample data are determined according to the cardiovascular and cerebrovascular disease categories of the patient.

It should be understood that the above types of physiological characteristic parameters are used as examples only, and that many other types of physiological characteristic parameters not described herein may be used, such as pulse signals and the like. Furthermore, the input data can be preprocessed, such as noise removal, missing data interpolation and the like, so as to ensure that the input data meets the model processing requirements. In addition, the original data may be enhanced or derived in various ways in constructing the data set to increase the diversity of the data set, more details of which will be developed below in connection with the other sections.

In step S120, the sample data set is divided into a training sample subset, a verification sample subset, and a test sample subset according to a preset ratio.

Illustratively, the training sample subset, the validation sample subset, and the test sample subset are divided in proportions of 70%, 15%.

In step S130, the cardiovascular and cerebrovascular disease classification model is trained based on the training sample subset, and model performance indexes of the trained cardiovascular and cerebrovascular disease classification model are evaluated on the verification sample subset to adjust and optimize super parameters of the cardiovascular and cerebrovascular disease classification model, wherein the model performance indexes at least comprise model prediction accuracy.

Specifically, a cardiovascular and cerebrovascular disease classification model is trained using a subset of training samples, e.g., dividing a plurality of batches, and passing data for each batch (batch) through the model to calculate output and loss. The model performance is evaluated after each epoch by updating the weights of the model using a back propagation algorithm and an optimizer, and after each epoch, the model's accuracy, recall, F1 score, etc. are evaluated using a subset of validation samples, thereby adjusting the hyper-parameters such as learning rate, regularization, etc. to prevent overfitting.

In step S140, based on the subset of test samples, it is evaluated whether the cardiovascular and cerebrovascular disease classification model satisfies a preset model convergence condition.

In some embodiments, misclassified cases are analyzed by final assessment of the cardiovascular and cerebrovascular disease classification model using a separate subset of test samples, looking for model deficiencies. And then, adjusting a model structure or parameters according to the test result, and repeating the training and verifying steps until the model convergence condition is met.

Here, the model convergence condition may be diversified, and may be adjusted according to actual requirements, such as convergence of a loss function, convergence of model parameters, and convergence of gradients, etc.

In connection with the example of embodiment of the present application, it is assumed that there is a cardiovascular and cerebrovascular disease dataset containing 1000 patient data. The data for each patient includes an electrocardiogram, blood pressure readings, blood biochemical indicators, and basic information. The data is preprocessed and labeled and then divided into a training set, a verification set and a test set. In the model training stage, a training set is used for classification prediction. And monitoring performance indexes on the verification set in the training process, and carrying out necessary adjustment. Finally, the model is finally evaluated on the test set. The model is assumed to reach 90% accuracy on the test set, which indicates that the model can effectively classify cardiovascular and cerebrovascular diseases. The model structure or training process is further tuned to the problems found in the test to improve its performance and accuracy.

Fig. 2 shows a block diagram of an example of a classification model of cardiovascular and cerebrovascular diseases according to an embodiment of the application.

As shown in fig. 2, the cardiovascular and cerebrovascular disease classification model 200 includes a cascade of a data input layer 210, a feature extraction module 220, a data fusion analysis layer 230, and a classification prediction layer 240.

Based on the data input layer 210, the input samples are subjected to a preprocessing operation. Here, the preprocessing operation includes normalization of the ECG data, the blood pressure data, and the blood biochemical index in the input samples, and further wavelet transformation and fourier transformation of the normalized ECG data to obtain multi-scale information corresponding to the ECG data.

Therefore, consistency and comparability of input multidimensional data are ensured, and a good basis is provided for subsequent deep learning processing.

The feature extraction module 220 includes an enhanced CNN (convolutional neural network) layer 221, a VAE (variational self-encoder) layer 222, and a temporal data processing layer 223. Based on enhanced CNN layer 221, the multi-scale information corresponding to the ECG data in the input samples is processed in parallel to extract the corresponding scale feature representation. Key features are extracted from blood biochemical indicators in the input sample based on the VAE layer 222 and target latent variables are generated. Based on the temporal data processing layer 223, the ECG data of the input samples is processed using a multi-headed self-attention mechanism to extract key feature representations in the temporal sequence.

In a first aspect of embodiments of the present application, capturing complex physiological signal patterns is facilitated by the extraction of multi-scale electrocardiogram features.

More specifically, the enhanced CNN layer 221 has the structure as follows:

，

wherein, Representing raw ECG data,/>Representing an electrocardiogram after wavelet transformation,/>Representing an electrocardiogram after Fourier transformation,/>Is the length of the time series,/>And/>Feature dimensions of electrocardiogram and blood pressure, respectively; /(I)、/>And/>Representing the corresponding scale feature representations, respectively/>And/>Respectively representing the weight and offset of the corresponding convolutional layer.

Thus, a multi-scale ECG analysis is performed for the critical data dimension "ECG data" in the cardiovascular and cerebrovascular disease classification scenario. The ECG data is subjected to wavelet transformation and Fourier transformation, so that the model can capture details and frequency domain features possibly missing in the original ECG signal, thereby improving the recognition capability of complex electrocardiogram modes and being beneficial to expanding the application range and accuracy of the model. In addition, aiming at the time sequence characteristics of the ECG data, the model can analyze the electrocardiogram from multiple angles through a multi-head self-attention mechanism, so that the recognition capability of the complex heart activity mode is improved.

In a second aspect of embodiments of the present application, with a multi-headed self-attention mechanism, the model can focus on key moments and patterns in the ECG data, such as feature information at abnormal heart rate moments, etc., enabling more efficient extraction and identification of nonlinear patterns in the electrocardiogram.

More specifically, the model layer structure based on the multi-head self-attention mechanism is as follows:

，

In the third aspect of the embodiment of the application, the deep features extracted by the VAE layer are helpful for understanding the internal connection of blood biochemical indexes, and learning key blood biochemical indexes influencing the disease classification result can further reveal complex biomarker patterns related to cardiovascular and cerebrovascular diseases. Therefore, the nonlinear recognition capability of the cardiovascular and cerebrovascular disease classification model is improved.

More specifically, the model layer structure of the VAE layer is:

The encoder structure of the VAE layer is: ，

The decoder structure of the VAE layer is: ，

wherein, Indicates the biochemical index of blood,/>Representing the dimension of the biochemical index of blood,/>And/>Is the mean and variance produced by the encoder network,/>Representing latent variables/>And/>Is a network parameter of the VAE layer.

In particular, in the case of VAEs,Representing a point in potential space (LATENT SPACE), this concept is the core of the VAE, which is a compressed representation of the data that the VAE is attempting to learn. Specifically, in the initial phase, the latent variable/>Considered as a random variable, its distribution is parameterized by the encoder portion of the VAE. The encoder learns the distribution of the input data and attempts to map the data onto a potentially spatial distribution. In the formula of the VAE, the encoder outputs two parameters: mean value ofSum of variances/>Which defines the latent variable/>Is a distribution of (a). By this parameterized representation, the VAE is able to capture the intrinsic structure of the input data. Further, in order to be able to train the VAE by gradient descent and counter-propagate the error, the VAE uses a so-called "re-parameterization technique". This means that/>, derived from the encoderIn practice by sampling a noise term tau from a standard normal distribution and using/>Is calculated to enable effective training by random gradient descent of the VAE. Thus, key features are automatically extracted from blood biochemical indicators by VAE and expressed as final target latent variables/>Helping to capture key characteristics of the input data and for subsequent disease classification.

The outputs of the enhanced CNN layer 221, VAE layer 222, and temporal data processing layer 223 are combined based on the data fusion analysis layer 230 to determine the corresponding composite feature representation.

In some embodiments, the data fusion analysis layer 230 performs a weighted calculation for each output data to obtain a corresponding composite feature representation. Illustratively, the data fusion analysis layer 230 may employ a GRU unit to enable dynamic fusion of features from different sources, adaptively adjusting fusion weights according to the content of the current input data. Therefore, the dynamic fusion mechanism ensures that the model can effectively combine various features, and improves the prediction accuracy and robustness of the model.

Specifically, the structure of the data fusion analysis layer 230 is:

，

the GRU represents a gating circulating unit and is used for dynamically determining fusion weights of data from different sources; and Representing the composite feature representation.

Thus, in feature fusion, rather than simply stitching, gating mechanisms of the GRU are used to dynamically determine the different source data (i.e.,) So as to more effectively combine various types of information and realize the feature fusion based on context awareness.

Based on the classification prediction layer 240, the comprehensive feature representation is updated according to the patient basic information in the input sample, so as to determine a label prediction result corresponding to the input sample according to the updated comprehensive feature representation, and the classification prediction layer 240 adopts a conditional full-connection layer.

In some embodiments, the conditional full link layer performs a weighted adjustment to the composite feature representation based on the patient's personal basic information (e.g., age, gender) to obtain a final classification prediction. Therefore, by adding a predictive calibration mechanism of personal basic information, the model can provide more accurate classification results according to individual differences of different patients, and the applicability of the model in practical application is enhanced.

More specifically, the structure of the classification prediction layer is:

，

Therefore, by introducing a condition mechanism into the full-connection layer, the weight in the full-connection layer can be dynamically adjusted according to the additional information such as the age, the sex and the like of the patient, and personalized prediction of the model according to the specific condition of the patient is realized.

Details of predicting cardiovascular and cerebrovascular disease categories to which patient physiological data should be subjected by using a cardiovascular and cerebrovascular disease classification model are developed in connection with examples of embodiments of the present application:

One patient (45 years old, male) was assumed to be undergoing cardiovascular and cerebrovascular disease detection. Provides its electrocardiographic data, blood pressure readings and blood biochemical indicators. The heart cerebrovascular disease classification model firstly processes the data, and central electrogram data is processed by multi-scale transformation and CNN layers, so that key characteristics such as heart rate variability and the like are extracted. Meanwhile, VAE extracts potential health risk factors from blood biochemical indicators. Next, a multi-headed self-attention mechanism analyzes important timing patterns in the electrocardiogram, such as signs of arrhythmia. Furthermore, all of these features are dynamically fused by the GRU unit. Finally, the fully connected layers accurately classify whether the patient has cardiovascular and cerebrovascular diseases and possible disease types, taking into account the age and sex of the patient. By way of example, the model further predicts the likelihood of coronary heart disease, given that the patient is identified during data processing by the model as being at risk for arrhythmia and mild hypertension, in combination with their age and sex. Therefore, the result output by the manual diagnosis model can assist doctors to quickly output accurate diagnosis results and provide more targeted medical advice for patients.

It should be noted that, through the cardiovascular and cerebrovascular disease classification model of the embodiment of the present application, not only the prediction classification for common cardiovascular and cerebrovascular diseases (such as arrhythmia, myocardial infarction and hypertension) but also the prediction classification for rare cardiovascular and cerebrovascular diseases (such as coronary artery diseases and ventricular tachycardia) can be achieved.

In order to solve the problem that the model can not fully learn the sample characteristic information of related diseases due to too small sample number of rare cardiovascular and cerebrovascular diseases, the embodiment of the application further provides a related improvement scheme of the cardiovascular and cerebrovascular disease classification model.

In particular, cardiovascular and cerebrovascular disease classification often involves multiple disease types, each of which may occur at different frequencies and severity levels. In this case, the conventional focus loss function may not be sufficient to deal with the problems of class imbalance and different class importance. Therefore, a "weighted multi-class focus loss" is proposed, which loss function considers not only the sample difficulty, but also the importance of the different classes.

In an embodiment of the present application, the cardiovascular and cerebrovascular disease classification model uses weighted multiclass focus loss as a model loss function, which is represented by the following manner:

For each cardiovascular and cerebrovascular disease category Calculate each sample/>Loss/>, corresponding to：/>

，

Wherein the model predicts the output，/>Is the batch size,/>Representing the total number of categories of cardiovascular and cerebrovascular diseases/>Representation model prediction sample/>Belonging to category/>Probability of (2); /(I)Representing the category/>, of cardiovascular and cerebrovascular diseasesWeights of (2); Representing a focus parameter; /(I) Representation of samples/>Aiming at cardiovascular and cerebrovascular diseasesFor example if the sample/>Truly belonging to category/>Then/>1, Otherwise 0.

The weight is as followsCan be configured according to business scenarios, in particular, can give higher weight to rare or more important cardiovascular and cerebrovascular disease categories. For example, if a rare disease type has a greater impact on the patient's health, even if it occurs less frequently in the dataset, it should be given a higher weight. Furthermore, by focus parameter/>The method is responsible for adjusting the attention degree of the model to samples with different difficulties, and is carried out on the basis of considering the category weights, so that the model is helped to learn the categories which are difficult to classify and have high importance more effectively.

In the practical application of the classification of cardiovascular and cerebrovascular diseases in combination with the business application scenario of the embodiment of the application, if certain types of cardiovascular and cerebrovascular diseases (such as coronary artery diseases) are less common than other types (such as hypertension), the health effect on patients is more serious. By using the enhanced focus loss function as provided by embodiments of the present application, higher weights may be placed on these rare but severe disease typesEnsuring that the model is more focused on correctly identifying the disease types during the training process, thereby providing more accurate and comprehensive diagnosis support in practical application.

Fig. 3 shows an operation flowchart according to an example of step S140 in fig. 1. Here, the model convergence condition is the pass loss thresholdOverall accuracy threshold/>And key category accuracy threshold/>To be comprehensively determined.

As shown in fig. 3, in step S310, when the test loss of the cardiovascular disease classification model is lower thanAnd its overall accuracy exceeds/>At this time, it was determined that the classification model of cardiovascular and cerebrovascular diseases had primarily converged.

In step S320, it is determined whether the recognition accuracy for the preset critical cardiovascular and cerebrovascular disease category exceeds。

In step S330, when the value exceedsAnd when the model convergence condition is met by the classification model of the cardiovascular and cerebrovascular diseases.

On the other hand, when not exceedingAnd if the model is not converged, determining that the cardiovascular and cerebrovascular disease classification model is not converged, and continuing to perform iterative training or adjusting and setting model super-parameters.

It should be noted that the critical cardiovascular disease categories may be defined or set according to the service requirements, for example, the corresponding recognition accuracy threshold is specifically set for rare but serious disease types of particular concern. Therefore, the classification model of cardiovascular and cerebrovascular diseases must achieve higher accuracy in these key categories to be considered as convergence. Not only is the sensitivity and importance of the medical field to a particular rare but serious disease category comprehensively considered based on the traditional reduction of losses and the improvement of accuracy.

Further, the supplemental training process for critical cardiovascular disease categories may employ an enhanced early-arrest (Early Stopping) strategy, i.e., if in continuousIn the epochs, the accuracy of the key category is not improved, and the training is stopped.

In particular, ifThen stop continuing with the iterative operation, wherein/>Is a preset positive number and is used to determine whether the change in accuracy is significant enough.

By the embodiment of the application, the accuracy of the key category is emphasized, and the model is ensured to have higher diagnosis accuracy on the most important and most dangerous disease types in medicine. In addition, by combining the threshold value of the overall loss and the accuracy and the early-stop strategy, the model overfitting can be effectively prevented, the generalization capability of the model on unseen data is ensured, and the model overfitting can be effectively prevented. In addition, by setting a specific accuracy threshold for the key categories, the categories are more focused in the model training process, so that the value of the model in actual medical application is improved.

Fig. 4 shows an operation flowchart according to an example of step S110 in fig. 1.

It should be noted that ECG data contains rich information about the electrophysiological activity of the heart, and may reflect the characteristics of various cardiovascular and cerebrovascular diseases. Furthermore, interpretation of an electrocardiogram requires identification of complex waveform patterns, which is a great challenge for machine learning models. In addition, the performance on an electrocardiogram is relatively specific for some rare cardiac conditions and rare in the real world, so enhancing the training set by synthesizing ECG data is particularly valuable in this regard.

As shown in fig. 4, in step S410, electrocardiogram sample information of a plurality of patients is acquired, the electrocardiogram sample information containing ECG data and corresponding patient disease labels.

In step S420, ECG synthesis data is generated using a conditional generation countermeasure network to synthesize at least one condition label, each condition label corresponding to a respective cardiovascular and cerebrovascular disease category.

It should be noted that the condition label is generally discrete and corresponds to various cardiovascular and cerebrovascular disease types, such as "coronary heart disease", "myocardial infarction", "ventricular tachycardia", etc. In a conditional generation countermeasure network, these condition tags are encoded as One-Hot Encoding (One-Hot Encoding) or embedded vectors (Embedding) so as to be combined with the input (noise vector) of cGAN.

In step S430, a sample dataset is constructed based on the respective condition-tagged ECG synthesis data and the physiological characteristic parameters with the respective patient disease tags.

According to the embodiment of the application, the Conditional GENERATIVE ADVERSARIAL Networks (cGANs) are adopted to generate more diversified Electrocardiogram (ECG) data, so that the diversity of a data set can be increased, and particularly for rare disease cases, the model can be better generalized and the robustness of the model can be improved.

More specifically, details of the build process for cGANs include the structural design and training process of cGANs. Structurally cGANs includes a generator network and a arbiter network. The inputs to the generator network are random noise vectors and condition labels (e.g., disease type) and the outputs are generated electrocardiographic data. The task of the discriminator network is to distinguish between the generated electrocardiographic data and the real electrocardiographic data. During the training process cGANs, a challenge training based on conditional constraints is employed. Conditional constraints refer to: the condition label is used in the training process to ensure that the generated data accords with the specific cardiovascular and cerebrovascular diseases. In the process of countermeasure training, false data and real data generated by a generator network are taken as input, a training discriminator network carries out true and false classification, and meanwhile, the training generator spoofs the discriminator so that the generated data is as close to the real data as possible.

In connection with the business application scenario of embodiments of the present application, researchers find that certain rare diseases (such as particular types of arrhythmias, e.g., ventricular tachycardia, rare variants of atrial fibrillation, etc.) are very limited in cases in cardiovascular and cerebrovascular disease classification studies. To improve the performance of the model on these rare diseases, researchers have generated a large number of synthetic electrocardiographic data for such arrhythmias using cGANs. Thus, these synthetic data are very similar in visual and statistical characteristics to real electrocardiograms and contain disease-specific signal patterns.

It should be noted that, for blood biochemical indicators and blood pressure data in the dataset, the data is generally relatively stable, and rare variations are not as frequent as in the case of an electrocardiogram. Furthermore, the complexity of synthesizing blood biochemical data or blood pressure data is low compared to an electrocardiogram, since these data are typically numerical rather than time series data. Thus, the blood biochemical index and blood pressure data may be enhanced in a variety of other ways, particularly, but not by way of limitation, conventional enhancement such as random perturbation, data interpolation or data resampling, to effect generation of derivative data.

The device for constructing the cardiovascular and cerebrovascular disease classification model provided by the application is described below, and the device for constructing the cardiovascular and cerebrovascular disease classification model described below and the method for constructing the cardiovascular and cerebrovascular disease classification model described above can be correspondingly referred to each other.

Fig. 5 shows a block diagram of an example of a construction apparatus of a classification model of cardiovascular and cerebrovascular diseases according to an embodiment of the present application.

As shown in fig. 5, a device 500 for constructing a classification model of cardiovascular and cerebrovascular diseases includes a data set constructing unit 510, a subset dividing unit 520, a model training unit 530, and a model testing unit 540.

The data set construction unit 510 is configured to count physiological characteristic parameters of a plurality of patients and corresponding patient disease labels, so as to construct a sample data set; the physiological characteristic parameters comprise ECG data, blood pressure data, basic patient information and blood biochemical indexes; the patient basic information includes a patient age and a patient sex; the patient disease label is a cardiovascular and cerebrovascular disease category.

The subset dividing unit 520 is configured to divide the sample data set into a training sample subset, a verification sample subset, and a test sample subset according to a preset ratio.

The model training unit 530 is configured to train the cardiovascular and cerebrovascular disease classification model based on the training sample subset, and evaluate the model performance index of the trained cardiovascular and cerebrovascular disease classification model on the verification sample subset, so as to adjust and optimize the hyper-parameters of the cardiovascular and cerebrovascular disease classification model; the model performance index at least comprises model prediction accuracy.

The model test unit 540 is configured to evaluate whether the cardiovascular and cerebrovascular disease classification model meets a preset model convergence condition based on the subset of test samples.

The cardiovascular and cerebrovascular disease classification model comprises a data input layer, a feature extraction module, a data fusion analysis layer and a classification prediction layer.

Based on the data input layer, preprocessing an input sample; the preprocessing operation comprises normalization and standardization of ECG data, blood pressure data and blood biochemical indexes in input samples, and further wavelet transformation and Fourier transformation are carried out on the normalized ECG data so as to obtain multi-scale information corresponding to the ECG data.

Based on an enhanced CNN layer in the feature extraction module, multi-scale information corresponding to the ECG data in the input sample is processed in parallel to extract corresponding scale feature representation; extracting key features from blood biochemical indexes in the input sample based on a VAE layer in the feature extraction module, and generating target potential variables; and processing the ECG data of the input samples using a multi-headed self-attention mechanism based on a time-series data processing layer in the feature extraction module to extract key feature representations in a time series.

And combining the outputs of the enhanced CNN layer, the VAE layer and the time sequence data processing layer based on the data fusion analysis layer to determine corresponding comprehensive characteristic representation.

Based on the classification prediction layer, updating the comprehensive feature representation according to the basic information of the patient in the input sample, so as to determine a label prediction result corresponding to the input sample according to the updated comprehensive feature representation; the classification prediction layer adopts a conditional full-connection layer.

In some embodiments, embodiments of the present application provide a non-transitory computer readable storage medium, in which one or more programs including execution instructions are stored, where the execution instructions can be read and executed by an electronic device (including, but not limited to, a computer, a server, or a network device, etc.) to perform the method for constructing a cardiovascular and cerebrovascular disease classification model according to the present application.

In some embodiments, embodiments of the present application also provide a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform the above-described method of constructing a classification model of cardiovascular and cerebrovascular diseases.

In some embodiments, the present application further provides an electronic device, including: the system comprises at least one processor and a memory communicatively connected with the at least one processor, wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor so that the at least one processor can execute a method for constructing a cardiovascular and cerebrovascular disease classification model.

Fig. 6 is a schematic hardware structure of an electronic device for executing a method for constructing a classification model of cardiovascular and cerebrovascular diseases according to another embodiment of the present application, as shown in fig. 6, the device includes:

one or more processors 610, and a memory 620, one processor 610 being illustrated in fig. 6.

The apparatus for performing the method of constructing the classification model of cardiovascular and cerebrovascular diseases may further include: an input device 630 and an output device 640.

The processor 610, memory 620, input devices 630, and output devices 640 may be connected by a bus or other means, for example in fig. 6.

The memory 620 is used as a non-volatile computer readable storage medium, and can be used to store non-volatile software programs, non-volatile computer executable programs, and modules, such as program instructions/modules corresponding to the method for constructing a classification model of cardiovascular and cerebrovascular diseases in the embodiment of the present application. The processor 610 executes various functional applications of the server and data processing by running nonvolatile software programs, instructions and modules stored in the memory 620, that is, implements the method for constructing the cardiovascular and cerebrovascular disease classification model according to the above method embodiment.

Memory 620 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created according to the use of the electronic device, etc. In addition, memory 620 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some embodiments, memory 620 optionally includes memory remotely located relative to processor 610, which may be connected to the electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 630 may receive input digital or character information and generate signals related to user settings and function control of the electronic device. The output device 640 may include a display device such as a display screen.

The one or more modules are stored in the memory 620, which when executed by the one or more processors 610, perform the method of constructing a classification model of cardiovascular and cerebrovascular diseases in any of the method embodiments described above.

The product can execute the method for constructing the cardiovascular and cerebrovascular disease classification model provided by the embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method. Technical details not described in detail in this embodiment may be found in the methods provided in the embodiments of the present application.

The electronic device of the embodiments of the present application exists in a variety of forms including, but not limited to:

(1) Mobile communication devices, which are characterized by mobile communication functionality and are aimed at providing voice, data communication. Such terminals include smart phones, multimedia phones, functional phones, low-end phones, and the like.

(2) Ultra mobile personal computer equipment, which belongs to the category of personal computers, has the functions of calculation and processing and generally has the characteristic of mobile internet surfing. Such terminals include PDA, MID, and UMPC devices, etc.

(3) Portable entertainment devices such devices can display and play multimedia content. The device comprises an audio player, a video player, a palm game machine, an electronic book, an intelligent toy and a portable vehicle navigation device.

(4) Other on-board electronic devices with data interaction functions, such as on-board devices mounted on vehicles.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

From the above description of embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus a general purpose hardware platform, or may be implemented by hardware. Based on such understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the related art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. A method for constructing a cardiovascular and cerebrovascular disease classification model comprises the following steps:

Counting physiological characteristic parameters of a plurality of patients and corresponding patient disease labels to construct a sample data set; the physiological characteristic parameters comprise ECG data, blood pressure data, basic patient information and blood biochemical indexes; the patient basic information includes a patient age and a patient sex; the patient disease label is a cardiovascular and cerebrovascular disease type;

Dividing the sample data set into a training sample subset, a verification sample subset and a test sample subset according to a preset proportion;

Training the cardiovascular and cerebrovascular disease classification model based on the training sample subset, and evaluating model performance indexes of the trained cardiovascular and cerebrovascular disease classification model on the verification sample subset to adjust and optimize super parameters of the cardiovascular and cerebrovascular disease classification model; the model performance index at least comprises model prediction accuracy;

based on the test sample subset, evaluating whether the cardiovascular and cerebrovascular disease classification model meets a preset model convergence condition;

the cardiovascular and cerebrovascular disease classification model comprises a data input layer, a feature extraction module, a data fusion analysis layer and a classification prediction layer;

based on the data input layer, preprocessing an input sample; the preprocessing operation comprises normalization and standardization of ECG data, blood pressure data and blood biochemical indexes in input samples, and further wavelet transformation and Fourier transformation are carried out on the normalized ECG data so as to obtain multi-scale information corresponding to the ECG data;

Based on an enhanced CNN layer in the feature extraction module, multi-scale information corresponding to the ECG data in the input sample is processed in parallel to extract corresponding scale feature representation; extracting key features from blood biochemical indexes in the input sample based on a VAE layer in the feature extraction module, and generating target potential variables; and processing the ECG data of the input samples using a multi-headed self-attention mechanism based on a time-series data processing layer in the feature extraction module to extract key feature representations in a time series;

combining outputs of the enhanced CNN layer, the VAE layer, and the temporal data processing layer based on the data fusion analysis layer to determine corresponding composite feature representations;

2. The method of claim 1, wherein the parallel processing of the multi-scale information corresponding to the ECG data in the input samples based on the enhanced CNN layer in the feature extraction module to extract the respective scale feature representations comprises:

，

wherein, Representing raw ECG data,/>Represents the electrocardiogram after wavelet transformation,Representing an electrocardiogram after Fourier transformation,/>Is the length of the time series,/>And/>Feature dimensions of electrocardiogram and blood pressure, respectively; /(I)、/>And/>Respectively represent corresponding scale feature representations, andAnd/>Respectively representing the weight and the offset of the corresponding convolution layer;

The encoder structure of the VAE layer is: ，

The decoder structure of the VAE layer is: ，

，

wherein, Respectively represent corresponding self-attention weight matrix,/>Representing the dimensions of the bond/>Representing key feature representations in the time series.

3. The method of claim 2, wherein the data fusion analysis layer has a structure of:

，

4. A method according to claim 3, wherein the structure of the classified prediction layer is:

，

5. The method of claim 4, wherein the cardiovascular disease category comprises any one of: arrhythmia, myocardial infarction, hypertension, coronary artery disease, and ventricular tachycardia.

6. The method of claim 5, wherein the cardiovascular and cerebrovascular disease classification model employs weighted multiclass focus loss as a model loss function, the model loss function being represented by:

，

7. The method of claim 6, wherein the model convergence condition is a pass loss thresholdOverall accuracy threshold/>And key category accuracy threshold/>To be comprehensively determined;

Wherein the evaluating whether the cardiovascular and cerebrovascular disease classification model meets a preset model convergence condition based on the test sample subset comprises:

When the test loss of the cardiovascular and cerebrovascular disease classification model is lower than And its overall accuracy exceedsDetermining that the cardiovascular and cerebrovascular disease classification model has primarily converged;

determining whether a recognition accuracy for a preset critical cardiovascular disease class exceeds ；

When exceedingAnd when the cardiovascular and cerebrovascular disease classification model is determined to meet the model convergence condition.

8. The method of claim 1, wherein the counting physiological characteristic parameters of the plurality of patients and corresponding patient disease signatures to construct a sample dataset comprises:

Acquiring electrocardiogram sampling information of a plurality of patients, wherein the electrocardiogram sampling information comprises ECG data and corresponding patient disease labels;

Generating ECG synthesis data for synthesizing the at least one condition label using a condition generation countermeasure network; each condition label corresponds to a corresponding cardiovascular and cerebrovascular disease category;

A sample dataset is constructed based on the respective ECG synthesis data with the condition label and the physiological characteristic parameter with the corresponding patient disease label.

9. A device for constructing a classification model of cardiovascular and cerebrovascular diseases, comprising:

The data set construction unit is used for counting physiological characteristic parameters of a plurality of patients and corresponding patient disease labels so as to construct a sample data set; the physiological characteristic parameters comprise ECG data, blood pressure data, basic patient information and blood biochemical indexes; the patient basic information includes a patient age and a patient sex; the patient disease label is a cardiovascular and cerebrovascular disease type;

The subset dividing unit is used for dividing the sample data set into a training sample subset, a verification sample subset and a test sample subset according to a preset proportion;

The model training unit is used for training the cardiovascular and cerebrovascular disease classification model based on the training sample subset, and evaluating model performance indexes of the trained cardiovascular and cerebrovascular disease classification model on the verification sample subset so as to adjust and optimize super-parameters of the cardiovascular and cerebrovascular disease classification model; the model performance index at least comprises model prediction accuracy;

the model test unit is used for evaluating whether the cardiovascular and cerebrovascular disease classification model meets a preset model convergence condition based on the test sample subset;