CN109907751B

CN109907751B - Laboratory chest pain data inspection auxiliary identification method based on artificial intelligence supervised learning

Info

Publication number: CN109907751B
Application number: CN201910147228.6A
Authority: CN
Inventors: 严洋; 严金川
Original assignee: Individual
Current assignee: Individual
Priority date: 2019-02-27
Filing date: 2019-02-27
Publication date: 2021-02-02
Anticipated expiration: 2039-02-27
Also published as: CN109907751A

Abstract

The invention discloses a laboratory chest pain data examination auxiliary identification method based on artificial intelligence supervised learning, which belongs to the technical field of artificial intelligence medical treatment and comprises an AI system, wherein the AI system acquires information of a clinician, a patient and database information, the AI system finds out diagnosis standards of diseases from the database, compares each standard with actual conditions collected after the patient is admitted, if the comparison results are matched, a warning system cannot be started, and if the comparison results are inconsistent, artificial intelligence sends out an alarm to remind the clinician to recheck the diagnosis of the patient. The invention combines artificial intelligence with chest pain examination, realizes intelligent medical examination, and increases the sensitivity and accuracy of judgment.

Description

Laboratory chest pain data inspection auxiliary identification method based on artificial intelligence supervised learning

Technical Field

The invention belongs to the technical field of medical treatment integrating artificial intelligence and machine learning, and particularly relates to a laboratory chest pain data examination auxiliary identification method based on artificial intelligence supervised learning.

Background

In recent years, the field of artificial intelligence has been rapidly developed, the application of the artificial intelligence is more and more extensive, the artificial intelligence cannot be separated from data analysis and machine learning, and the theory and method for researching intelligent data analysis become one of the necessary bases of artificial intelligence. It attempts to understand the essence of intelligence and produces a new intelligent machine that can react in a manner similar to human intelligence, a field of research including robotics, language recognition, image recognition, natural language processing, and expert systems.

Chest pain is a common life-threatening disease, causes of chest pain are complex and diverse, and include Acute Coronary Syndrome (ACS), aortic dissection, Pulmonary Embolism (PE), pericardial tamponade and the like, wherein the ACS accounts for the highest proportion of serious life-threatening diseases, the misdiagnosis rate of myocardial infarction (AMI) is 3% -5%, the morbidity of aortic dissection aneurysm is about 0.5-1/10 thousands of people, and if misdiagnosis is performed, the mortality rate exceeds 90%. The incidence rate of PE is about 70/10 ten thousand, the incidence rate of spontaneous pneumothorax is 2.5-18/10 ten thousand, and the incidence rate of esophageal rupture is 12.5/10 ten thousand. In 2009, 5666 patients were enrolled in the Beijing acute chest pain registration study, and the results showed that chest pain patients accounted for 4% of emergency treatment patients, with ACS accounting for 27.4%. How to quickly and accurately diagnose and identify the causes of ACS and other fatal chest pain becomes the difficult point and the key point of emergency treatment.

Therefore, it is important to combine artificial intelligence with chest pain examination, especially to combine electrocardiographic data and blood routine data detection in laboratories, to realize intelligent medical identification and to increase the sensitivity and accuracy of judgment.

Disclosure of Invention

Aiming at the defects of the prior art, the invention discloses a laboratory chest pain data examination auxiliary identification method based on artificial intelligence supervised learning, which combines artificial intelligence with chest pain examination, and particularly fuses and identifies electrocardio data, troponin, myoglobin, creatine kinase isoenzyme content, D-dimer and arterial blood sample partial pressure data, so that intelligent medical identification is realized, and the sensitivity and the accuracy of auxiliary judgment on chest pain diseases are increased.

The invention is realized by the following technical scheme:

the laboratory chest pain data examination auxiliary identification method based on artificial intelligence supervised learning comprises an AI system, wherein the AI system acquires clinician diagnosis information, patient laboratory examination information and database information, and performs fusion analysis processing on electrocardio data, troponin, myoglobin, creatine kinase isoenzyme content, D-mer and arterial blood sample partial pressure data examined by a patient laboratory; the AI system finds out the diagnosis standard of the laboratory examination information of the patient from the database, compares the diagnosis standard with the analysis standard of the diagnosis information of the clinician after the patient is admitted, if the comparison result is matched, the warning system can not be started, and if the comparison result is inconsistent, the artificial intelligence can send out an alarm to remind the clinician to examine the patient again.

Further, the fusion analysis processing process comprises:

the AI system adopts an improved Tri-Training algorithm to realize semi-supervised learning of troponin, myoglobin, creatine kinase isoenzyme content, D-dimer and arterial blood sample partial pressure data:

the algorithm inputs are: obtaining troponin, myoglobin, creatine kinase isoenzyme content and D-dimer through data sampling; dividing the sample data of arterial blood partial pressure into a non-labeled sample set U, a Must-link constraint set M and a Cannot-link constraint set C, wherein the maximum number of labeled samples is N_max(ii) a The algorithm output is: dividing two D-dimer intervals, dividing a myocardial injury marker interval and analyzing blood gas into three intervals, namely a marked sample set R 'and an updated unmarked sample set U';

a1, initializing the marked sample number N as 0,

U′＝U；

a2, then, for

The corresponding inter-sample distance is calculated by the following formula:

where d is the number of sample attributes, the sample pair (x) with the largest distance value is determined from M_i′，x_j′)，x_i′Annotating a sample for a user, y_i′For the class obtained, let (x)_i′，y_i′) And (x)_j′，y_i′) Added to R', x_i′And x_j′Deleted from U', N ═ N + 2;

a3, for

The corresponding inter-sample distance is calculated by the following formula:

wherein the pair of samples (x) having the smallest distance value is determined from C_p′，x_q′)，x_p′Annotating a sample for a user, y_p′For the class obtained, let (x)_p′，y_p′) And (x)_q′，-y_p′) To R', -y_p′And y_p′In the opposite category, x_p′And x_q′Deleted from U', N ═ N + 2;

a4, if N < N_maxIf yes, go to step A2, otherwise, the algorithm ends;

a clustering algorithm is introduced to solve unnecessary sample misjudgment brought in the process of analyzing data processing of the myocardial injury marker, the D-2 polymer and the blood gas, and generate more appropriate data division;

inputting: marking a sample

Unlabeled specimen

Wherein U is the number of unlabeled samples, let the number of labels be labelN, total samples be allN, the number of positive samples be posN, and the number of negative samples be negN;

first, the euclidean distance Kdist from other points is calculated for each labeled data, and the sequence from small to large is:

secondly, calculating the difference rate of the two distances before and after the label data:

rate＝abs(Kdist(i，j+1)-Kdist(i，j))；

thirdly, obtaining a domain parameter Eps and an input parameter MinPts in the clustering algorithm through the distance difference, wherein the domain parameter Eps is x_jBelong to data set D ═ x₁，x₂，...，x_mAnd, containing data in the sample setSet D and x_jThe samples of the Eps, MinPts is the minimum number of objects in the Eps field of one sample point, and when the same MinPts is j, the change amplitude of the two last numbers is less than 0.01, and the initial convergence is determined to be achieved;

finally, obtaining a label, and enabling the noise point to be a negative sample and the rest to be a positive sample; the output is y e {1, -1}^uI.e., label assignment of unlabeled data; note the book

The prediction results of the clustering algorithm on the unlabeled data under different category proportions are obtained, T is the number of the category proportions, y^middleFor combining the prediction results of positive and negative samples in the unlabeled data in the prediction, a worst case integration of multiple prediction results y is used for the clustering algorithm^*Can be expressed as:

further, in the process of performing fusion analysis processing, the method further comprises: the AI system processes the electrocardiosignals of the patient information by improving a support vector machine algorithm, and specifically comprises the following steps:

s1, preprocessing the acquired electrocardiogram data, filtering noise, extracting time domain characteristics of the electrocardiogram data, and generating an electrocardiogram data training sample set; the data for extracting the characteristics of the acquired electrocardiogram data comprise normal P waves, QRS wave groups, T waves, PR intervals, RR intervals and ST segments, wherein the ST segments are lifted or pressed upwards with the arch backs of the ST segments, and two asymmetric T waves are used as additional extraction characteristics;

s2, classifying the electrocardiogram data training sample set, and setting parameters Z and Z^*Adopting support vector machine algorithm to take the sample (x) with label₁，y₁),……,(x_n，y_n) Training and building an initial classifier, then for example x with positive label value without label₁ ^*,……,x_k ^*Number n of_abnIs arranged, wherein Z and Z^*Is a parameter designated for trainingCounting;

s3, example x without ID by the classifier pair obtained in S2₁ ^*,……,x_k ^*Are classified according to w × x_j ^*The output value of + b is used for assigning value to each label without label sample, w is weight value, b is constant value parameter, and n with the maximum output value_abnIndividual unlabeled specimen designated as y_j ^*The remaining samples are designated as y_j ^*1, then set the parameter Z^* _nAnd Z^* _abn(ii) a Retraining the sample to obtain a second classifier, setting Z^*Finding a group of test examples with different label values, exchanging the label values of the test examples to reduce the optimization objective function value in the formula to the maximum extent, and repeating the step until the condition is not met;

s4 gradually increasing the adjusting parameter Z^* _nAnd Z^* _abnAnd reverting to the execution of S3 when Z is^* _n>Z^*And Z is^* _abn>Z^*And then the algorithm is finished so as to achieve the purposes of identifying that the arch back of the ST section presented by 2-3 adjacent leads is raised (or depressed) upwards and the electrocardiogram is abnormally high and two asymmetric T waves.

Further, the AI system comprises the following steps in the process of extracting the characteristics of the collected electrocardio data:

s1.1, constructing a sparse binary random matrix, taking the sparse binary random matrix as an observation matrix Q, and observing the preprocessed electrocardiosignals on the basis of a compressive sensing model Y-QX to obtain a compression value of the electrocardiosignals;

s1.2, skipping the electrocardiosignal reconstruction step, and directly extracting the features of the compressed electrocardio data by using an improved principal component analysis method to obtain a feature vector of the electrocardio signal;

and S1.3, taking the feature vectors of the normal P wave, the QRS wave group, the T wave, the PR interval, the RR interval and the ST segment extracted in the S1.2 as the input of the classifier.

Further, in S1.2, a convolution CNN may also be adopted for feature extraction of the electrocardiographic data, where the convolution CNN includes three layers, a convolution layer one Conv1, a first Pooling layer one Pooling1, and a normalization layer one BN1, where the convolution kernel size of Conv1 is 6 × 6, the step size is 3, and 166 convolution kernels are total; the convolution kernel size of convolutional layer two Conv2 and normalization layer two BN2, Conv2 is 5 x 5, the step size is 3, and 128 convolution kernels are totally arranged; the convolution kernel size of the Pooling layer three Pooling3 and the normalization layer three BN3, Conv3 is 3 x 3, the step length is 1, and 128 convolution kernels are totally obtained; then the Incep structure, the convolutional layer four Conv4 and the global pooling layer are sequentially connected, the output result is classified by a classifier, and the classifier adopts Softmax classification to improve the classification precision and efficiency of the electrocardiogram data.

Further, the pretreatment of the collected electrocardio data also comprises the following steps:

s41, extracting attribute features in the sql sentences by using the sql sentences, writing a recursive function, extracting keywords, recording as 1 if diseases represented by the keywords appear, and recording as 0 if the diseases represented by the keywords do not appear;

s42 merging and integrating the data distributed in different databases or data tables;

s43 discretizing the characteristic attribute in S42;

and S44, cleaning the data in S43, and deleting repeated, abnormal and redundant data.

Further, in the process of performing fusion analysis processing, the following sequence is required to be satisfied for processing data:

step a, dividing the D-dimer into two intervals: 0< D-dimer <500ug/L, D-dimer > 500ug/L, enter step b when 0< D-dimer <500 ug/L;

and b, carrying out interval division on the myocardial injury markers: TNI (troponin I) is 0-0.05ng/ml, myoglobin is 0-107ng/ml, and CKMB (creatine kinase isozyme) is 0-4.3 ng/ml; entering a step c when TNI (troponin I) > 0.05ng/ml or TNI (troponin I) > 0.05ng/ml plus any one or two of the other two exceeds the standard;

and c, carrying out three interval divisions of blood gas analysis: 1)83-108 mmhg; 2) less than 83 mmhg; 3) greater than 108 mmhg; when the partial pressure of the blood sample is less than 83mmhg, entering the step d;

step d, electrocardio data analysis: 2-3 adjacent leads appear to present that the ST section is more than or equal to 1mm of the arch back is lifted upwards (or pressed downwards); or if the electrocardiogram is an abnormally tall and two asymmetric T-waves.

Further, the database runs in the win7 environment, the SQL Server 2000 database management system of Microsoft corporation is used as the development tool, the VC +6.0 of Microsoft is used as the front end of the database, and the CPU: AMD XP1800+, Kingston 3G DDR, hard disk Dall 600G.

Further, the AI system is a PC, an Intel/Intel core i5-8500 box processor, a CPU master frequency of 3.0GHz, a 16G memory, an operating system of Windows7X64 and a development tool of Matlab 2010.

The invention has the beneficial effects that:

the invention has the innovation points that the electrocardio data, the troponin, the myoglobin, the creatine kinase isoenzyme content, the D-dimer and the arterial blood sample partial pressure data which are checked by a patient laboratory are subjected to fusion analysis processing, so that the chest pain laboratory data can be quickly identified, the auxiliary identification efficiency is improved, the progressive judgment of chest pain diseases such as ACS and the like by auxiliary doctors is facilitated, and the overall fusion of artificial algorithms is improved. The sensitivity and accuracy of the auxiliary judgment are increased.

The invention combines artificial intelligence with chest pain examination, realizes intelligent medical identification, increases the sensitivity and the accuracy of judgment, and performs experiments on a database, so that the method obtains the accuracy of 97.85 percent. The experimental result shows that about 33% of electrocardio data can be compressed compared with the electrocardio signal classification accuracy in a non-compressed domain. Therefore, the method has certain feasibility in the wearable health monitoring system with low power consumption and real-time requirements, and lays a good foundation for performing electrocardiosignal processing research work in a compressed domain in the future.

The semi-supervised clustering is introduced on the basis of improving the Tri-Training algorithm, the problem that the classifier excessively depends on labeled data to cause the reduction of classification performance is solved, unnecessary sample misjudgment is caused in the process of processing the myocardial injury marker, the D-2 polymer and the blood gas analysis data, the learning performance and the stability can be effectively improved by introducing the algorithm, more proper data division is generated, the data sample processing is more accurate, and the diagnosis accuracy rate of heart diseases is improved. In the process of carrying out classification and identification on the myocardial infarction, the generalization capability and the identification precision of the improved SVM algorithm can meet the requirement of effectively identifying ST-segment and T-wave anomalies, a convolutional neural network is carried out for carrying out feature extraction on the myocardial infarction before SVM classification, and the precision and the efficiency of auxiliary diagnosis on the myocardial infarction are further improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic flow diagram of the present invention;

FIG. 2 is a flow chart of an improved Tri-Training algorithm according to an embodiment of the present invention;

FIG. 3 is a flow chart of the training of the electrocardiographic data of the SVM algorithm according to the embodiment of the present invention.

FIG. 4 is a comparison chart of the classification result of the improved Tri-Training algorithm in the embodiment of the present invention;

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The laboratory chest pain data examination auxiliary identification method based on artificial intelligence supervised learning as shown in fig. 1 comprises an AI system, wherein the AI system acquires clinician diagnosis information, patient laboratory examination information and database information, and performs fusion analysis processing on electrocardiographic data, troponin, myoglobin, creatine kinase isoenzyme content, D-dimer and arterial blood sample partial pressure data of patient laboratory examination; the AI system finds out the diagnosis standard of the laboratory examination information of the patient from the database, compares the diagnosis standard with the analysis standard of the diagnosis information of the clinician after the patient is admitted, if the comparison result is matched, the warning system can not be started, and if the comparison result is inconsistent, the artificial intelligence can send out an alarm to remind the clinician to examine the patient again.

In order to improve the efficiency of auxiliary identification, realize gradually judging, improve the overall fusion identification of artificial algorithm, the AI system carries out the analysis to the data of gathering and needs to satisfy the precedence order:

The database runs in a win7 environment, a SQL Server 2000 database management system of Microsoft corporation is used as a development tool, VC +6.0 of Microsoft is used as a database front end, and a CPU: AMD XP1800+, Kingston 3G DDR, hard disk Dall 600G.

The AI system is a PC, an Intel/Intel core i5-8500 box processor, a CPU main frequency of 3.0GHz, a 16G memory, an operating system of Windows7X64 and a development tool of Matlab 2010.

The data of the patient information comprises an ST-segment waveform and a T wave of the electrocardiogram data; troponin, myoglobin, creatine kinase isoenzyme content; d-dimer and arterial blood partial pressure. The characteristics of the database include suspected pulmonary embolism, acute coronary syndrome and myocardial infarction characteristics.

In the embodiment, artificial intelligence and chest pain data examination are combined, particularly, electrocardio data, troponin, myoglobin, creatine kinase isoenzyme content, D-dimer and arterial blood sample partial pressure data are fused and identified, so that intelligent medical identification of myocardial infarction in Acute Coronary Syndrome (ACS), pulmonary embolism and Acute Coronary Syndrome (ACS) is realized, and the judgment sensitivity and accuracy are increased.

When an AI system is used for chest pain examination in a laboratory, firstly, index information collected by a patient is compared and judged, wherein the index information comprises electrocardiogram data (mainly ST-segment waveforms and T waves); troponin, myoglobin, creatine kinase isoenzyme content; a D-dimer; arterial blood partial pressure; the diagnostic criteria for finding laboratory examination information for a patient in the database are:

dividing the D-dimer into different intervals, and basically excluding acute pulmonary thromboembolism when the D-dimer is less than 500 ug/L; when the D-dimer is more than 500ug/L, whether the D-dimer is pulmonary embolism can be judged by enhanced CT examination;

judging a myocardial damage marker: when acute coronary syndrome occurs, the three tests of myocardial infarction are abnormally increased; TNI (troponin I)0-0.05ng/ml, myoglobin 0-107ng/ml, CKMB (creatine kinase isozyme) 0-4.3 ng/ml; TNI (troponin I) > 0.05ng/ml or TNI (troponin I) > 0.05ng/ml plus any one or two of the other two exceeds standard, which indicates that the coronary syndrome is acute;

blood sample analysis was performed: 1) adult normal arterial blood partial pressure: 83-108 mmhg; 2) hypoxemia is when the mmhg is less than 83mmhg, and hyperxemia is when the mmhg is more than 108 mmhg;

and further judging whether the acute coronary syndrome belongs to the myocardial infarction or not by combining the analysis of the electrocardiogram data: 2-3 adjacent leads appear to present that the ST section is more than or equal to 1mm of the arch back is lifted upwards (or pressed downwards); or the acute myocardial infarction with ST elevation if the electrocardiogram is abnormally high and two asymmetric T waves.

As shown in fig. 2, in the process of algorithm application, the semi-supervised learning in the above step 1-3 is implemented by using an improved Tri-Training algorithm, and the specific process is as follows:

the algorithm inputs are: obtaining troponin, myoglobin, creatine kinase isoenzyme content and D-dimer through data sampling; sample data of partial pressure of arterial blood sample, which is divided into: a marker-free sample set U, a Must-link constraint set M and a Cannot-link constraint set C, wherein the maximum number of selected marker samples is N_max(ii) a The algorithm output is: dividing two D-dimer intervals, dividing the myocardial injury marker intervals and analyzing blood gas; the classification of common chest pain diseases, common pulmonary embolism and acute coronary syndrome in chest pain examination in a laboratory is obtained through the classification, and the two diseases are classified into a labeled sample set R 'and an updated unlabeled sample set U':

a1, initializing the marked sample number N as 0,

U′＝U：

a2, then, for

The corresponding inter-sample distance is calculated by the following formula:

A3，for the

The corresponding inter-sample distance is calculated by the following formula:

wherein the pair of samples (x) having the smallest distance value is determined from C_p′，x_q′)，x_p′Annotating a sample for a user, y_p′For the class obtained, let (x)_q′，y_p′) And (x)_q′，-y_p′) To R', -y_p′And y_p′In the opposite category, x_p′And x_q′Deleted from U', N ═ N + 2;

a4, if N < N_maxThen the process goes to step a2, otherwise the algorithm ends.

The semi-supervised clustering is introduced on the basis of improving the Tri-Training algorithm, the problem that the classifier excessively depends on labeled data to cause the reduction of classification performance is solved, unnecessary sample misjudgment is caused in the process of processing the myocardial injury marker, the D-2 polymer and the blood gas analysis data, the learning performance and the stability can be effectively improved by introducing the algorithm, more proper data division is generated, the data sample processing is more accurate, and the diagnosis accuracy rate of heart diseases is improved. As shown in fig. 4.

Inputting: marking a sample

Unlabeled specimen

Wherein u is the number of unlabeled samples, let the number of labels be labelN, the total samples be allN, the number of positive samples be posN, and the number of negative samples be negN;

rate＝abs(Kdist(i，j+1)-Kdist(i，j))；

thirdly, obtaining a domain parameter Eps and an input parameter MinPts in the clustering algorithm through the distance difference, wherein the domain parameter Eps is x_jBelong to data set D ═ x₁，x₂，...，x_mAnd, comprising data sets D and x in the sample set_jWhen the same MinPts is j, the change amplitude of the two last numbers is less than 0.01, the preliminary convergence is determined to be reached, and if the times are more than two times, the MinPts and the Eps statements are expressed as:

MinPts＝j；

Epsl＝mean(Kdist(1：posN，j))；

Eps2＝mean(Kdist(posnum+1：posN+negN，j))；

Eps2＝mean(Kdist(1：posN+negN，j))；

The prediction results of the clustering algorithm on the unlabeled data under different category proportions are obtained, T is the number of the category proportions, y^middleFor predicting the prediction result of combining the positive and negative samples in the unmarked data, adopting the worst case to integrate a plurality of prediction results y for the semi-supervised clustering algorithm^*Can be expressed as:

as shown in fig. 3, the AI system extracts the electrocardiographic signal of the patient information in step 4 by improving the support vector machine algorithm, and specifically includes the following steps:

s1, preprocessing the acquired electrocardiogram data, filtering noise, extracting time domain characteristics of the electrocardiogram data, and generating an electrocardiogram data training sample set; the electrocardio data training sample set comprises a normal P wave, a QRS wave group, a T wave, a PR interval, an RR interval and an ST segment, wherein the ST segment is lifted upwards (or pressed downwards) from the back of the bow, and two asymmetric T waves are used as additional extraction features;

s2, classifying the electrocardiogram data training sample set, and setting two parameters Z and Z^*Adopting support vector machine algorithm to take the sample (x) with label₁，y₁)，……，(x_n，y_n) Training and building an initial classifier, then for example x with positive label value without label₁ ^*，……，x_k ^*Number n of_abnIs arranged, wherein Z and Z^*Is a parameter specified by training;

s3, example x without ID by the classifier pair obtained in S2₁ ^*,……,x_k ^*Are classified according to w × x_j ^*+ b output value assigning each label without label sample, n with maximum output value_abnIndividual unlabeled specimen designated as y_j ^*The remaining samples are designated as y_j ^*1, then set the parameter Z^* _nAnd Z^* _abn(ii) a Retraining the sample to obtain a second classifier, setting Z^*Finding a group of test examples with different label values, exchanging the label values of the test examples to reduce the optimization objective function value in the formula to the maximum extent, and repeating the step until the condition is not met;

s4 gradually increasing the adjusting parameter Z^* _nAnd Z^* _abnAnd reverting to the execution of S3 when Z is^* _n>Z^*And Z is^* _abn>Z^*Then the algorithm is finished, and finally the classification result is obtained so as to recognize that the dorsum of the ST segment presented by 2-3 adjacent leads is raised (or lowered), andthe electrocardiogram is an abnormally tall and two asymmetric T-waves.

The AI system comprises the following steps in the process of extracting the characteristics of the collected electrocardio data:

as an embodiment of the present invention, a convolutional CNN may also be adopted for extracting features of the electrocardiographic data, where the convolutional CNN includes three layers, a convolutional layer one Conv1, a first Pooling layer one Pooling1, and a normalization layer one BN1, where the convolutional kernel size of the Conv1 is 6 × 6, the step size is 3, and 166 convolutional kernels are total; the convolution kernel size of convolutional layer two Conv2 and normalization layer two BN2, Conv2 is 5 x 5, the step size is 3, and 128 convolution kernels are totally arranged; the convolution kernel size of the Pooling layer three Pooling3 and the normalization layer three BN3, Conv3 is 3 x 3, the step length is 1, and 128 convolution kernels are totally obtained; then the Incep structure, the convolutional layer four Conv4 and the global pooling layer are sequentially connected, the output result is classified by a classifier, and the classifier adopts Softmax classification to improve the classification precision and efficiency of the electrocardiogram data.

The database information comprises data preprocessing and feature selection, and specifically comprises the following steps:

s43 discretizing the characteristic attribute in S42;

As an example of a chest pain laboratory test, if a patient has had a myocardial infarction: then it needs to satisfy: 2-3 adjacent leads of the electrocardiogram present that the ST section is more than or equal to 1mm, and the arch back is lifted upwards (or depressed); or if the electrocardiogram is an abnormally tall and two asymmetric T-waves. In the process of classifying and identifying the electrocardiogram data, the generalization capability and the identification precision of the improved SVM algorithm can meet the requirement of effectively identifying ST segment and T wave abnormalities, and the auxiliary diagnosis of myocardial infarction is further improved.

The embodiment is necessary to be preprocessed before the electrocardio data information is collected and classified. The chest pain data set is complex in type structure, has pure digital data such as examination and inspection and the like, and also contains text data such as the past history, personal history and disease course records of a patient. Text data cannot be directly input as features, units of indexes in an inspection table are different, numerical magnitudes among different indexes are not on the same level, and if original data are directly used as input parameters of a classification model, attributes with larger numerical values may occupy larger weight in a classification process, so that a classification effect is influenced.

The admission record table and the disease course record table in the chest pain data set exist in the form of texts, such as the content of the past history item in the admission record table: diabetes, hypertension and hepatitis. Therefore, the attribute features in the data are extracted firstly, wherein a recursive function is mainly written by using sql statements to extract keywords, if diseases represented by the keywords appear, the keywords are marked as 1, and if not, the keywords are 0. The data about the chest pain of the patient are distributed in different data tables, such as a personal information table of the patient, various check tables and the like. Therefore, in order to have a more comprehensive understanding of the data, the data scattered in different places must be integrated. When a heart disease patient is diagnosed, part of characteristic information does not need specific numerical values, and only the relevant range needs to be judged.

In the acquisition of the electrocardiogram data, the relevant characteristic attributes are discretized, the discretized data are more sparse, and the calculation speed of the classifier can be improved in the training of the classification model. Each data table of the electrocardiographic data contains a large amount of useless data or data with low diagnostic value on the current diseases, which causes interference on subsequent data analysis and influences the accuracy of analysis, so that the electrocardiographic data needs to be cleaned. Meanwhile, the dimension of the chest pain medical record data feature item is more, and the chest pain medical record data feature item contains features which are useless or less useful for chest pain classification, so that the features are deleted, and the features which have greater influence on classification results are extracted.

Deleting attributes which are useless or have low value for chest pain diagnosis in each table, selecting 2 attributes of age and sex of patients in a personal information table of the patients, selecting 3 attributes of hypertension history, diabetes history and hepatitis history of the patients related to chest pain in the past history, selecting 1 attribute of smoking or not in the personal history, selecting 5 attributes of body temperature, pulse, respiration, blood pressure and nutritional state in physical examination, selecting chest pain, dyspnea, palpitation, cyanosis, syncope and edema in common symptoms, cough, hemoptysis, heart failure and arrhythmia 10 attributes, wherein common physical signs are selected from 9 attributes of hypertension, hypotension, cardiac tremor, tachycardia, bradycardia, heart sound change, heart murmur, pulse abnormality and heart enlargement, and typical troponin, myoglobin and creatine kinase isoenzyme contents are selected as examination and test items; a D-dimer; arterial blood partial pressure.

The present invention is first diagnosed by a clinician. Based on the clinician's diagnosis, the artificial intelligence will find the diagnostic criteria A for the disease from the database and compare each criteria with actual condition criteria B collected after the patient was admitted. If the comparison results match, the warning system will not activate. If the results are inconsistent, artificial intelligence can issue an alarm, reminding the clinician to review his/her diagnosis. Experiments were performed on the database and the method gave an accuracy of 97.85%. The experimental result shows that about 33% of electrocardio data can be compressed compared with the electrocardio signal classification accuracy in a non-compressed domain. Therefore, the method has certain feasibility in the wearable health monitoring system with low power consumption and real-time requirements, and lays a good foundation for performing electrocardiosignal processing research work in a compressed domain in the future.

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. The laboratory chest pain data examination auxiliary recognition system based on artificial intelligence supervised learning is characterized by comprising an AI system, wherein the AI system acquires clinician diagnosis information, patient laboratory examination information and database information and performs fusion analysis processing on electrocardiogram data, troponin, myoglobin, creatine kinase isoenzyme content, D-dimer and arterial blood sample partial pressure data examined by a patient laboratory;

the fusion analysis processing includes: the AI system adopts an improved Tri-Training algorithm to realize semi-supervised learning of troponin, myoglobin, creatine kinase isoenzyme content, D-dimer and arterial blood sample partial pressure data: the algorithm inputs are: obtaining troponin, myoglobin, creatine kinase isoenzyme content and D-dimer through data sampling; dividing the sample data of arterial blood partial pressure into a non-labeled sample set U, a Must-link constraint set M and a Cannot-link constraint set C, wherein the maximum number of labeled samples is N_max(ii) a The algorithm output is: dividing two D-dimer intervals, dividing a myocardial injury marker interval and analyzing blood gas into three intervals, namely a marked sample set R 'and an updated unmarked sample set U';

the AI system processes the electrocardiosignals of the patient information by improving a support vector machine algorithm, and comprises the steps of preprocessing the collected electrocardio data, filtering noise, extracting time domain characteristics of the electrocardio data and generating an electrocardio data training sample set; to the collected electrocardio dataThe data extracted by the characteristics comprises normal P waves, QRS wave groups, T waves, PR intervals, RR intervals and ST segments, wherein the ST segments are lifted or pressed upwards in the arch back, and two asymmetric T waves are used as additional extracted characteristics; classifying the electrocardiogram data training sample set, and setting parameters Z and Z^*Adopting support vector machine algorithm to take the sample (x) with label₁，y₁)，……，(x_n，y_n) Training and building an initial classifier, then for example x with positive label value without label₁ ^*，……，x_k ^*Number n of_abnIs arranged, wherein Z and Z^*Is a parameter specified by training; example x without identification by classifier pair₁ ^*，……，x_k ^*Are classified according to w × x_j ^*The output value of + b is used for assigning value to each label without label sample, w is weight value, b is constant value parameter, and n with the maximum output value_abnIndividual unlabeled specimen designated as y_j ^*The remaining samples are designated as y_j ^*1, then set the parameter Z^* _nAnd Z^* _abn(ii) a Retraining the sample to obtain a second classifier, setting Z^*Finding a group of test examples with different label values, and exchanging the label values of the test examples, so that the optimization objective function value in the formula is reduced to the maximum extent; stepwise increase of the adjustment parameter Z^* _nAnd Z^* _abnWhen Z is^* _n＞Z^*And Z is^* _abn＞Z^*When the algorithm is finished, the purposes of identifying that the arch back of the ST section presented by 2-3 adjacent leads is raised or depressed upwards and the electrocardiogram is an abnormally high and two asymmetrical T waves are achieved;

extracting the electrocardiogram data features by adopting convolution CNN, wherein the convolution CNN comprises three layers, namely a convolution layer one Conv1, a first Pooling layer one Pooling1 and a normalization layer one BN1, the convolution kernel size of Conv1 is 6 multiplied by 6, the step length is 3, and 166 convolution kernels are totally included; the convolution kernel size of convolutional layer two Conv2 and normalization layer two BN2, Conv2 is 5 x 5, the step size is 3, and 128 convolution kernels are totally arranged; the convolution kernel size of the Pooling layer three Pooling3 and the normalization layer three BN3, Conv3 is 3 x 3, the step length is 1, and 128 convolution kernels are totally obtained; then, sequentially connecting the increment structure, the convolution layer four Conv4 and the global pooling layer, and classifying output results by a classifier, wherein the classifier adopts Softmax classification;

the AI system finds out the diagnosis standard of the laboratory examination information of the patient from the database, compares the diagnosis standard with the analysis standard of the diagnosis information of the clinician after the patient is admitted, if the comparison result is matched, the warning system can not be started, and if the comparison result is inconsistent, the artificial intelligence can send out an alarm to remind the clinician to examine the patient again.

2. The system of claim 1, wherein the database is operated in a win7 environment, the development tool is Microsoft SQL Server 2000 database management system, the database front end is Microsoft VC +6.0, and the CPU: AMD XP1800+, Kingston 3G DDR, hard disk Dall 600G.

3. The laboratory chest pain data examination assistant recognition system based on artificial intelligence supervised learning of claim 1, wherein the AI system is a PC, Intel/Intel core i5-8500 box processor, CPU main frequency: 3.0GHz, 16G memory, operating system Windows7X64, development tool Matlab 2010.