CN109907751B - Laboratory chest pain data inspection auxiliary identification method based on artificial intelligence supervised learning - Google Patents

Laboratory chest pain data inspection auxiliary identification method based on artificial intelligence supervised learning Download PDF

Info

Publication number
CN109907751B
CN109907751B CN201910147228.6A CN201910147228A CN109907751B CN 109907751 B CN109907751 B CN 109907751B CN 201910147228 A CN201910147228 A CN 201910147228A CN 109907751 B CN109907751 B CN 109907751B
Authority
CN
China
Prior art keywords
data
sample
patient
artificial intelligence
laboratory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910147228.6A
Other languages
Chinese (zh)
Other versions
CN109907751A (en
Inventor
严洋
严金川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201910147228.6A priority Critical patent/CN109907751B/en
Publication of CN109907751A publication Critical patent/CN109907751A/en
Application granted granted Critical
Publication of CN109907751B publication Critical patent/CN109907751B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Measurement And Recording Of Electrical Phenomena And Electrical Characteristics Of The Living Body (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

The invention discloses a laboratory chest pain data examination auxiliary identification method based on artificial intelligence supervised learning, which belongs to the technical field of artificial intelligence medical treatment and comprises an AI system, wherein the AI system acquires information of a clinician, a patient and database information, the AI system finds out diagnosis standards of diseases from the database, compares each standard with actual conditions collected after the patient is admitted, if the comparison results are matched, a warning system cannot be started, and if the comparison results are inconsistent, artificial intelligence sends out an alarm to remind the clinician to recheck the diagnosis of the patient. The invention combines artificial intelligence with chest pain examination, realizes intelligent medical examination, and increases the sensitivity and accuracy of judgment.

Description

Laboratory chest pain data inspection auxiliary identification method based on artificial intelligence supervised learning
Technical Field
The invention belongs to the technical field of medical treatment integrating artificial intelligence and machine learning, and particularly relates to a laboratory chest pain data examination auxiliary identification method based on artificial intelligence supervised learning.
Background
In recent years, the field of artificial intelligence has been rapidly developed, the application of the artificial intelligence is more and more extensive, the artificial intelligence cannot be separated from data analysis and machine learning, and the theory and method for researching intelligent data analysis become one of the necessary bases of artificial intelligence. It attempts to understand the essence of intelligence and produces a new intelligent machine that can react in a manner similar to human intelligence, a field of research including robotics, language recognition, image recognition, natural language processing, and expert systems.
Chest pain is a common life-threatening disease, causes of chest pain are complex and diverse, and include Acute Coronary Syndrome (ACS), aortic dissection, Pulmonary Embolism (PE), pericardial tamponade and the like, wherein the ACS accounts for the highest proportion of serious life-threatening diseases, the misdiagnosis rate of myocardial infarction (AMI) is 3% -5%, the morbidity of aortic dissection aneurysm is about 0.5-1/10 thousands of people, and if misdiagnosis is performed, the mortality rate exceeds 90%. The incidence rate of PE is about 70/10 ten thousand, the incidence rate of spontaneous pneumothorax is 2.5-18/10 ten thousand, and the incidence rate of esophageal rupture is 12.5/10 ten thousand. In 2009, 5666 patients were enrolled in the Beijing acute chest pain registration study, and the results showed that chest pain patients accounted for 4% of emergency treatment patients, with ACS accounting for 27.4%. How to quickly and accurately diagnose and identify the causes of ACS and other fatal chest pain becomes the difficult point and the key point of emergency treatment.
Therefore, it is important to combine artificial intelligence with chest pain examination, especially to combine electrocardiographic data and blood routine data detection in laboratories, to realize intelligent medical identification and to increase the sensitivity and accuracy of judgment.
Disclosure of Invention
Aiming at the defects of the prior art, the invention discloses a laboratory chest pain data examination auxiliary identification method based on artificial intelligence supervised learning, which combines artificial intelligence with chest pain examination, and particularly fuses and identifies electrocardio data, troponin, myoglobin, creatine kinase isoenzyme content, D-dimer and arterial blood sample partial pressure data, so that intelligent medical identification is realized, and the sensitivity and the accuracy of auxiliary judgment on chest pain diseases are increased.
The invention is realized by the following technical scheme:
the laboratory chest pain data examination auxiliary identification method based on artificial intelligence supervised learning comprises an AI system, wherein the AI system acquires clinician diagnosis information, patient laboratory examination information and database information, and performs fusion analysis processing on electrocardio data, troponin, myoglobin, creatine kinase isoenzyme content, D-mer and arterial blood sample partial pressure data examined by a patient laboratory; the AI system finds out the diagnosis standard of the laboratory examination information of the patient from the database, compares the diagnosis standard with the analysis standard of the diagnosis information of the clinician after the patient is admitted, if the comparison result is matched, the warning system can not be started, and if the comparison result is inconsistent, the artificial intelligence can send out an alarm to remind the clinician to examine the patient again.
Further, the fusion analysis processing process comprises:
the AI system adopts an improved Tri-Training algorithm to realize semi-supervised learning of troponin, myoglobin, creatine kinase isoenzyme content, D-dimer and arterial blood sample partial pressure data:
the algorithm inputs are: obtaining troponin, myoglobin, creatine kinase isoenzyme content and D-dimer through data sampling; dividing the sample data of arterial blood partial pressure into a non-labeled sample set U, a Must-link constraint set M and a Cannot-link constraint set C, wherein the maximum number of labeled samples is Nmax(ii) a The algorithm output is: dividing two D-dimer intervals, dividing a myocardial injury marker interval and analyzing blood gas into three intervals, namely a marked sample set R 'and an updated unmarked sample set U';
a1, initializing the marked sample number N as 0,
Figure BDA0001980407870000021
U′=U;
a2, then, for
Figure BDA0001980407870000022
The corresponding inter-sample distance is calculated by the following formula:
Figure BDA0001980407870000023
where d is the number of sample attributes, the sample pair (x) with the largest distance value is determined from Mi′,xj′),xi′Annotating a sample for a user, yi′For the class obtained, let (x)i′,yi′) And (x)j′,yi′) Added to R', xi′And xj′Deleted from U', N ═ N + 2;
a3, for
Figure BDA0001980407870000024
The corresponding inter-sample distance is calculated by the following formula:
Figure BDA0001980407870000025
wherein the pair of samples (x) having the smallest distance value is determined from Cp′,xq′),xp′Annotating a sample for a user, yp′For the class obtained, let (x)p′,yp′) And (x)q′,-yp′) To R', -yp′And yp′In the opposite category, xp′And xq′Deleted from U', N ═ N + 2;
a4, if N < NmaxIf yes, go to step A2, otherwise, the algorithm ends;
a clustering algorithm is introduced to solve unnecessary sample misjudgment brought in the process of analyzing data processing of the myocardial injury marker, the D-2 polymer and the blood gas, and generate more appropriate data division;
inputting: marking a sample
Figure BDA0001980407870000031
Unlabeled specimen
Figure BDA0001980407870000032
Wherein U is the number of unlabeled samples, let the number of labels be labelN, total samples be allN, the number of positive samples be posN, and the number of negative samples be negN;
first, the euclidean distance Kdist from other points is calculated for each labeled data, and the sequence from small to large is:
Figure BDA0001980407870000033
secondly, calculating the difference rate of the two distances before and after the label data:
rate=abs(Kdist(i,j+1)-Kdist(i,j));
thirdly, obtaining a domain parameter Eps and an input parameter MinPts in the clustering algorithm through the distance difference, wherein the domain parameter Eps is xjBelong to data set D ═ x1,x2,...,xmAnd, containing data in the sample setSet D and xjThe samples of the Eps, MinPts is the minimum number of objects in the Eps field of one sample point, and when the same MinPts is j, the change amplitude of the two last numbers is less than 0.01, and the initial convergence is determined to be achieved;
finally, obtaining a label, and enabling the noise point to be a negative sample and the rest to be a positive sample; the output is y e {1, -1}uI.e., label assignment of unlabeled data; note the book
Figure BDA0001980407870000034
The prediction results of the clustering algorithm on the unlabeled data under different category proportions are obtained, T is the number of the category proportions, ymiddleFor combining the prediction results of positive and negative samples in the unlabeled data in the prediction, a worst case integration of multiple prediction results y is used for the clustering algorithm*Can be expressed as:
Figure BDA0001980407870000035
further, in the process of performing fusion analysis processing, the method further comprises: the AI system processes the electrocardiosignals of the patient information by improving a support vector machine algorithm, and specifically comprises the following steps:
s1, preprocessing the acquired electrocardiogram data, filtering noise, extracting time domain characteristics of the electrocardiogram data, and generating an electrocardiogram data training sample set; the data for extracting the characteristics of the acquired electrocardiogram data comprise normal P waves, QRS wave groups, T waves, PR intervals, RR intervals and ST segments, wherein the ST segments are lifted or pressed upwards with the arch backs of the ST segments, and two asymmetric T waves are used as additional extraction characteristics;
s2, classifying the electrocardiogram data training sample set, and setting parameters Z and Z*Adopting support vector machine algorithm to take the sample (x) with label1,y1),……,(xn,yn) Training and building an initial classifier, then for example x with positive label value without label1 *,……,xk *Number n ofabnIs arranged, wherein Z and Z*Is a parameter designated for trainingCounting;
s3, example x without ID by the classifier pair obtained in S21 *,……,xk *Are classified according to w × xj *The output value of + b is used for assigning value to each label without label sample, w is weight value, b is constant value parameter, and n with the maximum output valueabnIndividual unlabeled specimen designated as yj *The remaining samples are designated as yj *1, then set the parameter Z* nAnd Z* abn(ii) a Retraining the sample to obtain a second classifier, setting Z*Finding a group of test examples with different label values, exchanging the label values of the test examples to reduce the optimization objective function value in the formula to the maximum extent, and repeating the step until the condition is not met;
s4 gradually increasing the adjusting parameter Z* nAnd Z* abnAnd reverting to the execution of S3 when Z is* n>Z*And Z is* abn>Z*And then the algorithm is finished so as to achieve the purposes of identifying that the arch back of the ST section presented by 2-3 adjacent leads is raised (or depressed) upwards and the electrocardiogram is abnormally high and two asymmetric T waves.
Further, the AI system comprises the following steps in the process of extracting the characteristics of the collected electrocardio data:
s1.1, constructing a sparse binary random matrix, taking the sparse binary random matrix as an observation matrix Q, and observing the preprocessed electrocardiosignals on the basis of a compressive sensing model Y-QX to obtain a compression value of the electrocardiosignals;
s1.2, skipping the electrocardiosignal reconstruction step, and directly extracting the features of the compressed electrocardio data by using an improved principal component analysis method to obtain a feature vector of the electrocardio signal;
and S1.3, taking the feature vectors of the normal P wave, the QRS wave group, the T wave, the PR interval, the RR interval and the ST segment extracted in the S1.2 as the input of the classifier.
Further, in S1.2, a convolution CNN may also be adopted for feature extraction of the electrocardiographic data, where the convolution CNN includes three layers, a convolution layer one Conv1, a first Pooling layer one Pooling1, and a normalization layer one BN1, where the convolution kernel size of Conv1 is 6 × 6, the step size is 3, and 166 convolution kernels are total; the convolution kernel size of convolutional layer two Conv2 and normalization layer two BN2, Conv2 is 5 x 5, the step size is 3, and 128 convolution kernels are totally arranged; the convolution kernel size of the Pooling layer three Pooling3 and the normalization layer three BN3, Conv3 is 3 x 3, the step length is 1, and 128 convolution kernels are totally obtained; then the Incep structure, the convolutional layer four Conv4 and the global pooling layer are sequentially connected, the output result is classified by a classifier, and the classifier adopts Softmax classification to improve the classification precision and efficiency of the electrocardiogram data.
Further, the pretreatment of the collected electrocardio data also comprises the following steps:
s41, extracting attribute features in the sql sentences by using the sql sentences, writing a recursive function, extracting keywords, recording as 1 if diseases represented by the keywords appear, and recording as 0 if the diseases represented by the keywords do not appear;
s42 merging and integrating the data distributed in different databases or data tables;
s43 discretizing the characteristic attribute in S42;
and S44, cleaning the data in S43, and deleting repeated, abnormal and redundant data.
Further, in the process of performing fusion analysis processing, the following sequence is required to be satisfied for processing data:
step a, dividing the D-dimer into two intervals: 0< D-dimer <500ug/L, D-dimer > 500ug/L, enter step b when 0< D-dimer <500 ug/L;
and b, carrying out interval division on the myocardial injury markers: TNI (troponin I) is 0-0.05ng/ml, myoglobin is 0-107ng/ml, and CKMB (creatine kinase isozyme) is 0-4.3 ng/ml; entering a step c when TNI (troponin I) > 0.05ng/ml or TNI (troponin I) > 0.05ng/ml plus any one or two of the other two exceeds the standard;
and c, carrying out three interval divisions of blood gas analysis: 1)83-108 mmhg; 2) less than 83 mmhg; 3) greater than 108 mmhg; when the partial pressure of the blood sample is less than 83mmhg, entering the step d;
step d, electrocardio data analysis: 2-3 adjacent leads appear to present that the ST section is more than or equal to 1mm of the arch back is lifted upwards (or pressed downwards); or if the electrocardiogram is an abnormally tall and two asymmetric T-waves.
Further, the database runs in the win7 environment, the SQL Server 2000 database management system of Microsoft corporation is used as the development tool, the VC +6.0 of Microsoft is used as the front end of the database, and the CPU: AMD XP1800+, Kingston 3G DDR, hard disk Dall 600G.
Further, the AI system is a PC, an Intel/Intel core i5-8500 box processor, a CPU master frequency of 3.0GHz, a 16G memory, an operating system of Windows7X64 and a development tool of Matlab 2010.
The invention has the beneficial effects that:
the invention has the innovation points that the electrocardio data, the troponin, the myoglobin, the creatine kinase isoenzyme content, the D-dimer and the arterial blood sample partial pressure data which are checked by a patient laboratory are subjected to fusion analysis processing, so that the chest pain laboratory data can be quickly identified, the auxiliary identification efficiency is improved, the progressive judgment of chest pain diseases such as ACS and the like by auxiliary doctors is facilitated, and the overall fusion of artificial algorithms is improved. The sensitivity and accuracy of the auxiliary judgment are increased.
The invention combines artificial intelligence with chest pain examination, realizes intelligent medical identification, increases the sensitivity and the accuracy of judgment, and performs experiments on a database, so that the method obtains the accuracy of 97.85 percent. The experimental result shows that about 33% of electrocardio data can be compressed compared with the electrocardio signal classification accuracy in a non-compressed domain. Therefore, the method has certain feasibility in the wearable health monitoring system with low power consumption and real-time requirements, and lays a good foundation for performing electrocardiosignal processing research work in a compressed domain in the future.
The semi-supervised clustering is introduced on the basis of improving the Tri-Training algorithm, the problem that the classifier excessively depends on labeled data to cause the reduction of classification performance is solved, unnecessary sample misjudgment is caused in the process of processing the myocardial injury marker, the D-2 polymer and the blood gas analysis data, the learning performance and the stability can be effectively improved by introducing the algorithm, more proper data division is generated, the data sample processing is more accurate, and the diagnosis accuracy rate of heart diseases is improved. In the process of carrying out classification and identification on the myocardial infarction, the generalization capability and the identification precision of the improved SVM algorithm can meet the requirement of effectively identifying ST-segment and T-wave anomalies, a convolutional neural network is carried out for carrying out feature extraction on the myocardial infarction before SVM classification, and the precision and the efficiency of auxiliary diagnosis on the myocardial infarction are further improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic flow diagram of the present invention;
FIG. 2 is a flow chart of an improved Tri-Training algorithm according to an embodiment of the present invention;
FIG. 3 is a flow chart of the training of the electrocardiographic data of the SVM algorithm according to the embodiment of the present invention.
FIG. 4 is a comparison chart of the classification result of the improved Tri-Training algorithm in the embodiment of the present invention;
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The laboratory chest pain data examination auxiliary identification method based on artificial intelligence supervised learning as shown in fig. 1 comprises an AI system, wherein the AI system acquires clinician diagnosis information, patient laboratory examination information and database information, and performs fusion analysis processing on electrocardiographic data, troponin, myoglobin, creatine kinase isoenzyme content, D-dimer and arterial blood sample partial pressure data of patient laboratory examination; the AI system finds out the diagnosis standard of the laboratory examination information of the patient from the database, compares the diagnosis standard with the analysis standard of the diagnosis information of the clinician after the patient is admitted, if the comparison result is matched, the warning system can not be started, and if the comparison result is inconsistent, the artificial intelligence can send out an alarm to remind the clinician to examine the patient again.
In order to improve the efficiency of auxiliary identification, realize gradually judging, improve the overall fusion identification of artificial algorithm, the AI system carries out the analysis to the data of gathering and needs to satisfy the precedence order:
step a, dividing the D-dimer into two intervals: 0< D-dimer <500ug/L, D-dimer > 500ug/L, enter step b when 0< D-dimer <500 ug/L;
and b, carrying out interval division on the myocardial injury markers: TNI (troponin I) is 0-0.05ng/ml, myoglobin is 0-107ng/ml, and CKMB (creatine kinase isozyme) is 0-4.3 ng/ml; entering a step c when TNI (troponin I) > 0.05ng/ml or TNI (troponin I) > 0.05ng/ml plus any one or two of the other two exceeds the standard;
and c, carrying out three interval divisions of blood gas analysis: 1)83-108 mmhg; 2) less than 83 mmhg; 3) greater than 108 mmhg; when the partial pressure of the blood sample is less than 83mmhg, entering the step d;
step d, electrocardio data analysis: 2-3 adjacent leads appear to present that the ST section is more than or equal to 1mm of the arch back is lifted upwards (or pressed downwards); or if the electrocardiogram is an abnormally tall and two asymmetric T-waves.
The database runs in a win7 environment, a SQL Server 2000 database management system of Microsoft corporation is used as a development tool, VC +6.0 of Microsoft is used as a database front end, and a CPU: AMD XP1800+, Kingston 3G DDR, hard disk Dall 600G.
The AI system is a PC, an Intel/Intel core i5-8500 box processor, a CPU main frequency of 3.0GHz, a 16G memory, an operating system of Windows7X64 and a development tool of Matlab 2010.
The data of the patient information comprises an ST-segment waveform and a T wave of the electrocardiogram data; troponin, myoglobin, creatine kinase isoenzyme content; d-dimer and arterial blood partial pressure. The characteristics of the database include suspected pulmonary embolism, acute coronary syndrome and myocardial infarction characteristics.
In the embodiment, artificial intelligence and chest pain data examination are combined, particularly, electrocardio data, troponin, myoglobin, creatine kinase isoenzyme content, D-dimer and arterial blood sample partial pressure data are fused and identified, so that intelligent medical identification of myocardial infarction in Acute Coronary Syndrome (ACS), pulmonary embolism and Acute Coronary Syndrome (ACS) is realized, and the judgment sensitivity and accuracy are increased.
When an AI system is used for chest pain examination in a laboratory, firstly, index information collected by a patient is compared and judged, wherein the index information comprises electrocardiogram data (mainly ST-segment waveforms and T waves); troponin, myoglobin, creatine kinase isoenzyme content; a D-dimer; arterial blood partial pressure; the diagnostic criteria for finding laboratory examination information for a patient in the database are:
dividing the D-dimer into different intervals, and basically excluding acute pulmonary thromboembolism when the D-dimer is less than 500 ug/L; when the D-dimer is more than 500ug/L, whether the D-dimer is pulmonary embolism can be judged by enhanced CT examination;
judging a myocardial damage marker: when acute coronary syndrome occurs, the three tests of myocardial infarction are abnormally increased; TNI (troponin I)0-0.05ng/ml, myoglobin 0-107ng/ml, CKMB (creatine kinase isozyme) 0-4.3 ng/ml; TNI (troponin I) > 0.05ng/ml or TNI (troponin I) > 0.05ng/ml plus any one or two of the other two exceeds standard, which indicates that the coronary syndrome is acute;
blood sample analysis was performed: 1) adult normal arterial blood partial pressure: 83-108 mmhg; 2) hypoxemia is when the mmhg is less than 83mmhg, and hyperxemia is when the mmhg is more than 108 mmhg;
and further judging whether the acute coronary syndrome belongs to the myocardial infarction or not by combining the analysis of the electrocardiogram data: 2-3 adjacent leads appear to present that the ST section is more than or equal to 1mm of the arch back is lifted upwards (or pressed downwards); or the acute myocardial infarction with ST elevation if the electrocardiogram is abnormally high and two asymmetric T waves.
As shown in fig. 2, in the process of algorithm application, the semi-supervised learning in the above step 1-3 is implemented by using an improved Tri-Training algorithm, and the specific process is as follows:
the algorithm inputs are: obtaining troponin, myoglobin, creatine kinase isoenzyme content and D-dimer through data sampling; sample data of partial pressure of arterial blood sample, which is divided into: a marker-free sample set U, a Must-link constraint set M and a Cannot-link constraint set C, wherein the maximum number of selected marker samples is Nmax(ii) a The algorithm output is: dividing two D-dimer intervals, dividing the myocardial injury marker intervals and analyzing blood gas; the classification of common chest pain diseases, common pulmonary embolism and acute coronary syndrome in chest pain examination in a laboratory is obtained through the classification, and the two diseases are classified into a labeled sample set R 'and an updated unlabeled sample set U':
a1, initializing the marked sample number N as 0,
Figure BDA0001980407870000091
U′=U:
a2, then, for
Figure BDA0001980407870000092
The corresponding inter-sample distance is calculated by the following formula:
Figure BDA0001980407870000093
where d is the number of sample attributes, the sample pair (x) with the largest distance value is determined from Mi′,xj′),xi′Annotating a sample for a user, yi′For the class obtained, let (x)i′,yi′) And (x)j′,yi′) Added to R', xi′And xj′Deleted from U', N ═ N + 2;
A3,for the
Figure BDA0001980407870000094
The corresponding inter-sample distance is calculated by the following formula:
Figure BDA0001980407870000095
wherein the pair of samples (x) having the smallest distance value is determined from Cp′,xq′),xp′Annotating a sample for a user, yp′For the class obtained, let (x)q′,yp′) And (x)q′,-yp′) To R', -yp′And yp′In the opposite category, xp′And xq′Deleted from U', N ═ N + 2;
a4, if N < NmaxThen the process goes to step a2, otherwise the algorithm ends.
The semi-supervised clustering is introduced on the basis of improving the Tri-Training algorithm, the problem that the classifier excessively depends on labeled data to cause the reduction of classification performance is solved, unnecessary sample misjudgment is caused in the process of processing the myocardial injury marker, the D-2 polymer and the blood gas analysis data, the learning performance and the stability can be effectively improved by introducing the algorithm, more proper data division is generated, the data sample processing is more accurate, and the diagnosis accuracy rate of heart diseases is improved. As shown in fig. 4.
Inputting: marking a sample
Figure BDA0001980407870000096
Unlabeled specimen
Figure BDA0001980407870000097
Wherein u is the number of unlabeled samples, let the number of labels be labelN, the total samples be allN, the number of positive samples be posN, and the number of negative samples be negN;
first, the euclidean distance Kdist from other points is calculated for each labeled data, and the sequence from small to large is:
Figure BDA0001980407870000098
secondly, calculating the difference rate of the two distances before and after the label data:
rate=abs(Kdist(i,j+1)-Kdist(i,j));
thirdly, obtaining a domain parameter Eps and an input parameter MinPts in the clustering algorithm through the distance difference, wherein the domain parameter Eps is xjBelong to data set D ═ x1,x2,...,xmAnd, comprising data sets D and x in the sample setjWhen the same MinPts is j, the change amplitude of the two last numbers is less than 0.01, the preliminary convergence is determined to be reached, and if the times are more than two times, the MinPts and the Eps statements are expressed as:
MinPts=j;
Epsl=mean(Kdist(1:posN,j));
Eps2=mean(Kdist(posnum+1:posN+negN,j));
Eps2=mean(Kdist(1:posN+negN,j));
finally, obtaining a label, and enabling the noise point to be a negative sample and the rest to be a positive sample; the output is y e {1, -1}uI.e., label assignment of unlabeled data; note the book
Figure BDA0001980407870000101
The prediction results of the clustering algorithm on the unlabeled data under different category proportions are obtained, T is the number of the category proportions, ymiddleFor predicting the prediction result of combining the positive and negative samples in the unmarked data, adopting the worst case to integrate a plurality of prediction results y for the semi-supervised clustering algorithm*Can be expressed as:
Figure BDA0001980407870000102
as shown in fig. 3, the AI system extracts the electrocardiographic signal of the patient information in step 4 by improving the support vector machine algorithm, and specifically includes the following steps:
s1, preprocessing the acquired electrocardiogram data, filtering noise, extracting time domain characteristics of the electrocardiogram data, and generating an electrocardiogram data training sample set; the electrocardio data training sample set comprises a normal P wave, a QRS wave group, a T wave, a PR interval, an RR interval and an ST segment, wherein the ST segment is lifted upwards (or pressed downwards) from the back of the bow, and two asymmetric T waves are used as additional extraction features;
s2, classifying the electrocardiogram data training sample set, and setting two parameters Z and Z*Adopting support vector machine algorithm to take the sample (x) with label1,y1),……,(xn,yn) Training and building an initial classifier, then for example x with positive label value without label1 *,……,xk *Number n ofabnIs arranged, wherein Z and Z*Is a parameter specified by training;
s3, example x without ID by the classifier pair obtained in S21 *,……,xk *Are classified according to w × xj *+ b output value assigning each label without label sample, n with maximum output valueabnIndividual unlabeled specimen designated as yj *The remaining samples are designated as yj *1, then set the parameter Z* nAnd Z* abn(ii) a Retraining the sample to obtain a second classifier, setting Z*Finding a group of test examples with different label values, exchanging the label values of the test examples to reduce the optimization objective function value in the formula to the maximum extent, and repeating the step until the condition is not met;
s4 gradually increasing the adjusting parameter Z* nAnd Z* abnAnd reverting to the execution of S3 when Z is* n>Z*And Z is* abn>Z*Then the algorithm is finished, and finally the classification result is obtained so as to recognize that the dorsum of the ST segment presented by 2-3 adjacent leads is raised (or lowered), andthe electrocardiogram is an abnormally tall and two asymmetric T-waves.
The AI system comprises the following steps in the process of extracting the characteristics of the collected electrocardio data:
s1.1, constructing a sparse binary random matrix, taking the sparse binary random matrix as an observation matrix Q, and observing the preprocessed electrocardiosignals on the basis of a compressive sensing model Y-QX to obtain a compression value of the electrocardiosignals;
s1.2, skipping the electrocardiosignal reconstruction step, and directly extracting the features of the compressed electrocardio data by using an improved principal component analysis method to obtain a feature vector of the electrocardio signal;
as an embodiment of the present invention, a convolutional CNN may also be adopted for extracting features of the electrocardiographic data, where the convolutional CNN includes three layers, a convolutional layer one Conv1, a first Pooling layer one Pooling1, and a normalization layer one BN1, where the convolutional kernel size of the Conv1 is 6 × 6, the step size is 3, and 166 convolutional kernels are total; the convolution kernel size of convolutional layer two Conv2 and normalization layer two BN2, Conv2 is 5 x 5, the step size is 3, and 128 convolution kernels are totally arranged; the convolution kernel size of the Pooling layer three Pooling3 and the normalization layer three BN3, Conv3 is 3 x 3, the step length is 1, and 128 convolution kernels are totally obtained; then the Incep structure, the convolutional layer four Conv4 and the global pooling layer are sequentially connected, the output result is classified by a classifier, and the classifier adopts Softmax classification to improve the classification precision and efficiency of the electrocardiogram data.
And S1.3, taking the feature vectors of the normal P wave, the QRS wave group, the T wave, the PR interval, the RR interval and the ST segment extracted in the S1.2 as the input of the classifier.
The database information comprises data preprocessing and feature selection, and specifically comprises the following steps:
s41, extracting attribute features in the sql sentences by using the sql sentences, writing a recursive function, extracting keywords, recording as 1 if diseases represented by the keywords appear, and recording as 0 if the diseases represented by the keywords do not appear;
s42 merging and integrating the data distributed in different databases or data tables;
s43 discretizing the characteristic attribute in S42;
and S44, cleaning the data in S43, and deleting repeated, abnormal and redundant data.
As an example of a chest pain laboratory test, if a patient has had a myocardial infarction: then it needs to satisfy: 2-3 adjacent leads of the electrocardiogram present that the ST section is more than or equal to 1mm, and the arch back is lifted upwards (or depressed); or if the electrocardiogram is an abnormally tall and two asymmetric T-waves. In the process of classifying and identifying the electrocardiogram data, the generalization capability and the identification precision of the improved SVM algorithm can meet the requirement of effectively identifying ST segment and T wave abnormalities, and the auxiliary diagnosis of myocardial infarction is further improved.
The embodiment is necessary to be preprocessed before the electrocardio data information is collected and classified. The chest pain data set is complex in type structure, has pure digital data such as examination and inspection and the like, and also contains text data such as the past history, personal history and disease course records of a patient. Text data cannot be directly input as features, units of indexes in an inspection table are different, numerical magnitudes among different indexes are not on the same level, and if original data are directly used as input parameters of a classification model, attributes with larger numerical values may occupy larger weight in a classification process, so that a classification effect is influenced.
The admission record table and the disease course record table in the chest pain data set exist in the form of texts, such as the content of the past history item in the admission record table: diabetes, hypertension and hepatitis. Therefore, the attribute features in the data are extracted firstly, wherein a recursive function is mainly written by using sql statements to extract keywords, if diseases represented by the keywords appear, the keywords are marked as 1, and if not, the keywords are 0. The data about the chest pain of the patient are distributed in different data tables, such as a personal information table of the patient, various check tables and the like. Therefore, in order to have a more comprehensive understanding of the data, the data scattered in different places must be integrated. When a heart disease patient is diagnosed, part of characteristic information does not need specific numerical values, and only the relevant range needs to be judged.
In the acquisition of the electrocardiogram data, the relevant characteristic attributes are discretized, the discretized data are more sparse, and the calculation speed of the classifier can be improved in the training of the classification model. Each data table of the electrocardiographic data contains a large amount of useless data or data with low diagnostic value on the current diseases, which causes interference on subsequent data analysis and influences the accuracy of analysis, so that the electrocardiographic data needs to be cleaned. Meanwhile, the dimension of the chest pain medical record data feature item is more, and the chest pain medical record data feature item contains features which are useless or less useful for chest pain classification, so that the features are deleted, and the features which have greater influence on classification results are extracted.
Deleting attributes which are useless or have low value for chest pain diagnosis in each table, selecting 2 attributes of age and sex of patients in a personal information table of the patients, selecting 3 attributes of hypertension history, diabetes history and hepatitis history of the patients related to chest pain in the past history, selecting 1 attribute of smoking or not in the personal history, selecting 5 attributes of body temperature, pulse, respiration, blood pressure and nutritional state in physical examination, selecting chest pain, dyspnea, palpitation, cyanosis, syncope and edema in common symptoms, cough, hemoptysis, heart failure and arrhythmia 10 attributes, wherein common physical signs are selected from 9 attributes of hypertension, hypotension, cardiac tremor, tachycardia, bradycardia, heart sound change, heart murmur, pulse abnormality and heart enlargement, and typical troponin, myoglobin and creatine kinase isoenzyme contents are selected as examination and test items; a D-dimer; arterial blood partial pressure.
The present invention is first diagnosed by a clinician. Based on the clinician's diagnosis, the artificial intelligence will find the diagnostic criteria A for the disease from the database and compare each criteria with actual condition criteria B collected after the patient was admitted. If the comparison results match, the warning system will not activate. If the results are inconsistent, artificial intelligence can issue an alarm, reminding the clinician to review his/her diagnosis. Experiments were performed on the database and the method gave an accuracy of 97.85%. The experimental result shows that about 33% of electrocardio data can be compressed compared with the electrocardio signal classification accuracy in a non-compressed domain. Therefore, the method has certain feasibility in the wearable health monitoring system with low power consumption and real-time requirements, and lays a good foundation for performing electrocardiosignal processing research work in a compressed domain in the future.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (3)

1. The laboratory chest pain data examination auxiliary recognition system based on artificial intelligence supervised learning is characterized by comprising an AI system, wherein the AI system acquires clinician diagnosis information, patient laboratory examination information and database information and performs fusion analysis processing on electrocardiogram data, troponin, myoglobin, creatine kinase isoenzyme content, D-dimer and arterial blood sample partial pressure data examined by a patient laboratory;
the fusion analysis processing includes: the AI system adopts an improved Tri-Training algorithm to realize semi-supervised learning of troponin, myoglobin, creatine kinase isoenzyme content, D-dimer and arterial blood sample partial pressure data: the algorithm inputs are: obtaining troponin, myoglobin, creatine kinase isoenzyme content and D-dimer through data sampling; dividing the sample data of arterial blood partial pressure into a non-labeled sample set U, a Must-link constraint set M and a Cannot-link constraint set C, wherein the maximum number of labeled samples is Nmax(ii) a The algorithm output is: dividing two D-dimer intervals, dividing a myocardial injury marker interval and analyzing blood gas into three intervals, namely a marked sample set R 'and an updated unmarked sample set U';
the AI system processes the electrocardiosignals of the patient information by improving a support vector machine algorithm, and comprises the steps of preprocessing the collected electrocardio data, filtering noise, extracting time domain characteristics of the electrocardio data and generating an electrocardio data training sample set; to the collected electrocardio dataThe data extracted by the characteristics comprises normal P waves, QRS wave groups, T waves, PR intervals, RR intervals and ST segments, wherein the ST segments are lifted or pressed upwards in the arch back, and two asymmetric T waves are used as additional extracted characteristics; classifying the electrocardiogram data training sample set, and setting parameters Z and Z*Adopting support vector machine algorithm to take the sample (x) with label1,y1),……,(xn,yn) Training and building an initial classifier, then for example x with positive label value without label1 *,……,xk *Number n ofabnIs arranged, wherein Z and Z*Is a parameter specified by training; example x without identification by classifier pair1 *,……,xk *Are classified according to w × xj *The output value of + b is used for assigning value to each label without label sample, w is weight value, b is constant value parameter, and n with the maximum output valueabnIndividual unlabeled specimen designated as yj *The remaining samples are designated as yj *1, then set the parameter Z* nAnd Z* abn(ii) a Retraining the sample to obtain a second classifier, setting Z*Finding a group of test examples with different label values, and exchanging the label values of the test examples, so that the optimization objective function value in the formula is reduced to the maximum extent; stepwise increase of the adjustment parameter Z* nAnd Z* abnWhen Z is* n>Z*And Z is* abn>Z*When the algorithm is finished, the purposes of identifying that the arch back of the ST section presented by 2-3 adjacent leads is raised or depressed upwards and the electrocardiogram is an abnormally high and two asymmetrical T waves are achieved;
extracting the electrocardiogram data features by adopting convolution CNN, wherein the convolution CNN comprises three layers, namely a convolution layer one Conv1, a first Pooling layer one Pooling1 and a normalization layer one BN1, the convolution kernel size of Conv1 is 6 multiplied by 6, the step length is 3, and 166 convolution kernels are totally included; the convolution kernel size of convolutional layer two Conv2 and normalization layer two BN2, Conv2 is 5 x 5, the step size is 3, and 128 convolution kernels are totally arranged; the convolution kernel size of the Pooling layer three Pooling3 and the normalization layer three BN3, Conv3 is 3 x 3, the step length is 1, and 128 convolution kernels are totally obtained; then, sequentially connecting the increment structure, the convolution layer four Conv4 and the global pooling layer, and classifying output results by a classifier, wherein the classifier adopts Softmax classification;
the AI system finds out the diagnosis standard of the laboratory examination information of the patient from the database, compares the diagnosis standard with the analysis standard of the diagnosis information of the clinician after the patient is admitted, if the comparison result is matched, the warning system can not be started, and if the comparison result is inconsistent, the artificial intelligence can send out an alarm to remind the clinician to examine the patient again.
2. The system of claim 1, wherein the database is operated in a win7 environment, the development tool is Microsoft SQL Server 2000 database management system, the database front end is Microsoft VC +6.0, and the CPU: AMD XP1800+, Kingston 3G DDR, hard disk Dall 600G.
3. The laboratory chest pain data examination assistant recognition system based on artificial intelligence supervised learning of claim 1, wherein the AI system is a PC, Intel/Intel core i5-8500 box processor, CPU main frequency: 3.0GHz, 16G memory, operating system Windows7X64, development tool Matlab 2010.
CN201910147228.6A 2019-02-27 2019-02-27 Laboratory chest pain data inspection auxiliary identification method based on artificial intelligence supervised learning Active CN109907751B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910147228.6A CN109907751B (en) 2019-02-27 2019-02-27 Laboratory chest pain data inspection auxiliary identification method based on artificial intelligence supervised learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910147228.6A CN109907751B (en) 2019-02-27 2019-02-27 Laboratory chest pain data inspection auxiliary identification method based on artificial intelligence supervised learning

Publications (2)

Publication Number Publication Date
CN109907751A CN109907751A (en) 2019-06-21
CN109907751B true CN109907751B (en) 2021-02-02

Family

ID=66962562

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910147228.6A Active CN109907751B (en) 2019-02-27 2019-02-27 Laboratory chest pain data inspection auxiliary identification method based on artificial intelligence supervised learning

Country Status (1)

Country Link
CN (1) CN109907751B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111128320B (en) * 2019-11-19 2023-08-01 四川好医生云医疗科技有限公司 System for determining medical label based on test result and artificial intelligence method
CN113113133A (en) * 2021-03-31 2021-07-13 上海深至信息科技有限公司 Medical service providing system and method
CN116189899B (en) * 2023-04-26 2023-07-07 淄博市中心医院 Emergency critical illness auxiliary evaluation system based on machine learning

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1791355A (en) * 2003-05-19 2006-06-21 伊舍米娅技术公司 Apparatus and methods for risk stratification of patients with chest pain of suspected cardiac origin
CN101568837A (en) * 2006-08-07 2009-10-28 比奥-拉德巴斯德公司 Method for the prediction of vascular events and the diagnosis of acute coronary syndrome
CN203133863U (en) * 2012-11-23 2013-08-14 北京倍肯恒业科技发展有限责任公司 Digital cardiovascular accident early diagnosis system
US20130345581A1 (en) * 2012-06-20 2013-12-26 Randox Laboratories Limited Combination for early exclusion of acute myocardial infarction
CN105232028A (en) * 2015-09-29 2016-01-13 滕大志 Method and apparatus for monitoring cardiovascular testing data
CN108766562A (en) * 2018-05-31 2018-11-06 深圳市零度智控科技有限公司 Medical intelligent diagnostics platform and its operation method and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1791355A (en) * 2003-05-19 2006-06-21 伊舍米娅技术公司 Apparatus and methods for risk stratification of patients with chest pain of suspected cardiac origin
CN101568837A (en) * 2006-08-07 2009-10-28 比奥-拉德巴斯德公司 Method for the prediction of vascular events and the diagnosis of acute coronary syndrome
US20130345581A1 (en) * 2012-06-20 2013-12-26 Randox Laboratories Limited Combination for early exclusion of acute myocardial infarction
CN203133863U (en) * 2012-11-23 2013-08-14 北京倍肯恒业科技发展有限责任公司 Digital cardiovascular accident early diagnosis system
CN105232028A (en) * 2015-09-29 2016-01-13 滕大志 Method and apparatus for monitoring cardiovascular testing data
CN108766562A (en) * 2018-05-31 2018-11-06 深圳市零度智控科技有限公司 Medical intelligent diagnostics platform and its operation method and storage medium

Also Published As

Publication number Publication date
CN109907751A (en) 2019-06-21

Similar Documents

Publication Publication Date Title
Gao et al. An effective LSTM recurrent network to detect arrhythmia on imbalanced ECG dataset
Guo et al. Using bayes network for prediction of type-2 diabetes
Li et al. Interpretability analysis of heartbeat classification based on heartbeat activity’s global sequence features and BiLSTM-attention neural network
CN109907751B (en) Laboratory chest pain data inspection auxiliary identification method based on artificial intelligence supervised learning
CN111261289A (en) Heart disease detection method based on artificial intelligence model
Khan et al. Automated classification of lung sound signals based on empirical mode decomposition
Ordonez et al. Using modified multivariate bag-of-words models to classify physiological data
Islam et al. New hybrid deep learning approach using BiGRU-BiLSTM and multilayered dilated CNN to detect arrhythmia
CN109360658B (en) Disease pattern mining method and device based on word vector model
CN112820416A (en) Major infectious disease queue data typing method, typing model and electronic equipment
Zhang et al. Auto-annotating sleep stages based on polysomnographic data
CN113593708A (en) Sepsis prognosis prediction method based on integrated learning algorithm
Zou et al. Automatic detection of congestive heart failure based on multiscale residual unet++: From centralized learning to federated learning
Das et al. Heart disease detection using ml
CN110400610B (en) Small sample clinical data classification method and system based on multichannel random forest
CN110010250B (en) Cardiovascular disease patient weakness grading method based on data mining technology
Zalewski et al. Estimating patient's health state using latent structure inferred from clinical time series and text
Golrizkhatami et al. Multi-scale features for heartbeat classification using directed acyclic graph CNN
Liu et al. Automatic sleep arousals detection from polysomnography using multi-convolution neural network and random forest
Guan et al. HA-ResNet: Residual neural network with hidden attention for ECG arrhythmia detection using two-dimensional signal
Rout et al. Deep Learning in Early Prediction of Sepsis and Diagnosis
Ordónez et al. Classification of patients using novel multivariate time series representations of physiological data
CN116172573A (en) Arrhythmia image classification method based on improved acceptance-ResNet-v 2
Liu et al. Ensemble learning-based atrial fibrillation detection from single lead ECG wave for wireless body sensor network
CN114550910A (en) Artificial intelligence-based ejection fraction retention type heart failure diagnosis and typing system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant