CN113539473A - Method and system for diagnosing brucellosis only by using blood routine test data - Google Patents

Method and system for diagnosing brucellosis only by using blood routine test data Download PDF

Info

Publication number
CN113539473A
CN113539473A CN202110519657.9A CN202110519657A CN113539473A CN 113539473 A CN113539473 A CN 113539473A CN 202110519657 A CN202110519657 A CN 202110519657A CN 113539473 A CN113539473 A CN 113539473A
Authority
CN
China
Prior art keywords
data
brucellosis
sample
layer
classifier
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110519657.9A
Other languages
Chinese (zh)
Inventor
陈超
宋彪
王哲
罗祎斐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inner Mongolia Weishu Data Technology Co ltd
Original Assignee
Inner Mongolia Weishu Data Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inner Mongolia Weishu Data Technology Co ltd filed Critical Inner Mongolia Weishu Data Technology Co ltd
Priority to CN202110519657.9A priority Critical patent/CN113539473A/en
Publication of CN113539473A publication Critical patent/CN113539473A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/80ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for detecting, monitoring or modelling epidemics or pandemics, e.g. flu

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The application provides a method and a system for diagnosing brucellosis by using blood routine test data only, and belongs to the field of medical test. The overall architecture is as follows: the system comprises a data acquisition layer, a data processing layer, a model prediction layer and a reinforcement learning layer. The defects of long time consumption, complex operation, low accuracy and the like of the traditional detection mode are overcome, so that the requirement of large-batch screening work of the brucellosis is met, and the medical inspection is more intelligent and automatic.

Description

Method and system for diagnosing brucellosis only by using blood routine test data
Technical Field
The invention relates to the field of test medicine, in particular to a method and a system for diagnosing brucellosis by using blood routine test data only.
Background
Brucellosis, also known as brucellosis and brucellosis, is a widespread and highly harmful zoonosis caused by brucellosis infection. Currently, the etiology test is mainly to directly isolate the bacillus bulgaricus from the blood, bone marrow, other tissues or body fluids of patients, which is the gold standard for determining the disease. The checking process is closely related to the germ content in the sample, the germ separation and culture method, the disease stage, whether antibiotics are used and other factors. This traditional method relies on the quality of the bacterial sample obtained, and because of the long culture period, it usually takes 5 days or more to separate and culture the bacteria from the blood or bone marrow, and the success rate of the examination is low in the chronic stage of the disease or after the use of antibiotics, and moreover, the handling of patient tissues increases the risk of infection of brucellosis by laboratory technicians. In addition, the influence of experimental environment and experience factors of doctors is large, and the phenomenon of missed diagnosis also frequently occurs. In a word, the existing diagnosis aiming at the cloth diseases still has the characteristics of low speed, low accuracy, complex operation and high cost, which brings adverse effects on the timely diagnosis and treatment of the cloth diseases, large-area screening and epidemiological investigation.
Disclosure of Invention
In view of the above, the invention establishes a convenient, rapid, sensitive, economical and applicable brucellosis laboratory detection method and system, and aims to solve the problems that the existing brucellosis diagnosis method is low in disease identification efficiency and accuracy, high in cost and experience dependence and incapable of meeting the brucellosis prevention and treatment requirements under the condition of unbalanced medical resources, and provide a powerful basis for early clinical diagnosis of brucellosis.
In order to achieve the above objects, the present invention provides a method and system for diagnosing brucellosis using only blood routine test data, the overall framework is divided into four layers: the system comprises a data acquisition layer, a data processing layer, a model prediction layer and a reinforcement learning layer.
The data acquisition layer is responsible for collecting data samples of routine examination items of blood of a patient, and the data samples comprise 22 index data.
A data processing layer: analyzing influence factors of the inspection sample data, determining and extracting feature dimensions, removing special values, unifying unit dimensions, filtering outliers, standardizing data, strengthening features, and filling up missing values of the dimension of the inspection sample data.
Model prediction layer: one or more classifiers suitable for intelligent identification of brucellosis are generated based on standard sample data and are deployed to a hospital lis system, and real-time inspection data is received to complete intelligent identification of brucellosis.
And (3) a reinforcement learning layer: in practical application, a sample diagnosed with brucellosis is used as a positive sample to expand a training set in real time, and when the positive sample is accumulated to a certain degree, model parameters can be automatically trained and updated so as to improve the precision of the model.
The classifier divides the sample data into a positive data group and a negative data group according to the historical diagnosis result of the inspection index, and establishes a training set, a verification set and a test set after pairing. Training the model based on a random forest algorithm, adjusting parameters by a grid search method, and evaluating the model by adopting the area AUC under the ROC curve.
Compared with the prior art, the invention has the beneficial effects that: the positive identification performance of the brucellosis can be further improved, the inspection cost is reduced, the inspection process is simplified, and the missed diagnosis risk of the brucellosis management and control is reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
Fig. 1 is a system framework of the present invention.
FIG. 2 is a random forest algorithm framework.
FIG. 3 is a flow of a random forest algorithm.
Fig. 4 is an effect diagram of an on-line hospital scene according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the existing medical detection, the conventional methods of serology tiger red plate test and SAT test are still the main methods for diagnosing brucellosis in various countries, the traditional method depends on the acquisition quality of bacterial samples, and meanwhile, the traditional method is greatly influenced by factors such as experimental environment, doctor experience and the like due to long bacterial culture period, the detection cost is high, and missed diagnosis is easy to occur.
To solve the above technical problems, the present invention provides a method and system for diagnosing brucellosis using only blood routine test data. The method comprises the steps of generating one or more classifiers suitable for intelligent identification of brucellosis by acquiring test sample data, deploying the classifiers to a hospital Laboratory Information System (LIS) platform, and receiving real-time test data to complete real-time identification of the brucellosis. The positive and negative control group reinforced experiments are accumulated, so that the classification accuracy of the brucellosis classifier is improved, the area AUC under the ROC curve is adopted to evaluate the models of the listed parameters, the optimal classifier is trained to intelligently classify the detection data, the cost of identifying, diagnosing and preventing the brucellosis is reduced, and the accuracy of the brucellosis identification is further improved.
The method and system for diagnosing brucellosis using only blood routine test data of the present invention comprises the steps of:
1) acquiring test sample data, performing data preprocessing and establishing a characteristic project, wherein the test sample data is subjected to positive and negative classification according to a historical diagnosis result;
2) generating one or more intelligent recognition classifiers suitable for brucellosis based on the test sample data;
3) and deploying the classifier to a hospital LIS platform, and receiving real-time inspection data to complete the real-time identification of the brucellosis.
In one embodiment of the invention, the test sample data is extracted from the LIS for patients within three years. Furthermore, in order to improve the accuracy of the brucellosis intelligent identification method based on the blood routine test data, the historical diagnosis result is referred to during sample data selection, and 22 index test data of the blood routine within three days before and two months after the diagnosis result are selected.
And performing data preprocessing after the test sample data is acquired so that the sample data meets the requirement of classifier training. The data preprocessing comprises the steps of carrying out region transposition, variable screening and missing value filling on the test sample data. In order to process a lot of inspection index data, the present embodiment adopts PCA to perform variable screening, and fills missing values on the basis of covering traditional methods such as median, mean, mode, and the like, and adopts countermeasure strategy to generate missing data.
Further, data filtering is a key step in feature engineering. According to the clinical data characteristics, the data are divided into peak distribution data and low peak distribution data. For peak distribution data, the numerical aggregation degree density of the samples within a certain threshold value is larger; for low peak distribution data, the degree of numerical polymerization within the same value threshold is relatively low. The error detection speed of the low-peak value distribution data is lower than that of the high-peak value distribution data, and the calculation efficiency of the algorithm can be improved by data filtering. The invention selects a relatively conservative data filtering method, namely an isolated forest algorithm, so as to improve the effective calculated data volume to the maximum extent and ensure the generalization in a clinical scene.
Further, an isolated forest algorithm is adopted for data filtering, wherein the data filtering ratio is controlled by the over-parameter of the abnormal control ratio (threshold). The labeled positive and negative data sets were paired, at 7: 1: the scale of 2 is randomly divided into a training set, a validation set, and a test set.
In the embodiment, a random forest algorithm is adopted, and the parameters of the classifier are finally determined by a parameter grid search method.
The random forest algorithm is one of the most commonly used algorithms in classification analysis, is proposed by Leo Breiman and Adele Cutler, and combines a 'Bootstrap aggregation' method and a 'random subspace method' method to construct a decision tree set, and each decision tree can independently learn and grow without pruning so as to make corresponding prediction. These decision trees are then used to solve a single classification prediction task through a combined strategy. Therefore, the prediction result of the random forest is better than the classification prediction of any decision tree thereof.
The random forest framework is shown in fig. 2, and includes a plurality of decision tree classifiers, and the category of the output result is determined by the mode of all decision tree classification results. In constructing a single decision tree, the random forest algorithm uses two random selection processes: the first time is to randomly select training samples and the second time is to randomly select feature attributes of the samples. And after all decision trees are constructed, deciding a final classification result in a weight voting mode. The random forest algorithm has the advantages of high accuracy, excellent performance, capability of evaluating the importance of the features, capability of balancing the errors of unbalanced classified data, high learning speed and overfitting resistance.
As shown in fig. 3, in the random forest algorithm process, first, the input category number, tree size, depth, hyper-parameter attribute filtering rule, and termination rule of the random forest model are determined; using a bootstrap method to select a training set with the size of N, which is put back for each tree; random selection at a nodekComparing and selecting the best features, and dividing a data set; and recursively generating each decision tree without pruning to further obtain a random forest classifier model.
In the test set, unknown samplesxIs classified intocThe probability of (c) is:
Figure 142437DEST_PATH_IMAGE002
whereinkIs the number of trees; determining categories by majority voting:
Figure 653053DEST_PATH_IMAGE004
and simultaneously calculating the classification error.
In the optimization process of the random forest algorithm, the number and the tree depth of the trees with large influence are mainly enumerated. And optimizing from experimental results at multiple depths. The number of the trees is traversed by taking 50-500 and 10 as step length, the depth of the tree is traversed by taking 50-500 and 10 as step length, and the area AUC under the ROC curve under each permutation and combination is counted and used as an evaluation index for evaluation.
In the process of evaluating the classifier, judging that the classifier is successfully trained when the AUC is greater than 0.8, judging that the classifier is not successfully trained when the AUC is less than 0.8, returning to the data acquisition stage to acquire data and train the classifier again.
In one embodiment of the present invention, after training of the classifier is completed, the classifier is deployed on the hospital's LIS platform. Each detection equipment of laboratory transmits the data number to the LIS platform through middleware, and the classifier carries out real-time brucellosis intelligent recognition to the detection data.
In the intelligent brucellosis identification process, the system collects and predicts index data of each item of a patient from blood routine, when positive brucellosis is detected by the classifier, the index data is output as 'suspected brucellosis' on a relevant interface of an LIS platform in real time, comparison is carried out by combining a routine examination means of an examining physician, if the results of the two are inconsistent, the system prompts the examining physician that the problem of missed diagnosis possibly occurs, and the examining physician is advised to carry out recheck to eliminate the problem possibly occurring by various factors. When the positive brucellosis is not detected by the classifier, no reaction is carried out on the LIS platform relevant interface in real time, and the brucellosis characteristic is not shown in the test data. And (3) taking the sample diagnosed as the brucellosis as a positive sample to expand the training set in real time, and automatically training and updating the model parameters when the sample is accumulated to a certain degree so as to improve the model precision.
By adopting the method and the system for diagnosing the brucellosis only by using the blood routine test data, the defects of long time consumption, complex operation, low accuracy and the like of the traditional detection mode can be overcome, the requirement of large-scale brucellosis screening work is met, the medical test is more intelligent and automatic, the detection accuracy of the brucellosis is greatly improved, the efficiency of brucellosis identification is improved, the risk of missing report is reduced, and the workload of testers is reduced.
Although the present application has been described in detail with reference to preferred embodiments, those skilled in the art will understand that various modifications and equivalent arrangements can be made without departing from the spirit and scope of the present application.

Claims (7)

1. A method and system for diagnosing brucellosis using only blood routine test data, the system comprising:
a data acquisition layer: the data sample is used for collecting the routine examination items of the blood of the patient, and comprises 22 index data;
a data processing layer: the method is used for analyzing influence factors of the inspection sample data, removing special values, unifying unit dimensions, filtering outliers, standardizing data, strengthening features and filling up dimension missing values of the inspection sample data;
model prediction layer: the system comprises a classifier, a hospital LIS system and a data processing system, wherein the classifier is used for generating one or more classifiers suitable for intelligent identification of the brucellosis through standard sample data, deploying the classifiers to the hospital LIS system, and receiving real-time inspection data to finish the intelligent identification of the brucellosis;
and (3) a reinforcement learning layer: the method is used for expanding the training set in real time by taking the sample diagnosed with the brucellosis as a positive sample in practical application, automatically training and updating the model parameters after accumulating to a certain degree, and improving the model precision.
2. The method and system for diagnosing brucellosis using only blood routine test data as claimed in claim 1, wherein in the data acquisition layer, the data are specifically: the unit dimension is a standard scientifically determined according to related indexes in China and China.
3. Under the scene of a training model, selecting 22 index test data of blood routine within three days before and two months after a diagnosis result as a data source, and referring to a historical diagnosis result as a label; in the real-time detection scene, only data samples of the routine test items of the blood of the patient are collected, and the data samples comprise 22 index data.
4. The method and system for diagnosing brucellosis by using only blood routine test data as claimed in claim 1, wherein in the data processing layer, the data preprocessing includes region transposition, variable screening and missing value filling on the test sample data; the missing value filling is based on covering traditional methods such as median, mean, mode and the like, and a countermeasure strategy is adopted to generate missing data; and in the process of establishing the characteristic engineering, an isolated forest algorithm is adopted to filter data.
5. The method and system for diagnosing brucellosis by using only blood routine test data as claimed in claim 1, wherein in the model prediction layer, a random forest algorithm is used to train the model, classifier parameters such as tree number and depth are adjusted by a parameter grid search method, and the area under ROC curve AUC under each permutation and combination is counted and evaluated as an evaluation index.
6. In the intelligent brucellosis identification process, when the positive brucellosis is detected by the classifier, the suspected brucellosis is judged, the comparison is carried out by combining the conventional detection means of a detection doctor, and if the results of the two are inconsistent, the problem of missed diagnosis of the detection doctor is judged; and when the positive brucellosis is not detected by the classifier, judging that the test data does not show the brucellosis characteristics.
7. The method and system for diagnosing brucellosis only using blood routine test data as claimed in claim 1, wherein in the reinforcement learning layer, the sample diagnosed with brucellosis is used as a positive sample to expand the training set in real time, and when the positive sample is accumulated to a certain extent, the model parameters can be automatically trained and updated to improve the model accuracy.
CN202110519657.9A 2021-05-12 2021-05-12 Method and system for diagnosing brucellosis only by using blood routine test data Pending CN113539473A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110519657.9A CN113539473A (en) 2021-05-12 2021-05-12 Method and system for diagnosing brucellosis only by using blood routine test data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110519657.9A CN113539473A (en) 2021-05-12 2021-05-12 Method and system for diagnosing brucellosis only by using blood routine test data

Publications (1)

Publication Number Publication Date
CN113539473A true CN113539473A (en) 2021-10-22

Family

ID=78095405

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110519657.9A Pending CN113539473A (en) 2021-05-12 2021-05-12 Method and system for diagnosing brucellosis only by using blood routine test data

Country Status (1)

Country Link
CN (1) CN113539473A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115019916A (en) * 2022-05-27 2022-09-06 山东大学 Method and system for predicting blood stream infection pathogenic bacteria
CN115223709A (en) * 2022-07-26 2022-10-21 内蒙古卫数数据科技有限公司 Missing value filling migration learning method based on disease distribution diagnosis neural network model

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112635069A (en) * 2020-12-14 2021-04-09 内蒙古卫数数据科技有限公司 Intelligent pulmonary tuberculosis identification method based on conventional test data

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112635069A (en) * 2020-12-14 2021-04-09 内蒙古卫数数据科技有限公司 Intelligent pulmonary tuberculosis identification method based on conventional test data

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115019916A (en) * 2022-05-27 2022-09-06 山东大学 Method and system for predicting blood stream infection pathogenic bacteria
CN115223709A (en) * 2022-07-26 2022-10-21 内蒙古卫数数据科技有限公司 Missing value filling migration learning method based on disease distribution diagnosis neural network model
CN115223709B (en) * 2022-07-26 2024-01-23 内蒙古卫数数据科技有限公司 Deficiency value filling migration learning method based on cloth disease diagnosis neural network model

Similar Documents

Publication Publication Date Title
CN108389201A (en) The good pernicious sorting technique of Lung neoplasm based on 3D convolutional neural networks and deep learning
CN107066791A (en) A kind of aided disease diagnosis method based on patient's assay
CN108511055B (en) Ventricular premature beat recognition system and method based on classifier fusion and diagnosis rules
WO2022198761A1 (en) Asthma diagnosis system based on decision tree and improved smote algorithms
CN108304887A (en) Naive Bayesian data processing system and method based on the synthesis of minority class sample
CN112652361B (en) GBDT model-based myeloma high-risk screening method and application thereof
CN113539473A (en) Method and system for diagnosing brucellosis only by using blood routine test data
CN107169284A (en) A kind of biomedical determinant attribute system of selection
CN112635069A (en) Intelligent pulmonary tuberculosis identification method based on conventional test data
CN111370126B (en) ICU mortality prediction method and system based on punishment integration model
CN113392894A (en) Cluster analysis method and system for multi-group mathematical data
CN113470816A (en) Machine learning-based diabetic nephropathy prediction method, system and prediction device
WO2023198224A1 (en) Method for constructing magnetic resonance image preliminary screening model for mental disorders
CN112950614A (en) Breast cancer detection method based on multi-scale cavity convolution
CN107545133A (en) A kind of Gaussian Blur cluster calculation method for antidiastole chronic bronchitis
CN114970637A (en) Lightweight arrhythmia classification method based on deep learning
CN117116477A (en) Construction method and system of prostate cancer disease risk prediction model based on random forest and XGBoost
CN113744869B (en) Method for establishing early screening light chain type amyloidosis based on machine learning and application thereof
Nagadeepa et al. Artificial intelligence based cervical cancer risk prediction using m1 algorithms
CN112767349B (en) Reticulocyte identification method and system
Ingle et al. Lung cancer types prediction using machine learning approach
CN111257558B (en) Machine learning-based chronic lymphocytic leukemia tumor cell identification method
CN112116559A (en) Digital pathological image intelligent analysis method based on deep learning
Beaulah et al. Lung Melanoma Recognition through Platelet Count Composition Using Supervised ML Technique
CN117116475A (en) Method, system, terminal and storage medium for predicting risk of ischemic cerebral apoplexy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20211022