CN109360657B - Time period reasoning method for selecting samples of hospital infection data - Google Patents

Time period reasoning method for selecting samples of hospital infection data Download PDF

Info

Publication number
CN109360657B
CN109360657B CN201811129775.3A CN201811129775A CN109360657B CN 109360657 B CN109360657 B CN 109360657B CN 201811129775 A CN201811129775 A CN 201811129775A CN 109360657 B CN109360657 B CN 109360657B
Authority
CN
China
Prior art keywords
sample
data
time period
infection
patient
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811129775.3A
Other languages
Chinese (zh)
Other versions
CN109360657A (en
Inventor
李栋栋
胡必杰
高晓东
牛耀军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Lilian Information Technology Co ltd
Zhongshan Hospital Fudan University
Original Assignee
Shanghai Lilian Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Lilian Information Technology Co ltd filed Critical Shanghai Lilian Information Technology Co ltd
Priority to CN201811129775.3A priority Critical patent/CN109360657B/en
Publication of CN109360657A publication Critical patent/CN109360657A/en
Application granted granted Critical
Publication of CN109360657B publication Critical patent/CN109360657B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention discloses a time period inference method for selecting samples of hospital infection data, which is characterized in that in the process of data sampling, a recorded diagnosis date is taken as a reference, a real infection date or samples of the latest days of the real infection date are selected, wherein the samples are extracted in the time unit length of inference of a former time period before and a latter time period after the reference, and an average value is obtained for estimating. The beneficial effects are as follows: the method solves the problem of sample selection by adopting a time period reasoning method, effectively obtains the infection sample in the time period in the infection state, has generalizability, can try on time series data, and has basic conditions similar to basic experience when being used.

Description

Time period reasoning method for selecting samples of hospital infection data
Technical Field
The invention relates to a hospital infection data mining technology, in particular to a time period reasoning method applied to analysis of hospital infection big data and selection of samples of hospital infection data in a modeling process.
Background
In the field of nosocomial infection, a great deal of economic loss and casualties are caused by nosocomial infection every year, the analysis and modeling of nosocomial infection data is a problem which is relatively troublesome in the analysis and modeling of medical data, the quality of the nosocomial infection data is poor, the difficulty in constructing samples is high, and no good precedent is provided for the analysis and modeling of the data as a guide, however, as the nosocomial infection events are gradually valued, a monitoring and early warning model is constructed to monitor and early warn the nosocomial infection cases in real time, so that a clinician is helped to intervene and cure in time, and the problem is of great value. In recent years, each hospital begins to establish its own hospital infection monitoring information system, however, these monitoring and early warning systems are not good and have poor effects, so that the problems are mainly caused by the difficulty of analyzing and modeling the large data of the hospital infection, cases which are not successful are used as guidance and reference, and each case is used for solving a small part of problems and is difficult to comprehensively describe and analyze the difficulty of modeling the hospital infection. Solutions have also been proposed in literature to model data, but various problems exist.
For example, the literature (forest pine, queen culture, liuwei, etc.; medical examination data preprocessing method research [ J ]. computer application research, 2017,34(4): 1089-.
For another example, the literature (Kotsiantis S B, Kanelopoulos D, Pintalas P E. data preprocessing for collaborative learning [ J ]. International Journal of Computer Science,2006,1(2):111-, for missing values, the use of means, special values, etc. may be used, however, for modeling purposes, these methods are not well suited because the ultimate goal of modeling is to provide early warning or real-time monitoring of hospital infected patients, and most importantly, to provide early warning basis for the patients who are finally early warned, these bases are generally intended to show the patient's true values rather than processed values, which is convenient for the physician to make a reasonable diagnosis, so that direct modification of values or use of special values is less suitable in this case.
For another example, a time sequence model is provided in the application research of an autoregressive moving average mixed model in hospital infection incidence prediction in documents (plum red, Liangpei Feng, Pandong peak, and the like) [ J ] China Hospital Infection journal, 2013,23(11):2693 ], and the time sequence model can monitor the development trend of hospital infection and aims at early warning and reducing the risk of hospital infection. However, the early warning model has two obvious disadvantages, one is that the model indirectly monitors the incidence of nosocomial infection, which generally belongs to retrospective research afterwards, is difficult to monitor in advance and in real time, and cannot intervene and treat nosocomial infection in time, the other is that the model belongs to a formula-type calculation model, has no interpretability and is difficult to analyze reasons, and data used by the model is established based on Ningxia people hospitals, is not subjected to a large number of tests of other hospitals, and remains to be checked if the model has generalizability.
In the process of carrying out the analysis modeling of the hospital infection big data, the encountered difficulties mainly include the following:
(1) the problem of hospital infection data loss. The hospital infection data has the characteristic of timeliness, and the characteristic determines that the time range of the detection data of the patient must be considered when the data is used, but the hospital infection data has the defect problem, so that the difficulty of analyzing and modeling the hospital infection big data is increased;
(2) hospital infection data problem of positive and negative sample division. The hospital infection data samples are mainly divided into two types, one type is an infection sample, the other type is a non-infection sample, and how to divide the two types of samples into positive and negative examples is a more important problem. However, the practical problem is complex, the non-infection sample is easy to obtain, only several days of data are randomly extracted from the patients without nosocomial infection as the non-infection sample, the selection of the infection sample has a difficulty that most of the patients with nosocomial infection are in hospital for a long time, the possibility of being in the infection state is only a period of time, and other periods of time are normal, so that how to obtain the data of the infection state is difficult. In hospital infections, patients who have been diagnosed or reported as hospital infections generally have an "infection date" diagnosed by a doctor, herein referred to as "diagnosis date", for determining that the patient has had an infection that the day, and the simplest way is to take the day of the "diagnosis date" as an infection sample, however, in actual investigations, this date is an inferred date of the doctor, and most of them are inaccurate, and the date on which the patient actually had an infection may be before this date or after this date, and is not very strict in date grasping, and similar problems have been explained in the literature (Zhang Wei, Meng Hui, Zheng Jia, etc.. in the study of different statistical methods of hospital infection miss-reporting rates [ J ] in China Hospital infectivity, 2006, 1.).
Therefore, there is a need in the art for improvements in the analysis and modeling of hospital infection big data for the above-mentioned deficiencies.
Disclosure of Invention
In view of the above defects in the prior art, the technical problem to be solved by the present invention is to provide a time period inference method for selecting hospital infection data samples, so as to solve the problem of selecting hospital infection data samples in the analysis and modeling processes of hospital infection big data.
Before proceeding with the summary of the invention, it is necessary to explain and define terms appearing in the document.
Effective time range: for example, if the data such as body temperature, stool frequency, heart rate and respiratory rate have high timeliness and basically have differences every day, the data can be used for 24 hours, the data exceeding 24 hours can be not considered to be used, the data such as microbiological examination and laboratory examination have low timeliness, and the data within three to five days can be considered to be effective, so that the data can be used for 72 or 120 hours, and the range is referred to as an "effective time range". The effective time range is generally determined according to experience or data in a reference document, and can also be established according to actual modeling purposes, and the standard refers to the action time of part of characteristics in the hospital infection diagnosis standard (trial).
The diagnosis date: in a nosocomial infection, a patient who has been diagnosed or reported as a nosocomial infection typically has a "date of infection" that is diagnosed by a physician, referred to herein as the "date of diagnosis".
The infection date: the date when the patient actually developed the infection was the date of infection.
The previous time period: selecting an infection sample by taking the diagnosis date as a reference date, and taking the time unit length of the previous inference as a previous time period.
And (3) a later time period: and selecting an infection sample by taking the diagnosis date as a reference date, and reasoning the later time period according to the later time unit length.
In order to solve the problems, the invention provides a time period reasoning method for selecting hospital infection data samples, which comprises the following steps:
step 1, determining the characteristics of hospital infection data, classifying the characteristics according to an effective time range, recording a characteristic set as F, and expressing k as the kth characteristic in the set F;
step 2, recording a set composed of all patients as S, obtaining a patient m in the set S, and generating a positive and negative sample set N for the patient m;
step 3, after the positive and negative sample set N is generated in the step 2, recording a hospital infected patient set as C, and recording a set formed by diagnosis dates of infected patients as Cd;
step 4, randomly extracting n patients from the set C, and obtaining diagnosis dates corresponding to the n patients;
step 5, diagnosing the n patients in the step 4, and acquiring arrays A _ pre and A _ end consisting of data of 'before time period' and 'after time period' of the n patients;
step 6, summing the two arrays in the step 5, and then averaging to obtain two average values avg _ pre ═ sum (a _ pre)/n and avg _ end ═ sum (a _ end)/n; these two averages serve as two parameters for time period inference for all patients in set C, approximating the "previous time period" and "subsequent time period" for all patients in set C;
step 7, updating the data to generate a sample set D and carrying out modeling test according to the sample set D;
and 8, continuously fine-adjusting the avg _ pre and the avg _ end according to the test result to obtain a final required value.
It should be noted that the positive example in step 2 is an m-sample of a patient with a nosocomial infection, and the negative example is an m-sample of a patient without a nosocomial infection.
Further, if m is the patient in the positive example sample, then m is recorded as the mth patient in S; if m is the patient in the counterexample sample, then m is the randomly drawn patient.
Further, the method for updating data in step 7 adopts an incremental updating method, which includes the following steps:
step 7a, sequencing the positive and negative sample sets N in the step 2 in an ascending order according to a sequence of time from front to back so as to ensure that the time is arranged from front to back in the incremental updating process, thereby ensuring that a new value always covers an old value during updating;
step 7b, storing the sample i with the earliest time in the sample set N into a sample set D, correspondingly storing the sample i into a set T according to the characteristics of the hospital infection data determined in the step 1, and respectively recording Tk _ v and Tk _ date which represent the value of the kth characteristic in the set T corresponding to the sample i in the set N and the date of the value;
step 7c, carrying out missing value judgment on the second and all the subsequent samples i in the sample set N, updating the missing values, and reserving the un-missing values; if the value Tk _ v of the feature Tk of the sample i is a missing value, finding the value Tk _ v and Tk _ date of the feature Tk in the sample set D in a reverse order, if the value in the sample set D is not null, and the difference between the Tk _ date and the Tk _ date in the sample i does not exceed an effective time range, taking out and updating the value into the Tk _ v of the sample i to replace the missing value, wherein the reverse order traversal is required to ensure that the traversed sample in the set D is always closest to the current sample in time, and the following is the same; if the value in the sample set D is not null but exceeds the valid time range, the missing state of traversing and maintaining the kth characteristic of the sample i is deduced; if the value in the sample set D is empty, the next value is continuously traversed.
Step 7D, storing the updated or reserved samples into the sample set D, reading subsequent samples according to the sequence of the step 5 and storing sample data;
and 7e, when the step 7c and the step 7D are repeated to obtain that i is equal to N, the reading is completed, and the sample set D is constructed.
The invention also provides an analysis modeling method for sampling hospital infection data samples by a time period reasoning method, which comprises the following steps:
step A1, determining the characteristics of hospital infection data, and classifying the characteristics according to an effective time range;
step A2, determining patients generating positive and negative samples, wherein the positive sample is a patient sample with nosocomial infection, and the negative sample is a patient sample without nosocomial infection;
step A3, dividing positive and negative examples by adopting a time period reasoning mode, wherein the specific implementation mode is as described in the step 1 to the step 8;
step A4, generating a sample set by using an incremental update method, the specific implementation manner is as described in the foregoing steps 7a-7 e;
step a5, analytically modeling the final sample set.
The invention also provides an analysis modeling system for sampling hospital infection data samples by a time period reasoning method, which at least comprises a database, wherein the database stores all patient sets S and case data of patients in the sets S; a sample generating module, which generates a sample set according to the sample generating conditions, such as generating an infected patient set and a non-infected patient set according to the infection condition of the patient; the sample dividing module is used for dividing the sample set generated by the sample generating module into a sample set required by analysis and modeling; and the data updating module realizes the updating of the missing data value through the steps 1 to 7.
The invention also provides a realization method of the analysis modeling system for sampling the hospital infection data samples by a time period reasoning method, which comprises the following steps:
step B1, according to the information of the database, arranging and defining the required patient data items in the hospital infection data and designing a corresponding XML storage structure;
b2, the sample generating module arranges the patient data into the sample format of the needed data according to the set sampling period and the data item as the characteristic, and generates the needed sample set;
in the step B2, the data of nosocomial infections are arranged into samples, each of which is the data of a patient in a set sampling period, and the features in the samples are incrementally updated according to the incremental updating method described above, so as to finally generate a sample set consisting of a plurality of samples of patients in the set sampling period.
Step B3, the sample dividing module divides the sample set according to the finally classified labels to generate a sample set after the final infection sample and the non-infection sample are distinguished;
step B4, the divided sample set is updated incrementally through a data updating module;
and step B5, after the sample set is updated, establishing a model according to a general modeling method.
Further, in step B1, the file is stored in an XML manner, and the file includes basic information of the patient, such as case number, sex, age, infection date, etc., basic information of the patient's admission, such as admission diagnosis, admission department, admission date, etc., and information of the patient setting a sampling period during the admission, such as body temperature, medical order, laboratory examination, microbiological examination, imaging examination, and disease course record, etc.; the storage scheme has the function of storing the information of the patient, is mainly convenient for organizing and applying the data, each item in the XML can be independently taken out and combined with other items for use, each item in the XML has accurate time, and can be organized according to the time sequence, and the use mode depends on the requirements of developers.
The present invention also provides a computer readable medium for resolving hospital infection data sample sampling and hospital infection data analysis and modeling over a computer network, comprising a set of instructions that when executed cause at least one computer to perform the resolving of a problem of hospital infection data sample sampling during hospital infection data analysis modeling and post-sampling data analysis and modeling.
The method for sampling the hospital infection data samples by implementing the method for reasoning through the time period provided by the invention has the following technical effects:
(1) the problem that samples are difficult to divide due to inaccurate infection dates is solved by adopting a time period reasoning mode. According to the method, the infected samples and the non-infected samples are divided in units of days, and the two types of samples are distinguished in a time period mode, so that the problems that the samples are difficult to select and divide are solved, and the infected samples in the time period in the infected state are effectively obtained.
(2) The incremental updating mode is adopted to solve the problem of data missing or real-time data utilization. In the prior method for processing hospital infection missing data and real-time data, the missing value of the data is mostly evaluated, and the samples with more missing are directly deleted and are not used any more, so that the method is not reasonable, because the missing values are more, but few values have reference values if the real-time data is, and the method adopts incremental updating to basically solve the problem of the missing of most data.
(3) The method for classifying different characteristics according to the effective time range solves the problem of different time effectiveness lengths of different characteristics.
(4) The data is stored in an XML mode, and the problem that hospital data is complex and difficult to utilize is solved. In the prior method for processing hospital infection data, most of the data derived through a database and related programs are processed and analyzed, and a relatively universal data structure is not designed for data storage and processing separately. The method has the advantages of convenient storage and processing, and can manage the data in a patient unit, and all the specific information of each patient is integrated into one file, so that the method is favorable for data management, is convenient for research and development workers to retrospectively research the data, and is greatly convenient for data application.
(5) The basic flow and a plurality of difficulties of the analysis and modeling of the big data of the hospital infection are described more clearly, and the basic idea is cleared for the analysis and modeling of the infection data of the hospital.
Drawings
The conception, the specific structure and the technical effects of the present invention will be further described with reference to the accompanying drawings to fully understand the objects, the features and the effects of the present invention.
FIG. 1 is a flow chart of the analytical modeling of hospital infection data in an embodiment of the present invention;
FIG. 2 is a flow diagram of time period inference in an embodiment of the invention;
FIG. 3 is a flow chart of incremental update in an embodiment of the present invention;
FIG. 4 is a flow chart of a method for implementing the analytical modeling system in an embodiment of the present invention;
FIG. 5 is a table of some of the characteristic classifications in criteria (trial) for diagnosing hospital infections, in accordance with an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the following embodiments reference to "effective time range" is to be taken to mean the time range of the patient/patient test data over which the data is aged, e.g. for higher ageing of data such as body temperature, stool frequency, heart rate and respiratory rate, which may vary substantially from day to day, so that the data may be used for 24 hours, and data over 24 hours may not be considered for use, whereas data over three to five days may be considered effective, e.g. for microbiological examinations and laboratory examinations, and 72 or 120 hours may be used, which is collectively referred to as "effective time range". The effective time range is generally determined according to experience or data in a reference document, and can also be established according to actual modeling purposes, and the standard refers to the action time of part of characteristics in the hospital infection diagnosis standard (trial).
"date of diagnosis" means that a patient who has been diagnosed or reported as a nosocomial infection will generally have a "date of infection" diagnosed by a doctor, and is referred to herein as the "date of diagnosis".
By "date of infection" is meant the date on which the patient actually developed the infection.
The "previous time period" refers to a time unit length obtained by previously deducing the time unit length of an infection sample selected by taking the diagnosis date as a reference date.
The "posterior period" refers to a time unit length after the infection sample is selected with the diagnosis date as the reference date and is inferred afterwards as the posterior period.
Fig. 1 shows a modeling process for analyzing hospital infection data, which comprises the following steps:
step A1, determining the characteristics of hospital infection data, such as body temperature, pulse, C-reactive protein and the like, forming a characteristic set of hospital infection, which is marked as F, wherein k represents the kth characteristic in the set F; classifying the feature set F according to an effective time range to generate a set T, wherein Tk represents the category to which the kth feature belongs;
the "effective time range" is intended to mean that the time length of the influence of different characteristics on the human body is different, and is generally determined according to experience or data in a reference document, or may be established according to an actual modeling purpose of the user, and the standard suggests referring to action time of some characteristics in "hospital infection diagnosis standard (trial)", as shown in fig. 5, this embodiment gives partial classification, which may be used for reference.
The determination of the feature set mainly depends on some features summarized in "hospital infection diagnosis criteria (trial) and some features obtained from papers or doctors, and this part of the work is mainly completed in the stage of demand investigation and analysis.
Step A2, determining patients generating positive and negative samples, wherein the positive samples are patient samples with hospital infection and the negative samples are patient samples without hospital infection; firstly, patients with hospital infection need to be obtained, the part is easy to obtain, because the patients with hospital infection have hospital diagnosis or are reported, the part of patients and the diagnosis date corresponding to the part of patients can be directly obtained, then, the patients without hospital infection can be obtained as the patients who are not diagnosed as hospital infection in the hospital, because the part of patients are more, a mode of combining stratified sampling and random sampling is adopted, the method is to stratify the patients in the hospital according to departments, then, each layer extracts part of patients in a random sampling mode, and the number of the finally extracted patients is generally not more than 10 times of the number of the patients with hospital infection;
it should be noted that this step is used to determine which patients are nosocomial infections and which are non-nosocomial infections, and these are not samples used for modeling, because a patient is not suitable as a sample, each patient is in an infected state for some time during the stay, and other times are normal, and only the time in which the patient is in an infected state can be used as an infected sample, i.e., the samples are time-series in nature.
Step A3, dividing positive and negative example samples by adopting a time period reasoning mode; after determination of the hospital-infected patient and the non-hospital-infected patient, positive and negative samples can be generated in time series. In the case, samples are mainly generated in units of days, so that each patient can be used as a sample every day in a hospital period, however, the samples are not generated according to data of each day in the hospital period of the patient, for the patient infected by the hospital, data of some days in the hospital period of the patient can be extracted in a random sampling mode, for the patient infected by the hospital, data of a corresponding time period can be extracted by applying a time period reasoning method, wherein a front time period and a rear time period of the time period reasoning need to try to find reasonable values for a plurality of times when positive and negative sample division is carried out, and two time periods generally suggest no more than 5 days; the process of using temporal reasoning is shown in fig. 2, and includes:
step A3a, recording a set of hospital infection patients as C, and recording a set formed by diagnosis dates as Cd;
step A3b, randomly extracting n patients from the set C, and obtaining the diagnosis dates corresponding to the n patients;
step A3C, further diagnosing the n patients according to the hospital infection diagnosis criteria (trial), obtaining arrays a _ pre and a _ end composed of "preceding time period" and "following time period" of the n patients, respectively summing and re-averaging the two arrays of the n patients, obtaining average values of the two sets of values, namely avg _ pre ═ sum (a _ pre)/n and avg _ end ═ sum (a _ end)/n, and using the average values as two parameters for time period inference of all patients C;
step A3d, generating a sample set by adopting an incremental updating method and carrying out modeling test;
and step A3e, continuously fine-tuning the avg _ pre and avg _ end according to the test result, such as +1 or-1 at the same time, and the like to optimize the set to finally obtain a value with a better effect.
Step A4, after the positive and negative samples are divided, generating a sample set by adopting an incremental updating method; the step is the same as the step of the step A3d, where different characteristics need to be incrementally updated according to the "valid time range" to which the data characteristics belong in the step 1, and it needs to be noted that due to the fact that regular samples obtained by applying time period inference to the hospital-infected patient are continuous in time, the method can solve most of data loss problems, however, due to the fact that random sampling is adopted to the non-hospital-infected patient, it is difficult to guarantee certain continuity in time, and the "incremental updating" here does not necessarily solve the problem of data loss, and for this situation, it is necessary to deal with the situation according to the actual situation, and if the loss value is too much, it is considered that random continuous sampling is selected for several days when the non-hospital-infected patient sample is selected; fig. 3 shows a method for processing a sample missing value by using an incremental update method, which includes the following specific steps:
step A4a, designating the group of all patients in the aforementioned step A3 as S, m being the mth patient in S;
step A4a, traversing the set S to obtain a hospital infected patient m in the set S, performing time quantum inference on the m to generate a positive and negative sample set N, and sorting the N in ascending order according to the date of the day, wherein the sorting aims to ensure that the time is arranged from small to large when the incremental updating is performed, so that a new value is always covered on an old value when the updating is performed, and if the patient m is a non-infected patient, generating the sample set N by adopting a random sampling method;
step A4b, beginning to traverse a sample set N, wherein a first sample i is a sample with the minimum time, is directly stored in a sample set D, classifies the characteristics of the sample i into a set T, and records Tk _ v and Tk _ date which represent the value of the kth characteristic of the sample i and the date of the value;
step A4c, beginning to traverse the second and all the following samples i, judging the value Tk _ v of each feature Tk in i, if the value is a missing value, performing the step 5, otherwise, retaining the value and not performing any processing;
step A4D, if the value Tk _ v of the feature Tk of the sample i is a missing value, finding the value Tk _ v and Tk _ date of the feature Tk in the sample set D in reverse order, if the value in D is not empty and the difference between Tk _ date and Tk _ date in i does not exceed "valid time range", taking out and updating the value into Tk _ v of the sample i to replace the missing value, if the value in D is not empty but exceeds "valid time range", then pushing out the missing state of traversing and maintaining the kth feature of the sample i, and if the value in D is also empty, continuing to traverse the next value. The reverse-order traversal is required here to ensure that the traversed samples in set D are always closest in time to the current sample;
step A4e, after updating or reserving, storing the sample into the sample set D and reading the next sample, that is, i is i + 1;
and step A4f, judging whether the i is satisfied or not, if so, completing traversal, completing the construction of the sample set D, and if not, continuing the next step.
Step A5, analyzing, modeling, testing and optimizing the final sample set; this step is the same as the aforementioned step A3 e; after the sample set is generated, the most main difficulties of hospital infection data are basically solved, when analysis modeling is carried out, follow-up work can be completed basically according to the basic processes of data analysis and machine learning, however, it needs to be noted that the selection of a machine learning algorithm is not optional, and the early warning result of the hospital infection monitoring early warning model generally needs to have interpretability, namely rational data, so that the algorithm must select an algorithm with interpretative properties, such as a decision tree, a random forest, a logistic regression and the like, and the algorithms of deep learning, a support vector machine and the like are not suggested to be used; the modeling and testing process is shown in fig. 3, which still uses conventional algorithms and steps, as follows:
a5a, modeling a sample set D, suggesting and selecting algorithms such as a decision tree, a random forest, a logistic regression and the like, wherein the algorithms have interpretability, and recording sensitivity and specificity indexes of the algorithms on a test set;
step A5b, after recording sensitivity and specificity indexes, fine-tuning avg _ pre and avg _ end, modeling and testing again, and recording two indexes;
step A5c, modeling tests are carried out for multiple times, and two indexes with the best effect are found, wherein the avg _ pre and the avg _ end are basically the best values;
after the final model is constructed, the model can be integrated online, and the model has great difference according to different systems, but has universality basically.
The invention also provides an analysis modeling system for solving the problem of hospital infection data loss based on the incremental updating method, which at least comprises a database, wherein the database stores all patient sets S and case data of the patients in the sets S; a sample generating module, which generates a sample set according to the sample generating conditions, such as generating an infected patient set and a non-infected patient set according to the infection condition of the patient; the sample dividing module is used for dividing the sample set generated by the sample generating module into a sample set required by analysis and modeling; and a data updating module, wherein the data updating module realizes the updating of the missing data value through the step A4 a-the step A4 f.
An implementation method of an analysis modeling system for solving hospital infection data loss based on an incremental updating method is shown in fig. 4, and includes the following steps:
step B1, according to the information of the database, the patient data items needed in the hospital infection data are sorted and defined and a corresponding XML storage structure is designed;
b2, the sample generating module arranges the patient data into the sample format of the needed data according to the set sampling period and the data item as the characteristic, and generates the needed sample set;
in step B2, the nosocomial infection data is arranged into samples, each of which is the data of one patient in a set sampling period, and the features in the samples are incrementally updated according to the incremental updating method described above, so as to finally generate a sample set consisting of a plurality of samples of patients in the set sampling period.
Step B3, the sample dividing module divides the sample set according to the finally classified labels to generate a sample set after the final infection sample and the non-infection sample are distinguished;
step B4, the divided sample set is updated incrementally through a data updating module;
and step B5, after the sample set is updated, establishing a model according to a general modeling method.
Further, in step B1, the file is stored in an XML manner, and the file includes basic information of the patient, such as case number, sex, age, infection date, etc., basic information of the patient's admission, such as admission diagnosis, admission department, admission date, etc., and information of the patient setting a sampling period during the admission, such as body temperature, medical order, laboratory examination, microbiological examination, imaging examination, and disease course record, etc.; the storage scheme has the function of storing the information of the patient, is mainly convenient for organizing and applying the data, each item in the XML can be independently taken out and combined with other items for use, each item in the XML has accurate time, and can be organized according to the time sequence, and the use mode depends on the requirements of developers.
A computer-readable medium for selecting a sample set and analyzing and modeling nosocomial infection data over a computer network, comprising a set of instructions that, when executed, cause at least one computer to perform a process for solving the problem of sample set selection during the analysis and modeling of nosocomial infection data and analyzing and modeling data after selecting the sample set.
The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.

Claims (10)

1. A time period reasoning method for selecting a sample of hospital infection data is characterized by comprising the following steps:
step 1, determining the characteristics of hospital infection data, classifying the characteristics according to an effective time range, wherein a characteristic set is marked as F, k represents the kth characteristic in the set F, and the effective time range is a general name of the aging time range of patient/patient detection data;
step 2, recording a set composed of all patients as S, obtaining a patient m in the set S, and generating a positive and negative sample set N for the patient m;
step 3, after the positive and negative sample set N is generated in the step 2, recording a hospital infected patient set as C, and recording a set formed by diagnosis dates of infected patients as Cd;
step 4, randomly extracting n patients from the set C, and obtaining diagnosis dates corresponding to the n patients;
step 5, diagnosing the n patients in the step 4, and obtaining arrays A _ pre and A _ end formed by data of 'previous time period' and 'next time period' of the n patients, wherein the 'previous time period' is the time unit length obtained by selecting infection samples by taking the diagnosis date as the reference date and deducing in the previous time, and the 'next time period' is the time unit length obtained by selecting infection samples by taking the diagnosis date as the reference date and deducing in the next time;
step 6, summing the two arrays in the step 5, and then averaging to obtain two average values avg _ pre ═ sum (a _ pre)/n and avg _ end ═ sum (a _ end)/n; these two averages serve as two parameters for time period inference for all patients in set C, approximating the "previous time period" and "subsequent time period" for all patients in set C;
step 7, updating the data to generate a sample set D and carrying out modeling test according to the sample set D;
step 8, continuously fine-adjusting the avg _ pre and the avg _ end according to the test result to obtain a final required value;
wherein, the positive sample in step 2 is the m-sample of the patient with the hospital infection, and the negative sample is the m-sample of the patient without the hospital infection.
2. A time period inference method according to claim 1, wherein if m is a patient in the positive sample, then m is scored as the mth patient in S; if m is the patient in the counterexample sample, then m is the randomly drawn patient.
3. The time period inference method of claim 1, wherein the method of updating data in step 7 is an incremental update method, comprising the steps of:
step 7a, sequencing the positive and negative sample sets N in the step 2 in an ascending order according to a sequence of time from front to back so as to ensure that the time is arranged from front to back in the incremental updating process, thereby ensuring that a new value always covers an old value during updating;
step 7b, storing the sample i with the earliest time in the sample set N into a sample set D, correspondingly storing the sample i into a set T according to the characteristics of the hospital infection data determined in the step 1, and respectively recording Tk _ v and Tk _ date which represent the value of the kth characteristic in the set T corresponding to the sample i in the set N and the date of the value;
step 7c, carrying out missing value judgment on the second and all the subsequent samples i in the sample set N, updating the missing values, and reserving the un-missing values;
step 7D, storing the updated or reserved samples into the sample set D, reading subsequent samples according to the sequence of the step 5 and storing sample data;
and 7e, when the step 7c and the step 7D are repeated to obtain that i is equal to N, the reading is completed, and the construction of the sample set D is completed.
4. A time period inference method according to claim 3, wherein, in step 7c, if the value Tk _ v of the signature Tk of the sample i is a missing value, the values Tk _ v and Tk _ date of the signature Tk are found in reverse order in the sample set D, and if the value in the sample set D is not empty and the difference between Tk _ date and Tk _ date in i does not exceed the "valid time range", the value is updated to Tk _ v of the sample i instead of the missing value.
5. A time period inference method according to claim 3, characterised in that in step 7c, if the value Tk _ v of the feature Tk of the sample i is a missing value, the values Tk _ v and Tk _ date of the feature Tk are found in reverse order in the sample set D, and if the value in the sample set D is not empty but exceeds the "valid time range", the missing state of the kth feature of the sample i is deduced by traversal.
6. A time period inference method according to claim 3, characterised in that in step 7c, if the value Tk _ v of the signature Tk of the sample i is a missing value, the values Tk _ v and Tk _ date of the signature Tk are found in reverse order in the sample set D, and if the value in the sample set D is empty, the next value is continued to be traversed.
7. An analytical modelling approach to address hospital infection data sample selection by the time period inference method of any of claims 1-6, comprising the steps of:
step A1, determining the characteristics of hospital infection data, and classifying the characteristics according to an effective time range;
step A2, determining patients generating positive and negative samples, wherein the positive sample is a patient sample with nosocomial infection, and the negative sample is a patient sample without nosocomial infection;
step A3, dividing positive and negative examples samples by adopting a time period reasoning mode, wherein the specific implementation mode is as described in the step 1 to the step 8;
step A4, generating a sample set by using an incremental update method, wherein the specific implementation manner is as described in steps 7a-7 e;
step a5, analytically modeling the final sample set.
8. An analytical modelling system for addressing hospital infection data sample selection by the time period inference method of any one of claims 1-6, comprising at least a database in which the case data of all patients in set S and in set S are stored; the sample generation module generates a sample set according to the sample generation condition; the sample dividing module is used for dividing the sample set generated by the sample generating module into a sample set required by analysis and modeling; and the data updating module realizes the updating of the missing data value through the steps 1 to 8.
9. An implementation method of an analytical modeling system for resolving hospital infection data sample selection by the time period inference method of claim 8, comprising the steps of:
step B1, according to the information of the database, the patient data items needed in the hospital infection data are sorted and defined and a corresponding XML storage structure is designed;
b2, the sample generating module arranges the patient data into the sample format of the needed data according to the set sampling period and the data item as the characteristic, and generates the needed sample set;
in the step B2, the data of nosocomial infection is arranged into samples, each of which is the data of a patient in a set sampling period, and the incremental updating method according to claim 3 is used to incrementally update the features in the samples, so as to finally generate a sample set consisting of a plurality of samples of patients in the set sampling period;
step B3, the sample dividing module divides the sample set according to the finally classified labels to generate a sample set after the final infection sample and the non-infection sample are distinguished;
step B4, the divided sample set is updated incrementally through a data updating module;
and step B5, after the sample set is updated, establishing a model according to a general modeling method.
10. A computer readable medium for selecting a sample set and hospital infection data analysis and modeling over a computer network, comprising a set of instructions which, when executed, cause at least one computer to perform the steps of solving the problem of sample selection during the hospital infection data analysis and modeling process and analyzing and modeling the data after sample selection according to any one of claims 1-6.
CN201811129775.3A 2018-09-27 2018-09-27 Time period reasoning method for selecting samples of hospital infection data Active CN109360657B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811129775.3A CN109360657B (en) 2018-09-27 2018-09-27 Time period reasoning method for selecting samples of hospital infection data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811129775.3A CN109360657B (en) 2018-09-27 2018-09-27 Time period reasoning method for selecting samples of hospital infection data

Publications (2)

Publication Number Publication Date
CN109360657A CN109360657A (en) 2019-02-19
CN109360657B true CN109360657B (en) 2022-06-03

Family

ID=65347853

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811129775.3A Active CN109360657B (en) 2018-09-27 2018-09-27 Time period reasoning method for selecting samples of hospital infection data

Country Status (1)

Country Link
CN (1) CN109360657B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111312404B (en) * 2020-01-21 2023-04-18 杭州杏林信息科技有限公司 Method, equipment and storage medium for counting number of blood stream infected persons related to new central vascular catheter
CN111312346B (en) * 2020-01-21 2023-04-18 杭州杏林信息科技有限公司 Statistical method, equipment and storage medium for newly infected number of inpatients
CN112002383B (en) * 2020-06-30 2024-03-08 杭州杏林信息科技有限公司 Automatic management method and system for number of people in hospital infection state in specific period
CN112037893A (en) * 2020-07-08 2020-12-04 杭州杏林信息科技有限公司 Automatic management method and system for number of people in hospital infection state at specified time point

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002065135A3 (en) * 2001-02-15 2003-05-30 Affitech As Determination of level of immunoglobulin modification
CN1598858A (en) * 2004-05-13 2005-03-23 郑州市疾病预防控制中心 Integral management system for digital information of hospital
CN105893725A (en) * 2014-11-13 2016-08-24 北京众智汇医科技有限公司 Management system for an entire process of hospital infection prevention and control, and method thereof
CN106390117A (en) * 2009-10-16 2017-02-15 奥默罗斯公司 Methods for treating conditions associated with masp-2 dependent complement activation
CN107658023A (en) * 2017-09-25 2018-02-02 泰康保险集团股份有限公司 Disease forecasting method, apparatus, medium and electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002065135A3 (en) * 2001-02-15 2003-05-30 Affitech As Determination of level of immunoglobulin modification
CN1598858A (en) * 2004-05-13 2005-03-23 郑州市疾病预防控制中心 Integral management system for digital information of hospital
CN106390117A (en) * 2009-10-16 2017-02-15 奥默罗斯公司 Methods for treating conditions associated with masp-2 dependent complement activation
CN105893725A (en) * 2014-11-13 2016-08-24 北京众智汇医科技有限公司 Management system for an entire process of hospital infection prevention and control, and method thereof
CN107658023A (en) * 2017-09-25 2018-02-02 泰康保险集团股份有限公司 Disease forecasting method, apparatus, medium and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
医院感染监测基本数据集的建立及作用;付强等;《中华医院感染学杂志》;20161231;第26卷(第11期);全文 *

Also Published As

Publication number Publication date
CN109360657A (en) 2019-02-19

Similar Documents

Publication Publication Date Title
CN109360657B (en) Time period reasoning method for selecting samples of hospital infection data
US20200337580A1 (en) Time series data learning and analysis method using artificial intelligence
CN112365978B (en) Method and device for establishing early risk assessment model of tachycardia event
US20100217144A1 (en) Diagnostic and predictive system and methodology using multiple parameter electrocardiography superscores
Shi et al. Inter-patient heartbeat classification based on region feature extraction and ensemble classifier
WO2019161611A1 (en) Ecg information processing method and ecg workstation
Linker Accurate, automated detection of atrial fibrillation in ambulatory recordings
CN110680326A (en) Pneumoconiosis identification and grading judgment method based on deep convolutional neural network
CN108492877B (en) Cardiovascular disease auxiliary prediction method based on DS evidence theory
Udawat et al. An automated detection of atrial fibrillation from single‑lead ECG using HRV features and machine learning
CN111584021A (en) Medical record information verification method and device, electronic equipment and storage medium
CN113995419B (en) Atrial fibrillation risk prediction system based on heartbeat rhythm signal and application thereof
CN115563484A (en) Street greening quality detection method based on physiological awakening identification
CN112932498A (en) T wave morphology classification system with strong generalization capability based on deep learning
CN111951965A (en) Panoramic health dynamic monitoring and predicting system based on time sequence knowledge graph
CN108597615A (en) A kind of screening reference method of Patients with Mild Cognitive Impairment dementia conversion
CN109461480B (en) Incremental updating method for hospital infection data loss
CN115607166B (en) Intelligent electrocardiosignal analysis method and system and intelligent electrocardio auxiliary system
Roobini et al. Diagnosis of Alzheimer Disease using Classification Algorithms
Manilo et al. ECG database of short fragments with arrhythmias classification according to the degree of danger to the patient’s life
Lin et al. Algorithm for clustering analysis of ECG data
Yu An ECG arrhythmia image classification system based on convolutional neural network
Firoz et al. Detection of myocardial infarction using hybrid CNN-LSTM model
AU2021102832A4 (en) System & method for automatic health prediction using fuzzy based machine learning
Junior et al. Automatic Processing of Histological Imaging to Aid Diagnosis of Cardiac Remodeling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20230824

Address after: 200032 No. 136, Xuhui District Medical College, Shanghai

Patentee after: ZHONGSHAN HOSPITAL, FUDAN University

Patentee after: SHANGHAI LILIAN INFORMATION TECHNOLOGY CO.,LTD.

Address before: 200444 room 1536, building 1, No. 668, SHANGDA Road, Baoshan District, Shanghai

Patentee before: SHANGHAI LILIAN INFORMATION TECHNOLOGY CO.,LTD.

TR01 Transfer of patent right