WO2020258483A1 - Clinical medication behavior analysis system based on highly effective negative sequential mining pattern, and working method therefor - Google Patents

Clinical medication behavior analysis system based on highly effective negative sequential mining pattern, and working method therefor Download PDF

Info

Publication number
WO2020258483A1
WO2020258483A1 PCT/CN2019/102473 CN2019102473W WO2020258483A1 WO 2020258483 A1 WO2020258483 A1 WO 2020258483A1 CN 2019102473 W CN2019102473 W CN 2019102473W WO 2020258483 A1 WO2020258483 A1 WO 2020258483A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
sequence
negative
patient
bitmap
Prior art date
Application number
PCT/CN2019/102473
Other languages
French (fr)
Chinese (zh)
Inventor
董祥军
高欣明
Original Assignee
齐鲁工业大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 齐鲁工业大学 filed Critical 齐鲁工业大学
Publication of WO2020258483A1 publication Critical patent/WO2020258483A1/en

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H20/00ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
    • G16H20/10ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to drugs or medications, e.g. for ensuring correct administration to patients
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H40/00ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices
    • G16H40/60ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices for the operation of medical equipment or devices
    • G16H40/67ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices for the operation of medical equipment or devices for remote operation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/40ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage

Definitions

  • the invention relates to a clinical medication behavior analysis system based on an efficient negative sequence mining mode and a working method thereof, and belongs to the technical field of application of negative sequence modes.
  • Data mining is the process of discovering hidden knowledge in a large information repository. Data mining techniques developed for retail or other industries can be applied to medical care. Data mining is a multidisciplinary research field that incorporates the latest theories and research methods such as database technology, pattern recognition, machine learning, fuzzy logic, artificial intelligence, information retrieval, statistics, high-performance computing, and neural networks.
  • Sequential pattern mining refers to the mining of relatively time or other patterns with high frequency. It can discover potentially useful information and knowledge between transactions that people do not know in advance.
  • one of the problems to be solved by sequential pattern analysis is that after the doctor prescribes a medicine to the patient, what kind of medicine will be used in a specific period of time in the future, and the difference between medicine and medicine, medicine and disease
  • the process of interrelationship rules allows doctors to refer to past drug prescriptions when diagnosing and administering patients to accurately determine the patient’s next medication.
  • the order of medication is to prescribe glucose injection solution, prescribe vitamin 6, then prescribe cephalosporin injection, and finally prescribe sodium chloride injection. Therefore, the sequence mode can discover a frequent sequence in a certain period of time in the database, that is, which drugs will be used by doctors in this period of time, and the standard of more or less is determined by the minimum support.
  • Each sequence is a group of combinations arranged according to the time of medication, and the minimum support can be set to mine sequences that meet different levels of frequency.
  • PSP positive sequential pattern
  • Negative Sequential Pattern involves not only events that have occurred, but also events that have not occurred. It can analyze and understand the potential meaning of the data more deeply, so as to dig out very valuable information that is easy to be ignored by people.
  • a, b, c, d Represents a medication sequence mode, which indicates that within a certain period of time, the patient used medication d after taking medication a and b without using medication c.
  • the value of the negative sequence model is more and more recognized by people. It has an irreplaceable role in understanding and processing many medical applications, such as the analysis of patient medication behavior.
  • the patient medication record data in the hospital is the data source for mining. Take the diagnosis and treatment records of 5 patients within 2 months as an example, as shown in Table 1 is a transaction database sorted by patient ID and drug issuance time as keywords.
  • a transaction database a transaction represents a treatment situation, a single item represents the used medicine, and the letter in the single attribute records the medicine ID. Perform data preprocessing and organize the transaction database of Table 1 into the sequence database of Table 2.
  • Patient ID The sequence of drugs used by the patient 1 ⁇ c ⁇ i ⁇ 2 ⁇ a,b ⁇ c ⁇ a,d,f,g ⁇ 3 ⁇ c,e,g,h ⁇ 4 ⁇ c ⁇ c,d,g,h ⁇ i ⁇ 5 ⁇ i ⁇
  • All medication records of a patient in a certain period of time constitute an ordered sequence, and the sequence is represented by ⁇ >.
  • the items/item sets are in order, each item represents a kind of medicine, and the element refers to all medicines used by the patient at a specific point in time, denoted by ⁇ or (),
  • the patient may use the same Chinese medicine in different time periods, that is, an item may occur in different elements of a sequence.
  • the drug sequence with ID 2 in Table 2 is ⁇ a,b ⁇ c ⁇ a,d,f,g ⁇ .
  • the patient used drug a during the first and third treatments where ⁇ a ,b ⁇ , ⁇ c ⁇ , ⁇ a,d,f,g ⁇ these three itemsets can be called sequence elements, a,b,c,d,f,g are called items, if only one element For an item, the parentheses can be omitted.
  • the element ⁇ c ⁇ in the sequence can be directly written as c.
  • the present invention provides a clinical medication behavior analysis system based on efficient negative sequence mining patterns
  • the invention also provides a working method of the above-mentioned clinical drug use behavior analysis system based on the efficient negative sequence mining mode.
  • the present invention proposes an efficient negative pattern mining algorithm named eNSP-IT.
  • Applying the eNSP-IT algorithm to the analysis of clinical medication behavior can quickly find out the negative sequence relationship between medications, thereby better predicting the next medication of patients, and supporting clinical decision-making based on changes in medication regimens.
  • Prefixspan algorithm a classic positive sequence pattern mining algorithm, which is based on depth-first search. Its basic idea is to use frequent prefixes to divide the search space and projection sequence database, and search for related frequent sequences.
  • Data set means a collection containing all data sequences.
  • Support indicates that the frequency of a candidate sequence in the database is called support.
  • Minimum support minimum support, min_sup for short, indicates the minimum frequency of frequent patterns in the database, which is set by the user. When the support of the candidate sequence is greater than the minimum support, this candidate sequence is a frequent pattern.
  • a clinical drug behavior analysis system based on an efficient negative sequence mining model including a data acquisition system and a behavior analysis system that are connected through a transmission network communication;
  • the data acquisition system includes a data acquisition module and a data transmission module connected in sequence;
  • the data collection module is used to collect and save the patient's clinical medication behavior data in real time.
  • the clinical medication behavior data includes the patient's ID number, timestamp (i.e., the time of diagnosis and treatment), prescribed drugs, symptoms, symptoms, and the department where the patient is located;
  • the data transmission module is used to transmit the patient's clinical medication behavior data to the behavior analysis system through a transmission network;
  • the behavior analysis system includes a data processing module, a data analysis module, and a data management module connected in sequence; and is arranged in a cloud server.
  • the data transmission module is connected to the data processing module;
  • the data processing module is used to perform data cleaning on the collected clinical medication behavior data of the patient, and to classify the data according to the department and symptoms of the patient;
  • the data analysis module is used to analyze and predict the clinical medication behavior of the patient according to the processing result of the data processing module; the steps are as follows:
  • the data analysis module establishes the medication behavior sequence corresponding to the patient’s ID number based on the clinical medication behavior data processed by the data processing module, and combines the clinical medication behavior analysis method of the effective negative sequence mining mode to analyze the clinical medication behavior Behavior analysis and prediction.
  • the clinical medication behavior data of patients in the department and the same symptoms constitute a sequence database.
  • Each patient’s ID number corresponds to a patient’s medication records in a certain period of time to form an orderly sequence;
  • Use an efficient negative sequence mining model of clinical drug behavior analysis method to mine the sequence database to obtain a negative sequence model that meets the minimum support requirements, that is, the commonly used treatment drugs for this disease, the order of medication, and the relationship between drugs and drugs.
  • the negative sequence patterns that can be used for decision-making are screened out, and the patient's medication behavior is analyzed by using the sequence patterns for decision-making.
  • the data management module is used to store and display the processing results of the data processing module and the clinical medication behavior results analyzed by the data analysis module, and when the doctor prescribes the medication, the next medication is recommended.
  • the data management module is used to view all clinical medication behavior records and all frequent clinical medication behaviors. When the doctor treats the patient, the system will provide the commonly used treatment plan for this disease, and when the first treatment plan is not satisfactory, it will provide an alternative treatment plan.
  • the transmission network is a wired public network, a local area network or a 3G/4G network.
  • the invention adopts a cloud management platform design (such as Facebook Cloud Server, Huawei Cloud, JD Cloud, etc.), and each hospital does not need to configure a server.
  • the hospital rents the cloud management platform server of this system to help the hospital connect with the various system interfaces in the hospital and import data. You can log in to the system through the Internet at any place with the corresponding authority, without installing a client, and realize the flexibility of security management.
  • This system can also be deployed in the hospital's local privatized cloud, and log in to the hospital's local area network to connect.
  • the working method of the above-mentioned clinical medication behavior analysis system based on the efficient negative sequence mining model includes the following steps:
  • the data collection module collects and saves the patient's clinical medication behavior data in real time.
  • the clinical medication behavior data includes the patient's ID number, timestamp (that is, the time of diagnosis and treatment), prescribed drugs, symptoms, symptoms, and the patient's department;
  • Set negative candidate sequence ns for example, set a negative candidate sequence as It means that the drugs b and d are not used, and a and c are the drugs a and b used;
  • m-size refers to the m elements contained in the negative candidate sequence ns; for example, Is a 4-size sequence;
  • MPS(ns) refers to the largest positive subsequence of the negative candidate sequence ns, which is composed of all the positive elements contained in the negative candidate sequence ns in the original order; for example: in ns Represents drugs not used, and a and c represent drugs used; the largest positive sequence is
  • Set 1-negMS ns to refer to the subsequence of the negative candidate sequence ns, and the subsequence is composed of MPS(ns) and a negative element;
  • Setting 1-negMSS ns refers to the set of subsequences of all negative sequences including the negative candidate sequence ns;
  • Setting p(1-negMS ns ) means that the positive element in the sequence 1-negMS ns remains unchanged, and the negative element is converted to the corresponding positive element; for example:
  • Setting ds refers to a data sequence in the database, ds contains the drugs used by a patient during this treatment, and the drugs are arranged in the order of medication;
  • Element constraint means: no negative items are allowed inside elements; only elements in the sequence can become negative; for example: Meet the constraints; and Does not meet the constraints because Is the element Internal negative
  • the format constraint means that there are no consecutive 2 or more negative elements; for example: The constraint is not satisfied because the negative element Are two consecutive negative elements;
  • the data transmission module transmits the patient's clinical medication behavior data to the behavior analysis system through the transmission network, and the behavior analysis system uses the eNSP-IT algorithm to analyze the clinical medication behavior data, including the following steps:
  • the data processing module performs data cleaning on the collected clinical medication behavior data of the patient, and classifies the data according to the department and disease of the patient;
  • the data analysis module analyzes and predicts the clinical medication behavior of the patient according to the processing result of the data processing module
  • the data management module stores and displays the processing results of the data processing module and the clinical medication behavior results analyzed by the data analysis module, and when the doctor prescribes drugs, the next medication is recommended.
  • the data processing module performs data cleaning on the collected clinical medication behavior data of the patient, and classifies the data according to the patient's department and disease, including the following steps:
  • the standardized processing refers to the integration of data, that is, the weekly medication records of patients with the same patient ID number are sorted into a sequence to form a complete sequence
  • the item/item set is ordered, and each item represents a drug, and Elements refer to all medicines used by the patient at a specific point in time; the patient may use the same Chinese medicine in different time periods, that is, an item may occur in different elements in a sequence.
  • the patient's clinical drug behavior data is classified, and according to the patient's ID number, time stamp (that is, the time of diagnosis and treatment), drugs prescribed, symptoms, symptoms and the patient's department Stored in the data management module.
  • the data analysis module analyzes and predicts the patient's clinical medication behavior according to the processing result of the data processing module, including the following steps:
  • Prefixspan Use the modified positive sequence pattern mining algorithm Prefixspan to mine all positive sequence patterns, that is, the order of the most frequently used drugs in the patient population within a certain period of time.
  • Prefixspan right Every frequent positive sequence uses a bitmap to store the data sequence ID number that contains it;
  • the negative candidate sequence generation method of PNSP is used to generate negative candidate sequence (Negative Sequential Candidates, NSC), which is used to determine which drugs are used more frequently and which drugs are not used in a certain period of time ;
  • the PrefixSpan algorithm in order to improve the time efficiency of negative sequence pattern mining, is used to mine the positive sequence pattern, and at the same time, the Bitmap strategy is used to further enhance the PrefixSpan algorithm to improve space efficiency.
  • the modified PrefixSpan algorithm uses simple bitmap structures and operations to obtain sequential patterns, including the following steps:
  • m Calculate the support of each item according to the bitmap of each item, that is, the number of 1 in the bitmap; determine whether the support of the item meets the minimum support min_sup, which is set by the user , The minimum frequency of frequent patterns; if the item's support is greater than or equal to the minimum support min_sup, then the item is a PSP of length 1, and the PSP of length 1 is regarded as a prefix of length 1; otherwise, it is not a length of 1 1 PSP, delete this item;
  • the projection database of the prefix ⁇ a> contains the projections of the first, second, third and fourth data series relative to the prefix ⁇ a> and the ID of the data series;
  • the new prefix is For a PSP with length i, if the PSP is a 1-size PSP, store its support directly, otherwise, continue to use the bitmap to store information;
  • the prefix is each new prefix after the merged item, and steps o to q are executed recursively.
  • step h in order to increase the number of NSPs mined, ENSP-IT relaxes the frequent constraint, and at the same time adopts the PNSP negative candidate sequence generation method.
  • the steps are as follows:
  • the definition constraint is: continuous negative elements in NSP are not allowed; 2-size NSC is generated by the arrangement of 1-size PSP and 1-size NSP, for example If the last element of ns is a positive element, add 1-size PSP or 1-size NSP; otherwise, add 1-size PSP;
  • the k-size NSC is trimmed before calculating its support.
  • the trimming method is:
  • the k-size NSC is trimmed before calculating its support.
  • the trimming method is:
  • the step i calculating the support degree of the negative candidate sequence, refers to:
  • ns the size of ns is 1, and ns has only 1 negative element, the support of ns is:
  • ns contains only one negative term, the support degree of sequence ns is:
  • OR refers to the AND operation in the bit operation, that is, the bitmap corresponding to p(1-negMS i ) is ANDed one by one, and the AND operation means multiple The two bitmaps are merged to generate a new bitmap. If the same position in the bitmap is all 1, the corresponding position on the new bitmap is 1, otherwise, all are 0.
  • B( ⁇ ace>)
  • , B( ⁇ cef>)
  • the negative constraint conditions of the eNSP-IT algorithm are more relaxed, which can mine more sequence patterns and provide users with more decision information.
  • the application of the present invention can fully combine the positive and negative sequence patterns as a reference in the process of clinical drug analysis, so as to discover the most commonly used drug treatment plan in the treatment of a certain disease, so that the doctor can treat the patient During treatment, the present invention can provide him with previous treatment plans, so as to better predict the patient's next medication and support clinical decision-making based on changes in the medication plan.
  • Fig. 1 is a structural block diagram of a clinical medication behavior analysis system based on an efficient negative sequence mining model of the present invention.
  • a clinical drug behavior analysis system based on an efficient negative sequence mining model includes a data acquisition system and a behavior analysis system connected through a transmission network communication;
  • the data acquisition system includes a data acquisition module and a data transmission module connected in sequence;
  • the data collection module is used to collect and save the patient's clinical medication behavior data in real time.
  • the clinical medication behavior data includes the patient's ID number, timestamp (that is, the time of diagnosis and treatment), prescribed drugs, symptoms, symptoms, and the patient's department;
  • the data transmission module is used to transmit the patient's clinical medication behavior data to the behavior analysis system through the transmission network;
  • the behavior analysis system includes a data processing module, a data analysis module, and a data management module connected in sequence; and is set in the cloud server.
  • the data transmission module is connected to the data processing module;
  • the data processing module is used to clean the collected clinical medication behavior data of the patient and classify the data according to the department and disease of the patient;
  • the data analysis module is used to analyze and predict the clinical medication behavior of patients according to the processing results of the data processing module; the steps are as follows:
  • the data analysis module establishes the medication behavior sequence corresponding to the patient’s ID number based on the clinical medication behavior data processed by the data processing module, and combines the clinical medication behavior analysis method of the effective negative sequence mining mode to analyze the clinical medication behavior Behavior analysis and prediction.
  • the clinical medication behavior data of patients in the department and the same symptoms constitute a sequence database.
  • Each patient’s ID number corresponds to a patient’s medication records in a certain period of time to form an orderly sequence;
  • Use an efficient negative sequence mining model of clinical drug behavior analysis method to mine the sequence database to obtain a negative sequence model that meets the minimum support requirements, that is, the commonly used treatment drugs for this disease, the order of medication, and the relationship between drugs and drugs.
  • the negative sequence patterns that can be used for decision-making are screened out, and the patient's medication behavior is analyzed by using the sequence patterns for decision-making.
  • the data management module is used to store and display the processing results of the data processing module and the clinical medication behavior results analyzed by the data analysis module. When the doctor prescribes drugs, the next medication is recommended.
  • the data management module is used to view all clinical medication behavior records and all frequent clinical medication behaviors. When the doctor treats the patient, the system will provide the commonly used treatment plan for this disease, and when the first treatment plan is not satisfactory, it will provide an alternative treatment plan.
  • the transmission network is a wired public network, a local area network or a 3G/4G network.
  • the invention adopts a cloud management platform design (such as Facebook Cloud Server, Huawei Cloud, JD Cloud, etc.), and each hospital does not need to configure a server.
  • the hospital rents the cloud management platform server of this system to help the hospital connect with the various system interfaces in the hospital and import data. You can log in to the system through the Internet at any place through the corresponding authority, without installing a client, and realize the flexibility of security management.
  • the system can also be deployed in the hospital's local privatized cloud, and log in to the hospital's local area network to connect.
  • the working method of the clinical medication behavior analysis system based on the efficient negative sequence mining mode described in embodiment 1, includes the following steps:
  • the data collection module collects and saves the patient's clinical medication behavior data in real time.
  • the clinical medication behavior data includes the patient's ID number, timestamp (that is, the time of diagnosis and treatment), prescribed drugs, symptoms, symptoms, and the patient's department;
  • Set a negative candidate sequence ns composed of drugs used by the patient for example, set a negative candidate sequence as It means that the drugs b and d are not used, and a and c are the drugs a and b used;
  • m-size refers to the m elements contained in the negative candidate sequence ns; for example, Is a 4-size sequence;
  • Set 1-negMS ns to refer to the subsequence of the negative candidate sequence ns, and the subsequence is composed of MPS(ns) and a negative element;
  • Setting 1-negMSS ns refers to the set of subsequences of all negative sequences including the negative candidate sequence ns;
  • Setting p(1-negMS ns ) means that the positive element in the sequence 1-negMS ns remains unchanged, and the negative element is converted to the corresponding positive element; for example:
  • Setting ds refers to a data sequence in the database, ds contains the drugs used by a patient during this treatment, and the drugs are arranged in the order of medication;
  • Element constraint means: no negative items are allowed inside elements; only elements in the sequence can become negative; for example: Meet the constraints; and Does not meet the constraints because Is the element Internal negative
  • the format constraint means that there are no consecutive 2 or more negative elements; for example: The constraint is not satisfied because the negative element Are two consecutive negative elements;
  • the gastritis outpatient data in the medical insurance data is used as the experimental data.
  • Table 3 is a partial result of preprocessing the medical insurance data into a sequence database.
  • the behavior analysis system uses the eNSP-IT algorithm to analyze the clinical medication behavior data, including the following steps:
  • the data processing module cleans the collected clinical medication behavior data of the patient, and classifies the data according to the department and disease of the patient; the steps are as follows:
  • the standardized processing refers to the integration of data, that is, the weekly medication records of patients with the same patient ID number are sorted into a sequence to form a complete sequence
  • the item/item set is ordered, and each item represents a drug, and Elements refer to all medicines used by the patient at a specific point in time; the patient may use the same Chinese medicine in different time periods, that is, an item may occur in different elements in a sequence.
  • the patient's clinical drug behavior data is classified, and according to the patient's ID number, time stamp (that is, the time of diagnosis and treatment), drugs prescribed, symptoms, symptoms and the patient's department Stored in the data management module.
  • the data analysis module analyzes and predicts the clinical medication behavior of patients according to the processing results of the data processing module
  • the data management module stores and displays the processing results of the data processing module and the clinical medication behavior results analyzed by the data analysis module. When the doctor prescribes drugs, the next medication is recommended.
  • Step b The data analysis module analyzes and predicts the patient's clinical medication behavior according to the processing result of the data processing module, including the following steps:
  • Prefixspan Use the modified positive sequence pattern mining algorithm Prefixspan to mine all positive sequence patterns, that is, the order of the most frequently used drugs in the patient population within a certain period of time.
  • Prefixspan right Every frequent positive sequence uses a bitmap to store the data sequence ID number that contains it. Table 4 shows some positive sequence patterns and their bitmaps;
  • the negative candidate sequence generation method of PNSP is adopted to generate negative candidate sequence (Negative Sequential Candidates, NSC), which is used to determine which drugs are used more frequently and which drugs are not used in a certain period of time . According to the experimental data, generate the following negative candidate sequence
  • P 1 and P 2 indicate that when treating gastritis, doctors often choose the prescriptions in these two sequences, and the potential relationship between the drugs in each prescription can be discovered through these two negative sequence patterns.
  • P 1 means that the doctor does not use vitamin C after using glucose, ceftriaxone, vitamin B6 and sodium chloride solution.
  • P 2 means that after the doctor prescribed ceftriaxone and vitamin C, he did not use vitamin C, and then used cimetidine instead of omeprazole. Therefore, using NSP mining methods can effectively help doctors accurately predict the patient's next medication.
  • the PrefixSpan algorithm is used to mine the positive sequence pattern.
  • the Bitmap strategy is used to further enhance the PrefixSpan algorithm to improve the space efficiency.
  • the modified PrefixSpan algorithm uses simple bitmap structures and operations to obtain sequential patterns, including the following steps:
  • Scan the database (contains the collection of all data sequences ds) to find all items, the item refers to each medicine, create a bitmap for each item, the length of each bitmap is equal to the number of data sequences in the database, if one item If it appears in the data sequence i, the bitmap of the item is set to 1 at position i; otherwise, the bitmap of the item is set to 0 at position i, and the bitmap is represented by B;
  • m Calculate the support of each item according to the bitmap of each item, that is, the number of 1 in the bitmap; determine whether the support of the item meets the minimum support min_sup, which is set by the user , The minimum frequency of frequent patterns; if the item's support is greater than or equal to the minimum support min_sup, then the item is a PSP of length 1, and the PSP of length 1 is regarded as a prefix of length 1; otherwise, it is not a length of 1 1 PSP, delete this item;
  • the new prefix is For a PSP with length i, if the PSP is a 1-size PSP, store its support directly, otherwise, continue to use the bitmap to store information;
  • the prefix is each new prefix after the merged item, and steps o to q are executed recursively.
  • step h in order to increase the number of NSPs mined, ENSP-IT relaxes the frequent constraint and adopts the PNSP negative candidate sequence generation method.
  • the steps are as follows:
  • the definition constraint is: continuous negative elements in NSP are not allowed; 2-size NSC is generated by the arrangement of 1-size PSP and 1-size NSP, for example If the last element of ns is a positive element, add 1-size PSP or 1-size NSP; otherwise, add 1-size PSP;
  • the k-size NSC is trimmed before calculating its support.
  • the trimming method is:
  • the k-size NSC is trimmed before calculating its support.
  • the trimming method is:
  • calculating the support degree of the negative candidate sequence refers to:
  • ns the size of ns is 1, and ns has only 1 negative element, the support of ns is:
  • ns contains only one negative term, the support degree of sequence ns is:
  • OR refers to the AND operation in the bit operation, that is, the bitmap corresponding to p(1-negMS i ) is ANDed one by one, and the AND operation means multiple The two bitmaps are merged to generate a new bitmap. If the same position in the bitmap is all 1, the corresponding position on the new bitmap is 1, otherwise, all are 0. N refers to the number of 1 in the bitmap. number.
  • the working method of the clinical medication behavior analysis system based on the efficient negative sequence mining mode described in embodiment 1, includes the following steps:
  • the data collection module collects and saves the patient's clinical medication behavior data in real time.
  • the clinical medication behavior data includes the patient's ID number, timestamp (that is, the time of diagnosis and treatment), prescribed drugs, symptoms, symptoms, and the patient's department;
  • Set a negative candidate sequence ns composed of drugs used by the patient for example, set a negative candidate sequence as It means that the drugs b and d are not used, and a and c are the drugs a and b used;
  • m-size refers to the m elements contained in the negative candidate sequence ns; for example, Is a 4-size sequence;
  • Set 1-negMS ns to refer to the subsequence of the negative candidate sequence ns, and the subsequence is composed of MPS(ns) and a negative element;
  • Setting 1-negMSS ns refers to the set of subsequences of all negative sequences including the negative candidate sequence ns;
  • Setting p(1-negMS ns ) means that the positive element in the sequence 1-negMS ns remains unchanged, and the negative element is converted to the corresponding positive element; for example:
  • Setting ds refers to a data sequence in the database, ds contains the drugs used by a patient during this treatment, and the drugs are arranged in the order of medication;
  • Element constraint means: no negative items are allowed inside elements; only elements in the sequence can become negative; for example: Meet the constraints; and Does not meet the constraints because Is the element Internal negative
  • the format constraint means that there are no consecutive 2 or more negative elements; for example: The constraint is not satisfied because the negative element Are two consecutive negative elements;
  • the data of diabetic patients in the medical insurance data is used as the experimental data.
  • Table 6 below is the partial result of preprocessing the medical insurance data into a sequence database.
  • the eNSP-IT algorithm is used to analyze the clinical medication behavior.
  • Support min_sup 30%, including the following steps:
  • Patient ID The sequence of drugs used by the patient 1 ⁇ (Metformin, Simvastatin, Venlafaxine) (Aspirin, Glipizide) (Hydrochlorothiazide, Insulin)>
  • the data processing module cleans the collected clinical medication behavior data of the patient, and classifies the data according to the department and disease of the patient; the steps are as follows:
  • the standardized processing refers to the integration of data, that is, the weekly medication records of patients with the same patient ID number are sorted into a sequence to form a complete sequence
  • the item/item set is ordered, and each item represents a drug, and Elements refer to all medicines used by the patient at a specific point in time; the patient may use the same Chinese medicine in different time periods, that is, an item may occur in different elements in a sequence.
  • the patient's clinical drug behavior data is classified, and according to the patient's ID number, time stamp (that is, the time of diagnosis and treatment), drugs prescribed, symptoms, symptoms and the patient's department Stored in the data management module.
  • the data analysis module analyzes and predicts the clinical medication behavior of patients according to the processing results of the data processing module
  • the data management module stores and displays the processing results of the data processing module and the clinical medication behavior results analyzed by the data analysis module. When the doctor prescribes drugs, the next medication is recommended.
  • Step b The data analysis module analyzes and predicts the patient's clinical medication behavior according to the processing result of the data processing module, including the following steps:
  • Prefixspan Use the modified positive sequence pattern mining algorithm Prefixspan to mine all positive sequence patterns, that is, the order of the most frequently used drugs in the patient population within a certain period of time.
  • Prefixspan right Every frequent positive sequence uses a bitmap to store the data sequence ID number that contains it. Table 7 shows some positive sequence patterns and their bitmaps;
  • the negative candidate sequence generation method of PNSP is adopted to generate negative candidate sequence (Negative Sequential Candidates, NSC), which is used to determine which drugs are used more frequently and which drugs are not used in a certain period of time . According to the experimental data, generate the following negative candidate sequence
  • P 1 and P 2 show that when treating diabetes, doctors often choose prescriptions in these two sequences, and the potential relationship between the drugs in each prescription can be discovered through these two negative sequence patterns.
  • P 1 indicates that the doctor used metformin and not alogliptin after not using acetohexanamide.
  • P 2 means that after the doctor prescribed metformin, he did not use acetohexanamide and then used rosiglitazone instead of saxagliptin. Therefore, using NSP mining methods can effectively help doctors accurately predict the patient's next medication.
  • the PrefixSpan algorithm is used to mine the positive sequence pattern.
  • the Bitmap strategy is used to further enhance the PrefixSpan algorithm to improve the space efficiency.
  • the modified PrefixSpan algorithm uses simple bitmap structures and operations to obtain sequential patterns, including the following steps:
  • bitmap for each item, the length of each bitmap is equal to the number of data sequences in the database, if one item If it appears in the data sequence i, the bitmap of the item is set to 1 at position i; otherwise, the bitmap of the item is set to 0 at position i, and the bitmap is represented by B; for example, the item of sodium chloride solution
  • m Calculate the support of each item according to the bitmap of each item, that is, the number of 1 in the bitmap; determine whether the support of the item meets the minimum support min_sup, which is set by the user , The minimum frequency of frequent patterns; if the item's support is greater than or equal to the minimum support min_sup, then the item is a PSP of length 1, and the PSP of length 1 is regarded as a prefix of length 1; otherwise, it is not a length of 1 1 PSP, delete this item;
  • the projection database of the prefix ⁇ a> contains the projections of the first, second, third and fourth data series relative to the prefix ⁇ a> and the ID of the data series;
  • the new prefix is For a PSP with length i, if the PSP is a 1-size PSP, store its support directly, otherwise, continue to use the bitmap to store information;
  • the prefix is each new prefix after the merged item, and steps o to q are executed recursively.
  • step h in order to increase the number of NSPs mined, ENSP-IT relaxes the frequent constraint and adopts the PNSP negative candidate sequence generation method.
  • the steps are as follows:
  • the definition constraint is: continuous negative elements in NSP are not allowed; 2-size NSC is generated by the arrangement of 1-size PSP and 1-size NSP, for example If the last element of ns is a positive element, add 1-size PSP or 1-size NSP; otherwise, add 1-size PSP;
  • the k-size NSC is trimmed before calculating its support.
  • the trimming method is:
  • the k-size NSC is trimmed before calculating its support.
  • the trimming method is:
  • calculating the support degree of the negative candidate sequence refers to:
  • ns the size of ns is 1, and ns has only 1 negative element, the support of ns is:
  • ns contains only one negative term, the support degree of sequence ns is:
  • OR refers to the AND operation in the bit operation, that is, the bitmap corresponding to p(1-negMS i ) is ANDed one by one, and the AND operation means multiple The two bitmaps are merged to generate a new bitmap. If the same position in the bitmap is all 1, the corresponding position on the new bitmap is 1, otherwise, all are 0.
  • B( ⁇ ace>)
  • , B( ⁇ cef>)
  • Sequence pattern set used to analyze clinical medication behavior
  • Step (1) is to use the modified PrefixSpan algorithm to dig out all positive sequence patterns from the sequence database, and the support of all positive candidate sequences are stored using bitmaps;
  • Steps (2)-(19) refer to generating negative candidates using a negative candidate sequence generation method, where steps (10) and (16) represent pruning the negative candidate sequences that meet the pruning conditions;
  • Steps (21)-(26) means using formulas (I)-(III) to calculate the support of negative candidate sequences, where steps (21)-(24) refer to calculating the support of negative candidates containing only one negative element.
  • Step (26) refers to calculating the support degree of negative candidates containing multiple negative elements;
  • Steps (27)-(28) means that if the support of the negative candidate is greater than the minimum support, then this negative candidate sequence is a negative sequence pattern and is added to the set of negative sequence patterns
  • Step (30) refers to returning the results, and then using appropriate methods to screen out the sequence patterns that can be used for decision-making, and use these screened sequence patterns to analyze the clinical medication behavior.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medicinal Chemistry (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Toxicology (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

A clinical medication behavior analysis system based on a highly effective negative sequential mining pattern, and a working method therefor, the system comprising a data acquisition system and a behavior analysis system; the data acquisition system comprises a data acquisition module and a data transfer module; the data acquisition module, in real time, acquires and saves clinical medication behavior data; the data transfer module transfers the clinical medication behavior data to the behavior analysis system; the behavior analysis system comprises a data processing module, a data analysis module, and a data management module; the data processing module performs data cleaning and data classification on the clinical medication behavior data; the data analysis module performs analysis and predictions; and the data management module stores and displays analysis results and gives recommendations for the next step of medication. The present method applies an eNSP-IT algorithm to clinical medication behavior analysis, to rapidly identify the negative sequential relationship between medicines, better predict the next step of medication for the patient, and support clinical decisions based on a medication regimen change.

Description

一种基于高效的负序列挖掘模式的临床用药行为分析系统及其工作方法A clinical medication behavior analysis system based on an efficient negative sequence mining model and its working method 技术领域Technical field
本发明涉及一种基于高效的负序列挖掘模式的临床用药行为分析系统及其工作方法,属于负序列模式的应用技术领域。The invention relates to a clinical medication behavior analysis system based on an efficient negative sequence mining mode and a working method thereof, and belongs to the technical field of application of negative sequence modes.
背景技术Background technique
近年来,随着我国经济的飞速发展,身体素质不断得到人们的重视,医疗也受到了越来越多的关注。伴随着信息化的不断发展,医疗信息系统也在由纸质图表向电子健康记录过渡的过程中取得了长足的进展,目前,医疗信息系统基本已实现电子化、数字化和媒体化,这种转变导致了临床数据仓库中大量数据的积累,使得医疗行业拥有了海量的数据存储。这些医疗健康数据中包含了临床诊断数据、患者的用药数据以及患者医疗保险数据和病人的自然属性信息等。如何发现其中有价值的信息、规律或知识,帮助医生增加临床知识、辅助医护人员诊疗以及为医院管理人员提供决策信息,成为一个很有社会价值并亟待解决的问题。In recent years, with the rapid development of my country's economy, people's physical fitness has been paid more and more attention, and medical care has also received more and more attention. With the continuous development of informatization, the medical information system has also made considerable progress in the process of transitioning from paper charts to electronic health records. At present, the medical information system has basically achieved electronic, digital and medialization. This transformation This has led to the accumulation of a large amount of data in the clinical data warehouse, and the medical industry has a large amount of data storage. These medical and health data include clinical diagnosis data, patient medication data, patient medical insurance data, and patient natural attribute information. How to discover valuable information, rules or knowledge, help doctors increase clinical knowledge, assist medical staff in diagnosis and treatment, and provide decision-making information for hospital managers has become a very socially valuable and urgent problem to be solved.
数据挖掘是在大型信息存储库中发现隐藏知识的过程,开发用于零售或其他行业的数据挖掘技术可以应用于医疗。数据挖掘是一个多学科交叉研究领域,吸纳了数据库技术、模式识别、机器学习、模糊逻辑、人工智能、信息检索、统计学、高性能计算以及神经网络等最新理论和研究方法。Data mining is the process of discovering hidden knowledge in a large information repository. Data mining techniques developed for retail or other industries can be applied to medical care. Data mining is a multidisciplinary research field that incorporates the latest theories and research methods such as database technology, pattern recognition, machine learning, fuzzy logic, artificial intelligence, information retrieval, statistics, high-performance computing, and neural networks.
序列模式挖掘作为数据挖掘的前沿领域受到越来越多的受到学者的关注。序列模式挖掘是指挖掘相对时间或其他模式出现频率高的模式,它能够发现人们事先不知道的事务之间潜在有用的信息和知识。在健康数据分析领域,序列模式分析所要解决的其中一个问题是医生给患者开具完一种药品后,在以后特定的时间内,还会使用什么药品,发现药品与药品之间、药品与疾病之间关系规律的过程,使得医生在对患者进行诊断和用药时,可以参考以往的药品开具情况,准确的判断患者的下一步用药。它的主要目的是研究临床用药之间的先后关系,找出其中的规律,即不仅需要知道该药物是否被使用,还需要确定该药物与其它药物使用的先后顺序,例如,一个常见的胃炎治疗用药顺序是开具葡萄糖注射溶液后,开具维生素6,再开具头孢注射液,最后开具氯化钠注射液。因此序列模式能够发现数据库中某一时间段内的一个频繁序列,即在这个时间段内哪些药物会被医生使用的比较多,多或少的标准是由最小支持度来决定的。每个序列是按照用药的时间排列的一组组合,可以设置最小支持度来挖掘满足不同频繁程度的序列。但在应用序列模式分析临床用药行为,预测患者下一步用药时,他们仅考虑了已发生的事件,也称为正序列模式(Positive Sequential Pattern,PSP)挖掘。As a frontier field of data mining, sequential pattern mining has attracted more and more scholars' attention. Sequential pattern mining refers to the mining of relatively time or other patterns with high frequency. It can discover potentially useful information and knowledge between transactions that people do not know in advance. In the field of health data analysis, one of the problems to be solved by sequential pattern analysis is that after the doctor prescribes a medicine to the patient, what kind of medicine will be used in a specific period of time in the future, and the difference between medicine and medicine, medicine and disease The process of interrelationship rules allows doctors to refer to past drug prescriptions when diagnosing and administering patients to accurately determine the patient’s next medication. Its main purpose is to study the sequence of clinical medications and find out the rules, that is, it is not only necessary to know whether the drug is used, but also to determine the order of use of the drug and other drugs, for example, a common gastritis treatment The order of medication is to prescribe glucose injection solution, prescribe vitamin 6, then prescribe cephalosporin injection, and finally prescribe sodium chloride injection. Therefore, the sequence mode can discover a frequent sequence in a certain period of time in the database, that is, which drugs will be used by doctors in this period of time, and the standard of more or less is determined by the minimum support. Each sequence is a group of combinations arranged according to the time of medication, and the minimum support can be set to mine sequences that meet different levels of frequency. However, when applying sequential patterns to analyze clinical medication behavior and predicting the next medication of patients, they only consider the events that have occurred, which is also called positive sequential pattern (PSP) mining.
随着研究的不断深入,研究人员发现不发生事件中隐含着大量的有用信息,而这些信息在单纯 的正序列模式挖掘中是根本得不到的,于是相关研究人员开始挖掘负序列模式(Negative Sequential Pattern,NSP)。负序列模式不仅涉及到已经发生的事件,还涉及到不发生的事件,它能够更深入地分析和理解数据中的潜在含义,从而挖掘出容易被人们忽略但是非常有价值的信息。例如:a,b,c,d,
Figure PCTCN2019102473-appb-000001
表示一个用药序列模式,该模式说明在某一段时间内,该病人在使用了药物a、b后,在没有使用药物c的情况下,使用了药物d。如今负序列模式的价值越来越被人们认可,在深入理解和处理许多医疗应用方面,如对患者用药行为分析方面,它更有一种不可替代的作用。
With the deepening of research, researchers found that there is a lot of useful information hidden in non-occurring events, and this information is not available in pure positive sequence pattern mining, so relevant researchers began to mine negative sequence patterns ( Negative Sequential Pattern, NSP). Negative sequence mode involves not only events that have occurred, but also events that have not occurred. It can analyze and understand the potential meaning of the data more deeply, so as to dig out very valuable information that is easy to be ignored by people. For example: a, b, c, d,
Figure PCTCN2019102473-appb-000001
Represents a medication sequence mode, which indicates that within a certain period of time, the patient used medication d after taking medication a and b without using medication c. Nowadays, the value of the negative sequence model is more and more recognized by people. It has an irreplaceable role in understanding and processing many medical applications, such as the analysis of patient medication behavior.
医院中的病人用药记录数据为挖掘的数据源。以5个病人在2个月内的诊疗记录为例,如表1是由病人ID和药物开具时间为关键字所排序的事务数据库。一个事务数据库,一个事务代表一次治疗情况,一个单项代表使用的药物,单项属性中的字母记录的是药物ID。进行数据预处理,将表1的事务数据库整理成表2的序列数据库。The patient medication record data in the hospital is the data source for mining. Take the diagnosis and treatment records of 5 patients within 2 months as an example, as shown in Table 1 is a transaction database sorted by patient ID and drug issuance time as keywords. A transaction database, a transaction represents a treatment situation, a single item represents the used medicine, and the letter in the single attribute records the medicine ID. Perform data preprocessing and organize the transaction database of Table 1 into the sequence database of Table 2.
表1Table 1
Figure PCTCN2019102473-appb-000002
Figure PCTCN2019102473-appb-000002
表2Table 2
病人IDPatient ID 病人使用的药物序列The sequence of drugs used by the patient
11 {c}{i}{c}{i}
22 {a,b}{c}{a,d,f,g}{a,b}{c}{a,d,f,g}
33 {c,e,g,h}{c,e,g,h}
44 {c}{c,d,g,h}{i}{c}{c,d,g,h}{i}
55 {i}{i}
一个病人在某个时间段内所有的用药记录构成一个有序的序列,序列用<>表示。在序列中,项/项集是有顺序的,每个项都代表一种药物,而元素则是指该病人在某一个具体的时间点同时使用的所以药物,用{}或()表示,该病人可能在不同的时间段里使用同一中药物,即一个项可能在一个序列的不同元素中发生。如表2中ID为2的药物序列为{a,b}{c}{a,d,f,g},该病人分别在第一次和第三次治疗时使用了药物a,其中{a,b},{c},{a,d,f,g}这三个项目集可称为序列的元素,a,b,c,d,f,g则称为项,如果一个元素中只有一个项,则括号可以省略,如该序列中的元素{c}可直接写c。All medication records of a patient in a certain period of time constitute an ordered sequence, and the sequence is represented by <>. In the sequence, the items/item sets are in order, each item represents a kind of medicine, and the element refers to all medicines used by the patient at a specific point in time, denoted by {} or (), The patient may use the same Chinese medicine in different time periods, that is, an item may occur in different elements of a sequence. For example, the drug sequence with ID 2 in Table 2 is {a,b}{c}{a,d,f,g}. The patient used drug a during the first and third treatments, where {a ,b},{c},{a,d,f,g} these three itemsets can be called sequence elements, a,b,c,d,f,g are called items, if only one element For an item, the parentheses can be omitted. For example, the element {c} in the sequence can be directly written as c.
目前,关于负序列模式挖掘算法的研究成果较少,如,NSPM,PNSP,Neg-GSP,e-NSP和f-NSP等等。然而,大多数方法,即使是最先进的算法f-NSP也不够高效,且挖掘到的负序列模式数量也不多。在实际应用中,影响负序列模式挖掘效率和数量的因素很多,其中最重要的是正序列模式挖掘过程和负约束条件。由于用户主要是对缺少某些频繁元素的负序列模式感兴趣,因此现有的负序列模式挖掘算法都首先依赖于识别正序列模式,但在挖掘负序列模式的过程中,大多数算法都忽视了发现正序列模式所用的时间消耗,这导致整个挖掘过程的时间成本较高。同时,所有的负序列模式算法都从各个方面对格式、频率和负元素进行了约束,以减少负候选序列的数量,发现特定的感兴趣的负序列模式。在某种程度上,严格的负约束条件可以减少冗余负候选序列的数量,保证计算效率,但会导致大量有趣的负序列模式丢失,特别是长度较长(包含大量信息)的负序列模式。此外,在负序列模式挖掘中,负约束条件也会在一定程度上影响负候选序列生成方法的选择,当约束条件改变时,也应相应的改变负候选序列生成方法。At present, there are few research results on negative sequence pattern mining algorithms, such as NSPM, PNSP, Neg-GSP, e-NSP and f-NSP, etc. However, most methods, even the most advanced algorithm f-NSP, are not efficient enough, and the number of negative sequence patterns mined is small. In practical applications, there are many factors that affect the efficiency and quantity of negative sequence pattern mining, the most important of which are the positive sequence pattern mining process and negative constraints. Since users are mainly interested in negative sequence patterns that lack some frequent elements, the existing negative sequence pattern mining algorithms first rely on identifying positive sequence patterns, but most algorithms ignore negative sequence patterns in the process of mining negative sequence patterns. In order to find the time consumption of the positive sequence pattern, this leads to a higher time cost of the entire mining process. At the same time, all negative sequence pattern algorithms restrict the format, frequency and negative elements from all aspects to reduce the number of negative candidate sequences and find specific negative sequence patterns of interest. To a certain extent, strict negative constraints can reduce the number of redundant negative candidate sequences and ensure computational efficiency, but will cause a lot of interesting negative sequence patterns to be lost, especially long-length negative sequence patterns (containing a lot of information) . In addition, in negative sequence pattern mining, negative constraint conditions will also affect the choice of negative candidate sequence generation method to a certain extent. When the constraint conditions change, the negative candidate sequence generation method should be changed accordingly.
发明内容Summary of the invention
针对现有技术的不足,更快提高挖掘负序列的效率,发现更多有趣的负序列模式,本发明提供了一种基于高效的负序列挖掘模式的临床用药行为分析系统;Aiming at the shortcomings of the prior art, to increase the efficiency of mining negative sequences faster, and to discover more interesting negative sequence patterns, the present invention provides a clinical medication behavior analysis system based on efficient negative sequence mining patterns;
本发明还提供了上述基于高效的负序列挖掘模式的临床用药行为分析系统的工作方法。The invention also provides a working method of the above-mentioned clinical drug use behavior analysis system based on the efficient negative sequence mining mode.
本发明提出了一种高效的负模式挖掘算法,名为eNSP-IT。将eNSP-IT算法应用到临床用药行为分析中,可以更快的找出药物间的负序列关系,从而更好的预测患者下一步用药,对基于药物方案变化的临床决策进行支持。The present invention proposes an efficient negative pattern mining algorithm named eNSP-IT. Applying the eNSP-IT algorithm to the analysis of clinical medication behavior can quickly find out the negative sequence relationship between medications, thereby better predicting the next medication of patients, and supporting clinical decision-making based on changes in medication regimens.
术语解释:Term explanation:
1、Prefixspan算法:一种经典的正序列模式挖掘算法,它基于深度优先搜索,其基本思想是使用频繁前缀划分搜索空间和投影序列数据库,并搜索相关的频繁序列。1. Prefixspan algorithm: a classic positive sequence pattern mining algorithm, which is based on depth-first search. Its basic idea is to use frequent prefixes to divide the search space and projection sequence database, and search for related frequent sequences.
2、数据库:Data set,简称DS,表示包含所有数据序列的集合。2. Database: Data set, referred to as DS, means a collection containing all data sequences.
3、支持度:support,简称sup,表示一条候选序列在数据库中出现的频率称为支持度。3. Support: support, referred to as sup, indicates that the frequency of a candidate sequence in the database is called support.
4、最小支持度:minimum support,简称min_sup,表示频繁模式在数据库中出现的最低频率,这是由用户设定的。当候选序列的支持度大于最小支持度时,这条候选序列是频繁模式。4. Minimum support: minimum support, min_sup for short, indicates the minimum frequency of frequent patterns in the database, which is set by the user. When the support of the candidate sequence is greater than the minimum support, this candidate sequence is a frequent pattern.
5、前缀,是指假设有两个序列α=<e 1e 2…e n>和β=<e 1’e 2’…e m’>(m≤n),当且仅当e i’=e i(i≤m-1),e m’∈e m,并且所有在(e m—e m’)的连续项在e m’中都是按照字母表顺序排列的,那么β是α的一个前缀。通俗的说,前缀就是序列前面部分的子序列。例如,对于序列B=<a(abc)(ac)d(cf)>,而A=<a(abc)a>,则A是B的前缀。相应的,对于前缀β,α的的投影为α’=<e m”e m+1…e n>,其中e m”=(e m-e m’)。通俗的说,投影指的是该条序列不包含前缀的最大子序列。例如,对于序列B相对于前缀A的投影为B’=<cd(cf)>。 5, a prefix, refers to two sequences assume α = <e 1 e 2 ... e n> and β = <e 1 'e 2 ' ... e m '> (m≤n), if and only if e i' = e i (i≤m-1) , e m '∈e m, and all (e m -e m' consecutive items in e m ') are arranged in the alphabetical order, then β is α Of a prefix. In layman's terms, the prefix is the subsequence at the beginning of the sequence. For example, for the sequence B=<a(abc)(ac)d(cf)> and A=<a(abc)a>, then A is the prefix of B. Accordingly, the prefix for β, α is the projection of α '= <e m "e m + 1 ... e n>, where e m" = (e m -e m'). In layman's terms, projection refers to the largest subsequence of the sequence that does not contain a prefix. For example, the projection of sequence B relative to prefix A is B'=<cd(cf)>.
本发明的技术方案为:The technical scheme of the present invention is:
一种基于高效的负序列挖掘模式的临床用药行为分析系统,包括通过传输网络通信连接的数据采集系统和行为分析系统;A clinical drug behavior analysis system based on an efficient negative sequence mining model, including a data acquisition system and a behavior analysis system that are connected through a transmission network communication;
所述数据采集系统包括依次连接的数据采集模块、数据传输模块;The data acquisition system includes a data acquisition module and a data transmission module connected in sequence;
所述数据采集模块,用于实时采集并保存患者的临床用药行为数据,临床用药行为数据包括患者的ID号、时间戳(即诊疗的时间)、开具的药品、病状、病症和患者所在科室;The data collection module is used to collect and save the patient's clinical medication behavior data in real time. The clinical medication behavior data includes the patient's ID number, timestamp (i.e., the time of diagnosis and treatment), prescribed drugs, symptoms, symptoms, and the department where the patient is located;
所述数据传输模块,用于通过传输网络将患者的临床用药行为数据传输至所述行为分析系统;The data transmission module is used to transmit the patient's clinical medication behavior data to the behavior analysis system through a transmission network;
所述行为分析系统包括依次连接的数据处理模块、数据分析模块、数据管理模块;并设置在云服务器内。所述数据传输模块连接所述数据处理模块;The behavior analysis system includes a data processing module, a data analysis module, and a data management module connected in sequence; and is arranged in a cloud server. The data transmission module is connected to the data processing module;
所述数据处理模块,用于对采集的患者的临床用药行为数据进行数据清洗,并按照患者所在科室、病症进行数据分类;The data processing module is used to perform data cleaning on the collected clinical medication behavior data of the patient, and to classify the data according to the department and symptoms of the patient;
所述数据分析模块,用于根据所述数据处理模块的处理结果对患者的临床用药行为进行分析和预测;包括步骤如下:The data analysis module is used to analyze and predict the clinical medication behavior of the patient according to the processing result of the data processing module; the steps are as follows:
数据分析模块基于所述数据处理模块处理后的临床用药行为数据,建立与患者的ID号对应的用药行为序列,并结合所述的高效的负序列挖掘模式的临床用药行为的分析方法对临床用药行为进行分析和预测,患者所在科室、病症相同的患者的临床用药行为数据构成一个序列数据库,每一个患者的ID号对应一条病人在某个时间段内所有的用药记录构成一个有序的序列;使用高效的负序列挖掘模式的临床用药行为的分析方法对序列数据库进行挖掘,得到符合最小支持度要求的负序列模式,即此病症的常用治疗药品、用药顺序、药品与药品之间的关系,将能用于决策的负序列模式筛选出来,利用所述用于决策的序列模式对患者的用药行为进行分析。The data analysis module establishes the medication behavior sequence corresponding to the patient’s ID number based on the clinical medication behavior data processed by the data processing module, and combines the clinical medication behavior analysis method of the effective negative sequence mining mode to analyze the clinical medication behavior Behavior analysis and prediction. The clinical medication behavior data of patients in the department and the same symptoms constitute a sequence database. Each patient’s ID number corresponds to a patient’s medication records in a certain period of time to form an orderly sequence; Use an efficient negative sequence mining model of clinical drug behavior analysis method to mine the sequence database to obtain a negative sequence model that meets the minimum support requirements, that is, the commonly used treatment drugs for this disease, the order of medication, and the relationship between drugs and drugs. The negative sequence patterns that can be used for decision-making are screened out, and the patient's medication behavior is analyzed by using the sequence patterns for decision-making.
所述数据管理模块,用于对所述数据处理模块的处理结果及数据分析模块分析的临床用药行为结果进行存储和显示,当医生开具药品时,推荐下一步的用药。数据管理模块用于查看所有的临床 用药行为记录和所有频繁的临床用药行为。当医生给患者进行治疗时,系统会提供此病症常用的治疗方案,当首选治疗方案效果不理想时,提供备选治疗方案。The data management module is used to store and display the processing results of the data processing module and the clinical medication behavior results analyzed by the data analysis module, and when the doctor prescribes the medication, the next medication is recommended. The data management module is used to view all clinical medication behavior records and all frequent clinical medication behaviors. When the doctor treats the patient, the system will provide the commonly used treatment plan for this disease, and when the first treatment plan is not satisfactory, it will provide an alternative treatment plan.
根据本发明优选的,所述传输网络为有线公网、局域网或3G/4G网络。According to the present invention, preferably, the transmission network is a wired public network, a local area network or a 3G/4G network.
本发明采用云端管理平台设计(如阿里云服务器、华为云、京东云等模式),各医院不需要配置服务器。医院租用本系统云端管理平台服务器,帮助医院对接院内各系统接口,导入数据等。可通过互联网在任何地方通过相应权限登录系统,无需安装客户端,实现安全管理的灵活性。本系统也可在医院本地私有化云部署,登录医院局域网联通。The invention adopts a cloud management platform design (such as Alibaba Cloud Server, Huawei Cloud, JD Cloud, etc.), and each hospital does not need to configure a server. The hospital rents the cloud management platform server of this system to help the hospital connect with the various system interfaces in the hospital and import data. You can log in to the system through the Internet at any place with the corresponding authority, without installing a client, and realize the flexibility of security management. This system can also be deployed in the hospital's local privatized cloud, and log in to the hospital's local area network to connect.
上述基于高效的负序列挖掘模式的临床用药行为分析系统的工作方法,包括步骤如下:The working method of the above-mentioned clinical medication behavior analysis system based on the efficient negative sequence mining model includes the following steps:
(1)所述数据采集模块实时采集并保存患者的临床用药行为数据,临床用药行为数据包括患者的ID号、时间戳(即诊疗的时间)、开具的药品、病状、病症和患者所在科室;(1) The data collection module collects and saves the patient's clinical medication behavior data in real time. The clinical medication behavior data includes the patient's ID number, timestamp (that is, the time of diagnosis and treatment), prescribed drugs, symptoms, symptoms, and the patient's department;
设定负候选序列ns;例如,设定一个负侯选序列为
Figure PCTCN2019102473-appb-000003
是指没有使用药物b、d,a、c是指使用的药物a、b;
Set negative candidate sequence ns; for example, set a negative candidate sequence as
Figure PCTCN2019102473-appb-000003
It means that the drugs b and d are not used, and a and c are the drugs a and b used;
设定m-size是指负侯选序列ns中包含的m个元素;例如,
Figure PCTCN2019102473-appb-000004
为4-size序列;
Setting m-size refers to the m elements contained in the negative candidate sequence ns; for example,
Figure PCTCN2019102473-appb-000004
Is a 4-size sequence;
设定MPS(ns)是指负侯选序列ns的最大正子序列,由负侯选序列ns中包含的所有正元素按照原顺序组成;例如:ns中
Figure PCTCN2019102473-appb-000005
代表没有使用的药物,而a、c代表使用的药物;则最大正子序列为
Figure PCTCN2019102473-appb-000006
Setting MPS(ns) refers to the largest positive subsequence of the negative candidate sequence ns, which is composed of all the positive elements contained in the negative candidate sequence ns in the original order; for example: in ns
Figure PCTCN2019102473-appb-000005
Represents drugs not used, and a and c represent drugs used; the largest positive sequence is
Figure PCTCN2019102473-appb-000006
设定正偶P(ns)是将一个由病人使用的药物组成的负侯选序列ns中的负元素全部转化为对应的正元素后的序列;例如,
Figure PCTCN2019102473-appb-000007
Setting the positive pair P(ns) is the sequence after all the negative elements in a negative candidate sequence ns composed of the medicine used by the patient are converted into corresponding positive elements; for example,
Figure PCTCN2019102473-appb-000007
设定1-negMS ns是指负侯选序列ns的子序列,并且该子序列是由MPS(ns)以及一个负元素组成; Set 1-negMS ns to refer to the subsequence of the negative candidate sequence ns, and the subsequence is composed of MPS(ns) and a negative element;
设定1-negMSS ns是指包含负侯选序列ns的所有负序列的子序列的集合; Setting 1-negMSS ns refers to the set of subsequences of all negative sequences including the negative candidate sequence ns;
设定p(1-negMS ns)是指序列1-negMS ns中的正元素不变,将负元素转换为相应的正元素;如:
Figure PCTCN2019102473-appb-000008
Setting p(1-negMS ns ) means that the positive element in the sequence 1-negMS ns remains unchanged, and the negative element is converted to the corresponding positive element; for example:
Figure PCTCN2019102473-appb-000008
设定ds是指数据库中的一个数据序列,ds包含一位病人在本次治疗过程中所使用的药物,药物按用药的先后次序排列;Setting ds refers to a data sequence in the database, ds contains the drugs used by a patient during this treatment, and the drugs are arranged in the order of medication;
综上,对于一个数据序列ds和一个包含的所有元素的个数为m,并且含有n个负元素的序列ns,满足元素约束、格式约束及频繁约束,且满足条件:
Figure PCTCN2019102473-appb-000009
且每一个1-negMS ns满足
Figure PCTCN2019102473-appb-000010
则ds包含ns:
In summary, for a data sequence ds and a sequence ns containing n negative elements with the number of all elements being m, satisfy the element constraint, format constraint, and frequent constraint, and meet the conditions:
Figure PCTCN2019102473-appb-000009
And each 1-negMS ns satisfies
Figure PCTCN2019102473-appb-000010
Then ds contains ns:
元素约束是指:元素内部不允许有负项;只有序列中元素才可以变负;例如:
Figure PCTCN2019102473-appb-000011
符合约束;而
Figure PCTCN2019102473-appb-000012
不符合约束,因为
Figure PCTCN2019102473-appb-000013
是元素
Figure PCTCN2019102473-appb-000014
内部的负项;
Element constraint means: no negative items are allowed inside elements; only elements in the sequence can become negative; for example:
Figure PCTCN2019102473-appb-000011
Meet the constraints; and
Figure PCTCN2019102473-appb-000012
Does not meet the constraints because
Figure PCTCN2019102473-appb-000013
Is the element
Figure PCTCN2019102473-appb-000014
Internal negative
格式约束是指:不存在连续2个或2个以上的负元素;例如:
Figure PCTCN2019102473-appb-000015
不满足约束,因为负元素
Figure PCTCN2019102473-appb-000016
为连续的两个负元素;
The format constraint means that there are no consecutive 2 or more negative elements; for example:
Figure PCTCN2019102473-appb-000015
The constraint is not satisfied because the negative element
Figure PCTCN2019102473-appb-000016
Are two consecutive negative elements;
频繁约束是指:负序列满足1-negMS ns∈1-negMSS ns且p(1-negMS ns)∈PSP,PSP是指的是正序列模式; Frequent constraint is: negative sequences satisfy 1-negMS ns ∈1-negMSS ns and p (1-negMS ns) ∈PSP , PSP refers to the positive sequence pattern;
频繁约束考虑以下几个方面:(1)用户对NSP中缺少某些频繁元素感兴趣。因此,NSP中考虑的元素应具有足够的频率。ENSP-IT要求任何p(1-negMS ns)都属于PSP,这满足了NSP中每个元素都是频繁出现的要求。(2)用户希望NSP包含更有用的信息,这有助于他们做出更好的决策。(3)如果我们不执行这一约束,负候选序列的数量可能是巨大的,甚至是无限的,这将导致NSP挖掘效率非常低。 Frequent constraints consider the following aspects: (1) Users are interested in the lack of certain frequent elements in NSP. Therefore, the elements considered in NSP should have sufficient frequency. ENSP-IT requires that any p(1-negMS ns ) belongs to PSP, which meets the requirement that every element in NSP is frequently present. (2) Users hope that NSP contains more useful information, which helps them make better decisions. (3) If we do not implement this constraint, the number of negative candidate sequences may be huge, or even unlimited, which will lead to very low NSP mining efficiency.
(2)所述数据传输模块通过传输网络将患者的临床用药行为数据传输至所述行为分析系统,所述行为分析系统利用eNSP-IT算法对临床用药行为数据进行分析,包括步骤如下:(2) The data transmission module transmits the patient's clinical medication behavior data to the behavior analysis system through the transmission network, and the behavior analysis system uses the eNSP-IT algorithm to analyze the clinical medication behavior data, including the following steps:
a、所述数据处理模块对采集的患者的临床用药行为数据进行数据清洗,并按照患者所在科室、病症进行数据分类;a. The data processing module performs data cleaning on the collected clinical medication behavior data of the patient, and classifies the data according to the department and disease of the patient;
b、所述数据分析模块根据所述数据处理模块的处理结果对患者的临床用药行为进行分析和预测;b. The data analysis module analyzes and predicts the clinical medication behavior of the patient according to the processing result of the data processing module;
c、所述数据管理模块对所述数据处理模块的处理结果及数据分析模块分析的临床用药行为结果进行存储和显示,当医生开具药品时,推荐下一步的用药。c. The data management module stores and displays the processing results of the data processing module and the clinical medication behavior results analyzed by the data analysis module, and when the doctor prescribes drugs, the next medication is recommended.
根据本发明优选的,所述步骤a,所述数据处理模块对采集的患者的临床用药行为数据进行数据清洗,并按照患者所在科室、病症进行数据分类,包括步骤如下:Preferably, according to the present invention, in step a, the data processing module performs data cleaning on the collected clinical medication behavior data of the patient, and classifies the data according to the patient's department and disease, including the following steps:
通过所述的数据采集系统对患者的临床用药行为数据进行采集时,会产生大量的数据量,同时数据可能中出现重复或者数据信息不完善等情况。因此,需要When collecting clinical medication behavior data of patients through the data collection system, a large amount of data will be generated, and at the same time there may be duplication in the data or incomplete data information. Therefore, need
d、对采集的患者的临床用药行为数据进行优化,使其能适用于后期的分析。对数据进行优化包括填充缺失数据、过滤掉异常数据;d. Optimize the collected clinical medication behavior data of patients to make them suitable for later analysis. Data optimization includes filling in missing data and filtering out abnormal data;
e、对优化后的患者的临床用药行为数据进行标准化处理,所述标准化处理是指对数据进行整合,即把患者的ID号相同的病人的每一周的用药记录整理成一条顺序序列,形成完整的患者的临床用药行为数据;一个病人在某个时间段内所有的用药记录构成一个有序的序列,在序列中,项/项集是有顺序的,每个项都代表一种药物,而元素则是指该病人在某一个具体的时间点同时使用的所有药物;该病人可能在不同的时间段里使用同一中药物,即一个项可能在一个序列的不同元素中发生。e. Perform standardized processing on the optimized clinical medication behavior data of patients. The standardized processing refers to the integration of data, that is, the weekly medication records of patients with the same patient ID number are sorted into a sequence to form a complete sequence The clinical medication behavior data of patients; all medication records of a patient in a certain period of time constitute an ordered sequence. In the sequence, the item/item set is ordered, and each item represents a drug, and Elements refer to all medicines used by the patient at a specific point in time; the patient may use the same Chinese medicine in different time periods, that is, an item may occur in different elements in a sequence.
f、按照患者所在科室、病症这两种分类特征对患者的临床用药行为数据进行分类,并按照患 者的ID号、时间戳(即诊疗的时间)、开具的药品、病状、病症和患者所在科室存储在所述数据管理模块中。f. According to the two classification characteristics of the patient's department and disease, the patient's clinical drug behavior data is classified, and according to the patient's ID number, time stamp (that is, the time of diagnosis and treatment), drugs prescribed, symptoms, symptoms and the patient's department Stored in the data management module.
根据本发明优选的,所述步骤b,所述数据分析模块根据所述数据处理模块的处理结果对患者的临床用药行为进行分析和预测,包括步骤如下:Preferably, according to the present invention, in step b, the data analysis module analyzes and predicts the patient's clinical medication behavior according to the processing result of the data processing module, including the following steps:
g、用修改后的正序列模式挖掘算法Prefixspan挖掘得到所有的正序列模式,即在某一段时间内,患者群体中使用最频繁的药品次序,在修改后的正序列模式挖掘算法Prefixspan中,对每一个频繁正序列都使用位图来存储包含它的数据序列ID号;g. Use the modified positive sequence pattern mining algorithm Prefixspan to mine all positive sequence patterns, that is, the order of the most frequently used drugs in the patient population within a certain period of time. In the modified positive sequence pattern mining algorithm Prefixspan, right Every frequent positive sequence uses a bitmap to store the data sequence ID number that contains it;
h、采用了PNSP的负候选序列生成方法,生成负候选序列(Negative Sequential Candidates,NSC),该负候选序列用于判断在某一时间段内,哪些药物使用的次数多,哪些药物没有被使用;h. The negative candidate sequence generation method of PNSP is used to generate negative candidate sequence (Negative Sequential Candidates, NSC), which is used to determine which drugs are used more frequently and which drugs are not used in a certain period of time ;
i、使用位图操作,计算负候选序列的支持度;i. Use bitmap operations to calculate the support for negative candidate sequences;
j、从负候选序列中筛选出符合最小支持度要求的负序列模式,并用适当的筛选方法将能用于决策的负序列模式筛选出来,利用所述用于决策的序列模式对患者的用药行为进行分析;医生根据分析结果预测患者的下一步治疗方案,对基于药物方案变化的临床决策进行支持。例如,两个负序列模式
Figure PCTCN2019102473-appb-000017
Figure PCTCN2019102473-appb-000018
Figure PCTCN2019102473-appb-000019
P 1和P 2表明,在治疗胃炎时,医生经常选择这两个序列中的处方,通过这两个负序列模式可以发现每个处方中药物之间的潜在关系。P 1表示医生在使用葡萄糖、头孢曲松、维生素B6和氯化钠溶液后不使用维生素C。P 2是指医生开了头孢曲松和维生素C后,不使用维生素C,然后使用西米替丁而不是奥美拉唑。因此,使用NSP挖掘方法可以有效地帮助医生准确预测患者的下一步用药。
j. Screen negative sequence patterns that meet the minimum support requirements from negative candidate sequences, and use appropriate screening methods to screen out negative sequence patterns that can be used for decision-making, and use the sequence patterns for decision-making on the patient's medication behavior Perform analysis; the doctor predicts the patient's next treatment plan based on the analysis result, and supports clinical decision-making based on changes in the drug plan. For example, two negative sequence patterns
Figure PCTCN2019102473-appb-000017
with
Figure PCTCN2019102473-appb-000018
Figure PCTCN2019102473-appb-000019
P 1 and P 2 indicate that when treating gastritis, doctors often choose the prescriptions in these two sequences, and the potential relationship between the drugs in each prescription can be discovered through these two negative sequence patterns. P 1 means that the doctor does not use vitamin C after using glucose, ceftriaxone, vitamin B6 and sodium chloride solution. P 2 means that after the doctor prescribed ceftriaxone and vitamin C, he did not use vitamin C, and then used cimetidine instead of omeprazole. Therefore, using NSP mining methods can effectively help doctors accurately predict the patient's next medication.
根据本发明优选的,所述步骤g,为了提高负序列模式挖掘的时间效率,使用PrefixSpan算法挖掘正序列模式,同时,利用位图策略进一步增强PrefixSpan算法,以提高空间效率。与使用位图结构的其他挖掘方法不同,修改后的PrefixSpan算法使用简单的位图结构和操作来获得顺序模式,包括步骤如下:According to the present invention, preferably, in step g, in order to improve the time efficiency of negative sequence pattern mining, the PrefixSpan algorithm is used to mine the positive sequence pattern, and at the same time, the Bitmap strategy is used to further enhance the PrefixSpan algorithm to improve space efficiency. Unlike other mining methods that use bitmap structures, the modified PrefixSpan algorithm uses simple bitmap structures and operations to obtain sequential patterns, including the following steps:
k、在每个数据序列ds上添加ID;k. Add ID to each data sequence ds;
l、扫描数据库(包含所有数据序列ds的集合)查找所有项,项指的是每种药品,为每个项创建位图,每个位图的长度等于数据库中的数据序列数,如果一个项出现在数据序列i中,则该项的位图在位置i设置为1;否则,则该项的位图在位置i设置为0,位图用B表示;例如,b项的位图为B(b)=|1|1|1|0|0|,则包含在第一、第二和第三个数据序列中。1. Scan the database (contains the collection of all data sequences ds) to find all items, the item refers to each medicine, create a bitmap for each item, the length of each bitmap is equal to the number of data sequences in the database, if one item If it appears in the data sequence i, the bitmap of the item is set to 1 at position i; otherwise, the bitmap of the item is set to 0 at position i, and the bitmap is represented by B; for example, the bitmap of item b is B (b)=|1|1|1|0|0|, it is included in the first, second and third data sequences.
m、根据每个项的位图,计算每个项的支持度,即位图中1的个数;判断项的支持度是否满足最小支持度min_sup,最小支持度min_sup指的是由用户设定的,频繁模式出现的最小频率;如果 项的支持度大于或等于最小支持度min_sup,则该项是长度为1的PSP,将长度为1的PSP看作长度为1的前缀;否则,不是长度为1的PSP,删除此项;m. Calculate the support of each item according to the bitmap of each item, that is, the number of 1 in the bitmap; determine whether the support of the item meets the minimum support min_sup, which is set by the user , The minimum frequency of frequent patterns; if the item's support is greater than or equal to the minimum support min_sup, then the item is a PSP of length 1, and the PSP of length 1 is regarded as a prefix of length 1; otherwise, it is not a length of 1 1 PSP, delete this item;
n、对于每个长度为i满足支持度要求的前缀进行递归挖掘,i≥1,基于前缀的位图,找到包含此前缀的数据序列,同时将数据序列对应此前缀的投影存入投影数据库中;例如,前缀<a>的位图是B(<a>)=|1|1|1|1|0|,这意味着它包含于第一、第二、第三和第四个数据序列,前缀<a>的投影数据库中包含了第一、第二、第三和第四个数据序列相对于前缀<a>的投影和数据序列的ID;n. Perform recursive mining for each prefix of length i that meets the support requirements, i≥1, based on the bitmap of the prefix, find the data sequence containing the prefix, and store the projection of the data sequence corresponding to the prefix in the projection database ; For example, the bitmap of the prefix <a> is B(<a>)=|1|1|1|1|0|, which means it is contained in the first, second, third, and fourth data sequence , The projection database of the prefix <a> contains the projections of the first, second, third and fourth data series relative to the prefix <a> and the ID of the data series;
o、扫描投影数据库,找到所有项,并根据其对应的数据序列的ID创建位图,计算各个项的支持度,即位图中1的个数,若所有项的支持度都低于min_sup,则递归返回,否则,进入步骤p;o. Scan the projection database to find all items, create a bitmap according to the ID of the corresponding data sequence, calculate the support of each item, that is, the number of 1 in the bitmap, if the support of all items is lower than min_sup, then Return recursively, otherwise, go to step p;
p、将满足支持度计数的各个项和当前的前缀进行合并,并将两者的位图进行位运算,即对两个位图进行与运算,得到新前缀和它的位图,新前缀为长度为i的PSP,若PSP是一个1-size的PSP,直接存储它的支持度,否则,继续使用位图存储信息;p. Combine the items that meet the support count and the current prefix, and perform bit operations on the bitmaps of the two, that is, perform the AND operation on the two bitmaps to obtain the new prefix and its bitmap. The new prefix is For a PSP with length i, if the PSP is a 1-size PSP, store its support directly, otherwise, continue to use the bitmap to store information;
q、i加1,前缀为合并项后的各个新前缀,分别递归执行步骤o至q。q, i plus 1, the prefix is each new prefix after the merged item, and steps o to q are executed recursively.
根据本发明优选的,所述步骤h,为了提高挖掘到NSP的数量,ENSP-IT放宽了频繁约束,同时采用了PNSP的负候选序列生成方法。包括步骤如下:According to the present invention, in step h, in order to increase the number of NSPs mined, ENSP-IT relaxes the frequent constraint, and at the same time adopts the PNSP negative candidate sequence generation method. The steps are as follows:
r、由1-size的PSP生成1-size的NSC;如1-size的PSP<a>生成1-size的
Figure PCTCN2019102473-appb-000020
r. Generate 1-size NSC from 1-size PSP; such as 1-size PSP<a> generate 1-size
Figure PCTCN2019102473-appb-000020
s、定义约束条件为:不允许NSP中的连续负元素;2-size NSC是由1-size的PSP和1-size的NSP的排列生成的,例如
Figure PCTCN2019102473-appb-000021
如果ns的最后一个元素是一个正元素,则附加1-size的PSP或1-size的NSP;否则,附加1-size的PSP;
s. The definition constraint is: continuous negative elements in NSP are not allowed; 2-size NSC is generated by the arrangement of 1-size PSP and 1-size NSP, for example
Figure PCTCN2019102473-appb-000021
If the last element of ns is a positive element, add 1-size PSP or 1-size NSP; otherwise, add 1-size PSP;
t、在(k-1)-size的候选序列(NSC或PSP)上附加1-size的PSP或1-size的NSP产生k-size的NSC;t. Add 1-size PSP or 1-size NSP to the (k-1)-size candidate sequence (NSC or PSP) to generate k-size NSC;
u、重复上述步骤r至步骤t直到没有生成NSC,或者NSC的元素个数大于2l+1,l表示PSP中最大序列的元素个数;如果PSP中最大序列的元素个数为m,则生成的NSP的最大元素个数为2m+1;u. Repeat the above steps r to step t until no NSC is generated, or the number of NSC elements is greater than 2l+1, l represents the number of elements in the largest sequence in the PSP; if the number of elements in the largest sequence in the PSP is m, then generate The maximum number of elements of the NSP is 2m+1;
进一步优选的,k-size的NSC在计算其支持度之前进行修剪,修剪方法为:Further preferably, the k-size NSC is trimmed before calculating its support. The trimming method is:
如果
Figure PCTCN2019102473-appb-000022
Figure PCTCN2019102473-appb-000023
则剪除负候选序列ns。
in case
Figure PCTCN2019102473-appb-000022
with
Figure PCTCN2019102473-appb-000023
Then cut out the negative candidate sequence ns.
进一步优选的,k-size的NSC在计算其支持度之前进行修剪,修剪方法为:Further preferably, the k-size NSC is trimmed before calculating its support. The trimming method is:
如果
Figure PCTCN2019102473-appb-000024
Figure PCTCN2019102473-appb-000025
则剪除负候选序列ns。
in case
Figure PCTCN2019102473-appb-000024
with
Figure PCTCN2019102473-appb-000025
Then cut out the negative candidate sequence ns.
根据本发明优选的,所述步骤i,计算负候选序列的支持度,是指:According to the present invention, the step i, calculating the support degree of the negative candidate sequence, refers to:
大小为m并且含有n个负元素的序列ns,对于
Figure PCTCN2019102473-appb-000026
(只含有一个负元素的序列)∈1-negMSS ns(含有一个负元素的序列的集合),1≤i≤n,在数据库中,ns的支持度sup(ns)如式(Ⅰ)、 式(Ⅱ)、式(Ⅲ)所示:
A sequence ns of size m and n negative elements, for
Figure PCTCN2019102473-appb-000026
(Sequence containing only one negative element) ∈1-negMSS ns (a set of sequences containing one negative element), 1≤i≤n, in the database, the support of ns sup(ns) is as in formula (I), (Ⅱ), formula (Ⅲ) shows:
若ns的大小为1,并且ns只有1个负元素,则ns的支持度为:If the size of ns is 1, and ns has only 1 negative element, the support of ns is:
Figure PCTCN2019102473-appb-000027
Figure PCTCN2019102473-appb-000027
若ns只包含一个负项,则序列ns的支持度为:If ns contains only one negative term, the support degree of sequence ns is:
sup(ns)=sup(MPS(ns)-sup(p(ns)))    (Ⅱ)sup(ns)=sup(MPS(ns)-sup(p(ns))) (Ⅱ)
否则,ns的支持度为:Otherwise, the support of ns is:
Figure PCTCN2019102473-appb-000028
Figure PCTCN2019102473-appb-000028
式(Ⅰ)、式(Ⅱ)、式(Ⅲ)中,OR是指位操作中的与运算,即将p(1-negMS i)相对应的位图一一进行与运算,与运算是指多个位图进行合并产生一个新位图,若位图中相同位置上都是1的话,则新位图上对应位置为1,否则,都为0,N是指对位图中的1的个数。例如,一条负候选序列
Figure PCTCN2019102473-appb-000029
sup<ce>=5,相应的MPS(ns)=<ce>,p(1-negMS 1)=<ace>,p(1-negMS 2)=<cef>。假设B(<ace>)=|0|0|1|1|0|,B(<cef>)=|0|1|1|1|0|,
Figure PCTCN2019102473-appb-000030
Figure PCTCN2019102473-appb-000031
因此
Figure PCTCN2019102473-appb-000032
Figure PCTCN2019102473-appb-000033
Figure PCTCN2019102473-appb-000034
In formula (Ⅰ), formula (Ⅱ), and formula (Ⅲ), OR refers to the AND operation in the bit operation, that is, the bitmap corresponding to p(1-negMS i ) is ANDed one by one, and the AND operation means multiple The two bitmaps are merged to generate a new bitmap. If the same position in the bitmap is all 1, the corresponding position on the new bitmap is 1, otherwise, all are 0. N refers to the number of 1 in the bitmap. number. For example, a negative candidate sequence
Figure PCTCN2019102473-appb-000029
sup<ce>=5, corresponding MPS(ns)=<ce>, p(1-negMS 1 )=<ace>, p(1-negMS 2 )=<cef>. Suppose B(<ace>)=|0|0|1|1|0|, B(<cef>)=|0|1|1|1|0|,
Figure PCTCN2019102473-appb-000030
Figure PCTCN2019102473-appb-000031
therefore
Figure PCTCN2019102473-appb-000032
And
Figure PCTCN2019102473-appb-000033
Figure PCTCN2019102473-appb-000034
本发明的有益效果为:The beneficial effects of the present invention are:
1、目前用于负序列模式挖掘的算法较少,但是这些算法,往往效率都很低,我们提出了一个高效的负序列模式挖掘算法——eNSP-IT算法,能够用更少的时间挖掘出用户感兴的序列模式。对于临床用药行为数据这类数据中包含项目多、序列长度大的稠密型数据有很好的实验结果,能够较为迅速的得到结果。1. At present, there are few algorithms for mining negative sequence patterns, but these algorithms are often very inefficient. We have proposed an efficient negative sequence pattern mining algorithm-eNSP-IT algorithm, which can mine in less time Sequence mode that users are interested in. For such data as clinical medication behavior data, dense data with many items and large sequence length have good experimental results, and the results can be obtained relatively quickly.
2、与其他负序列模式挖掘算法相比,eNSP-IT算法的负约束条件更宽松,能够挖掘出更多的序列模式,能够为用户提供更多的决策信息。2. Compared with other negative sequence pattern mining algorithms, the negative constraint conditions of the eNSP-IT algorithm are more relaxed, which can mine more sequence patterns and provide users with more decision information.
3、本发明应用在进行临床用药分析的过程中,可以充分的将正负序列模式结合起来作为参考,从而发现在某一疾病治疗过程中,最常使用的药物治疗方案,这样医生在对患者进行治疗时,利用本发明可以向他提供以往的治疗方案,从而更好的预测患者下一步用药,对基于药物方案变化的临床决策进行支持。3. The application of the present invention can fully combine the positive and negative sequence patterns as a reference in the process of clinical drug analysis, so as to discover the most commonly used drug treatment plan in the treatment of a certain disease, so that the doctor can treat the patient During treatment, the present invention can provide him with previous treatment plans, so as to better predict the patient's next medication and support clinical decision-making based on changes in the medication plan.
附图说明Description of the drawings
图1为本发明基于高效的负序列挖掘模式的临床用药行为分析系统的结构框图。Fig. 1 is a structural block diagram of a clinical medication behavior analysis system based on an efficient negative sequence mining model of the present invention.
具体实施方式Detailed ways
下面结合说明书附图和实施例对本发明作进一步限定,但不限于此。In the following, the present invention is further limited in combination with the drawings and the embodiments of the specification, but is not limited thereto.
实施例1Example 1
一种基于高效的负序列挖掘模式的临床用药行为分析系统,如图1所示,包括通过传输网络通信连接的数据采集系统和行为分析系统;A clinical drug behavior analysis system based on an efficient negative sequence mining model, as shown in Figure 1, includes a data acquisition system and a behavior analysis system connected through a transmission network communication;
数据采集系统包括依次连接的数据采集模块、数据传输模块;The data acquisition system includes a data acquisition module and a data transmission module connected in sequence;
数据采集模块,用于实时采集并保存患者的临床用药行为数据,临床用药行为数据包括患者的ID号、时间戳(即诊疗的时间)、开具的药品、病状、病症和患者所在科室;The data collection module is used to collect and save the patient's clinical medication behavior data in real time. The clinical medication behavior data includes the patient's ID number, timestamp (that is, the time of diagnosis and treatment), prescribed drugs, symptoms, symptoms, and the patient's department;
数据传输模块,用于通过传输网络将患者的临床用药行为数据传输至行为分析系统;The data transmission module is used to transmit the patient's clinical medication behavior data to the behavior analysis system through the transmission network;
行为分析系统包括依次连接的数据处理模块、数据分析模块、数据管理模块;并设置在云服务器内。所述数据传输模块连接所述数据处理模块;The behavior analysis system includes a data processing module, a data analysis module, and a data management module connected in sequence; and is set in the cloud server. The data transmission module is connected to the data processing module;
数据处理模块,用于对采集的患者的临床用药行为数据进行数据清洗,并按照患者所在科室、病症进行数据分类;The data processing module is used to clean the collected clinical medication behavior data of the patient and classify the data according to the department and disease of the patient;
数据分析模块,用于根据数据处理模块的处理结果对患者的临床用药行为进行分析和预测;包括步骤如下:The data analysis module is used to analyze and predict the clinical medication behavior of patients according to the processing results of the data processing module; the steps are as follows:
数据分析模块基于所述数据处理模块处理后的临床用药行为数据,建立与患者的ID号对应的用药行为序列,并结合所述的高效的负序列挖掘模式的临床用药行为的分析方法对临床用药行为进行分析和预测,患者所在科室、病症相同的患者的临床用药行为数据构成一个序列数据库,每一个患者的ID号对应一条病人在某个时间段内所有的用药记录构成一个有序的序列;使用高效的负序列挖掘模式的临床用药行为的分析方法对序列数据库进行挖掘,得到符合最小支持度要求的负序列模式,即此病症的常用治疗药品、用药顺序、药品与药品之间的关系,将能用于决策的负序列模式筛选出来,利用所述用于决策的序列模式对患者的用药行为进行分析。The data analysis module establishes the medication behavior sequence corresponding to the patient’s ID number based on the clinical medication behavior data processed by the data processing module, and combines the clinical medication behavior analysis method of the effective negative sequence mining mode to analyze the clinical medication behavior Behavior analysis and prediction. The clinical medication behavior data of patients in the department and the same symptoms constitute a sequence database. Each patient’s ID number corresponds to a patient’s medication records in a certain period of time to form an orderly sequence; Use an efficient negative sequence mining model of clinical drug behavior analysis method to mine the sequence database to obtain a negative sequence model that meets the minimum support requirements, that is, the commonly used treatment drugs for this disease, the order of medication, and the relationship between drugs and drugs. The negative sequence patterns that can be used for decision-making are screened out, and the patient's medication behavior is analyzed by using the sequence patterns for decision-making.
数据管理模块,用于对数据处理模块的处理结果及数据分析模块分析的临床用药行为结果进行存储和显示,当医生开具药品时,推荐下一步的用药。数据管理模块用于查看所有的临床用药行为记录和所有频繁的临床用药行为。当医生给患者进行治疗时,系统会提供此病症常用的治疗方案,当首选治疗方案效果不理想时,提供备选治疗方案。The data management module is used to store and display the processing results of the data processing module and the clinical medication behavior results analyzed by the data analysis module. When the doctor prescribes drugs, the next medication is recommended. The data management module is used to view all clinical medication behavior records and all frequent clinical medication behaviors. When the doctor treats the patient, the system will provide the commonly used treatment plan for this disease, and when the first treatment plan is not satisfactory, it will provide an alternative treatment plan.
传输网络为有线公网、局域网或3G/4G网络。The transmission network is a wired public network, a local area network or a 3G/4G network.
本发明采用云端管理平台设计(如阿里云服务器、华为云、京东云等模式),各医院不需要配置服务器。医院租用本系统云端管理平台服务器,帮助医院对接院内各系统接口,导入数据等。可通过互联网在任何地方通过相应权限登录系统,无需安装客户端,实现安全管理的灵活性。本系统也可在医院本地私有化云部署,登录医院局域网联通。The invention adopts a cloud management platform design (such as Alibaba Cloud Server, Huawei Cloud, JD Cloud, etc.), and each hospital does not need to configure a server. The hospital rents the cloud management platform server of this system to help the hospital connect with the various system interfaces in the hospital and import data. You can log in to the system through the Internet at any place through the corresponding authority, without installing a client, and realize the flexibility of security management. The system can also be deployed in the hospital's local privatized cloud, and log in to the hospital's local area network to connect.
实施例2Example 2
实施例1所述基于高效的负序列挖掘模式的临床用药行为分析系统的工作方法,包括步骤如下:The working method of the clinical medication behavior analysis system based on the efficient negative sequence mining mode described in embodiment 1, includes the following steps:
(1)数据采集模块实时采集并保存患者的临床用药行为数据,临床用药行为数据包括患者的ID号、时间戳(即诊疗的时间)、开具的药品、病状、病症和患者所在科室;(1) The data collection module collects and saves the patient's clinical medication behavior data in real time. The clinical medication behavior data includes the patient's ID number, timestamp (that is, the time of diagnosis and treatment), prescribed drugs, symptoms, symptoms, and the patient's department;
设定由病人使用的药物组成的负候选序列ns;例如,设定一个负侯选序列为
Figure PCTCN2019102473-appb-000035
Figure PCTCN2019102473-appb-000036
是指没有使用药物b、d,a、c是指使用的药物a、b;
Set a negative candidate sequence ns composed of drugs used by the patient; for example, set a negative candidate sequence as
Figure PCTCN2019102473-appb-000035
Figure PCTCN2019102473-appb-000036
It means that the drugs b and d are not used, and a and c are the drugs a and b used;
设定m-size是指负侯选序列ns中包含的m个元素;例如,
Figure PCTCN2019102473-appb-000037
为4-size序列;
Setting m-size refers to the m elements contained in the negative candidate sequence ns; for example,
Figure PCTCN2019102473-appb-000037
Is a 4-size sequence;
设定MPS(ns)是一条由病人使用的药物组成的指负侯选序列ns的最大正子序列,由负侯选序列ns中包含的所有正元素按照原顺序组成,即由这条负候选序列中所有病人使用过的药物组成;例如:ns中
Figure PCTCN2019102473-appb-000038
代表没有使用的药物,而a、c代表使用的药物;则最大正子序列为
Figure PCTCN2019102473-appb-000039
Figure PCTCN2019102473-appb-000040
Set MPS(ns) to be the largest positive subsequence of the negative candidate sequence ns composed of the medicine used by the patient, which is composed of all the positive elements contained in the negative candidate sequence ns in the original order, that is, this negative candidate sequence The composition of the drugs used by all patients in the
Figure PCTCN2019102473-appb-000038
Represents drugs not used, and a and c represent drugs used; the largest positive sequence is
Figure PCTCN2019102473-appb-000039
Figure PCTCN2019102473-appb-000040
设定正偶P(ns)是将一个由病人使用的药物组成的负侯选序列ns中的负元素全部转化为对应的正元素后的序列;例如,
Figure PCTCN2019102473-appb-000041
Setting the positive pair P(ns) is the sequence after all the negative elements in a negative candidate sequence ns composed of the medicine used by the patient are converted into corresponding positive elements; for example,
Figure PCTCN2019102473-appb-000041
设定1-negMS ns是指负侯选序列ns的子序列,并且该子序列是由MPS(ns)以及一个负元素组成; Set 1-negMS ns to refer to the subsequence of the negative candidate sequence ns, and the subsequence is composed of MPS(ns) and a negative element;
设定1-negMSS ns是指包含负侯选序列ns的所有负序列的子序列的集合; Setting 1-negMSS ns refers to the set of subsequences of all negative sequences including the negative candidate sequence ns;
设定p(1-negMS ns)是指序列1-negMS ns中的正元素不变,将负元素转换为相应的正元素;如:
Figure PCTCN2019102473-appb-000042
Setting p(1-negMS ns ) means that the positive element in the sequence 1-negMS ns remains unchanged, and the negative element is converted to the corresponding positive element; for example:
Figure PCTCN2019102473-appb-000042
设定ds是指数据库中的一个数据序列,ds包含一位病人在本次治疗过程中所使用的药物,药物按用药的先后次序排列;Setting ds refers to a data sequence in the database, ds contains the drugs used by a patient during this treatment, and the drugs are arranged in the order of medication;
综上,对于一个数据序列ds和一个包含的所有元素的个数为m,并且含有n个负元素的序列ns,满足元素约束、格式约束及频繁约束,且满足条件:
Figure PCTCN2019102473-appb-000043
且每一个1-negMS ns满足
Figure PCTCN2019102473-appb-000044
则ds包含ns:
In summary, for a data sequence ds and a sequence ns containing n negative elements with the number of all elements being m, satisfy the element constraint, format constraint, and frequent constraint, and meet the conditions:
Figure PCTCN2019102473-appb-000043
And each 1-negMS ns satisfies
Figure PCTCN2019102473-appb-000044
Then ds contains ns:
元素约束是指:元素内部不允许有负项;只有序列中元素才可以变负;例如:
Figure PCTCN2019102473-appb-000045
符合约束;而
Figure PCTCN2019102473-appb-000046
不符合约束,因为
Figure PCTCN2019102473-appb-000047
是元素
Figure PCTCN2019102473-appb-000048
内部的负项;
Element constraint means: no negative items are allowed inside elements; only elements in the sequence can become negative; for example:
Figure PCTCN2019102473-appb-000045
Meet the constraints; and
Figure PCTCN2019102473-appb-000046
Does not meet the constraints because
Figure PCTCN2019102473-appb-000047
Is the element
Figure PCTCN2019102473-appb-000048
Internal negative
格式约束是指:不存在连续2个或2个以上的负元素;例如:
Figure PCTCN2019102473-appb-000049
不满足约束,因为负元素
Figure PCTCN2019102473-appb-000050
为连续的两个负元素;
The format constraint means that there are no consecutive 2 or more negative elements; for example:
Figure PCTCN2019102473-appb-000049
The constraint is not satisfied because the negative element
Figure PCTCN2019102473-appb-000050
Are two consecutive negative elements;
频繁约束是指:负序列满足1-negMS ns∈1-negMSS ns且p(1-negMS ns)∈PSP,PSP是指的是正序列模式; Frequent constraint is: negative sequences satisfy 1-negMS ns ∈1-negMSS ns and p (1-negMS ns) ∈PSP , PSP refers to the positive sequence pattern;
(2)本实施例以医保数据中的胃炎门诊数据为作为实验数据,表3是将医保数据预处理后整理为序列数据库的部分结果,利用eNSP-IT算法对临床用药行为进行分析,最小支持度min_sup=30%,数据传输模块通过传输网络将患者的临床用药行为数据传输至行为分析系统,行为 分析系统利用eNSP-IT算法对临床用药行为数据进行分析,包括步骤如下:(2) In this example, the gastritis outpatient data in the medical insurance data is used as the experimental data. Table 3 is a partial result of preprocessing the medical insurance data into a sequence database. The eNSP-IT algorithm is used to analyze the clinical medication behavior, and the minimum support Degree min_sup=30%, the data transmission module transmits the patient's clinical medication behavior data to the behavior analysis system through the transmission network. The behavior analysis system uses the eNSP-IT algorithm to analyze the clinical medication behavior data, including the following steps:
表3table 3
病人IDPatient ID 病人使用的药物序列The sequence of drugs used by the patient
11 <(葡萄糖)(氯化钠溶液)(头孢曲松)(维生素B6)(西米替丁)(吗丁啉)><(Glucose) (Sodium Chloride Solution) (Ceftriaxone) (Vitamin B6) (Cimetidine) (Domperidone)>
22 <(奥美拉唑)(阿莫西林)><(Omeprazole)(Amoxicillin)>
33 <(氯化钠溶液)(头孢曲松)(葡萄糖)(奥美拉唑)><(Sodium chloride solution)(Ceftriaxone)(Glucose)(Omeprazole)>
44 <(氯化钠溶液)(香丹注射液)(黄芪注射液)><(Sodium chloride solution)(Xiangdan injection)(Astragalus injection)>
55 <(氯化钠溶液)(头孢曲松)(地奥心血康胶囊)(三九胃泰颗粒)(吗丁啉)><(Sodium Chloride Solution)(Ceftriaxone)(Diaoxinxuekang Capsule)(Sanjiuweitai Granule)(Domperidone)>
a、数据处理模块对采集的患者的临床用药行为数据进行数据清洗,并按照患者所在科室、病症进行数据分类;包括步骤如下:a. The data processing module cleans the collected clinical medication behavior data of the patient, and classifies the data according to the department and disease of the patient; the steps are as follows:
通过所述的数据采集系统对患者的临床用药行为数据进行采集时,会产生大量的数据量,同时数据可能中出现重复或者数据信息不完善等情况。因此,需要When collecting clinical medication behavior data of patients through the data collection system, a large amount of data will be generated, and at the same time there may be duplication in the data or incomplete data information. Therefore, need
d、对采集的患者的临床用药行为数据进行优化,使其能适用于后期的分析。对数据进行优化包括填充缺失数据、过滤掉异常数据;d. Optimize the collected clinical medication behavior data of patients to make them suitable for later analysis. Data optimization includes filling in missing data and filtering out abnormal data;
e、对优化后的患者的临床用药行为数据进行标准化处理,所述标准化处理是指对数据进行整合,即把患者的ID号相同的病人的每一周的用药记录整理成一条顺序序列,形成完整的患者的临床用药行为数据;一个病人在某个时间段内所有的用药记录构成一个有序的序列,在序列中,项/项集是有顺序的,每个项都代表一种药物,而元素则是指该病人在某一个具体的时间点同时使用的所有药物;该病人可能在不同的时间段里使用同一中药物,即一个项可能在一个序列的不同元素中发生。e. Perform standardized processing on the optimized clinical medication behavior data of patients. The standardized processing refers to the integration of data, that is, the weekly medication records of patients with the same patient ID number are sorted into a sequence to form a complete sequence The clinical medication behavior data of patients; all medication records of a patient in a certain period of time constitute an ordered sequence. In the sequence, the item/item set is ordered, and each item represents a drug, and Elements refer to all medicines used by the patient at a specific point in time; the patient may use the same Chinese medicine in different time periods, that is, an item may occur in different elements in a sequence.
f、按照患者所在科室、病症这两种分类特征对患者的临床用药行为数据进行分类,并按照患者的ID号、时间戳(即诊疗的时间)、开具的药品、病状、病症和患者所在科室存储在所述数据管理模块中。f. According to the two classification characteristics of the patient's department and disease, the patient's clinical drug behavior data is classified, and according to the patient's ID number, time stamp (that is, the time of diagnosis and treatment), drugs prescribed, symptoms, symptoms and the patient's department Stored in the data management module.
b、数据分析模块根据数据处理模块的处理结果对患者的临床用药行为进行分析和预测;b. The data analysis module analyzes and predicts the clinical medication behavior of patients according to the processing results of the data processing module;
c、数据管理模块对数据处理模块的处理结果及数据分析模块分析的临床用药行为结果进行存储和显示,当医生开具药品时,推荐下一步的用药。c. The data management module stores and displays the processing results of the data processing module and the clinical medication behavior results analyzed by the data analysis module. When the doctor prescribes drugs, the next medication is recommended.
步骤b,数据分析模块根据数据处理模块的处理结果对患者的临床用药行为进行分析和预测,包括步骤如下:Step b: The data analysis module analyzes and predicts the patient's clinical medication behavior according to the processing result of the data processing module, including the following steps:
g、用修改后的正序列模式挖掘算法Prefixspan挖掘得到所有的正序列模式,即在某一段时间内,患者群体中使用最频繁的药品次序,在修改后的正序列模式挖掘算法Prefixspan中,对每一个频繁正序列都使用位图来存储包含它的数据序列ID号,表4显示了部分正序列模式和其位图;g. Use the modified positive sequence pattern mining algorithm Prefixspan to mine all positive sequence patterns, that is, the order of the most frequently used drugs in the patient population within a certain period of time. In the modified positive sequence pattern mining algorithm Prefixspan, right Every frequent positive sequence uses a bitmap to store the data sequence ID number that contains it. Table 4 shows some positive sequence patterns and their bitmaps;
表4Table 4
正序列模式Positive sequence mode 位图bitmap
<(维生素B6)(维生素C)><(Vitamin B6)(Vitamin C)> |0|0|0|0|0|1|0|0|0|1|……|0|0|1||0|0|0|0|0|1|0|0|0|1|……|0|0|1|
<(氯化钠溶液)(头孢曲松)(奥美拉唑)><(Sodium Chloride Solution) (Ceftriaxone) (Omeprazole)> |0|0|1|0|0|0|0|1|0|0|……|0|0|0||0|0|1|0|0|0|0|1|0|0|……|0|0|0|
<(奥美拉唑)(复方大青叶片)><(Omeprazole) (Compound Daqing Leaf)> |0|0|0|0|0|0|0|1|0|0|……|0|1|0||0|0|0|0|0|0|0|1|0|0|……|0|1|0|
<(三九胃泰颗粒)(吗丁啉)><(Sanjiu Weitai Granules) (Domperidone)> |0|0|0|0|1|0|0|0|1|0|……|1|0|0||0|0|0|0|1|0|0|0|1|0|……|1|0|0|
h、采用了PNSP的负候选序列生成方法,生成负候选序列(Negative Sequential Candidates,NSC),该负候选序列用于判断在某一时间段内,哪些药物使用的次数多,哪些药物没有被使用。根据实验数据,生成如下负候选序列
Figure PCTCN2019102473-appb-000051
Figure PCTCN2019102473-appb-000052
h. The negative candidate sequence generation method of PNSP is adopted to generate negative candidate sequence (Negative Sequential Candidates, NSC), which is used to determine which drugs are used more frequently and which drugs are not used in a certain period of time . According to the experimental data, generate the following negative candidate sequence
Figure PCTCN2019102473-appb-000051
Figure PCTCN2019102473-appb-000052
i、使用位图操作,计算负候选序列的支持度;i. Use bitmap operations to calculate the support for negative candidate sequences;
j、从负候选序列中筛选出符合最小支持度要求的负序列模式,并用适当的筛选方法将能用于决策的负序列模式筛选出来,利用所述用于决策的序列模式对患者的用药行为进行分析;医生根据分析结果预测患者的下一步治疗方案,对基于药物方案变化的临床决策进行支持,表5显示了在最小支持度min_sup=30%下,挖掘出来的部分负序列模式。j. Screen negative sequence patterns that meet the minimum support requirements from negative candidate sequences, and use appropriate screening methods to screen out negative sequence patterns that can be used for decision-making, and use the sequence patterns for decision-making on the patient's medication behavior Analyze; the doctor predicts the patient's next treatment plan according to the analysis result, and supports the clinical decision based on the change of the drug plan. Table 5 shows the partial negative sequence patterns mined under the minimum support min_sup=30%.
表5table 5
Figure PCTCN2019102473-appb-000053
Figure PCTCN2019102473-appb-000053
例如,两个负序列模式
Figure PCTCN2019102473-appb-000054
Figure PCTCN2019102473-appb-000055
P 1和P 2表明,在治疗胃炎时,医生经常选择这两个序列中的处方,通过这两个负序列模式可以发现每个处方中药物之间的潜在关系。P 1表示医生在使用葡萄糖、头孢曲松、维生素B6和氯化钠溶液后不使用维生素C。P 2是指医生开了头孢曲松和维生素C后,不使用维生素C,然后使用西米替丁而不是奥美拉唑。因此,使用NSP挖掘方法可以有效地帮助医生准确预测患者的下一步用药。
For example, two negative sequence patterns
Figure PCTCN2019102473-appb-000054
with
Figure PCTCN2019102473-appb-000055
P 1 and P 2 indicate that when treating gastritis, doctors often choose the prescriptions in these two sequences, and the potential relationship between the drugs in each prescription can be discovered through these two negative sequence patterns. P 1 means that the doctor does not use vitamin C after using glucose, ceftriaxone, vitamin B6 and sodium chloride solution. P 2 means that after the doctor prescribed ceftriaxone and vitamin C, he did not use vitamin C, and then used cimetidine instead of omeprazole. Therefore, using NSP mining methods can effectively help doctors accurately predict the patient's next medication.
按照步骤g所述方法,为了提高负序列模式挖掘的时间效率,使用PrefixSpan算法挖掘正序列模式,同时,利用位图策略进一步增强PrefixSpan算法,以提高空间效率。与使用位图结构的其他挖掘方法不同,修改后的PrefixSpan算法使用简单的位图结构和操作来获得顺序模式,包括步骤如下:According to the method described in step g, in order to improve the time efficiency of negative sequence pattern mining, the PrefixSpan algorithm is used to mine the positive sequence pattern. At the same time, the Bitmap strategy is used to further enhance the PrefixSpan algorithm to improve the space efficiency. Unlike other mining methods that use bitmap structures, the modified PrefixSpan algorithm uses simple bitmap structures and operations to obtain sequential patterns, including the following steps:
k、在每个数据序列ds上添加ID;k. Add ID to each data sequence ds;
l、扫描数据库(包含所有数据序列ds的集合)查找所有项,项指的是每种药品,为每个项创建位图,每个位图的长度等于数据库中的数据序列数,如果一个项出现在数据序列i中,则该项的位图在位置i设置为1;否则,则该项的位图在位置i设置为0,位图用B表示;1. Scan the database (contains the collection of all data sequences ds) to find all items, the item refers to each medicine, create a bitmap for each item, the length of each bitmap is equal to the number of data sequences in the database, if one item If it appears in the data sequence i, the bitmap of the item is set to 1 at position i; otherwise, the bitmap of the item is set to 0 at position i, and the bitmap is represented by B;
m、根据每个项的位图,计算每个项的支持度,即位图中1的个数;判断项的支持度是否满足最小支持度min_sup,最小支持度min_sup指的是由用户设定的,频繁模式出现的最小频率;如果项的支持度大于或等于最小支持度min_sup,则该项是长度为1的PSP,将长度为1的PSP看作长度为1的前缀;否则,不是长度为1的PSP,删除此项;m. Calculate the support of each item according to the bitmap of each item, that is, the number of 1 in the bitmap; determine whether the support of the item meets the minimum support min_sup, which is set by the user , The minimum frequency of frequent patterns; if the item's support is greater than or equal to the minimum support min_sup, then the item is a PSP of length 1, and the PSP of length 1 is regarded as a prefix of length 1; otherwise, it is not a length of 1 1 PSP, delete this item;
n、对于每个长度为i满足支持度要求的前缀进行递归挖掘,i≥1,基于前缀的位图,找到包含此前缀的数据序列,同时将数据序列对应此前缀的投影存入投影数据库中;n. Perform recursive mining for each prefix of length i that meets the support requirements, i≥1, based on the bitmap of the prefix, find the data sequence containing the prefix, and store the projection of the data sequence corresponding to the prefix in the projection database ;
o、扫描投影数据库,找到所有项,并根据其对应的数据序列的ID创建位图,计算各个项的支持度,即位图中1的个数,若所有项的支持度都低于min_sup,则递归返回,否则,进入步骤p;o. Scan the projection database to find all items, create a bitmap according to the ID of the corresponding data sequence, calculate the support of each item, that is, the number of 1 in the bitmap, if the support of all items is lower than min_sup, then Return recursively, otherwise, go to step p;
p、将满足支持度计数的各个项和当前的前缀进行合并,并将两者的位图进行位运算,即对两个位图进行与运算,得到新前缀和它的位图,新前缀为长度为i的PSP,若PSP是一个1-size的PSP,直接存储它的支持度,否则,继续使用位图存储信息;p. Combine the items that meet the support count and the current prefix, and perform bit operations on the bitmaps of the two, that is, perform the AND operation on the two bitmaps to obtain the new prefix and its bitmap. The new prefix is For a PSP with length i, if the PSP is a 1-size PSP, store its support directly, otherwise, continue to use the bitmap to store information;
q、i加1,前缀为合并项后的各个新前缀,分别递归执行步骤o至q。q, i plus 1, the prefix is each new prefix after the merged item, and steps o to q are executed recursively.
按照步骤h所述方法,为了提高挖掘到NSP的数量,ENSP-IT放宽了频繁约束,同时采用了PNSP的负候选序列生成方法。包括步骤如下:According to the method described in step h, in order to increase the number of NSPs mined, ENSP-IT relaxes the frequent constraint and adopts the PNSP negative candidate sequence generation method. The steps are as follows:
r、由1-size的PSP生成1-size的NSC;如1-size的PSP<a>生成1-size的
Figure PCTCN2019102473-appb-000056
r. Generate 1-size NSC from 1-size PSP; such as 1-size PSP<a> generate 1-size
Figure PCTCN2019102473-appb-000056
s、定义约束条件为:不允许NSP中的连续负元素;2-size NSC是由1-size的PSP和1-size的 NSP的排列生成的,例如
Figure PCTCN2019102473-appb-000057
如果ns的最后一个元素是一个正元素,则附加1-size的PSP或1-size的NSP;否则,附加1-size的PSP;
s. The definition constraint is: continuous negative elements in NSP are not allowed; 2-size NSC is generated by the arrangement of 1-size PSP and 1-size NSP, for example
Figure PCTCN2019102473-appb-000057
If the last element of ns is a positive element, add 1-size PSP or 1-size NSP; otherwise, add 1-size PSP;
t、在(k-1)-size的候选序列(NSC或PSP)上附加1-size的PSP或1-size的NSP产生k-size的NSC;t. Add 1-size PSP or 1-size NSP to the (k-1)-size candidate sequence (NSC or PSP) to generate k-size NSC;
u、重复上述步骤r至步骤t直到没有生成NSC,或者NSC的元素个数大于2l+1,l表示PSP中最大序列的元素个数;如果PSP中最大序列的元素个数为m,则生成的NSP的最大元素个数为2m+1;u. Repeat the above steps r to step t until no NSC is generated, or the number of NSC elements is greater than 2l+1, l represents the number of elements in the largest sequence in the PSP; if the number of elements in the largest sequence in the PSP is m, then generate The maximum number of elements of the NSP is 2m+1;
k-size的NSC在计算其支持度之前进行修剪,修剪方法为:The k-size NSC is trimmed before calculating its support. The trimming method is:
如果
Figure PCTCN2019102473-appb-000058
Figure PCTCN2019102473-appb-000059
则剪除负候选序列ns。
in case
Figure PCTCN2019102473-appb-000058
with
Figure PCTCN2019102473-appb-000059
Then cut out the negative candidate sequence ns.
k-size的NSC在计算其支持度之前进行修剪,修剪方法为:The k-size NSC is trimmed before calculating its support. The trimming method is:
如果
Figure PCTCN2019102473-appb-000060
Figure PCTCN2019102473-appb-000061
则剪除负候选序列ns。
in case
Figure PCTCN2019102473-appb-000060
with
Figure PCTCN2019102473-appb-000061
Then cut out the negative candidate sequence ns.
按照步骤i所述方法,计算负候选序列的支持度,是指:According to the method described in step i, calculating the support degree of the negative candidate sequence refers to:
大小为m并且含有n个负元素的序列ns,对于
Figure PCTCN2019102473-appb-000062
(只含有一个负元素的序列)∈1-negMSS ns(含有一个负元素的序列的集合),1≤i≤n,在数据库中,ns的支持度sup(ns)如式(Ⅰ)、式(Ⅱ)、式(Ⅲ)所示:
A sequence ns of size m and n negative elements, for
Figure PCTCN2019102473-appb-000062
(Sequence containing only one negative element) ∈1-negMSS ns (Set of sequences containing one negative element), 1≤i≤n, in the database, the support of ns sup(ns) is as formula (I), (Ⅱ), formula (Ⅲ) shows:
若ns的大小为1,并且ns只有1个负元素,则ns的支持度为:If the size of ns is 1, and ns has only 1 negative element, the support of ns is:
Figure PCTCN2019102473-appb-000063
Figure PCTCN2019102473-appb-000063
若ns只包含一个负项,则序列ns的支持度为:If ns contains only one negative term, the support degree of sequence ns is:
sup(ns)=sup(MPS(ns)-sup(p(ns)))   (Ⅱ)sup(ns)=sup(MPS(ns)-sup(p(ns))) (Ⅱ)
否则,ns的支持度为:Otherwise, the support of ns is:
Figure PCTCN2019102473-appb-000064
Figure PCTCN2019102473-appb-000064
式(Ⅰ)、式(Ⅱ)、式(Ⅲ)中,OR是指位操作中的与运算,即将p(1-negMS i)相对应的位图一一进行与运算,与运算是指多个位图进行合并产生一个新位图,若位图中相同位置上都是1的话,则新位图上对应位置为1,否则,都为0,N是指对位图中的1的个数。 In formula (Ⅰ), formula (Ⅱ), and formula (Ⅲ), OR refers to the AND operation in the bit operation, that is, the bitmap corresponding to p(1-negMS i ) is ANDed one by one, and the AND operation means multiple The two bitmaps are merged to generate a new bitmap. If the same position in the bitmap is all 1, the corresponding position on the new bitmap is 1, otherwise, all are 0. N refers to the number of 1 in the bitmap. number.
实施例2Example 2
实施例1所述基于高效的负序列挖掘模式的临床用药行为分析系统的工作方法,包括步骤如下:The working method of the clinical medication behavior analysis system based on the efficient negative sequence mining mode described in embodiment 1, includes the following steps:
(1)数据采集模块实时采集并保存患者的临床用药行为数据,临床用药行为数据包括患者的ID号、时间戳(即诊疗的时间)、开具的药品、病状、病症和患者所在科室;(1) The data collection module collects and saves the patient's clinical medication behavior data in real time. The clinical medication behavior data includes the patient's ID number, timestamp (that is, the time of diagnosis and treatment), prescribed drugs, symptoms, symptoms, and the patient's department;
设定由病人使用的药物组成的负候选序列ns;例如,设定一个负侯选序列为
Figure PCTCN2019102473-appb-000065
Figure PCTCN2019102473-appb-000066
是指没有使用药物b、d,a、c是指使用的药物a、b;
Set a negative candidate sequence ns composed of drugs used by the patient; for example, set a negative candidate sequence as
Figure PCTCN2019102473-appb-000065
Figure PCTCN2019102473-appb-000066
It means that the drugs b and d are not used, and a and c are the drugs a and b used;
设定m-size是指负侯选序列ns中包含的m个元素;例如,
Figure PCTCN2019102473-appb-000067
为4-size序列;
Setting m-size refers to the m elements contained in the negative candidate sequence ns; for example,
Figure PCTCN2019102473-appb-000067
Is a 4-size sequence;
设定MPS(ns)是一条由病人使用的药物组成的指负侯选序列ns的最大正子序列,由负侯选序列ns中包含的所有正元素按照原顺序组成,即由这条负候选序列中所有病人使用过的药物组成;例如:ns中
Figure PCTCN2019102473-appb-000068
代表没有使用的药物,而a、c代表使用的药物;则最大正子序列为
Figure PCTCN2019102473-appb-000069
Figure PCTCN2019102473-appb-000070
Set MPS(ns) to be the largest positive subsequence of the negative candidate sequence ns composed of the medicine used by the patient, which is composed of all the positive elements contained in the negative candidate sequence ns in the original order, that is, this negative candidate sequence The composition of the drugs used by all patients in the
Figure PCTCN2019102473-appb-000068
Represents drugs not used, and a and c represent drugs used; the largest positive sequence is
Figure PCTCN2019102473-appb-000069
Figure PCTCN2019102473-appb-000070
设定正偶P(ns)是将一个由病人使用的药物组成的负侯选序列ns中的负元素全部转化为对应的正元素后的序列;例如,
Figure PCTCN2019102473-appb-000071
Setting the positive pair P(ns) is the sequence after all the negative elements in a negative candidate sequence ns composed of the medicine used by the patient are converted into corresponding positive elements; for example,
Figure PCTCN2019102473-appb-000071
设定1-negMS ns是指负侯选序列ns的子序列,并且该子序列是由MPS(ns)以及一个负元素组成; Set 1-negMS ns to refer to the subsequence of the negative candidate sequence ns, and the subsequence is composed of MPS(ns) and a negative element;
设定1-negMSS ns是指包含负侯选序列ns的所有负序列的子序列的集合; Setting 1-negMSS ns refers to the set of subsequences of all negative sequences including the negative candidate sequence ns;
设定p(1-negMS ns)是指序列1-negMS ns中的正元素不变,将负元素转换为相应的正元素;如:
Figure PCTCN2019102473-appb-000072
Setting p(1-negMS ns ) means that the positive element in the sequence 1-negMS ns remains unchanged, and the negative element is converted to the corresponding positive element; for example:
Figure PCTCN2019102473-appb-000072
设定ds是指数据库中的一个数据序列,ds包含一位病人在本次治疗过程中所使用的药物,药物按用药的先后次序排列;Setting ds refers to a data sequence in the database, ds contains the drugs used by a patient during this treatment, and the drugs are arranged in the order of medication;
综上,对于一个数据序列ds和一个包含的所有元素的个数为m,并且含有n个负元素的序列ns,满足元素约束、格式约束及频繁约束,且满足条件:
Figure PCTCN2019102473-appb-000073
且每一个1-negMS ns满足
Figure PCTCN2019102473-appb-000074
则ds包含ns:
In summary, for a data sequence ds and a sequence ns containing n negative elements with the number of all elements being m, satisfy the element constraint, format constraint, and frequent constraint, and meet the conditions:
Figure PCTCN2019102473-appb-000073
And each 1-negMS ns satisfies
Figure PCTCN2019102473-appb-000074
Then ds contains ns:
元素约束是指:元素内部不允许有负项;只有序列中元素才可以变负;例如:
Figure PCTCN2019102473-appb-000075
符合约束;而
Figure PCTCN2019102473-appb-000076
不符合约束,因为
Figure PCTCN2019102473-appb-000077
是元素
Figure PCTCN2019102473-appb-000078
内部的负项;
Element constraint means: no negative items are allowed inside elements; only elements in the sequence can become negative; for example:
Figure PCTCN2019102473-appb-000075
Meet the constraints; and
Figure PCTCN2019102473-appb-000076
Does not meet the constraints because
Figure PCTCN2019102473-appb-000077
Is the element
Figure PCTCN2019102473-appb-000078
Internal negative
格式约束是指:不存在连续2个或2个以上的负元素;例如:
Figure PCTCN2019102473-appb-000079
不满足约束,因为负元素
Figure PCTCN2019102473-appb-000080
为连续的两个负元素;
The format constraint means that there are no consecutive 2 or more negative elements; for example:
Figure PCTCN2019102473-appb-000079
The constraint is not satisfied because the negative element
Figure PCTCN2019102473-appb-000080
Are two consecutive negative elements;
频繁约束是指:负序列满足1-negMS ns∈1-negMSS ns且p(1-negMS ns)∈PSP,PSP是指的是正序列模式; Frequent constraint is: negative sequences satisfy 1-negMS ns ∈1-negMSS ns and p (1-negMS ns) ∈PSP , PSP refers to the positive sequence pattern;
(2)本实施例以医保数据中的糖尿病患者数据为作为实验数据,下表6是将医保数据预处理后整理为序列数据库的部分结果,利用eNSP-IT算法对临床用药行为进行分析,最小支持度min_sup=30%,包括步骤如下:(2) In this embodiment, the data of diabetic patients in the medical insurance data is used as the experimental data. Table 6 below is the partial result of preprocessing the medical insurance data into a sequence database. The eNSP-IT algorithm is used to analyze the clinical medication behavior. Support min_sup=30%, including the following steps:
表6Table 6
病人IDPatient ID 病人使用的药物序列The sequence of drugs used by the patient
11 <(二甲双胍,辛伐他汀,文拉法辛)(阿司匹林,格列吡嗪)(氢氯噻嗪,胰岛素)><(Metformin, Simvastatin, Venlafaxine) (Aspirin, Glipizide) (Hydrochlorothiazide, Insulin)>
22 <(二甲双胍)(格列吡嗪)(胰岛素)><(Metformin)(Glipizide)(Insulin)>
33 <(阿司匹林,阿奇霉素,二甲双胍)(胰岛素)><(Aspirin, Azithromycin, Metformin) (Insulin)>
44 <(二甲双胍)(乙酰己酰胺)(罗格列酮)><(Metformin)(acetohexanamide)(rosiglitazone)>
55 <(氨磺丁脲)(二甲双胍)(阿格列汀)(呲格列酮)(艾塞那肽)><(Sulbutamide)(Metformin)(Alogliptin)(Piglitazone)(Exenatide)>
a、数据处理模块对采集的患者的临床用药行为数据进行数据清洗,并按照患者所在科室、病症进行数据分类;包括步骤如下:a. The data processing module cleans the collected clinical medication behavior data of the patient, and classifies the data according to the department and disease of the patient; the steps are as follows:
通过所述的数据采集系统对患者的临床用药行为数据进行采集时,会产生大量的数据量,同时数据可能中出现重复或者数据信息不完善等情况。因此,需要When collecting clinical medication behavior data of patients through the data collection system, a large amount of data will be generated, and at the same time there may be duplication in the data or incomplete data information. Therefore, need
d、对采集的患者的临床用药行为数据进行优化,使其能适用于后期的分析。对数据进行优化包括填充缺失数据、过滤掉异常数据;d. Optimize the collected clinical medication behavior data of patients to make them suitable for later analysis. Data optimization includes filling in missing data and filtering out abnormal data;
e、对优化后的患者的临床用药行为数据进行标准化处理,所述标准化处理是指对数据进行整合,即把患者的ID号相同的病人的每一周的用药记录整理成一条顺序序列,形成完整的患者的临床用药行为数据;一个病人在某个时间段内所有的用药记录构成一个有序的序列,在序列中,项/项集是有顺序的,每个项都代表一种药物,而元素则是指该病人在某一个具体的时间点同时使用的所有药物;该病人可能在不同的时间段里使用同一中药物,即一个项可能在一个序列的不同元素中发生。e. Perform standardized processing on the optimized clinical medication behavior data of patients. The standardized processing refers to the integration of data, that is, the weekly medication records of patients with the same patient ID number are sorted into a sequence to form a complete sequence The clinical medication behavior data of patients; all medication records of a patient in a certain period of time constitute an ordered sequence. In the sequence, the item/item set is ordered, and each item represents a drug, and Elements refer to all medicines used by the patient at a specific point in time; the patient may use the same Chinese medicine in different time periods, that is, an item may occur in different elements in a sequence.
f、按照患者所在科室、病症这两种分类特征对患者的临床用药行为数据进行分类,并按照患者的ID号、时间戳(即诊疗的时间)、开具的药品、病状、病症和患者所在科室存储在所述数据管理模块中。f. According to the two classification characteristics of the patient's department and disease, the patient's clinical drug behavior data is classified, and according to the patient's ID number, time stamp (that is, the time of diagnosis and treatment), drugs prescribed, symptoms, symptoms and the patient's department Stored in the data management module.
b、数据分析模块根据数据处理模块的处理结果对患者的临床用药行为进行分析和预测;b. The data analysis module analyzes and predicts the clinical medication behavior of patients according to the processing results of the data processing module;
c、数据管理模块对数据处理模块的处理结果及数据分析模块分析的临床用药行为结果进行存储和显示,当医生开具药品时,推荐下一步的用药。c. The data management module stores and displays the processing results of the data processing module and the clinical medication behavior results analyzed by the data analysis module. When the doctor prescribes drugs, the next medication is recommended.
步骤b,数据分析模块根据数据处理模块的处理结果对患者的临床用药行为进行分析和预测,包括步骤如下:Step b: The data analysis module analyzes and predicts the patient's clinical medication behavior according to the processing result of the data processing module, including the following steps:
g、用修改后的正序列模式挖掘算法Prefixspan挖掘得到所有的正序列模式,即在某一段时间内,患者群体中使用最频繁的药品次序,在修改后的正序列模式挖掘算法Prefixspan中,对每一个频繁正序列都使用位图来存储包含它的数据序列ID号,表7显示了部分正序列模式和其位图;g. Use the modified positive sequence pattern mining algorithm Prefixspan to mine all positive sequence patterns, that is, the order of the most frequently used drugs in the patient population within a certain period of time. In the modified positive sequence pattern mining algorithm Prefixspan, right Every frequent positive sequence uses a bitmap to store the data sequence ID number that contains it. Table 7 shows some positive sequence patterns and their bitmaps;
表7Table 7
正序列模式Positive sequence mode 位图bitmap
<(二甲双胍)(格列吡嗪)><(Metformin)(Glipizide)> |1|1|0|0|0|0|1|0|1|1|……|0|0|1||1|1|0|0|0|0|1|0|1|1|……|0|0|1|
<(二甲双胍)(胰岛素)><(Metformin)(Insulin)> |1|1|1|0|0|0|1|0|0|0|……|1|0|0||1|1|1|0|0|0|1|0|0|0|……|1|0|0|
<(格列吡嗪)(氢氯噻嗪,胰岛素)><(Glipizide) (Hydrochlorothiazide, Insulin)> |1|0|0|0|0|1|0|0|0|0|……|0|0|0||1|0|0|0|0|1|0|0|0|0|……|0|0|0|
<(阿司匹林)(胰岛素)><(Aspirin)(Insulin)> |1|0|1|0|0|0|0|0|1|0|……|1|0|0||1|0|1|0|0|0|0|0|1|0|……|1|0|0|
h、采用了PNSP的负候选序列生成方法,生成负候选序列(Negative Sequential Candidates,NSC),该负候选序列用于判断在某一时间段内,哪些药物使用的次数多,哪些药物没有被使用。根据实验数据,生成如下负候选序列
Figure PCTCN2019102473-appb-000081
Figure PCTCN2019102473-appb-000082
Figure PCTCN2019102473-appb-000083
h. The negative candidate sequence generation method of PNSP is adopted to generate negative candidate sequence (Negative Sequential Candidates, NSC), which is used to determine which drugs are used more frequently and which drugs are not used in a certain period of time . According to the experimental data, generate the following negative candidate sequence
Figure PCTCN2019102473-appb-000081
Figure PCTCN2019102473-appb-000082
Figure PCTCN2019102473-appb-000083
i、使用位图操作,计算负候选序列的支持度;i. Use bitmap operations to calculate the support for negative candidate sequences;
j、从负候选序列中筛选出符合最小支持度要求的负序列模式,并用适当的筛选方法将能用于决策的负序列模式筛选出来,利用所述用于决策的序列模式对患者的用药行为进行分析;医生根据分析结果预测患者的下一步治疗方案,对基于药物方案变化的临床决策进行支持,表8显示了在最小支持度min_sup=30%下,挖掘出来的部分负序列模式。j. Screen negative sequence patterns that meet the minimum support requirements from negative candidate sequences, and use appropriate screening methods to screen out negative sequence patterns that can be used for decision-making, and use the sequence patterns for decision-making on the patient's medication behavior Analyze; the doctor predicts the patient's next treatment plan based on the analysis result, and supports the clinical decision based on the change of the drug plan. Table 8 shows the partial negative sequence patterns mined under the minimum support min_sup=30%.
表8Table 8
Figure PCTCN2019102473-appb-000084
Figure PCTCN2019102473-appb-000084
例如,两个负序列模式
Figure PCTCN2019102473-appb-000085
Figure PCTCN2019102473-appb-000086
Figure PCTCN2019102473-appb-000087
P 1和P 2表明,在治疗糖尿病时,医生经常选择这两个序列中的处方,通过这两个负序列模式可以发现每个处方中药物之间的潜在关系。P 1表示医生在没有使用乙酰己酰胺之后使用了二甲双胍并没有使用阿格列汀。P 2是指医生开了二甲双胍之后,不使用乙 酰己酰胺,然后使用罗格列酮而不是沙格列汀。因此,使用NSP挖掘方法可以有效地帮助医生准确预测患者的下一步用药。
For example, two negative sequence patterns
Figure PCTCN2019102473-appb-000085
with
Figure PCTCN2019102473-appb-000086
Figure PCTCN2019102473-appb-000087
P 1 and P 2 show that when treating diabetes, doctors often choose prescriptions in these two sequences, and the potential relationship between the drugs in each prescription can be discovered through these two negative sequence patterns. P 1 indicates that the doctor used metformin and not alogliptin after not using acetohexanamide. P 2 means that after the doctor prescribed metformin, he did not use acetohexanamide and then used rosiglitazone instead of saxagliptin. Therefore, using NSP mining methods can effectively help doctors accurately predict the patient's next medication.
按照步骤g所述方法,为了提高负序列模式挖掘的时间效率,使用PrefixSpan算法挖掘正序列模式,同时,利用位图策略进一步增强PrefixSpan算法,以提高空间效率。与使用位图结构的其他挖掘方法不同,修改后的PrefixSpan算法使用简单的位图结构和操作来获得顺序模式,包括步骤如下:According to the method described in step g, in order to improve the time efficiency of negative sequence pattern mining, the PrefixSpan algorithm is used to mine the positive sequence pattern. At the same time, the Bitmap strategy is used to further enhance the PrefixSpan algorithm to improve the space efficiency. Unlike other mining methods that use bitmap structures, the modified PrefixSpan algorithm uses simple bitmap structures and operations to obtain sequential patterns, including the following steps:
k、在每个数据序列ds上添加ID;k. Add ID to each data sequence ds;
l、扫描数据库(包含所有数据序列ds的集合)查找所有项,项指的是每种药品,为每个项创建位图,每个位图的长度等于数据库中的数据序列数,如果一个项出现在数据序列i中,则该项的位图在位置i设置为1;否则,则该项的位图在位置i设置为0,位图用B表示;例如,氯化钠溶液这项的位图为B(b)=|1|1|1|0|0|,则包含在第一、第二和第三个数据序列中。1. Scan the database (contains the collection of all data sequences ds) to find all items, the item refers to each medicine, create a bitmap for each item, the length of each bitmap is equal to the number of data sequences in the database, if one item If it appears in the data sequence i, the bitmap of the item is set to 1 at position i; otherwise, the bitmap of the item is set to 0 at position i, and the bitmap is represented by B; for example, the item of sodium chloride solution The bitmap is B(b)=|1|1|1|0|0|, which is included in the first, second and third data sequences.
m、根据每个项的位图,计算每个项的支持度,即位图中1的个数;判断项的支持度是否满足最小支持度min_sup,最小支持度min_sup指的是由用户设定的,频繁模式出现的最小频率;如果项的支持度大于或等于最小支持度min_sup,则该项是长度为1的PSP,将长度为1的PSP看作长度为1的前缀;否则,不是长度为1的PSP,删除此项;m. Calculate the support of each item according to the bitmap of each item, that is, the number of 1 in the bitmap; determine whether the support of the item meets the minimum support min_sup, which is set by the user , The minimum frequency of frequent patterns; if the item's support is greater than or equal to the minimum support min_sup, then the item is a PSP of length 1, and the PSP of length 1 is regarded as a prefix of length 1; otherwise, it is not a length of 1 1 PSP, delete this item;
n、对于每个长度为i满足支持度要求的前缀进行递归挖掘,i≥1,基于前缀的位图,找到包含此前缀的数据序列,同时将数据序列对应此前缀的投影存入投影数据库中;例如,前缀<a>的位图是B(<a>)=|1|1|1|1|0|,这意味着它包含于第一、第二、第三和第四个数据序列,前缀<a>的投影数据库中包含了第一、第二、第三和第四个数据序列相对于前缀<a>的投影和数据序列的ID;n. Perform recursive mining for each prefix of length i that meets the support requirements, i≥1, based on the bitmap of the prefix, find the data sequence containing the prefix, and store the projection of the data sequence corresponding to the prefix in the projection database ; For example, the bitmap of the prefix <a> is B(<a>)=|1|1|1|1|0|, which means it is contained in the first, second, third, and fourth data sequence , The projection database of the prefix <a> contains the projections of the first, second, third and fourth data series relative to the prefix <a> and the ID of the data series;
o、扫描投影数据库,找到所有项,并根据其对应的数据序列的ID创建位图,计算各个项的支持度,即位图中1的个数,若所有项的支持度都低于min_sup,则递归返回,否则,进入步骤p;o. Scan the projection database to find all items, create a bitmap according to the ID of the corresponding data sequence, calculate the support of each item, that is, the number of 1 in the bitmap, if the support of all items is lower than min_sup, then Return recursively, otherwise, go to step p;
p、将满足支持度计数的各个项和当前的前缀进行合并,并将两者的位图进行位运算,即对两个位图进行与运算,得到新前缀和它的位图,新前缀为长度为i的PSP,若PSP是一个1-size的PSP,直接存储它的支持度,否则,继续使用位图存储信息;p. Combine the items that meet the support count and the current prefix, and perform bit operations on the bitmaps of the two, that is, perform the AND operation on the two bitmaps to obtain the new prefix and its bitmap. The new prefix is For a PSP with length i, if the PSP is a 1-size PSP, store its support directly, otherwise, continue to use the bitmap to store information;
q、i加1,前缀为合并项后的各个新前缀,分别递归执行步骤o至q。q, i plus 1, the prefix is each new prefix after the merged item, and steps o to q are executed recursively.
按照步骤h所述方法,为了提高挖掘到NSP的数量,ENSP-IT放宽了频繁约束,同时采用了PNSP的负候选序列生成方法。包括步骤如下:According to the method described in step h, in order to increase the number of NSPs mined, ENSP-IT relaxes the frequent constraint and adopts the PNSP negative candidate sequence generation method. The steps are as follows:
r、由1-size的PSP生成1-size的NSC;如1-size的PSP<a>生成1-size的
Figure PCTCN2019102473-appb-000088
r. Generate 1-size NSC from 1-size PSP; such as 1-size PSP<a> generate 1-size
Figure PCTCN2019102473-appb-000088
s、定义约束条件为:不允许NSP中的连续负元素;2-size NSC是由1-size的PSP和1-size的NSP的排列生成的,例如
Figure PCTCN2019102473-appb-000089
如果ns的最后一个元素是一个正元素,则附加1-size的 PSP或1-size的NSP;否则,附加1-size的PSP;
s. The definition constraint is: continuous negative elements in NSP are not allowed; 2-size NSC is generated by the arrangement of 1-size PSP and 1-size NSP, for example
Figure PCTCN2019102473-appb-000089
If the last element of ns is a positive element, add 1-size PSP or 1-size NSP; otherwise, add 1-size PSP;
t、在(k-1)-size的候选序列(NSC或PSP)上附加1-size的PSP或1-size的NSP产生k-size的NSC;t. Add 1-size PSP or 1-size NSP to the (k-1)-size candidate sequence (NSC or PSP) to generate k-size NSC;
u、重复上述步骤l至步骤n直到没有生成NSC,或者NSC的元素个数大于2l+1,l表示PSP中最大序列的元素个数;如果PSP中最大序列的元素个数为m,则生成的NSP的最大元素个数为2m+1;u. Repeat the above steps 1 to n until no NSC is generated, or the number of NSC elements is greater than 2l+1, l represents the number of elements in the largest sequence in the PSP; if the number of elements in the largest sequence in the PSP is m, then generate The maximum number of elements of the NSP is 2m+1;
k-size的NSC在计算其支持度之前进行修剪,修剪方法为:The k-size NSC is trimmed before calculating its support. The trimming method is:
如果
Figure PCTCN2019102473-appb-000090
Figure PCTCN2019102473-appb-000091
则剪除负候选序列ns。
in case
Figure PCTCN2019102473-appb-000090
with
Figure PCTCN2019102473-appb-000091
Then cut out the negative candidate sequence ns.
k-size的NSC在计算其支持度之前进行修剪,修剪方法为:The k-size NSC is trimmed before calculating its support. The trimming method is:
如果
Figure PCTCN2019102473-appb-000092
Figure PCTCN2019102473-appb-000093
则剪除负候选序列ns。
in case
Figure PCTCN2019102473-appb-000092
with
Figure PCTCN2019102473-appb-000093
Then cut out the negative candidate sequence ns.
按照步骤i所述方法,计算负候选序列的支持度,是指:According to the method described in step i, calculating the support degree of the negative candidate sequence refers to:
大小为m并且含有n个负元素的序列ns,对于
Figure PCTCN2019102473-appb-000094
(只含有一个负元素的序列)∈1-negMSS ns(含有一个负元素的序列的集合),1≤i≤n,在数据库中,ns的支持度sup(ns)如式(Ⅰ)、式(Ⅱ)、式(Ⅲ)所示:
A sequence ns of size m and n negative elements, for
Figure PCTCN2019102473-appb-000094
(Sequence containing only one negative element) ∈1-negMSS ns (Set of sequences containing one negative element), 1≤i≤n, in the database, the support of ns sup(ns) is as formula (I), (Ⅱ), formula (Ⅲ) shows:
若ns的大小为1,并且ns只有1个负元素,则ns的支持度为:If the size of ns is 1, and ns has only 1 negative element, the support of ns is:
Figure PCTCN2019102473-appb-000095
Figure PCTCN2019102473-appb-000095
若ns只包含一个负项,则序列ns的支持度为:If ns contains only one negative term, the support degree of sequence ns is:
sup(ns)=sup(MPS(ns)-sup(p(ns)))    (Ⅱ)sup(ns)=sup(MPS(ns)-sup(p(ns))) (Ⅱ)
否则,ns的支持度为:Otherwise, the support of ns is:
Figure PCTCN2019102473-appb-000096
Figure PCTCN2019102473-appb-000096
式(Ⅰ)、式(Ⅱ)、式(Ⅲ)中,OR是指位操作中的与运算,即将p(1-negMS i)相对应的位图一一进行与运算,与运算是指多个位图进行合并产生一个新位图,若位图中相同位置上都是1的话,则新位图上对应位置为1,否则,都为0,N是指对位图中的1的个数。例如,一条负候选序列
Figure PCTCN2019102473-appb-000097
sup<ce>=5,相应的MPS(ns)=<ce>,p(1-negMS 1)=<ace>,p(1-negMS 2)=<cef>。假设B(<ace>)=|0|0|1|1|0|,B(<cef>)=|0|1|1|1|0|,
Figure PCTCN2019102473-appb-000098
Figure PCTCN2019102473-appb-000099
因此
Figure PCTCN2019102473-appb-000100
Figure PCTCN2019102473-appb-000101
Figure PCTCN2019102473-appb-000102
In formula (Ⅰ), formula (Ⅱ), and formula (Ⅲ), OR refers to the AND operation in the bit operation, that is, the bitmap corresponding to p(1-negMS i ) is ANDed one by one, and the AND operation means multiple The two bitmaps are merged to generate a new bitmap. If the same position in the bitmap is all 1, the corresponding position on the new bitmap is 1, otherwise, all are 0. N refers to the number of 1 in the bitmap. number. For example, a negative candidate sequence
Figure PCTCN2019102473-appb-000097
sup<ce>=5, corresponding MPS(ns)=<ce>, p(1-negMS 1 )=<ace>, p(1-negMS 2 )=<cef>. Suppose B(<ace>)=|0|0|1|1|0|, B(<cef>)=|0|1|1|1|0|,
Figure PCTCN2019102473-appb-000098
Figure PCTCN2019102473-appb-000099
therefore
Figure PCTCN2019102473-appb-000100
And
Figure PCTCN2019102473-appb-000101
Figure PCTCN2019102473-appb-000102
算法伪代码Algorithm pseudo code
输入:临床用药记录序列数据库(D);最小支持度(min_sup);Input: clinical medication record sequence database (D); minimum support (min_sup);
输出:用于分析临床用药行为的序列模式集合(NSP);Output: Sequence pattern set (NSP) used to analyze clinical medication behavior;
Figure PCTCN2019102473-appb-000103
Figure PCTCN2019102473-appb-000103
步骤(1)是用修改后的PrefixSpan算法从序列数据库中挖掘出所有的正序列模式,所有正候选序列的支持度都使用位图进行存储;Step (1) is to use the modified PrefixSpan algorithm to dig out all positive sequence patterns from the sequence database, and the support of all positive candidate sequences are stored using bitmaps;
步骤(2)-(19)是指用负候选序列生成方法生成负候选,其中步骤(10)和(16)表示对于满足剪枝条件的负候选序列进行剪枝;Steps (2)-(19) refer to generating negative candidates using a negative candidate sequence generation method, where steps (10) and (16) represent pruning the negative candidate sequences that meet the pruning conditions;
步骤(21)-(26)表示使用公式(Ⅰ)-(Ⅲ)计算负候选序列的支持度,其中步骤(21)-(24)是指计算只包含一个负元素的负候选的支持度,步骤(26)是指计算包含多个负元素的负候选的支持度;Steps (21)-(26) means using formulas (Ⅰ)-(Ⅲ) to calculate the support of negative candidate sequences, where steps (21)-(24) refer to calculating the support of negative candidates containing only one negative element. Step (26) refers to calculating the support degree of negative candidates containing multiple negative elements;
步骤(27)-(28)是指如果负候选的支持度大于最小支持度,那么这条负候选序列是一条负序列模式,添加到负序列模式集合中Steps (27)-(28) means that if the support of the negative candidate is greater than the minimum support, then this negative candidate sequence is a negative sequence pattern and is added to the set of negative sequence patterns
步骤(30)是指返回结果,再用适当的方法将能用于决策的序列模式筛选出来,利用这些筛选后的序列模式来分析临床用药行为。Step (30) refers to returning the results, and then using appropriate methods to screen out the sequence patterns that can be used for decision-making, and use these screened sequence patterns to analyze the clinical medication behavior.

Claims (10)

  1. 一种基于高效的负序列挖掘模式的临床用药行为分析系统,其特征在于,包括通过传输网络通信连接的数据采集系统和行为分析系统;A clinical medication behavior analysis system based on an efficient negative sequence mining model, which is characterized in that it includes a data acquisition system and a behavior analysis system that are communicatively connected through a transmission network;
    所述数据采集系统包括依次连接的数据采集模块、数据传输模块;所述数据采集模块,用于实时采集并保存患者的临床用药行为数据,临床用药行为数据包括患者的ID号、时间戳、开具的药品、病状、病症和患者所在科室;所述数据传输模块,用于通过传输网络将患者的临床用药行为数据传输至所述行为分析系统;The data collection system includes a data collection module and a data transmission module that are sequentially connected; the data collection module is used to collect and save the patient's clinical medication behavior data in real time. The clinical medication behavior data includes the patient's ID number, time stamp, and issuance The data transmission module is used to transmit the clinical medication behavior data of the patient to the behavior analysis system through the transmission network;
    所述行为分析系统包括依次连接的数据处理模块、数据分析模块、数据管理模块;所述数据处理模块,用于对采集的患者的临床用药行为数据进行数据清洗,并按照患者所在科室、病症进行数据分类;所述数据分析模块,用于根据所述数据处理模块的处理结果对患者的临床用药行为进行分析和预测;所述数据管理模块,用于对所述数据处理模块的处理结果及数据分析模块分析的临床用药行为结果进行存储和显示,当医生开具药品时,推荐下一步的用药。The behavior analysis system includes a data processing module, a data analysis module, and a data management module that are sequentially connected; the data processing module is used to perform data cleaning on the collected clinical medication behavior data of the patient, and perform data processing according to the department and disease of the patient Data classification; the data analysis module is used to analyze and predict the clinical medication behavior of patients according to the processing results of the data processing module; the data management module is used to analyze and predict the processing results and data of the data processing module The clinical medication behavior results analyzed by the analysis module are stored and displayed. When the doctor prescribes the medication, the next medication is recommended.
  2. 根据权利要求1所述的一种基于高效的负序列挖掘模式的临床用药行为分析系统,其特征在于,所述传输网络为有线公网、局域网或3G/4G网络。The clinical medication behavior analysis system based on an efficient negative sequence mining model according to claim 1, wherein the transmission network is a wired public network, a local area network, or a 3G/4G network.
  3. 权利要求1或2所述的一种基于高效的负序列挖掘模式的临床用药行为分析系统的工作方法,其特征在于,包括步骤如下:The working method of a clinical medication behavior analysis system based on an efficient negative sequence mining model according to claim 1 or 2, characterized in that it comprises the following steps:
    (1)所述数据采集模块实时采集并保存患者的临床用药行为数据,临床用药行为数据包括患者的ID号、时间戳、开具的药品、病状、病症和患者所在科室;(1) The data collection module collects and saves the patient's clinical medication behavior data in real time. The clinical medication behavior data includes the patient's ID number, timestamp, prescribed drugs, symptoms, symptoms, and the patient's department;
    设定负候选序列ns;Set negative candidate sequence ns;
    设定m-size是指负侯选序列ns中包含的m个元素;Setting m-size refers to the m elements contained in the negative candidate sequence ns;
    设定MPS(ns)是指负侯选序列ns的最大正子序列,由负侯选序列ns中包含的所有正元素按照原顺序组成;Set MPS(ns) to refer to the largest positive subsequence of the negative candidate sequence ns, which is composed of all the positive elements contained in the negative candidate sequence ns in the original order;
    设定正偶P(ns)是将一个由病人使用的药物组成的负侯选序列ns中的负元素全部转化为对应的正元素后的序列;Setting the positive pair P(ns) is the sequence after all the negative elements in a negative candidate sequence ns composed of the medicine used by the patient are converted into the corresponding positive elements;
    设定1-negMS ns是指负侯选序列ns的子序列,并且该子序列是由MPS(ns)以及一个负元素组成; Set 1-negMS ns to refer to the subsequence of the negative candidate sequence ns, and the subsequence is composed of MPS(ns) and a negative element;
    设定1-negMSS ns是指包含负侯选序列ns的所有负序列的子序列的集合; Setting 1-negMSS ns refers to the set of subsequences of all negative sequences including the negative candidate sequence ns;
    设定p(1-negMS ns)是指序列1-negMS ns中的正元素不变,将负元素转换为相应的正元素; Setting p(1-negMS ns ) means that the positive element in the sequence 1-negMS ns remains unchanged, and the negative element is converted to the corresponding positive element;
    设定ds是指数据库中的一个数据序列,ds包含一位病人在本次治疗过程中所使用的药物,药物按用药的先后次序排列;Setting ds refers to a data sequence in the database, ds contains the drugs used by a patient during this treatment, and the drugs are arranged in the order of medication;
    综上,对于一个数据序列ds和一个包含的所有元素的个数为m,并且含有n个负元素的序列ns,满足元素约束、格式约束及频繁约束,且满足条件:
    Figure PCTCN2019102473-appb-100001
    且每一个1-negMS ns满足
    Figure PCTCN2019102473-appb-100002
    则ds包含ns:
    In summary, for a data sequence ds and a sequence ns containing n negative elements with the number of all elements being m, satisfy the element constraint, format constraint, and frequent constraint, and meet the conditions:
    Figure PCTCN2019102473-appb-100001
    And each 1-negMS ns satisfies
    Figure PCTCN2019102473-appb-100002
    Then ds contains ns:
    元素约束是指:元素内部不允许有负项;只有序列中元素才可以变负;Element constraint means: no negative items are allowed inside the element; only elements in the sequence can become negative;
    格式约束是指:不存在连续2个或2个以上的负元素;The format constraint means: there are no consecutive 2 or more negative elements;
    频繁约束是指:负序列满足1-negMS ns∈1-negMSS ns且p(1-negMS ns)∈PSP,PSP是指的是正序列模式; Frequent constraint is: negative sequences satisfy 1-negMS ns ∈1-negMSS ns and p (1-negMS ns) ∈PSP , PSP refers to the positive sequence pattern;
    (2)所述数据传输模块通过传输网络将患者的临床用药行为数据传输至所述行为分析系统,所述行为分析系统利用eNSP-IT算法对临床用药行为数据进行分析,包括步骤如下:(2) The data transmission module transmits the patient's clinical medication behavior data to the behavior analysis system through the transmission network, and the behavior analysis system uses the eNSP-IT algorithm to analyze the clinical medication behavior data, including the following steps:
    a、所述数据处理模块对采集的患者的临床用药行为数据进行数据清洗,并按照患者所在科室、病症进行数据分类;a. The data processing module performs data cleaning on the collected clinical medication behavior data of the patient, and classifies the data according to the department and disease of the patient;
    b、所述数据分析模块根据所述数据处理模块的处理结果对患者的临床用药行为进行分析和预测;b. The data analysis module analyzes and predicts the clinical medication behavior of the patient according to the processing result of the data processing module;
    c、所述数据管理模块对所述数据处理模块的处理结果及数据分析模块分析的临床用药行为结果进行存储和显示,当医生开具药品时,推荐下一步的用药。c. The data management module stores and displays the processing results of the data processing module and the clinical medication behavior results analyzed by the data analysis module, and when the doctor prescribes drugs, the next medication is recommended.
  4. 根据权利要求3所述的一种基于高效的负序列挖掘模式的临床用药行为分析系统的工作方法,其特征在于,所述步骤a,所述数据处理模块对采集的患者的临床用药行为数据进行数据清洗,并按照患者所在科室、病症进行数据分类,包括步骤如下:The working method of a clinical medication behavior analysis system based on an efficient negative sequence mining model according to claim 3, wherein in step a, the data processing module performs data processing on the collected clinical medication behavior data of patients Data cleaning, and data classification according to the department and disease of the patient, including the following steps:
    d、对采集的患者的临床用药行为数据进行优化,对数据进行优化包括填充缺失数据、过滤掉异常数据;d. Optimize the collected clinical medication behavior data of patients, and optimize the data including filling in missing data and filtering out abnormal data;
    e、对优化后的患者的临床用药行为数据进行标准化处理,所述标准化处理是指对数据进行整合,即把患者的ID号相同的病人的每一周的用药记录整理成一条顺序序列,形成完整的患者的临床用药行为数据;e. Perform standardized processing on the optimized clinical medication behavior data of patients. The standardized processing refers to the integration of data, that is, the weekly medication records of patients with the same patient ID number are sorted into a sequence to form a complete sequence Clinical medication behavior data of patients;
    f、按照患者所在科室、病症这两种分类特征对患者的临床用药行为数据进行分类,并按照患者的ID号、时间戳、开具的药品、病状、病症和患者所在科室存储在所述数据管理模块中。f. According to the two classification characteristics of the patient's department and disease, the clinical drug behavior data of the patient is classified, and the data management is stored in the data management according to the patient's ID number, timestamp, drugs prescribed, symptoms, symptoms and the patient's department Module.
  5. 根据权利要求3所述的一种基于高效的负序列挖掘模式的临床用药行为分析系统的工作方法,其特征在于,所述步骤b,所述数据分析模块根据所述数据处理模块的处理结果对患者的临床用药行为进行分析和预测,包括步骤如下:The working method of a clinical drug use behavior analysis system based on an efficient negative sequence mining model according to claim 3, wherein, in step b, the data analysis module performs an analysis on the basis of the processing result of the data processing module The analysis and prediction of the patient's clinical medication behavior includes the following steps:
    g、用修改后的正序列模式挖掘算法Prefixspan挖掘得到所有的正序列模式,即在某一段时间内,患者群体中使用最频繁的药品次序,在修改后的正序列模式挖掘算法Prefixspan中,对每一个 频繁正序列都使用位图来存储包含它的数据序列ID号;g. Use the modified positive sequence pattern mining algorithm Prefixspan to mine all positive sequence patterns, that is, the order of the most frequently used drugs in the patient population within a certain period of time. In the modified positive sequence pattern mining algorithm Prefixspan, right Every frequent positive sequence uses a bitmap to store the data sequence ID number that contains it;
    h、采用了PNSP的负候选序列生成方法,生成负候选序列(Negative Sequential Candidates,NSC),该负候选序列用于判断在某一时间段内,哪些药物使用的次数多,哪些药物没有被使用;h. The negative candidate sequence generation method of PNSP is used to generate negative candidate sequence (Negative Sequential Candidates, NSC), which is used to determine which drugs are used more frequently and which drugs are not used in a certain period of time ;
    i、使用位图操作,计算负候选序列的支持度;i. Use bitmap operations to calculate the support for negative candidate sequences;
    j、从负候选序列中筛选出符合最小支持度要求的负序列模式,并用适当的筛选方法将能用于决策的负序列模式筛选出来,利用所述用于决策的序列模式对患者的用药行为进行分析;医生根据分析结果预测患者的下一步治疗方案,对基于药物方案变化的临床决策进行支持。j. Screen negative sequence patterns that meet the minimum support requirements from negative candidate sequences, and use appropriate screening methods to screen out negative sequence patterns that can be used for decision-making, and use the sequence patterns for decision-making on the patient's medication behavior Perform analysis; the doctor predicts the patient's next treatment plan based on the analysis result, and supports clinical decision-making based on changes in the drug plan.
  6. 根据权利要求5所述的一种基于高效的负序列挖掘模式的临床用药行为分析系统的工作方法,其特征在于,所述步骤g,包括步骤如下:The working method of a clinical medication behavior analysis system based on an efficient negative sequence mining model according to claim 5, wherein the step g includes the following steps:
    k、在每个数据序列ds上添加ID;k. Add ID to each data sequence ds;
    l、扫描数据库查找所有项,项指的是每种药品,为每个项创建位图,每个位图的长度等于数据库中的数据序列数,如果一个项出现在数据序列i中,则该项的位图在位置i设置为1;否则,则该项的位图在位置i设置为0,位图用B表示;1. Scan the database to find all items. The item refers to each drug. Create a bitmap for each item. The length of each bitmap is equal to the number of data sequences in the database. If an item appears in the data sequence i, then The bitmap of the item is set to 1 at position i; otherwise, the bitmap of the item is set to 0 at position i, and the bitmap is represented by B;
    m、根据每个项的位图,计算每个项的支持度,即位图中1的个数;判断项的支持度是否满足最小支持度min_sup,最小支持度min_sup指的是由用户设定的,频繁模式出现的最小频率;如果项的支持度大于或等于最小支持度min_sup,则该项是长度为1的PSP,将长度为1的PSP看作长度为1的前缀;否则,不是长度为1的PSP,删除此项;m. Calculate the support of each item according to the bitmap of each item, that is, the number of 1 in the bitmap; determine whether the support of the item meets the minimum support min_sup, which is set by the user , The minimum frequency of frequent patterns; if the item's support is greater than or equal to the minimum support min_sup, then the item is a PSP of length 1, and the PSP of length 1 is regarded as a prefix of length 1; otherwise, it is not a length of 1 1 PSP, delete this item;
    n、对于每个长度为i满足支持度要求的前缀进行递归挖掘,i≥1,基于前缀的位图,找到包含此前缀的数据序列,同时将数据序列对应此前缀的投影存入投影数据库中;n. Perform recursive mining for each prefix of length i that meets the support requirements, i≥1, based on the bitmap of the prefix, find the data sequence containing the prefix, and store the projection of the data sequence corresponding to the prefix in the projection database ;
    o、扫描投影数据库,找到所有项,并根据其对应的数据序列的ID创建位图,计算各个项的支持度,即位图中1的个数,若所有项的支持度都低于min_sup,则递归返回,否则,进入步骤p;o. Scan the projection database to find all items, create a bitmap according to the ID of the corresponding data sequence, calculate the support of each item, that is, the number of 1 in the bitmap, if the support of all items is lower than min_sup, then Return recursively, otherwise, go to step p;
    p、将满足支持度计数的各个项和当前的前缀进行合并,并将两者的位图进行位运算,即对两个位图进行与运算,得到新前缀和它的位图,新前缀为长度为i的PSP,若PSP是一个1-size的PSP,直接存储它的支持度,否则,继续使用位图存储信息;p. Combine the items that meet the support count and the current prefix, and perform bit operations on the bitmaps of the two, that is, perform the AND operation on the two bitmaps to obtain the new prefix and its bitmap. The new prefix is For a PSP with length i, if the PSP is a 1-size PSP, store its support directly, otherwise, continue to use the bitmap to store information;
    q、i加1,前缀为合并项后的各个新前缀,分别递归执行步骤o至q。q, i plus 1, the prefix is each new prefix after the merged item, and steps o to q are executed recursively.
  7. 根据权利要求5所述的一种基于高效的负序列挖掘模式的临床用药行为分析系统的工作方法,其特征在于,所述步骤h,包括步骤如下:The working method of a clinical medication behavior analysis system based on an efficient negative sequence mining model according to claim 5, wherein the step h includes the following steps:
    r、由1-size的PSP生成1-size的NSC;r. Generate 1-size NSC from 1-size PSP;
    s、定义约束条件为:不允许NSP中的连续负元素;2-size NSC是由1-size的PSP和1-size的NSP的排列生成的,如果ns的最后一个元素是一个正元素,则附加1-size的PSP或1-size的NSP; 否则,附加1-size的PSP;s. The definition constraint is: continuous negative elements in NSP are not allowed; 2-size NSC is generated by the arrangement of 1-size PSP and 1-size NSP, if the last element of ns is a positive element, then Attach 1-size PSP or 1-size NSP; otherwise, attach 1-size PSP;
    t、在(k-1)-size的候选序列上附加1-size的PSP或1-size的NSP产生k-size的NSC;t. Add 1-size PSP or 1-size NSP to the (k-1)-size candidate sequence to generate k-size NSC;
    u、重复上述步骤r至步骤t直到没有生成NSC,或者NSC的元素个数大于2l+1,l表示PSP中最大序列的元素个数;如果PSP中最大序列的元素个数为m,则生成的NSP的最大元素个数为2m+1。u. Repeat the above steps r to step t until no NSC is generated, or the number of NSC elements is greater than 2l+1, l represents the number of elements in the largest sequence in the PSP; if the number of elements in the largest sequence in the PSP is m, then generate The maximum number of elements of the NSP is 2m+1.
  8. 根据权利要求5所述的一种基于高效的负序列挖掘模式的临床用药行为分析系统的工作方法,其特征在于,k-size的NSC在计算其支持度之前进行修剪,修剪方法为:The working method of a clinical medication behavior analysis system based on an efficient negative sequence mining model according to claim 5, wherein the k-size NSC is pruned before calculating its support, and the pruning method is:
    如果
    Figure PCTCN2019102473-appb-100003
    Figure PCTCN2019102473-appb-100004
    则剪除负候选序列ns。
    in case
    Figure PCTCN2019102473-appb-100003
    with
    Figure PCTCN2019102473-appb-100004
    Then cut out the negative candidate sequence ns.
  9. 根据权利要求5所述的一种基于高效的负序列挖掘模式的临床用药行为分析系统的工作方法,其特征在于,k-size的NSC在计算其支持度之前进行修剪,修剪方法为:The working method of a clinical medication behavior analysis system based on an efficient negative sequence mining model according to claim 5, wherein the k-size NSC is pruned before calculating its support, and the pruning method is:
    如果
    Figure PCTCN2019102473-appb-100005
    Figure PCTCN2019102473-appb-100006
    则剪除负候选序列ns。
    in case
    Figure PCTCN2019102473-appb-100005
    with
    Figure PCTCN2019102473-appb-100006
    Then cut out the negative candidate sequence ns.
  10. 根据权利要求5-9任一所述的一种基于高效的负序列挖掘模式的临床用药行为分析系统的工作方法,其特征在于,所述步骤i,计算负候选序列的支持度,是指:The working method of a clinical medication behavior analysis system based on an efficient negative sequence mining model according to any one of claims 5-9, wherein the step i, calculating the support degree of the negative candidate sequence, refers to:
    大小为m并且含有n个负元素的序列ns,对于
    Figure PCTCN2019102473-appb-100007
    (只含有一个负元素的序列)∈1-negMSS ns(含有一个负元素的序列的集合),1≤i≤n,在数据库中,ns的支持度sup(ns)如式(Ⅰ)、式(Ⅱ)、式(Ⅲ)所示:
    A sequence ns of size m and n negative elements, for
    Figure PCTCN2019102473-appb-100007
    (Sequence containing only one negative element) ∈1-negMSS ns (Set of sequences containing one negative element), 1≤i≤n, in the database, the support of ns sup(ns) is as formula (I), (Ⅱ), formula (Ⅲ) shows:
    若ns的大小为1,并且ns只有1个负元素,则ns的支持度为:If the size of ns is 1, and ns has only 1 negative element, the support of ns is:
    Figure PCTCN2019102473-appb-100008
    Figure PCTCN2019102473-appb-100008
    若ns只包含一个负项,则序列ns的支持度为:If ns contains only one negative term, the support degree of sequence ns is:
    sup(ns)=sup(MPS(ns)-sup(p(ns)))  (Ⅱ)sup(ns)=sup(MPS(ns)-sup(p(ns))) (Ⅱ)
    否则,ns的支持度为:Otherwise, the support of ns is:
    Figure PCTCN2019102473-appb-100009
    Figure PCTCN2019102473-appb-100009
    式(Ⅰ)、式(Ⅱ)、式(Ⅲ)中,OR是指位操作中的与运算,即将p(1-negMS i)相对应的位图一一进行与运算,与运算是指多个位图进行合并产生一个新位图,若位图中相同位置上都是1的话,则新位图上对应位置为1,否则,都为0,N是指对位图中的1的个数。 In formula (Ⅰ), formula (Ⅱ), and formula (Ⅲ), OR refers to the AND operation in the bit operation, that is, the bitmap corresponding to p(1-negMS i ) is ANDed one by one, and the AND operation means multiple The two bitmaps are merged to generate a new bitmap. If the same position in the bitmap is all 1, the corresponding position on the new bitmap is 1, otherwise, all are 0. N refers to the number of 1 in the bitmap. number.
PCT/CN2019/102473 2019-06-27 2019-08-26 Clinical medication behavior analysis system based on highly effective negative sequential mining pattern, and working method therefor WO2020258483A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910565947.X 2019-06-27
CN201910565947.XA CN110277172A (en) 2019-06-27 2019-06-27 A kind of clinical application behavior analysis system and its working method based on efficient negative sequence mining mode

Publications (1)

Publication Number Publication Date
WO2020258483A1 true WO2020258483A1 (en) 2020-12-30

Family

ID=67963607

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/102473 WO2020258483A1 (en) 2019-06-27 2019-08-26 Clinical medication behavior analysis system based on highly effective negative sequential mining pattern, and working method therefor

Country Status (3)

Country Link
CN (1) CN110277172A (en)
LU (1) LU102313B1 (en)
WO (1) WO2020258483A1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110825786A (en) * 2019-11-06 2020-02-21 哈尔滨理工大学 Spark-based big data association rule mining method
CN111785370B (en) * 2020-07-01 2024-05-17 医渡云(北京)技术有限公司 Medical record data processing method and device, computer storage medium and electronic equipment
CN111883247B (en) * 2020-07-29 2022-03-15 复旦大学 Analysis system for correlation between behavior data and medical outcome
JP7498503B2 (en) * 2020-08-18 2024-06-12 斉魯工業大学 A product recommendation system based on practical high utility negative sequence rule mining and its working method
CN111949711B (en) * 2020-08-18 2021-06-01 齐鲁工业大学 Commodity recommendation system based on decision-making high-utility negative sequence rule mining and working method thereof

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104504159A (en) * 2015-01-19 2015-04-08 齐鲁工业大学 Application of multi-supporting-degree positive and negative sequence modes in clients' purchasing behavior analysis
CN104537553A (en) * 2015-01-19 2015-04-22 齐鲁工业大学 Application of repeated negative sequence pattern in customer purchase behavior analysis
CN104574153A (en) * 2015-01-19 2015-04-29 齐鲁工业大学 Method for quickly applying negative sequence mining patterns to customer purchasing behavior analysis
CN104732419A (en) * 2015-01-19 2015-06-24 齐鲁工业大学 Application of positive and negative sequence mode screening method in customer purchasing behavior analysis
CN105095653A (en) * 2015-07-13 2015-11-25 湖南互动传媒有限公司 Basic service system for medical large data application
CN107436997A (en) * 2017-07-03 2017-12-05 上海百纬健康科技有限公司 The analysis system and method for a kind of physiological data
CN108804419A (en) * 2018-05-22 2018-11-13 湖南大学 Medicine is sold accurate recommended technology under a kind of line of knowledge based collection of illustrative plates
CN109636688A (en) * 2018-12-11 2019-04-16 武汉文都创新教育研究院(有限合伙) A kind of students ' behavior analysis system based on big data

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106910132A (en) * 2017-01-11 2017-06-30 齐鲁工业大学 Top k can decision-making application of the negative sequence pattern in client insures behavioural analysis
CN109542944B (en) * 2018-09-29 2023-07-25 广东工业大学 Intelligent home user control behavior recommendation method based on time sequence causality analysis
CN109830303A (en) * 2019-02-01 2019-05-31 上海众恒信息产业股份有限公司 Clinical data mining analysis and aid decision-making method based on internet integration medical platform

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104504159A (en) * 2015-01-19 2015-04-08 齐鲁工业大学 Application of multi-supporting-degree positive and negative sequence modes in clients' purchasing behavior analysis
CN104537553A (en) * 2015-01-19 2015-04-22 齐鲁工业大学 Application of repeated negative sequence pattern in customer purchase behavior analysis
CN104574153A (en) * 2015-01-19 2015-04-29 齐鲁工业大学 Method for quickly applying negative sequence mining patterns to customer purchasing behavior analysis
CN104732419A (en) * 2015-01-19 2015-06-24 齐鲁工业大学 Application of positive and negative sequence mode screening method in customer purchasing behavior analysis
CN105095653A (en) * 2015-07-13 2015-11-25 湖南互动传媒有限公司 Basic service system for medical large data application
CN107436997A (en) * 2017-07-03 2017-12-05 上海百纬健康科技有限公司 The analysis system and method for a kind of physiological data
CN108804419A (en) * 2018-05-22 2018-11-13 湖南大学 Medicine is sold accurate recommended technology under a kind of line of knowledge based collection of illustrative plates
CN109636688A (en) * 2018-12-11 2019-04-16 武汉文都创新教育研究院(有限合伙) A kind of students ' behavior analysis system based on big data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LU-LIN ZHAO, DONG XIANG-JUN: "Comparisons of several sefinitions about negative containment in negative nequential pattern mining", JOURNAL OF SHANDONG INSTITUTE OF LIGHT INDUSTRY(NATURAL SCIENCE EDITION), vol. 25, no. 4, 1 November 2011 (2011-11-01), pages 41 - 43, XP055769474 *
PING QIU, DONG XIANG-JUN: "Research on Constraints of Positive and Negative Sequential Patterns", JOURNAL OF QILU UNIVERSITY OF TECHNOLOGY(NATURAL SCIENCE EDITION), vol. 30, no. 05, 1 October 2016 (2016-10-01), pages 39 - 45, XP055769471, ISSN: 1004-4280 *

Also Published As

Publication number Publication date
CN110277172A (en) 2019-09-24
LU102313A1 (en) 2021-01-18
LU102313B1 (en) 2021-04-27

Similar Documents

Publication Publication Date Title
WO2020258483A1 (en) Clinical medication behavior analysis system based on highly effective negative sequential mining pattern, and working method therefor
Eswari et al. Predictive methodology for diabetic data analysis in big data
Lee et al. Data mining techniques applied to medical information
CN113707297A (en) Medical data processing method, device, equipment and storage medium
Lin et al. Temporal event tracing on big healthcare data analytics
Chanda et al. An efficient approach to mine flexible periodic patterns in time series databases
CN108573758A (en) A kind of intelligent medical big data service system and application process
CN111243748A (en) Needle pushing health data standardization system
CN113161001B (en) Improved LDA-based process path mining method
Leung et al. Smart data analytics on COVID-19 data
Khan et al. Development of national health data warehouse for data mining.
JP6928332B2 (en) Knowledge management system
Tsechansky et al. Mining relational patterns from multiple relational tables
Pokharel et al. Representing EHRs with temporal tree and sequential pattern mining for similarity computing
Lu et al. Emerging technologies for health data analytics research: a conceptual architecture
Kejriwal Populating entity name systems for big data integration
Tsai et al. Mining decision rules on data streams in the presence of concept drifts
Sampath et al. Diabetic data analysis in healthcare using Hadoop architecture over big data
CN115295165A (en) Knowledge graph system for medical science and decision-making auxiliary method thereof
Kumar et al. Analysis of Business Intelligence in Healthcare Using Machine Learning
Chaturvedi et al. Fuzzy c-means based inference mechanism for association rule mining: a clinical data mining approach
Jiyun et al. Data cleaning of medical data for knowledge mining
Kumar et al. An Efficient Algorithm for Mining Frequent Itemsets in Large Databases
Budrionis et al. Establishing Baseline in the Status of E-health Research in Norway
Lee et al. FuzzyGap: sequential pattern mining for predicting chronic heart failure in clinical pathways

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19935017

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19935017

Country of ref document: EP

Kind code of ref document: A1