CN113707281A

CN113707281A - Dynamic disease group grouping realization method based on multi-dimensional disease characteristics

Info

Publication number: CN113707281A
Application number: CN202010442799.5A
Authority: CN
Inventors: 张伟; 孙麟; 冯海欢; 李春漾; 辜永红; 李天俊
Original assignee: West China Hospital of Sichuan University
Current assignee: West China Hospital of Sichuan University
Priority date: 2020-05-22
Filing date: 2020-05-22
Publication date: 2021-11-26

Abstract

The invention discloses a dynamic disease component grouping realization method based on multi-dimensional disease characteristics, which comprises the following steps: step 1: determining the disease category of each case in the cases to be classified, and acquiring one or more disease category sets; step 2: establishing a classification model, and classifying the cases in the disease category set according to treatment cost by adopting the classification model; the step 1 comprises the following steps: step 101: acquiring the disease name and the operation mode of the case to be classified; step 102: determining a diagnosis and treatment mode according to the operation mode; step 103: and determining the disease species of the case to be classified according to the disease name and the diagnosis and treatment mode. The invention can classify according to the clinical symptoms of patients and treatment cost by a classification model and a CART algorithm so as to realize reasonable medical insurance control cost and improve the utilization efficiency of medical resources of hospitals.

Description

Dynamic disease group grouping realization method based on multi-dimensional disease characteristics

Technical Field

The invention relates to the technical field of computer information management, in particular to a dynamic grouping and grouping implementation method based on multi-dimensional disease characteristics.

Background

The disease-related grouping is a grouping system for grouping cases into a plurality of groups according to the factors of age, sex, hospitalization days, clinical diagnosis, disease symptoms, operations, disease severity, complications, outcome and the like of patients. Cases grouped together have similar clinical symptoms and convergent resource expenditure. The grouping of disease diagnosis related is used as a tool for medical payment, and is one of the more advanced payment methods recognized in the world today. The medical resource utilization standardization is achieved through the establishment of the unified disease diagnosis classification quota payment standard. The method is beneficial to exciting the hospital to strengthen medical quality management, forcing the hospital to actively reduce cost, shorten the number of hospitalization days and reduce inductive medical expense payment for obtaining profits, and is beneficial to expense control.

The payment and calculation of the medical insurance cost are important links of a medical insurance system and are related to the benefits of all the medical insurance parties. At present, the medical expense is increased quickly, and medical insurance control cost is needed to prevent medical insurance from entering or not being laid out. The traditional medical insurance payment methods include methods of paying according to items and paying according to head. The existing pay-per-item is lack of a constraint mechanism, and excessive medical behaviors such as repeated inspection, large prescription and the like occur, so that the waste of medical resources is caused, and the medical expense is continuously and rapidly increased. Pay-per-head is a fixed amount that a medical insurance agency charges monthly or annually for the number of people and rules for hospital or doctor services, and prepays to the service provider a fixed amount. The head payment induces the hospital to selectively receive patients, such as patients with mild receiving symptoms and relatively short hospitalization time, and patients with perverting severe diseases; decomposing the hospitalization times of the patient to obtain more heads; hospitals lack competitive awareness, medical staff are not motivated to improve medical skills, and even a decline in medical quality may occur.

According to known case basic data including personal information, disease information, treatment information, hospitalization time, cost and the like of a case, various factors of a hospital are comprehensively considered by using a related algorithm, and the grouping of disease diagnosis of the case is realized, so that a foundation is laid for later application such as medical insurance payment pricing and the like, and the technical problem to be solved at present is solved.

Disclosure of Invention

The invention aims to provide a method for classifying cases based on a classification model and a CART algorithm.

In order to achieve the purpose, the invention is realized by adopting the following technical scheme:

a dynamic disease component grouping realization method based on multi-dimensional disease characteristics comprises the following steps:

step 1: determining the disease category of each case in the cases to be classified, and acquiring one or more disease category sets;

step 2: and establishing a classification model, and classifying the cases in the disease category set according to treatment cost by adopting the classification model.

Preferably, the step 1 comprises the following steps:

step 101: acquiring the disease name and the operation mode of the case to be classified;

step 102: determining a diagnosis and treatment mode according to the operation mode;

step 103: and determining the disease species of the case to be classified according to the disease name and the diagnosis and treatment mode.

Further, in step 102, a diagnosis and treatment method corresponding to the operation method is queried through a pre-established diagnosis and treatment code library.

Further, in step 103, the disease name and the disease type corresponding to the diagnosis and treatment manner are queried through a pre-established disease type code library.

Preferably, in step 2, the establishing of the classification model comprises the following steps:

step 201: all factors in a pre-established influencing factor library are used as independent variables, and the treatment cost of the cases in the disease category set is used as a dependent variable;

step 202: selecting an optimal segmentation point from all factors in the influence factor library by adopting a CART algorithm as a splitting node, and repeating the selection process of the optimal segmentation point under each splitting node until a preset stop condition of the CART algorithm is reached to generate a classification regression tree;

step 203: and pruning the classification regression tree to generate the classification model.

Further, in step 202, the selection process of the optimal segmentation point includes the following steps:

step 202 a: for each factor in the influence factor library, calculating a damping coefficient before segmentation and a damping coefficient after segmentation;

step 202 b: calculating the difference value between the damping coefficient before segmentation and the damping coefficient after segmentation;

step 202 c: and selecting the factor corresponding to the maximum value of the difference value as an optimal dividing point.

Further, in step 102, the diagnosis and treatment code library is built by:

step 102 a: classifying and coding the existing diagnosis and treatment modes;

step 102 b: and classifying the operation and operation code ICD-9-CM-3 and the name into the corresponding diagnosis and treatment mode, and forming the diagnosis and treatment code library by the operation and operation code ICD-9-CM-3, the name and the attributive diagnosis and treatment mode.

Further, in step 103, the disease seed code library is created by the following steps:

step 103 a: encoding ICD-10 according to international disease classification established by WHO, and intercepting the first three bits of the encoding as a preset disease major class;

step 103 b: subdividing the disease category, namely intercepting the first four digits of the ICD-10 code of the international disease classification code as a preset disease subclass;

step 103 c: and (3) corresponding the disease subclasses with the diagnosis and treatment modes to form a disease species coding library.

Further, the factors in the influencing factor library include the patient's age, sex, smoking history, physical fitness, rescue, complication status, and ventilator assistance.

Further, in step 202, the calculation formula of the kini coefficient gini (d) before slicing is:

gini coefficient after slicing_A(D) The calculation formula is as follows:

wherein D is a clinical case and resource consumption dataset, m is the number of classifications made by classifying the clinical case and resource consumption dataset, p_iThe probability of any one of the classifications is selected for each case.

The invention has the following beneficial effects:

the invention classifies cases with approximately same clinical symptoms and treatment cost into one class, has higher classification speed and higher precision compared with the existing manual classification mode, and can realize scientific and reasonable medical insurance control cost by calculating the medical insurance cost according to the classification.

Drawings

FIG. 1 is a schematic representation of the classification of chronic obstructive pulmonary disease using the method of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings.

step 1: determining the disease category of each case in the cases to be classified, and acquiring one or more disease category sets.

Specifically, the step 1 includes the steps of:

In step 102, the diagnosis and treatment method corresponding to the operation method is queried through a pre-established diagnosis and treatment code library. In step 103, the disease name and the disease type corresponding to the diagnosis and treatment method are searched through a pre-established disease type code library.

Specifically, in step 102, the diagnosis and treatment code library is built by the following steps:

step 102 a: classifying and coding the existing diagnosis and treatment modes;

In step 103, the disease seed code library is created by the following steps:

The disease species are the disease species with more definite diagnosis and treatment scheme and hospital entrance and exit standard, more mature diagnosis and treatment technology, stable clinical path and little difference of comprehensive service cost. In specific implementation, the diagnosis and treatment coding library is established by adopting the following method:

firstly, the commonly used diagnosis and treatment methods are classified and coded, and the diagnosis and treatment methods mainly comprise 20 diagnosis and treatment methods, which are shown in the following table:

code	Diagnosis and treatment method	Code	Diagnosis and treatment method
				0	Conservative treatment	10	Conformal radiotherapy
1	Traditional surgery	11	Conformal intensity modulated radiation therapy
				2	Minimally invasive surgery	12	Ascites reinfusion
3	Interventional therapy	13	Repair surgery
				4	Artificial liver therapy	14	Extracorporeal lithotripsy
5	Dialysis treatment	15	Volume intensity modulation
				6	Stem cell transplantation	16	Blood purification
7	Mechanical ventilation	17	Mechanical continuous cooling
				8	Ultrasonic emulsification	18	Kidney transplantation
9	Cutting vitreous body	19	Coronary angiography

Then, the common operation and manipulation codes ICD-9-CM-3 and names are classified into corresponding diagnosis and treatment modes, so that the common operation and manipulation codes ICD-9-CM-3 and names and diagnosis and treatment modes belonging to the common operation and manipulation codes form a diagnosis and treatment code library, and the total number of the common operation and manipulation codes ICD-9-CM-3 is 5398, wherein the following table shows 5 examples:

in the specific implementation process, the disease species coding library is established in the following way: firstly, according to international disease classification code ICD-10 established by WHO, the first three bits of the code are intercepted as the preset disease major classes, and the total number of the disease major classes is 20, including: some infectious and parasitic diseases A00-B99, tumors C00-D48, blood and hematopoietic organ diseases and some diseases involving immune mechanisms D50-D89, endocrine, nutritional and metabolic diseases E00-E90, and the like; subdividing the disease species major class, and then cutting the first four digits encoded by ICD-10 as preset disease subclasses, wherein the total number of 187 disease subclasses comprises: typhoid A01.0, unspecified paratyphoid A01.4, Salmonella enteritidis A02.0, unspecified bacillary dysentery A03.9, other intestinal colibacillus infection A04.4, etc. Then, the 187 disease subclasses described above and the main treatment methods already identified in this example were matched to form a "disease species coding library" of 4654 disease species in total. For example, if the disease subclass is "bacterial intestinal infection as other specific indications of A04.8" and the diagnosis and treatment mode is "0 conservative treatment", the case is determined to be the disease species "BZ 1", and the following table exemplifies 5 disease species in the disease species coding library:

when the case is under the same ICD-10 sub-vision and several different diagnosis and treatment modes exist at the same time, the priority is judged by priority numerical values, and the higher the numerical value is, the higher the priority is judged. For example, in a case with the sub-order ICD-10 of "a 16.2", if there are three diagnosis and treatment modes, namely "conservative treatment", "minimally invasive surgery" and "mechanical ventilation", the case is judged to belong to the "BZ 5" with the highest priority by priority. The "priority" is an evaluation made by a clinical specialist according to the resource consumption, technical difficulty, and risk level of diagnosis and treatment modes of various diseases.

After the diagnosis and treatment code library and the disease species code library are established, the operation of a case and the operation code ICD-9-CM-3 can be matched with a pre-established diagnosis and treatment code library, and the diagnosis and treatment mode corresponding to the operation and the operation code can be obtained; then, according to the diagnosis and treatment mode of the case, the pre-constructed disease category coding library is matched, and the specific disease category corresponding to the case can be obtained.

Specifically, in step 2, establishing a classification model includes the following steps:

In a specific implementation, the establishing of the classification model includes: all factors in a pre-established influence factor library are used as independent variables; the factors in the influencing factor library include: the patient's age, sex, smoking history, physical fitness, rescue, complication status, ventilator assistance; using as a dependent variable a treatment cost of the cases in the race set; selecting an optimal segmentation point from all factors in the influence factor library as a splitting node by adopting a CART algorithm; repeating the selection process of the optimal segmentation point under each splitting node until a preset stop condition of the CART algorithm is reached, and generating a classification regression tree; and pruning the classification regression tree to generate the classification model. Factors in the influencing factor library include the patient's age, sex, smoking history, fitness, whether to rescue, complication status, and whether to ventilator aid.

Specifically, in step 202, the selection process of the optimal segmentation point includes the following steps:

gini coefficient after slicing_A(D) The calculation formula is as follows:

wherein D is a clinical case and resource consumption dataset, m is the number of classifications made by classifying the clinical case and resource consumption dataset, p_iThe probability of any one of the classifications is selected for each case. If the kini coefficients of the continuous values are calculated, the middle points of the sorted adjacent values are required to be used as splitting points, and the calculation is also carried out by using the above formula.

The following parameters are also determined when the classification model is established:

1. complexity of the decision tree: control decision tree growth (i.e., control number of classes);

2. the minimum number of cases each category should contain, as a condition for stopping the algorithm.

In addition, when the CART algorithm selects the complications as segmentation characteristic variables, determining the complications with the segmented node cost average value higher than a preset threshold value as important complications, and determining the complications with the segmented node cost average value lower than the preset threshold value as general complications.

The following describes the classification process by taking the classification of cases of chronic obstructive pulmonary disease as an example. Firstly, the disease category of a case is determined to be chronic obstructive pulmonary disease through a pre-constructed diagnosis and treatment coding library and a disease category coding library. Aiming at chronic obstructive pulmonary disease, case treatment cost is set as a dependent variable, independent variables are all multi-dimensional characteristic information such as age, sex, smoking history, physical quality, rescue, complication and ventilator assistance, a CART algorithm is used for selecting proper influencing factors, and the most effective grouping factors and grouping conditions are quickly and adaptively searched by the CART algorithm. When a classification model is established, the complexity of a decision tree is set to be 0.01 so as to control the growth of the decision tree (control the number of classifications); the minimum number of cases that each category should contain is set to 50 as a condition for stopping the algorithm.

According to the CART algorithm, firstly, complications are selected as segmentation variables and divided into non-complications and complications, cases with the complications are further subdivided, the rules are that the complications with the segmented node cost average value higher than a preset threshold value are determined as important complications, and the complications with the segmented node cost average value lower than the preset threshold value are determined as general complications; for the uncomplicated cases, the breathing machine auxiliary time is further divided into non-breathing machine support and breathing machine support, the breathing machine support is further divided into breathing machine support accompanied by more than 96 hours and breathing machine support accompanied by less than 96 hours; ventilator support with 96 hours or less is further subdivided by age, into 66 years and up to 65 years; for patients over 66 years old, the number of hospitalization days in one year is further subdivided into no hospitalization record in the last year and a hospitalization record in the last year, and the hospitalization record in the last year is further subdivided into the number of hospitalization days in the last year of 1-30 days and the number of hospitalization days in the last year of 30 days; for patients who have no hospitalization record in the last year, a final classification model is formed according to whether the patients have a smoking history or not.

As shown in fig. 1, the case data set is divided into 9 groups according to the CART algorithm, which are: 1) chronic obstructive pulmonary disease with major complications (average cost 13555.9 yuan); 2) chronic obstructive pulmonary disease with general complications (average cost 10039.7 yuan); 3) chronic obstructive pulmonary disease is uncomplicated and without ventilator support (average cost 6110.3 yuan); 4) chronic obstructive pulmonary disease is without complications and with ventilator support for more than 96 hours (average cost 11636.3 yuan); 5) chronic obstructive pulmonary disease without complications and with ventilator support for 96 hours or less, less than or equal to 65 years old (average cost 8684.9 yuan); 6) chronic obstructive pulmonary disease without complications and with ventilator support for 96 hours or less, age 66 or more, last year hospital stay 30 days or more (average cost 12436.3 yuan); 7) chronic obstructive pulmonary disease has no complications and is supported by a ventilator for 96 hours or less, the patient is over 66 years old, and the patient has a hospitalization day for the last year for 1-30 days (the average cost is 11636.3 yuan); 8) chronic obstructive pulmonary disease has no complications and ventilator support for 96 hours or less, is over 66 years old, has no hospitalization record in the last year, and has a history of smoking (average cost is 9345.7 yuan); 9) chronic obstructive pulmonary disease is without complications and with ventilator support for 96 hours or less, over the age of 66 years, no hospitalization for the last year, no history of smoking (average cost 8543.7 yuan). The average value of the group costs can be used as a reference value for predicting medical service costs.

The present invention is capable of other embodiments, and various changes and modifications may be made by one skilled in the art without departing from the spirit and scope of the invention.

Claims

1. A dynamic disease component grouping realization method based on multi-dimensional disease characteristics is characterized by comprising the following steps:

2. The method for realizing dynamic disease component grouping based on multi-dimensional disease features as claimed in claim 1, wherein the step 1 comprises the following steps:

3. The method for realizing dynamic disease grouping based on multi-dimensional disease features as claimed in claim 2, wherein: in step 102, the diagnosis and treatment method corresponding to the operation method is queried through a pre-established diagnosis and treatment code library.

4. The method for realizing dynamic disease grouping based on multi-dimensional disease features as claimed in claim 2, wherein: in step 103, the disease name and the disease type corresponding to the diagnosis and treatment method are searched through a pre-established disease type code library.

5. The method for realizing dynamic disease grouping based on multi-dimensional disease features according to claim 1, wherein: in step 2, establishing a classification model comprises the following steps:

6. The method for realizing dynamic disease grouping based on multi-dimensional disease features as claimed in claim 5, wherein: in step 202, the selection process of the optimal cut point includes the following steps:

7. The method for realizing dynamic disease grouping based on multi-dimensional disease features according to claim 3, wherein: in step 102, the diagnosis and treatment code library is built by the following steps:

step 102 a: classifying and coding the existing diagnosis and treatment modes;

8. The method for realizing dynamic disease grouping based on multi-dimensional disease features as claimed in claim 4, wherein: in step 103, the disease seed code library is created by the following steps:

9. The method for realizing dynamic disease component grouping based on multi-dimensional disease features according to claim 5 or 6, wherein: factors in the influencing factor library include the patient's age, sex, smoking history, fitness, whether to rescue, complication status, and whether to ventilator aid.

10. The method for realizing dynamic disease grouping based on multi-dimensional disease features as claimed in claim 6, wherein: in step 202, the calculation formula of the kini coefficient gini (d) before the slicing is:

gini coefficient after slicing_A(D) The calculation formula is as follows: