CN113658681A

CN113658681A - A decision tree-based method for evaluating the effect of drug addicts

Info

Publication number: CN113658681A
Application number: CN202110864809.9A
Authority: CN
Inventors: 陆宇升; 李家深; 朱晓东; 许金礼; 陶炜; 廖淑珍
Original assignee: Guangxi Youdi Information Technology Co ltd
Current assignee: Guangxi Youdi Information Technology Co ltd
Priority date: 2021-07-29
Filing date: 2021-07-29
Publication date: 2021-11-16

Abstract

The invention provides a method for evaluating the effect of drug addiction treatment based on a decision tree, which belongs to the technical field of machine learning. The method includes the following technical steps: S1: objective function selection; S2: feature selection; S3: training process; S4: evaluation process. The drug addiction treatment effect evaluation model for drug addicts obtained by the method of the invention can carry out regular evaluation of the drug addicts, the evaluation only needs to extract data from the information system database, the evaluation process is simple and the cost is low, no additional human subjective judgment is required, and the accuracy is accurate. High rate, easy to understand and grasp the output indicators, high flexibility, can adapt to the huge differences brought about by different systems and different technical equipment in different regions, strong adaptability, when the system changes and technological progress leads to great changes in data, it can Quickly adapt to changes by retraining the model.

Description

Decision tree-based drug abstinence personnel abstinence effect evaluation method

Technical Field

The invention belongs to the technical field of machine learning, and particularly relates to a decision tree-based method for evaluating the abstinence effect of drug addicts.

Background

Although many methods for evaluating the abstinence effect of forced drug addicts are proposed at present, the problems of high operation difficulty and low reliability evaluation effectiveness generally exist in the actual operation process. In addition, the design of the existing evaluation method is based on experience, parameters cannot be changed rapidly and flexibly, and the method is difficult to adapt to environmental changes caused by new technology development and information system and related system changes.

The existing drug rehabilitation information system has a large amount of data directly related to the treatment effect, such as scoring and checking data, examination results, medical examination results, rehabilitation training data and the like; however, these data lack unified standards, the difference between various regions is huge, and each institutional change and technological progress can cause the data to change greatly, and it is difficult and not intuitive to evaluate the withdrawal effect directly from these data by means of manual analysis, and the accuracy of the evaluation result depends heavily on the experience of the evaluator.

Disclosure of Invention

Aiming at the problems, the invention provides a decision tree-based withdrawal effect evaluation method for drug addicts, which can be used for carrying out regular withdrawal effect evaluation on the drug addicts by an obtained withdrawal effect evaluation model of the drug addicts through target function selection, characteristic selection, a training process and an evaluation process, wherein the evaluation only needs to extract data from an information system database, the evaluation process is simple and low in cost, no additional artificial subjective judgment is needed, the accuracy is high, the output index is easy to understand and grasp, the flexibility is high, the method can adapt to the huge differences brought by different systems and different technical equipment in various regions, the adaptability is strong, and when the system changes and the technical progress causes the huge changes of the data, the method can adapt to the changes quickly in a mode of retraining the model.

The invention is realized by the following technical scheme:

a decision tree-based method for evaluating the withdrawal effect of drug addicts comprises the following steps:

s1: selecting an objective function: selecting one dimension YD from the multi-dimensional data of the drug-dropping personnel as a target function;

s2: selecting characteristics: selecting a set of features FD from the multi-dimensional data of the drug addict;

s3: training process: establishing a training data set TrainSet according to the objective function YD and the characteristic FD, training a decision tree regression model DTM, calculating parameters LNSTD and LNMEAN of each leaf node in the model DTM, saving the decision tree regression model DTM, the overall mean GMEAN, the overall standard deviation GSTD, the sample standard deviation LNSTD and the sample mean LNMEAN, and finishing the training process;

s4: an evaluation process; loading a decision tree regression model DTM, an overall mean value GMEAN, an overall standard deviation GSTD, a sample standard deviation LNSTD and a sample mean value LNMEAN which are saved in the training process, predicting a target function YD value of a person to be evaluated according to the model DTM by using a decision tree regression algorithm, obtaining a hit leaf node of the decision tree regression model DTM to calculate LSS, calculating GSS according to the target function YD value of the person to be evaluated, the overall mean value GMEAN and the overall standard deviation GSTD, and outputting the LSS and the GSS as an evaluation result.

Further, in step S3, the trained decision tree regression model DTM is obtained by extracting samples whose month is equal to mi from the data set TrainSet, and putting the samples into the subset ModelTrainSet for training the decision tree regression model DTM, that is, the data in month mi is extracted to train the decision tree regression model DTM, and mi is equal to 12 or the median of month.

Further, in step S3, in the process of training the decision tree regression model DTM by the subset ModelTrainSet, the minimum sample number of the leaf nodes > MNS is controlled, where 10 ≦ MNS < the total number of the ModelTrainSet samples of the subset or the total number of the leaf nodes.

Further, in step S3, the parameters LNSTD and LNMEAN of each leaf node in the decision tree regression model DTM are to put all leaf nodes in the obtained decision tree regression model DTM into a unified leaf node array lnodes, the number of leaf nodes is lnsize, which is equal to the length of lnodes, the standard deviation LNSTD array and the mean LNMEAN array of all dataset transet samples hitting the leaf nodes are calculated, both the sample standard deviation LNSTD and the sample mean LNMEAN are two-dimensional arrays, the first dimension represents month and the length is 36, the second dimension represents nodes, the length is lnsize, the value of lnd [ m ] [ i ] is the standard deviation of label of the mth month sample hitting the ith leaf node, and the value of LNMEAN [ m ] [ i ] is the average of label of the mth month sample hitting the ith leaf node.

Further, in step S3, the specific calculation method of the sample standard deviation LNSTD and the sample mean LNMEAN is as follows:

s301: establishing a set array TSS, wherein the set array TSS is a two-dimensional array, the first dimension represents a month, the length is 36, the second dimension represents a node, the length is the number lnsize of leaf nodes, and all elements of the set array TSS are initialized to be an empty set;

s302: enumerating each sample x in a data set TrainSet set, predicting a predicted value py of x.features by using a decision tree regression algorithm, ignoring the predicted value py, taking a subscript lni of a leaf node of a decision tree hit in the prediction process in a leaf node array lnodes, and adding the sample x into a subset TSS [ x.month ] [ lni ];

s303: enumerating each element TSS [ m ] [ j ] of the set array TSS, wherein TSS [ m ] [ j ] is a subset of a sample, calculating the mean value and standard deviation of elements label of the subset, and storing the mean value and standard deviation into LNSTD [ m ] [ i ] and LNMEAN [ m ] [ i ];

s304: establishing a one-dimensional set array GTSS with the length of 36, initializing all elements into an empty set, listing each sample x in a data set TrainSet set, and adding x into a subset GTSS [ x.month ];

s305: enumerating each element GTSS [ m ] of the one-dimensional set array GTSS, wherein GTSS [ m ] is a subset of a sample, calculating the mean value and standard deviation of label of all samples of the subset, and storing the mean value and standard deviation into arrays GMEAN [ m ] and GSTD [ m ], wherein GMEAN and GSTD are one-dimensional arrays and represent the mean value and standard deviation of the whole, and the subscript m represents a month.

Further, in step S3, the data set TrainSet is a sample set, each sample corresponds to data of one person in the multi-dimensional drug rehabilitation data, and each sample has three columns: according to the method, the values of an objective function YD are used as label, data of selected characteristics FD are extracted from multi-dimensional drug-dropping data to construct characteristic vectors featuress, and drug-dropping time is extracted from the multi-dimensional drug-dropping data to serve as mouth, and months are taken as units.

Further, in step S4, the specific evaluation process is:

s401: a decision tree regression model DTM, an overall mean value GMEAN, an overall standard deviation GSTD, a sample standard deviation LNSTD and a sample mean value LNMEAN obtained from a storage medium loading training process;

s402: extracting feature vectors featurees of the evaluated person by using the same method as a data set TrainSet sample featureseries, predicting an objective function YD attribute value of the featurees by using a decision tree regression algorithm according to a decision tree regression model DTM, neglecting a predicted value, obtaining subscripts lni of leaf nodes of the featurees hitting the decision tree regression model DTM, calculating the drug-dropping time month of the evaluated person, and calculating a parameter LSS (YD-LNMEAN [ m ] [ lni ])/LNSTD [ m ] [ lni ];

s403: calculating GSS ═ (YD-GMEAN [ m ])/GSTD [ m ];

s404: outputting the evaluation results LSS and GSS and the time-varying trends of the LSS and GSS indexes as visual explanations of the withdrawal effect index YD of the evaluated person;

GSS & gt 0 represents that the withdrawal effect of the evaluated person is better than the overall average level, and GSS & lt 0 represents that the withdrawal effect of the evaluated person is worse than the overall average level;

when the LSS is more than 0, the current withdrawal effect of the evaluated person is superior to the average value of similar drug-relief persons,

-1 < LSS < 1, indicating that the deviation of the mean values of the person to be evaluated and the person with similar drug addiction falls within 1 standard deviation, and labeling the withdrawal effect as "normal",

LSS < -1 > indicates that the withdrawal effect of the drug-addict is lower than the average value of similar drug-addicts and exceeds 1 standard deviation, and the withdrawal effect is marked as 'poor',

LSS >1 indicates that the withdrawal effect of the drug-addict is higher than the average value of similar drug-addicts by more than 1 standard deviation, and the withdrawal effect is marked as 'excellent';

when the evaluation results of the LSS and the GSS are different, the evaluation result of the LSS is used as a standard.

Further, in step S1, the objective function YD is any one of a cumulative award penalty, a monthly award penalty, an examination score, a medical examination result, and a rehabilitation training score.

Further, in step S2, the characteristic FD is any one of gender, age, drug type and culture degree.

Compared with the prior art, the invention has the advantages and beneficial effects that:

1. the method overcomes the defects of the existing method for evaluating the abstinence effect of the forced abstinence personnel, utilizes the data directly related to the abstinence effect in the abstinence information system to automatically extract the data from the database to construct a training set, and uses a decision tree regression algorithm to train a forced abstinence personnel abstinence effect evaluation model based on the abstinence historical data, the obtained model can carry out regular abstinence effect evaluation on the abstinence personnel, the evaluation only needs to extract the data from the information system database, no additional artificial subjective judgment needs to be added, the method is simple and easy to operate, the accuracy is high, and the output index is easy to understand and grasp.

2. The method of the invention establishes a withdrawal effect evaluation model completely based on data, eliminates human subjective factors, realizes that the model is updated at any time by constructing a data set and retraining, can quickly and flexibly adapt to the change of environment, and can also adapt to the huge difference of the technical and institutional environments of different regions.

3. The method has the advantages of simple evaluation process, low cost, easy operation and easy understanding of evaluation results; the LSS index takes the average value and the standard deviation of similar drug addicts as comparison standards, and considers the differences of sex, cultural degree and the like of the evaluated person, so that the evaluation result is more reasonable; the flexibility is high, the method can adapt to the huge difference brought by different systems and different technical equipment of each region, although the original data has huge difference, the evaluation results LSS and GSS have the same value range and similar numerical value meaning, and the method is easy to popularize; the adaptability is strong, and when the system is changed and the data is changed greatly due to the technical progress, the change can be quickly adapted in a mode of retraining the model.

Drawings

Fig. 1 is a flowchart of a training process in embodiment 1 of the present invention.

FIG. 2 is a flowchart of an evaluation process in embodiment 1 of the present invention.

Detailed Description

The present invention is further illustrated by the following examples, which are provided only for illustrating the present invention and are not intended to limit the scope of the present invention.

Example 1

s1: selecting an objective function: selecting one dimension YD from multidimensional data of drug addicts as a target function, wherein YD is a continuous real number type and is a quantitative index directly related to the abstinence effect, and selecting any one of accumulated award penalty points, monthly award penalty points, examination scores, medical examination results and rehabilitation training scores;

s2: selecting characteristics: selecting a group of characteristic FD from the multidimensional data of the drug-dropping personnel, selecting the static attribute of the drug-dropping personnel as the characteristic, namely that the attribute values are not changed in the whole drug-dropping process, and selecting any one of gender, age, drug types and cultural degree;

s3: training process: establishing a training data set TrainSet according to the objective function YD and the characteristic FD, training a decision tree regression model DTM, calculating parameters LNSTD and LNMEAN of each leaf node in the model DTM, saving the decision tree regression model DTM, the overall mean GMEAN, the overall standard deviation GSTD, the sample standard deviation LNSTD and the sample mean LNMEAN, and finishing the training process as shown in a flow chart of a training process in figure 1;

the data set TrainSet is a sample set, each sample corresponds to data of one person in the multi-dimensional drug rehabilitation data, and each sample has three columns: according to the method, month, label and features are used, the value of an objective function YD is used as label, data of a selected feature FD is extracted from multi-dimensional drug-dropping data to construct feature vectors, the drug-dropping time is extracted from the multi-dimensional drug-dropping data to serve as mouth, and a month is used as a unit;

the training decision tree regression model DTM is that samples with month equal to mi are extracted from a data set TrainSet, a subset ModelTrainSet is put into the samples for training the decision tree regression model DTM, namely the data in the month mi are extracted to train the decision tree regression model DTM, and mi is the middle value of month or mi is 12; in the process of training a decision tree regression model DTM by the subset ModelTrainSet, controlling the minimum sample number of leaf nodes to be more than MNS, wherein MNS is more than or equal to 10 and less than the total number of the ModelTrainSet samples or the total number of the leaf nodes;

calculating parameters LNSTD and LNMEAN of each leaf node in the decision tree regression model DTM, namely putting all leaf nodes in the obtained decision tree regression model DTM into a unified leaf node array lnodes, wherein the number of the leaf nodes is lnize which is equal to the length of the lnodes, calculating a standard deviation LNSTD array and a mean LNMEAN array of all dataset TransSet samples hitting the leaf nodes, wherein the sample standard deviation LNSTD and the sample mean LNMEAN are two-dimensional arrays, the first dimension represents months and the length is 36, the second dimension represents nodes, the length is lnize, the value of LNSTD [ m ] [ i ] is the standard deviation of label of an mth month sample hitting the ith leaf node, and the value of LNMEAN [ m ] [ i ] is the mean of label of the mth month sample hitting the ith leaf node;

the specific calculation method of the sample standard deviation LNSTD and the sample mean LNMEAN comprises the following steps:

s305: enumerating each element GTSS [ m ] of a one-dimensional set array GTSS, wherein GTSS [ m ] is a subset of a sample, calculating the mean value and standard deviation of label of all samples of the subset, storing the mean value and standard deviation into arrays GMEAN [ m ] and GSTD [ m ], wherein GMEAN and GSTD are one-dimensional arrays and represent the mean value and standard deviation of the whole, and subscript m represents a month;

s4: an evaluation process; loading a decision tree regression model DTM, an overall mean value GMEAN, an overall standard deviation GSTD, a sample standard deviation LNSTD and a sample mean value LNMEAN stored in a training process, predicting a target function YD value of a person to be evaluated according to the model DTM by using a decision tree regression algorithm, obtaining leaf nodes of the hit decision tree regression model DTM to calculate LSS, calculating GSS according to the target function YD value of the person to be evaluated, the overall mean value GMEAN and the overall standard deviation GSTD, and outputting LSS and GSS as an evaluation result, wherein an evaluation process flow chart is shown in FIG. 2;

the specific evaluation process is as follows:

s403: calculating GSS ═ (YD-GMEAN [ m ])/GSTD [ m ];

Example 2

According to the method of the embodiment 1 of the invention, the method is used for testing at a certain drug rehabilitation bureau, extracting 762 dimensional data such as basic information of 13126 drug rehabilitation personnel who have left the country since 2016-09-01, SCL90 scale test results, score assessment and the like in a drug rehabilitation law enforcement platform database, constructing a training data set TrainSet after data cleaning, error deletion and too low quality data, selecting a cumulative prize penalty as YD, mi for 12, training a model for evaluating the withdrawal effect, and then evaluating the withdrawal effect of 6971 drug rehabilitation personnel on the book.

64836 evaluation results (one for each month for each drug-addict) were obtained, with 92.7% of the results showing a consistent LSS and GSS evaluation.

Of these, 7.3% of the results (528 people involved) scored lower than the global mean, but LSS >1, i.e. the abstinence effect was "excellent"; to verify the 7.3% accuracy of the results, 20 results were randomly selected from the interval and manually evaluated by experts, of which 18 were excellent and 2 were normal, i.e., the LSS evaluation accuracy of the data in the interval was 90%.

Therefore, the invention considers and analyzes the combination of the LSS and the GSS, carries out comprehensive evaluation, not only improves the evaluation efficiency and confirms the accuracy of most data, but also has higher accuracy, so that about 470 drug addicts which have no outstanding score but actually have good performance obtain more fair evaluation.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and the like that are within the spirit and principle of the present invention are included in the present invention.

Claims

1. a drug addiction treatment effect evaluation method based on a decision tree, is characterized in that, comprises the following steps:

S1: Objective function selection: select a dimension YD from the multidimensional data of drug addicts as the objective function;

S2: Feature selection: select a set of features FD from the multidimensional data of drug addicts;

S3: Training process: Create a training data set TrainSet according to the objective function YD and feature FD, train the decision tree regression model DTM, calculate the parameters LNSTD and LNMEAN of each leaf node in the model DTM, save the decision tree regression model DTM, the overall mean GMEAN, The overall standard deviation GSTD, the sample standard deviation LNSTD and the sample mean LNMEAN, the training process is completed;

S4: Evaluation process; load the decision tree regression model DTM, the overall mean GMEAN, the overall standard deviation GSTD, the sample standard deviation LNSTD and the sample mean LNMEAN saved in the training process, and use the decision tree regression algorithm to predict the target function YD of the evaluated person according to the model DTM value, obtain the hit decision tree regression model DTM leaf node to calculate LSS, calculate GSS according to the target function YD value of the evaluated person, the overall mean GMEAN, and the overall standard deviation GSTD, and output LSS and GSS as the evaluation result.

2. the drug rehab effect evaluation method for drug addicts based on decision tree according to claim 1, is characterized in that, in step S3, described training decision tree regression model DTM is from data set TrainSet, extracts the sample that month is equal to mi , put it into the subset ModelTrainSet for training the decision tree regression model DTM, that is, extract the data of the mi month to train the decision tree regression model DTM, and take the middle value of month or take mi=12.

3. the drug rehab effect evaluation method based on decision tree according to claim 2, is characterized in that, in step S3, in described subset ModelTrainSet training decision tree regression model DTM process, controls the minimum sample number of leaf node >MNS, where 10≤MNS<subset ModelTrainSet samples or total number of leaf nodes.

4. the method for evaluating the effect of drug rehab based on decision tree according to claim 1, is characterized in that, in step S3, the parameter LNSTD and LNMEAN of each leaf node in the described computing decision tree regression model DTM are to obtain. All leaf nodes in the decision tree regression model DTM are placed in a unified leaf node array lnodes, the number of leaf nodes is lnsize, lnsize is equal to the length of lnodes, calculate the standard deviation LNSTD array and mean LNMEAN of all dataset TrainSet samples that hit leaf nodes Array, sample standard deviation LNSTD and sample mean LNMEAN are two-dimensional arrays, the first dimension represents the month, the length is 36, the second dimension represents the node, the length is lnsize, the value of LNSTD[m][i] is the ith hit The standard deviation of the label of the mth month sample of the leaf node, the value of LNMEAN[m][i] is the average value of the label of the mth month sample that hits the ith leaf node.

5. the method for evaluating the effect of drug rehabilitation personnel based on decision tree according to claim 4, is characterized in that, in step S3, the concrete calculation method of described sample standard deviation LNSTD and sample mean value LNMEAN is:

S301: Establish a set array TSS, the set array TSS is a two-dimensional array, the first dimension represents the month, the length is 36, the second dimension represents the node, the length is the number of leaf nodes lnsize, and all elements of the set array TSS are initialized as empty sets;

S302: List each sample x in the data set TrainSet, use the decision tree regression algorithm to predict the predicted value py of x.features, ignore the predicted value py, and take the decision tree leaf node hit during the prediction process in the leaf node array lnodes. Label lni, add sample x to subset TSS[x.month][lni];

S303: List each element TSS[m][j] of the set array TSS, TSS[m][j] is a subset of a sample, calculate the mean and standard deviation of the element label of this subset, and save it to LNSTD[m ][i] and LNMEAN[m][i];

S304: Create a one-dimensional set array GTSS, the length is 36, all elements are initialized as empty sets, enumerate each sample x in the dataset TrainSet set, and add x to the subset GTSS[x.month];

S305: List each element GTSS[m] of the one-dimensional set array GTSS, where GTSS[m] is a subset of a sample, calculate the mean and standard deviation of the labels of all samples in this subset, and save them to the arrays GMEAN[m] and GSTD[m], GMEAN and GSTD are one-dimensional arrays, which represent the mean and standard deviation of the whole, and the subscript m represents the month.

6. the method for evaluating the effect of drug rehabilitation personnel based on decision tree according to claim 1, is characterized in that, in step S3, described data set TrainSet is sample collection, and each sample corresponds to a person's data in multi-dimensional drug rehabilitation data. Data, each sample has three columns: month, label and features, use the value of the objective function YD as the label, extract the data of the selected feature FD from the multi-dimensional detoxification data to construct the feature vector features, and extract the detoxification time from the multi-dimensional detoxification data. As mouth, in months.

7. the method for evaluating the effect of drug rehab personnel based on decision tree according to claim 1, is characterized in that, in step S4, described concrete evaluation process is:

S401: Load the decision tree regression model DTM, the overall mean GMEAN, the overall standard deviation GSTD, the sample standard deviation LNSTD, and the sample mean LNMEAN obtained from the storage medium during the training process;

S402: Use the same method as the features column of the data set TrainSet sample to extract the feature vector features of the evaluated person, and use the decision tree regression algorithm to predict the YD attribute value of the feature's objective function according to the decision tree regression model DTM, ignore the predicted value, and obtain the features hit The subscript lni of the leaf node in the decision tree regression model DTM, calculate the detoxification time month of the evaluated person, and calculate the parameter LSS=(YD-LNMEAN[m][lni])/LNSTD[m][lni];

S403: Calculate GSS=(YD-GMEAN[m])/GSTD[m];

S404: Output the evaluation results LSS and GSS, as well as the trend of LSS and GSS indicators changing over time, as an intuitive description of the evaluation person's abstinence effect index YD;

GSS>0 means that the abstinence effect of the evaluated person is better than the overall average level; GSS<0 means that the abstinence effect of the evaluated person is worse than the overall average level;

When LSS>0, it means that the current treatment effect of the evaluated person is better than the average of similar treatment personnel.

-1<LSS<1, it means that the average deviation between the evaluated person's treatment effect and similar drug treatment personnel is within 1 standard deviation, and the treatment effect is marked as "normal".

LSS<-1 means that the treatment effect of the drug addicts is lower than the average value of similar drug addicts by more than 1 standard deviation, and the treatment effect is marked as "poor".

LSS>1 means that the detoxification effect of drug addicts is higher than the average of similar drug addicts by more than 1 standard deviation, and the detoxification effect is marked as "excellent";

When the evaluation results of LSS and GSS are different, the evaluation results of LSS shall be used as the standard.

8. the method for evaluating the effect of drug rehab based on decision tree according to claim 1, is characterized in that, in step S1, described objective function YD is cumulative reward and penalty points, monthly reward and penalty points, test scores, medical examination Either outcome or rehabilitation training performance.

9. The method for assessing the effect of drug rehabilitation for drug addicts based on a decision tree according to claim 1, characterized in that, in step S2, the feature FD is any one of gender, age, drug use type and educational level.