CN114649075A

CN114649075A - Depression rating system and method based on machine learning

Info

Publication number: CN114649075A
Application number: CN202210334156.8A
Authority: CN
Inventors: 李晓虹; 孙源鸿; 刘启健; 李毅; 李康
Original assignee: Beijing Anding Hospital
Current assignee: Beijing Anding Hospital
Priority date: 2022-03-30
Filing date: 2022-03-30
Publication date: 2022-06-21

Abstract

The invention relates to a depression rating system and method based on machine learning, and belongs to the technical field of machine learning. The system comprises a data acquisition module, a data preprocessing module, a feature selection module, a machine learning module and an execution module. A data acquisition module to obtain data from a globally accessible online survey including answers to all items in a depression scale, demographic variables of the filling person, and time information spent filling a questionnaire; the data preprocessing module divides the data into two grades of high depression or low depression according to the scores of the responders except age and gender data; organizing the characteristics of the original data, and converting all classifiable columns into one-hot codes representing each class; separating the features and the labels, and dividing a data set into a training data set, a testing data set and a reserved data set; the present invention enables the development of a simpler, more effective method for depression risk assessment by shortening existing scales.

Description

Depression rating system and method based on machine learning

Technical Field

The invention belongs to the technical field of machine learning, and relates to a depression rating system and method based on machine learning.

Background

Features of depression include "depressed mood (e.g., sadness, irritability, vacuity) or loss of pleasure, with other cognitive, behavioral, or autonomic symptoms that significantly affect individual function" (ICD-11,2021, Page 6a71.1), possibly resulting in decreased performance in work or learning; and in severe cases even death. According to the World Health Organization (WHO) data, about 800,000 deaths are attributed to suicidal connections with depression, which is the second leading cause of death in teenagers and young adults (WHO, 2020). Clinical psychologists developed a variety of depression scales for assessing and monitoring depression (becket. al., 1961; Lovibond & Lovibond,1995), including the becker depression scale (BDI, item 21), the depression anxiety stress scale (DASS, item 21 or 42), the hamilton depression rating scale (HDRS, item 17), and the minnesota multiphase personality scale (MMPI, > 200). However, a depression assessment tool which is convenient and effective, suitable for self-assessment by patients and capable of being frequently used is not available at present.

The problems that exist are that:

the traditional depression scale has high reliability and high efficiency, can evaluate the existing symptoms and severity of an individual, but has the following problems:

1. these scales are usually very long (>20 items) and are not convenient for people to use often;

2. excessively lengthy questions may produce memory effects and fatigue effects during repeated measurements, reducing the accuracy of the obtained results;

3. many scales require testing by professional clinical psychologists or psychiatrists and are not suitable for individual self-assessment;

4. the individual has low filling willingness on the existing scale and poor compliance;

5. even if individuals can use self-rating scales that do not require professional testing, there are still difficulties in calculating and reasonably interpreting results;

6. is not beneficial to the real-time evaluation and monitoring of depression symptoms and reminds individuals with high-risk evaluation results to seek help in time.

Disclosure of Invention

In view of the above, the present invention provides a depression rating system and method based on machine learning, wherein the generated reduced scale facilitates individuals to promptly and accurately assess their own depression level state, improves patient's self-cognition on disease symptoms and compliance with treatment, and more quickly and effectively assesses public mental health conditions.

In order to achieve the purpose, the invention provides the following technical scheme:

a depression rating system based on machine learning comprises a data acquisition module, a data preprocessing module, a feature selection module, a machine learning module and an execution module;

a data acquisition module to obtain data from a globally accessible online survey including answers to all items of the DASS-42 questionnaire, demographic variables of the filling person, and time information spent filling the questionnaire;

the data preprocessing module divides the data into two grades of high depression or low depression according to the scores of the responders except age and gender data; organizing the characteristics of the data obtained after screening in the data acquisition module, wherein the organizing content comprises the step of converting all classifiable columns into one-hot codes representing each class; after checking the number of rows per tag, the data set is balanced by upsampling; separating the features and the labels, and dividing a data set into a training data set, a testing data set and a reserved data set;

the characteristic selection module selects a minimum Redundancy maximum correlation method (Max-Relevance and Min-Redundancy, MRMR) for the data obtained by preprocessing, selects the most relevant characteristics according to the pairwise correlation or mutual information of each pair of variables in the data set, simultaneously minimizes the Redundancy among the variables, and judges which problems in the examination scale have larger weight for predicting the depression risk;

a machine learning module that predicts hypo-depressive and hyper-depressive levels from different combinations of topics determined in feature selection; evaluating the performance of the model to obtain a model based on several questions to determine how many questions are needed to achieve an Area Under the Curve (AUC) of the Receiver Operating characteristics Curve (ROC) of 90% of the Receiver Operating characteristics Curve on the retained dataset; machine learning training is only completed on the training data set, and internal verification is performed on the test data set; after obtaining one model, recombining the training and testing data sets, and segmenting again in the same manner, training and testing another model; a total of 100 data reorganizations, partitions, training and testing were performed; optimization of hyper-parameters to obtain an optimal model: selecting a model with the best performance according to the AUC ROC score and the F1 score evaluated by the original verification data set, namely the model with the highest AUC and F1 values; obtaining the optimal hyperparameters of each model through grid search, namely parameters with the highest AUC and F1 values: training each iteration using a different combination of hyper-parameter values to determine which combination is optimal; finally, selecting the best model to evaluate the depression grade;

and the execution module is used for reserving the obtained optimal model, establishing a website application program prototype and executing the model, and checking the accuracy of the model when the model is used for classifying individual data which are not used for training and testing, namely measuring the model performance when a real user is evaluated.

Optionally, the feature selection includes an unsupervised technique MRMR and a supervised feature selection method;

the unsupervised MRMR is as follows: selecting the most relevant features based on the pairwise correlations or mutual information for each pair of variables in the dataset while minimizing redundancy between the variables;

the supervised feature selection method comprises the following steps: fitting an extreme random tree classifier (ETC) to all features and labels before ranking the most important features; an extreme random tree classifier fits several extreme random trees on the subsamples of the dataset and averages the results; the feature Importance is obtained by calculating the normalized total reduction of the criterion brought by the feature, called the Gini Importance (GI), and the Importance of each feature is calculated as the sum of the number of partitions of all decision trees containing the feature, which is proportional to the number of samples it partitions;

the most important features are obtained by ranking feature importance; and finally, selecting the most important problem of the original scale, namely the problem with higher weight for predicting the height of the depression level by combining the results of the MRMR and the random decision tree classifier.

Optionally, the testing method includes logistic regression LR, gaussian naive bayes GNB, support vector machine SVM, random forest RF, multilayer perceptron neural network MLP, extreme gradient boosting decision tree XGBoost, and stacking generalization integration ensembles.

Optionally, the loss function used in the machine learning training is a binary cross entropy BCE loss;

the machine learning training is implemented in Python3 using its Sciket-learn library and is retained in the default hyper-parameters and stopping criteria for fair comparison; confirming the relative importance of the project again by using a SHAP method, checking which values of the depression correlation scale have the largest contribution to the overall prediction, and calculating by using a Shapley value formula to obtain the value of 0-1; SHAP is a method of interpreting individual predictions, with the goal of interpreting predictions of instances by computing the contribution of each feature to the prediction; the sharley value was developed from the cooperative game theory, wherein each cooperative game was assigned a unique distribution of total remainders among the participants resulting from the union of all the participants;

the Shapley value is formulated as:

wherein

Is the weight, (v (S @) v (S) is a marginal contribution.

Optionally, the depression-related scales include the following scales:

beck Depression questionnaire (Beck Depression invent, BDI);

depression Self-Rating Scale (Self-Rating Depression Scale, SDS);

depression Status questionnaire (DSI);

carroll Depression self-Rating Scale (Carroll Rating Scale for Depression, CRS);

depression Scale for centers of circulation (CENTER FOR epidemic students Depression Scale, CES-D);

depression Adjective Checklist (DACL);

the Depression Experience Questionnaire (DEQ);

the Cognitive deviation Questionnaire (The Cognitive Bias questonaire, CBQ);

automated thinking Questionnaire (The automated roads questoiire, ATQ);

the senile Depression Scale (The Geriatric Depression Scale, GDS);

hamilton Depression Scale (Hamilton Depression Scale for Depression, HAMD);

hospital Anxiety Depression Scale (HAD);

edinburgh Postpartum Depression Scale (EPDS);

montgomery Depression Rating Scale (Montgomery-Asperg Depression Rating Scale, MADRS);

the childhood Depression scale (Children's Depression Inventory, CDI);

Bech-Rafaelsen-Melancholia symptom Scale (The Bech-Rafaelsen Melancholia Scale, MRS);

a Depression symptom self rating scale (DSRS);

simpson Depression Rating Scale (Simpson Depression Rating Scale);

the Kellner Depression Rating Scale (Kellner Depression Rating Scale);

the senile Depression Scale (the Depression In old Age scale, DIA-S);

and a scale comprising depression dimensions:

kannell health questionnaire (Cornell Medical Index, CMI)

Symptom self-rating scale (Symptom Checklist 90, SCL-90)

Minnesota multiple Personality test (MMPI, Minnesota Multipharmaceutical personal Inventory)

Depression-Anxiety-Stress Scale (Depression-Anxiety-Stress Scale)

Quality of Life scale (Quality of Life Score, QOE.S)

Irritation, Depression and Anxiety Scale (Iritability suppression and Anxiety Scale, IDAS)

Mood State scale (Profile of Mood State, POMS)

BFS mood Scale (the Benefit filing Scale, BFS).

A method for machine learning-based depression rating, the method comprising the steps of:

s1: collecting data; obtaining data from a globally accessible online survey including answers to all items of the DASS-42 questionnaire, demographic variables of the filling out person, and time information taken to fill out the questionnaire;

s2: preprocessing data; dividing the data of age and gender into two grades of high depression or low depression according to the scores of responders; organizing the data characteristics obtained by the data acquisition module, wherein the organizing content comprises the step of converting all classifiable columns into one-hot codes representing each class; after checking the number of rows per tag, the data set is balanced by upsampling; separating the features and the labels, and dividing a data set into a training data set, a testing data set and a reserved data set;

s3: selecting data obtained after preprocessing the data by feature selection, selecting a minimum redundancy maximum correlation method (MRMR), selecting the most relevant features according to the pairwise correlation or mutual information of each pair of variables in a data set, simultaneously minimizing the redundancy among the variables, and judging which problems in an examination scale have larger weight for predicting the depression risk;

s4: machine learning; different combinations of topics identified from the feature selection to predict hypo-and hyper-depressive levels; evaluating model performance to obtain a model based on several questions to determine how many questions are needed to achieve a 90% Area Under the Curve (AUC) of the Receiver Operating Characteristic Curve (ROC) on the retained dataset; machine learning training is only completed on the training data set, and internal verification is performed on the test data set; after obtaining one model, recombining the training and testing data sets, and segmenting again in the same way, training and testing another model; a total of 100 data reorganizations, partitions, training and testing were performed; optimization of hyper-parameters to obtain the best model: selecting a model with the best performance according to the AUC ROC score and the F1 score evaluated by the original verification data set, namely the model with the highest AUC and F1 values; the best hyper-parameters of each model, namely the parameters with the highest AUC and F1 values are obtained through grid search: training each iteration using a different combination of hyper-parameter values to determine which combination is optimal; finally, selecting the best model to evaluate the depression grade;

s5: executing; and (3) keeping the obtained optimal model, establishing a website application program prototype and executing the model, and checking the accuracy of the model when the model is classified to individual data which are not used for training and testing, namely measuring the performance of the model when a real user is evaluated.

Optionally, the feature selection specifically includes: the feature selection comprises an unsupervised technology MRMR and a supervised feature selection method;

the Shapley value is formulated as:

wherein

Is the weight, (v (S @) v (S) is a marginal contribution.

Optionally, the depression-related scales include the following scales:

beck Depression questionnaire (Beck Depression invent, BDI);

depression Self-Rating Scale (Self-Rating Depression Scale, SDS);

depression Status questionnaire (DSI);

depression Adjective Checklist (DACL);

the Depression Experience Questionnaire (DEQ);

the Cognitive deviation Questionnaire (The Cognitive Bias questonaire, CBQ);

automated thinking Questionnaire (The automated roads questoiire, ATQ);

the senile Depression Scale (The Geriatric Depression Scale, GDS);

hamilton Depression Scale (Hamilton Depression Scale for Depression, HAMD);

hospital Anxiety Depression Scale (HAD);

the Edinburgh Postpartum Depression Scale (EPDS);

the childhood Depression scale (Children's Depression Inventory, CDI);

a Depression symptom self rating scale (DSRS);

simpson Depression Rating Scale (Simpson Depression Rating Scale);

the Kellner Depression Rating Scale (Kellner Depression Rating Scale);

the senile Depression Scale (the Depression In old Age scale, DIA-S);

and a scale comprising depression dimensions:

kannell health questionnaire (Cornell Medical Index, CMI)

Symptom self-rating scale (Symptom Checklist 90, SCL-90)

Depression-Anxiety-Stress Scale (Depression-Anxiety-Stress Scale)

Quality of Life scale (Quality of Life Score, QOE.S)

Mood State scale (Profile of Mood State, POMS)

BFS mood Scale (the Benefit filing Scale, BFS).

The invention has the beneficial effects that: the present invention enables the development of a simpler, more effective method for depression risk assessment by shortening existing scales. The novel, simplified, networked and intelligent scale generated by the machine learning technology can not only predict the depression risk level, but also guide a user to carry out comprehensive evaluation. In addition, the technology can generate scales of different subject combinations, and guarantees that a user does not have memory effect caused by repeated measurement in the process of regularly monitoring the depression risk of the user. The tool can provide regular monitoring of depression symptoms for users in the world, improve scientific cognition of people on depression risks, help-seeking consciousness of individuals with high risks in evaluation results, and reduce negative influence of depression on individuals, families and society.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the means of the instrumentalities and combinations particularly pointed out hereinafter.

Drawings

For the purposes of promoting a better understanding of the objects, aspects and advantages of the invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings in which:

FIG. 1 is a schematic block diagram of the present invention;

FIG. 2 is a flow chart of the present invention;

FIG. 3 is the number of AUC ROC scores over DASS-42 items.

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention in a schematic way, and the features in the following embodiments and examples may be combined with each other without conflict.

Wherein the showings are for the purpose of illustrating the invention only and not for the purpose of limiting the same, and in which there is shown by way of illustration only and not in the drawings in which there is no intention to limit the invention thereto; to better illustrate the embodiments of the present invention, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if there is an orientation or positional relationship indicated by terms such as "upper", "lower", "left", "right", "front", "rear", etc., based on the orientation or positional relationship shown in the drawings, it is only for convenience of description and simplification of description, but it is not an indication or suggestion that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and therefore, the terms describing the positional relationship in the drawings are only used for illustrative purposes, and are not to be construed as limiting the present invention, and the specific meaning of the terms may be understood by those skilled in the art according to specific situations.

Please refer to fig. 1 and fig. 2, which are a depression rating system and method based on machine learning, for simplifying the machine learning technique of the depression assessment scale. The system comprises a data acquisition module, a data preprocessing module, a feature selection module, a machine learning module and an execution module.

A data collection module that obtains data from a globally accessible online survey, including answers to all items of the DASS-42 questionnaire, demographic variables of the filling person, time taken to fill out the questionnaire, and the like.

And the data preprocessing module is used for removing data which do not meet the requirements in age and gender, dividing the data into two levels of high depression or low depression according to the scores of the responders, organizing the characteristics of the original data, and converting all classifiable columns into one-hot codes representing each category. After checking the number of rows per label, the data set is balanced by upsampling. Features and labels are separated and the data set is partitioned into training, testing and retention data sets.

The system also includes a feature selection module that selects the Minimum Redundancy Maximum correlation (MRMR) technique from the fully preprocessed and balanced data samples, selects the most relevant features based on the pairwise correlations or mutual information of each pair of variables in the data set, and minimizes the Redundancy between variables to check which problems in the scale are weighted more heavily in predicting the risk of depression. A second feature selection method is also designed to fit an additional tree classifier to all features and labels before ordering the most important features. The random decision tree classifier fits several random decision trees on the subsamples of the dataset and averages the results. Feature significance is obtained by calculating the normalized total reduction of the criteria brought about by the feature. And sequencing the importance of the features, and finally selecting the most important problem of the original table by combining the results of the MRMR and the random decision tree classifier.

The system also includes a machine learning module that predicts low and high depression levels for different combinations of topics determined from the feature selections. Model performance was evaluated to obtain a model based on one question, two questions, etc., to determine how many questions were needed to achieve a 90% AUC on the retained dataset. Machine learning training is done only on the training data set and internal validation is performed on the test data set. After one model is obtained, the training and testing data sets are recombined and segmented again in the same manner, and another model is trained and tested. A total of 100 data reorganizations, partitions, and training/testing were performed.

The testing techniques include Logistic Regression (LR), Gaussian Naive Bayes (GNB), Support Vector Machines (SVM), Random Forests (RF), multi-layer perceptron neural networks (MLP), extreme gradient boosting decision trees (XGBoost), and stacked generalized integration (Ensemble). The primary loss function used in machine learning is the Binary Cross Entropy (BCE) loss. All machine learning techniques were implemented in Python3 using its Scikit-leann library and retained in the default hyper-parameters and stopping criteria for fair comparison. The relative importance of the project is again confirmed using SHAP (Shapley Additive exhibition) and it is checked which values of certain projects contribute most to the overall prediction. SHAP is a procedure.

Finally, the hyper-parameters are optimized to obtain the best performance model. The best performing model was selected based on the AUC ROC score and F1 score evaluated from the original validation dataset. The best hyper-parameter for each model is obtained by a grid search, and each iteration is trained using a different combination of hyper-parameter values to determine which combinations are best. Finally, the best model is selected to perform a quick assessment of the depression rating.

The system also comprises an execution module, wherein the execution module is used for reserving the optimal model obtained in the last step, establishing a website application program prototype and executing the model, and checking the accuracy of the model when the model is classified for individual data which are not used for training and testing, namely measuring the model performance when a real user is evaluated.

For this study, the DASS-42 dataset was obtained from 31,715 participants worldwide. The degree of depression of each participant was scored according to the DASS manual. Since this is a proof of concept study, the model was not trained to classify individuals into 5 classes, but rather to group participants on the "none", "mild" and "moderate" classes into a "low depression" class, while those classified as "severe" and "extreme" into a "high depression" class. This is done as a first proof of concept, since it is clinically most meaningful to identify patients with severe and extreme major depression. The success of this concept validation may form the basis for further exploration of computational models to classify participants into two classes (e.g., none with all other levels) or five levels with different cutoff values.

To build a computational model to classify individuals into low-and high-depression categories, first, feature selection is performed to identify the best item from the entire DASS-42 that can classify participants into two categories. All 42 items in DASS-42 are used for feature selection, since it is assumed that items designed for anxiety and stress may contain important information for predicting depressive state. Second, the data is divided into a training set, a test set, and a retention set. Models were trained on a training set using various machine learning algorithms to predict depression categories with different combinations of important items identified in feature selection. The model is then tested against the test set. The following machine learning algorithm was used: logistic Regression (LR), Random Forest (RF), Gaussian Naive Bayes (GNB), Support Vector Machines (SVM), extreme gradient boosting decision trees (XGB), multi-layered perceptron (MLP). Three different accuracy indicators were used to evaluate model performance: accuracy score (ACC), area under the curve (AUC) of Receiver Operating Characteristic (ROC) curve score, and F1 score. The performance of all algorithms are compared to each other to determine the best algorithm.

Third, an integration approach is used to integrate predictions based on models of these individual algorithms. Ensembles are used because each machine learning algorithm is able to identify an individual's state of depression in a unique way and has its unique advantages and disadvantages. By integrating the predictions of all algorithms, it is assumed that the shortened scale should be achievable with the least number of items but the highest accuracy. Fourth, to evaluate the generalization of computational models based on these algorithms and their sets, the models are applied to a retention set, which is the original data set that was never used for feature selection, model training, and model testing.

Finally, SHAP (Shapley Additive Expanation) analysis was performed on the best models to determine the relative importance of the items that contributed most to the prediction. The assumption that different items contribute differently to accurately predicting the depressive state was examined, as opposed to the basic assumption of the original DASS-42, that is, all items of the scale are of equal importance, so that the score can be calculated by simply adding the answers of all items, as follows:

participants

Data was obtained from 31,715 participants who filled out a DASS questionnaire online. 7,217 men and 24,498 women. The mean age of the participants was 25.4 years. Of these, 24,673 participants were in the age range of 18-27 years and were the largest population. The number of participants gradually decreased with age. Most of the participants resided in asia, and most of the remaining participants resided in north america and europe, as shown in table 1.

TABLE 1 participant demographic subdivision

Material

Depression anxiety stress scale 42(DASS-42) was used to assess depression levels in participants. This scale was developed by the university of new south wales in 1995. It contains 42 items aimed at measuring the severity of the user in terms of depression, anxiety and stress. Each sub-scale contains 14 items, intended to evaluate each condition. Each item is a statement describing the user's experience or experience, and the user can answer how close it has been to their description in the past week. The options for each item are "never", "sometimes", "often", and "almost always", corresponding to

scores

0,1, 2, and 3, respectively (Lovibond & Lovibond,1995). The total score for each condition is calculated by adding all the scores of the items associated with the condition. Based on the score, the state of the condition of the user will fall into one of the following categories: "none", "mild", "moderate", "severe" and "very severe" (for more information, see DASS handbook http:// www2.psy. unsw. edu. au/groups/das /) (Lovibond & Lovibond,1995). In this study, for simplicity, none, mild and moderate were combined into one class (low depression) and the rest into another class (high depression).

The DASS scale was developed using answer samples from 504 omics students. The normalization was then performed on a standard sample of 1,044 males and 1,870 females from different backgrounds, which were between the ages of 17 and 69 years. These scores were subsequently validated against multiple outpatient populations, including patients with anxiety, stress and depression, and other psychiatric disorders (Lovibond & Lovibond,1995). In the standard sample, reliability scores are given on the Cronbach's alpha score or tau equivalent credit scale as depression scale 0.91, anxiety scale 0.84, and stress scale 0.90(Lovibond & Lovibond,1995).

Data acquisition

The data for this survey is collected by publishing a globally accessible online survey. The survey is anonymous and users are consented to provide their personal information and informed consent. The survey included all 42 entries of DASS-42, and a series of questions asking the participants for demographic data. The survey entries were then downloaded as Comma Separated Value (CSV) files as the main dataset for this study.

This data set includes answers from all 42 items of the DASS-42 questionnaire, demographic information of the participants, including gender (not biological gender) and age. It also includes anxiety, depression and stress scores calculated according to the DASS-42 scoring method, and the time it takes for each question to track whether the survey was answered correctly. The answer value for each option of each DASS project is digitally encoded using

integers

0,1, 2, and 3, representing options "never", "sometimes", "often", and "almost always" in the questionnaire, respectively. Other materials include a severity classification based on a score for each of the three conditions. Since this study only focused on depression, anxiety and stress scores were abandoned.

The DASS-42 questionnaire defines threshold scores ("none", "mild", "moderate", "severe" and "very severe") for each severity of depression, which are used as ground truth in this study. However, for simplicity, participants were divided into two depression levels according to their scores: assigning the participant to a high depression level if the DASS depression score belongs to the "severe" and "very severe" categories (1); if the DASS depression score belongs to the "none", "mild", or "moderate" categories, the participant is assigned a low depression rating (0). Using this criterion, 15,044 high depression samples and 16,671 low depression samples were obtained.

In the DASS-42 questionnaire, there were 14 problems with the depression, anxiety and stress scales. However, the problems associated with stress and anxiety were not eliminated in this study. All 42 questions from DASS were considered as features for feature selection and model training, so machine learning techniques can be applied to assess which questions are more important for best predicting the level of depression.

Demographic data is not included in model training and testing.

Data analysis

And (4) preprocessing data. In this study, the raw data set was filtered according to specific conditions. First, since the study only focuses on adults, rows less than 18 years of age were deleted. Second, rows with an empty country or region of residence are also deleted. Third, rows that take an unusual amount of time per problem (e.g., too short or too long: 2 standard deviations from the mean) are also deleted.

After the data is filtered, the original data features are organized first. All classifiable columns in the dataset are converted into one-hot codes representing each class. Next, after checking the number of rows per label, the data set is balanced by upsampling the depression state with the lower count. Prior to upsampling, there were 15,044 high depression samples and 16,671 low depression samples. The upsampling algorithm randomly selects rows from smaller classes and copies them into the dataset so that the number of rows matches the class with more samples (skleann. utilis. reuse, 2020). This is crucial to ensure that the data set does not bias towards a certain depression category, affecting the model outcome. After equilibration, there were 16,671 × 2 ═ 33,342 samples in the dataset.

Finally, the features and labels are separated and the data set is partitioned into training, testing, and retention data sets. In this study, 80% (26,674 samples) of the data set was used for model training, 10% (3,334 samples) for internal model testing during training, and 10% (3,334 samples) was used as the raw data set for external model validation after training was completed. The training set and the internal test set are reshuffled and repartitioned throughout the training process to ensure consistency of the model, while the external validation set remains unchanged.

Herein, for simplicity, only the performance metrics of each model based on the external verification set are reported.

And (4) selecting features. Since the goal was to reduce the number of questions needed to accurately predict depression levels, the first step was to use feature selection techniques to check which questions in the DASS-42 questionnaire were more heavily weighted in predicting low and high depression levels. This was done on a fully processed and balanced data set (33,342 samples).

The most important questions for predicting depression were identified from the DASS-42 questionnaire. As previously mentioned, all 42 DASS-42 questions were contained in the pool, as some questions aimed at assessing stress and anxiety in DASS may contain important information about the participants' final depression score. MRMR is an unsupervised technique that selects the most relevant features based on the pairwise correlation or mutual information of each pair of variables in the dataset, while minimizing redundancy between variables (Peng, Long, & Ding, 2005).

For reference, a second supervised feature selection approach was devised to fit an additional tree classifier to all features and labels before ranking the most important features. This step is to ensure the robustness of the MRMR feature selection and to validate the results. The Extra Tree classifier fits several random decision Trees (also known as Extra Trees) on a subsample of the dataset and averages the results. Feature importance is obtained by calculating the normalized total reduction of the criteria brought about by the feature, called kini importance (skleern. ensemble. extra treesclassic, 2020). Giniiportance, also known as Mean increment in impuity (MDI), calculates the importance of each feature as the sum of the number of partitions of all decision trees containing that feature, proportional to the number of samples it partitions (Menze, et., 2009). The ranking of the most important questions in DASS-42 is obtained by ranking feature importance.

Finally, the most important issues in DASS-42 from MRMR and Extra Tree classifiers were combined. The top 10 most important questions from DASS-42 are then selected to form a pool of features to be used for machine learning model training. Although more DASS-42 problems may be selected, only the first 10 DASS-42 problems are used because this saves overall computation time. The goal of (1) is to select a subset of questions from the 10 questions and use the selected questions to train a computational model to predict the participant's depression level. For this reason, a minimum of 1 problem to a maximum of 10 problems can be selected, and therefore, there are 1,023 possible combinations to test, which has already taken a lot of calculation time. Further increases in the number of questions to be tested will result in an exponential increase in computation time, as shown by the following equation:

since this study is a proof-of-concept study, the maximum number of DASS-42 problems was limited to 10.

And (4) training a model. The second step is to train the machine learning model to predict low and high depression levels based on the combination of the top 10 DASS-42 questions determined from feature selection. The main objective of this is to find the minimum of problems from DASS-42 that need to constitute a sufficiently accurate assessment to predict the level of depression. To do this, a problem is first randomly selected from a pool of 10 problems to train the computational model to classify the participants into low-and high-depression classes. The model performance was evaluated to determine if 1 question was sufficient to accurately classify the participant. After training with 1 of 10 questions, the system moves to a training model containing 2 combinations of 10 questions, and again randomly selects 2 questions from the 10 question pools. Then, the number of questions is increased to 3, 4, etc. up to 9, and machine learning is performed to generate a computational model to classify participants according to the 3, 4, … 9 DASS-42 questions. By doing so, the goal is to obtain a model based on one problem, two problems, etc., to determine how many problems are needed to achieve a 90% AUC on the retention set.

More than ten problems were not attempted because the total calculation time was too long and the final depression scale would have too many items. For each question, ten different question combinations are run from the first ten question pools. This is done to provide a sufficiently large sample of the problem set that will represent the accuracy distribution at run time. No more than 10 combinations were run because only the general trend of the study was to be observed and the calculation time increased with each added combination. For each question, the combination of all models remains unchanged. This is done to ensure a fair comparison between the models.

For each model training, the data set was divided into training (80%), testing (10%) and raw validation data set (10%) as previously described. Machine learning training is done only on the training data set and internal validation is performed on the test data set. The label or target is a binary column indicating the degree of depression (0 for low depression levels and 1 for high depression levels). After one model was obtained, the training and testing data sets were recombined and partitioned in the same way (80% versus 10%) and another model was trained and tested. Data reorganization, partitioning, and training/testing were performed 50 times, since preliminary training indicated that 50 sub-models were sufficient to produce a stable gaussian distribution of model performance metrics.

Then, 50 sub-models were tested to classify participants in the retained validation data set to evaluate the performance of these models, with the retained validation data set containing 10% of the complete data set unaffected throughout the process. This makes it a fair comparison between models, as it represents the accuracy of the model in classifying data from individuals who have not been used for training and testing. In other words, the results of the raw retained data represent the model's performance when used with real-world users.

Techniques tested in this study include Logistic Regression (LR), Gaussian Naive Bayes (GNB), Support Vector Machines (SVM), Random Forests (RF), multi-layer perceptron neural networks (MLP), extreme gradient boosting decision trees (XGBoost), and stacked generalized integration (Ensemble). The primary loss function used in machine learning is the Binary Cross Entropy (BCE) loss. All machine learning techniques were implemented in Python3 using its Scikit-leann library and retained in the default hyper-parameters and stopping criteria for fair comparison. The hyper-parameters and stop criteria are listed in appendix A.

The primary loss function used to evaluate the model during training is the binary cross-entropy (BCE) loss, which measures p (y)_i) The average logarithmic difference between the predicted value () and the actual value () in the binary classifier (BCELoss, 2020).

For each model, the area under the curve score of the receiver operating characteristic curve score (AUC ROC score) and the F1 score on the adherence verification dataset were evaluated. These indices are all 50 sub-models for each combination of DASS problems, 10 combinations for each problem of DASS, from 1 to when accuracy is sufficient. The mean confidence interval and 95% confidence interval were performed for the 10 combinations.

Finally, the values of the SHAP (Shapley Additive Explosition) best performance model were used. This step is to again confirm the relative importance of the items and check which values of certain items contribute most to the overall prediction. The SHAP value is the Shapley value of the conditional expectation function of the model. The sharley value evolved from the cooperative game theory, where each cooperative game was assigned a unique distribution (among participants) of the total remainder generated by all participants in combination (Lundberg & Lee, 2017). This helps determine the relative importance of each player from the total. The following equation for Shapley equation:

wherein

Is the weight, (v (S {) -v (S)), (S)) is the marginal contribution.

And (6) optimizing the model. The next step in machine learning is to optimize the hyper-parameters to obtain the best performance model. This is done to further refine the optimal machine learning model from the five techniques described above. First, the best performing model was selected based on the AUC ROC score and F1 score evaluated from the original validation dataset. The best model is retrained using the same procedure as described above, and the best hyper-parameters for each model are obtained by using a grid search, where each iteration is trained using a different combination of hyper-parameter values to determine which combinations are best. Finally, the best model is selected to perform a quick assessment of the depression rating.

And (5) implementing the model. The best machine learning model developed from the previous step has been implemented in prototypes of rapid depression assessment tools and can therefore be validated. Verification ensures that the model is implemented correctly. The verification includes two steps. The first step is to enter a fixed set of answers, ranking the different values of the different DASS-42 items for all questionnaire groups, while checking whether the predicted level of depression is realistic. It will simulate different combinations of responses of the user input tool. The second step is to enter several different responses from the original dataset and calculate the average binary prediction accuracy (AUC ROC score and F1 score) to match the original dataset when tested on the original model, where the numbers must match exactly.

Results

Feature selection

All 42 items (numbered from 1 to 42) from the DASS-42 questionnaire are included in the feature selection process to select the most important items. The first 10 items in DASS-42 that predict depression levels are the items numbered {13, 16, 3,34, 24, 22, 27, 36, 40, 26} by combining the results of the MRMR method and the Gini importance of ETC.

Model training

After all machine learning models are trained, test results are plotted. Fig. 3 shows the AUC for the ROC score for a combination of 1 DASS problem to 10 problems. Each entry averages all combinations of the number of questions per question, where each combination consists of 50 cross-validations. For fair comparison, AUC scores were taken from the outer validation dataset. Error bars represent the range of 95% confidence intervals for 50 cross-validations.

As the number of problems in the training features increases, the model performs better, but as the problems increase, the number of performance improvements decreases. Fig. 3 shows that, of the 7 problems of DASS, the AUCROC score of the best model exceeds 90%, determining that this is the required threshold of accuracy. Therefore, it is suggested to extract at least 7 items from DASS-42 for quick depression assessment.

In FIG. 3, the area under verification (AUC) scores for Receiver Operating Characteristic (ROC) curves for all models were averaged over 10 combinations of different numbers of problems from DASS-42 (n). The horizontal line shows the 90% threshold.

All models in table 2 run a more detailed table of results for 7 DASS project combinations. Based on table 2, the indicators based on AUC scores and F1 scores, ensembles and random forests are the best performing models. However, it is not clear which technique performs better because their 95% confidence intervals overlap. Since the Ensemble model has more tunable parameters, it is more likely to exceed the performance of a random forest if the hyper-parameters of both models are tuned.

TABLE 2 comparison of validation accuracy of the best model trained on the 7 DASS problem combination using default hyper-parameters

The performance of the Stacked Generalization Ensemble model was found to be superior to all other algorithms. This is an expected result, since ensembles should improve the prediction accuracy of their basic estimators (Wolpert, 1992).

After finding that the Stacked Generalization Ensemble is the best algorithm, an attempt was made to further improve accuracy by adjusting the hyperparameters of its underlying estimators (LR, GNB, RF, XGB, SVM, and MLP) using a grid search process.

After the hyper-parameter adjustment, 30 combinations were generated instead of the original 10, and the 10 combinations with the highest precision in the DASS-42 project were selected for implementation. In general, this step enables a further improvement in accuracy results of at least 1.5%, resulting in better performance even than when there are 1 more DASS-42 projects in the model with default hyper-parameters. Table 3 provides a comparison of AUC scores between the default Ensemble model and the final implementation model. See appendix B for the final hyper-parameter adjustment used in this study and 10 best combinations of DASS-42 project per threshold.

TABLE 3 AUC score comparison between Default model and Final model

The item combination selection section also reveals that some item combinations perform better than others. The results are detailed in appendix C.

Table 4 shows the results of the SHAP (simple Additive expansion) analysis by showing the average SHAP value of each feature in the top 10 most important DASS projects in the stacked generalized integration model. From the results of SHAP analysis, it appears that items 34, 24, 13 and 26 have significant weight in predicting depression levels, much higher than the other items. After looking up the original DASS-42 scale, these items appeared to be taken from the original depression sub-scale, while the other items were from the anxiety and stress sub-scale, explaining why their weights were significantly higher (see appendix a for details).

Table 4 average shap (simple Additive exposure) values, standard deviations and 95% confidence intervals for each feature in the top 10 most important DASS-42 terms in the stacked generalized integration model. These features are listed from the highest to lowest average SHAP values.

TABLE 4 confidence intervals for each feature in the top 10 most important DASS-42 entries

Execute

A prototype of the rapid depression assessment tool was constructed in the form of a Web application to demonstrate its functionality. The presentation application allows the user to answer a short survey. It shows a set of seven items from DASS-42 that were randomly chosen from a pool of X questions. This allows different combinations of 10 sets of questions and XXX questions. Using 10 problem sets instead of 1 fixed problem set allows the user to reuse the application without encountering the same problem set too frequently. The problem set is randomized each time according to the problems required by the model. The implemented model uses an optimized Stacked Generalization Ensemble technique and is a set of 10 different submodels for each set of problems. A total of 7 questions from the original DASS-42 were used to build the question pool.

Conclusion

Using long to short methods and machine learning, the possibility of developing a simpler, more effective method to assess the risk of depression by shortening the existing scale can be demonstrated. As a proof of concept, the major depressive state defined by DASS-42 was predicted using the Depression anxiety stress Scale 42(DASS-42), with an accuracy of over 90%. The shortened assessment used only 7 items on the DASS-42 scale, instead of 14 items on the original DASS stasis sub-scale, or 42 items on the full DASS-42 scale. Although developed foreshortening estimates do not completely replace the original scale, they are much shorter and more convenient to use than the original scale. Given the online nature of the assessment, once the model predicts that their depression risk is above a certain level (e.g., 50%), the user may also be instructed to make a comprehensive assessment. Furthermore, this approach creates multiple sets of non-identical problems that allow users to monitor their depression risk on a regular basis (e.g., daily) without undue repetition. Thus, the developed tools will enable global users to regularly monitor their symptoms of depression and improve their awareness of the risk of depression, thereby making them more likely to seek help in time when the risk is high. This in turn will help to reduce the negative effects of depression at the personal, household and social level.

Appendix A: feature selection results

Table A1 DASS-42 project feature selection results including unprocessed and processed projects (single hot encoding)

The items in DASS-42 used to calculate the depression score are the items numbered {3,5,10,13,16,17,21,24,26,31,34,37,38,42 }. In the shorter DASS-21 scale, there were 7 items used to calculate depression scores; they are the items 3,13,16,17,24,26, 34.

TABLE A2 DASS-42 sublevel and project numbering

Sub-scale	Goods numbering
		Depression and depression	3、5、10、13、16、17、21、24、26、31、34、37、38、42
Anxiety disorder	2、4、7、9、15、19、20、23、25、28、30、36、40、41
		Pressure of	1、6、8、11、12、14、18、22、27、29、32、33、37、39

TABLE A3 DASS-21 sublevel and item number (from DASS-42 number)

Appendix B: hyper-parametric and optimal project combinations

TABLE B1 Final hyperparameters of Stack classifier

TABLE B2 optimum combinations of articles

Appendix C: precision comparison of different combinations of items

Table C1 accuracy results for Individual models adjusted for severity thresholds, and demographic data

Table C2 accuracy results for the 10 best adjusted individual models for severity threshold, and demographic data

Article combination	ACC score	AUC score	F1 score
				[3、24、26、30、34、40]	0.91988	0.919838	0.919878
[3、16、17、30、34、36]	0.918883	0.91882	0.918878
				[3、13、17、20、27、40]	0.909907	0.909947	0.90991
[11、13、24、26、30、34]	0.919548	0.919493	0.919544
				[13、17、20、22、24、27]	0.920545	0.92057	0.920548
[3、11、17、20、26、36]	0.913896	0.913904	0.913898
				[16、20、24、26、30、34]	0.925864	0.925865	0.925865
[3、11、13、17、24、26]	0.922872	0.922852	0.922872
				[16、18、24、30、34、36]	0.920213	0.920175	0.92021
[3、13、24、27、30、34]	0.917886	0.917835	0.917882

Finally, the above embodiments are only intended to illustrate the technical solutions of the present invention and not to limit the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions, and all of them should be covered by the claims of the present invention.

Claims

1. A machine learning based depression rating system characterized by: the system comprises a data acquisition module, a data preprocessing module, a feature selection module, a machine learning module and an execution module;

the characteristic selection module selects a minimum Redundancy maximum correlation method (Max-Relevance and Min-Redundancy, MRMR) for the data obtained by preprocessing, selects the most relevant characteristics according to the pairwise correlation or mutual information of each pair of variables in the data set, minimizes the Redundancy among the variables and judges which problems in the examination scale have higher weight for predicting the depression risk;

a machine learning module that predicts hypo-depressive and hyper-depressive levels from different combinations of topics determined in feature selection; evaluating the performance of the model to obtain a model based on several questions to determine how many questions are needed to achieve an Area Under the Curve (AUC) of the Receiver Operating characteristics Curve (ROC) of 90% of the Receiver Operating characteristics Curve on the retained dataset; machine learning training is only completed on the training data set, and internal verification is performed on the test data set; after obtaining one model, recombining the training and testing data sets, and segmenting again in the same way, training and testing another model; a total of 100 data reorganizations, partitions, training and testing were performed; optimization of hyper-parameters to obtain an optimal model: selecting a model with the best performance according to the AUC ROC score and the F1 score evaluated by the original verification data set, namely the model with the highest AUC and F1 values; the best hyper-parameters of each model, namely the parameters with the highest AUC and F1 values are obtained through grid search: training each iteration using a different combination of hyper-parameter values to determine which combination is optimal; finally, selecting the best model to evaluate the depression grade;

and the execution module is used for reserving the obtained optimal model, establishing a website application program prototype and executing the model, and checking the accuracy of the model when the model is classified for individual data which are not used for training and testing, namely measuring the model performance when a real user is evaluated.

2. A machine learning based depression rating system according to claim 1, wherein: the feature selection comprises an unsupervised technology MRMR and a supervised feature selection method;

3. A machine learning based depression rating system according to claim 1, wherein: the testing method comprises logistic regression LR, Gaussian naive Bayes GNB, support vector machine SVM, random forest RF, multilayer perceptron neural network MLP, extreme gradient boosting decision tree XGboost and stacking generalization integration Ensemble.

4. A machine learning based depression rating system according to claim 1, wherein: the loss function used for the machine learning training is a Binary Cross Entropy (BCE) loss;

the machine learning training is implemented in Python3 using its Scikit-learn library and retained in default hyper-parameters and stopping criteria for fair comparison; confirming the relative importance of the project again by using a SHAP method, checking which values of the depression correlation scale have the largest contribution to the overall prediction, and calculating by using a Shapley value formula to obtain the value of 0-1; SHAP is a method of interpreting individual predictions, with the goal of interpreting predictions of instances by computing the contribution of each feature to the prediction; the Shapley value was developed from cooperative game theory, where each cooperative game is assigned a unique distribution of total remainders among the participants resulting from the union of all participants;

the Shapley value is formulated as:

wherein

Is the weight, (v (S @) v (S) is a marginal contribution.

5. The machine learning-based depression rating system of claim 4, wherein: the depression-related scales include the following scales:

beck Depression questionnaire (Beck Depression invent, BDI);

the Depression Self-Rating Scale (Self-Rating Depression Scale, SDS);

depression Status questionnaire (DSI);

depression Adjective Checklist (DACL);

the Depression Experience Questionnaire (DEQ);

the Cognitive deviation Questionnaire (The Cognitive Bias questonaire, CBQ);

automated thinking Questionnaire (The automated roads questoiire, ATQ);

the senile Depression Scale (The Geriatric Depression Scale, GDS);

hamilton Depression Scale (Hamilton Depression Scale for Depression, HAMD);

hospital Anxiety Depression Scale (HAD);

the Edinburgh Postpartum Depression Scale (EPDS);

the childhood Depression scale (Children's Depression Inventory, CDI);

Bech-rafarese-Melancholia symptom Scale (The Bech-rafaresen Melancholia Scale, MRS);

a Depression symptom self rating scale (DSRS);

simpson Depression Rating Scale (Simpson Depression Rating Scale);

the Kellner Depression Rating Scale (Kellner Depression Rating Scale);

the senile Depression scale (the Depression In oldAge scale, DIA-S);

and a scale comprising depression dimensions:

kannell health questionnaire (Cornell Medical Index, CMI)

Symptom self-rating scale (Symptom Checklist 90, SCL-90)

Minnesota multiple Personality test (MMPI, MinnesotaMultipharmaceutical personal Inventory)

Depression-Anxiety-Stress Scale (Depression-Anxiety-Stress Scale)

Quality of life scale (Quality of Life Score, QOE.S)

Mood State scale (Profile of food State, POMS)

BFS mood Scale (the Benefit filing Scale, BFS).

6. A method for machine learning-based depression rating, characterized by: the method comprises the following steps:

s1: collecting data; obtaining data from a globally accessible online survey including answers to all items of the DASS-42 questionnaire, demographic variables of the filling person, and time information taken to fill out the questionnaire;

s4: machine learning; different combinations of topics identified from the feature selection to predict hypo-and hyper-depressive levels; evaluating model performance to obtain a model based on several questions to determine how many questions are needed to achieve a 90% Area Under the Curve (AUC) of the Receiver Operating Characteristic Curve (ROC) on the retained dataset; machine learning training is only completed on the training data set, and internal verification is performed on the test data set; after obtaining one model, recombining the training and testing data sets, and segmenting again in the same way, training and testing another model; a total of 100 data reorganizations, partitions, training and testing were performed; optimization of hyper-parameters to obtain the best model: selecting a model with the best performance according to the AUC ROC score and the F1 score evaluated by the original verification data set, namely the model with the highest AUC and F1 values; obtaining the optimal hyperparameters of each model through grid search, namely parameters with the highest AUC and F1 values: training each iteration using a different combination of hyper-parameter values to determine which combination is optimal; finally, selecting the best model to evaluate the depression grade;

7. The machine learning-based depression rating method of claim 6, wherein: the feature selection specifically comprises: the feature selection comprises an unsupervised technology MRMR and a supervised feature selection method;

the unsupervised technology MRMR is as follows: selecting the most relevant features based on the pairwise correlations or mutual information for each pair of variables in the dataset while minimizing redundancy between the variables;

8. The machine learning-based depression rating method of claim 7, wherein: the testing method comprises logistic regression LR, Gaussian naive Bayes GNB, support vector machine SVM, random forest RF, multilayer perceptron neural network MLP, extreme gradient boosting decision tree XGboost and stacking generalization integration Ensemble.

9. The machine learning-based depression rating method of claim 8, wherein: the loss function used for the machine learning training is a Binary Cross Entropy (BCE) loss;

the Shapley value is formulated as:

wherein

Is the weight, (v (S {) -v (S)), (S)) is the marginal contribution.

10. A machine learning based depression rating method as claimed in claim 9, wherein: the depression-related scales include the following scales:

beck Depression questionnaire (Beck Depression invent, BDI);

depression Self-Rating Scale (Self-Rating Depression Scale, SDS);

depression Status questionnaire (DSI);

depression Adjective Checklist (DACL);

a depression experience Questionnaire (Deq);

the Cognitive deviation Questionnaire (The Cognitive Bias questonaire, CBQ);

automated thinking Questionnaire (The automated roads questoiire, ATQ);

the senile Depression Scale (The Geriatric Depression Scale, GDS);

hamilton Depression Scale (Hamilton Depression Scale for Depression, HAMD);

hospital Anxiety Depression Scale (HAD);

edinburgh Postpartum Depression Scale (EPDS);

the childhood Depression scale (Children's Depression Inventory, CDI);

a Depression symptom self rating scale (DSRS);

simpson Depression Rating Scale (Simpson Depression Rating Scale);

the Kellner Depression Rating Scale (Kellner Depression Rating Scale);

the senile Depression scale (the Depression In oldAge scale, DIA-S);

and a scale comprising depression dimensions:

kannell health questionnaire (Cornell Medical Index, CMI)

Symptom self-rating scale (Symptom Checklist 90, SCL-90)

Minnesota multiple Personality test (MMPI) Depression-Anxiety-Stress Scale (Depression-Anxiety-Stress Scale)

Quality of life scale (Quality of Life Score, QOE.S)

Mood State scale (Profile of food State, POMS)

BFS mood Scale (the Benefit filing Scale, BFS).