CN112037911B

CN112037911B - Screening system for mental assessment based on machine learning and training method thereof

Info

Publication number: CN112037911B
Application number: CN202010908710.XA
Authority: CN
Inventors: 纪俊; 于滨; 杜晓宁; 于淏岿; 冯超南
Original assignee: Beijing Welline Pangu Technology Co ltd
Current assignee: Beijing Welline Pangu Technology Co ltd
Priority date: 2020-08-28
Filing date: 2020-09-02
Publication date: 2024-03-05
Anticipated expiration: 2040-09-02
Also published as: CN112037911A

Abstract

The present disclosure describes a machine learning based mental assessment screening system and a training method thereof, wherein the machine learning based mental assessment screening system training method comprises two steps of training a first classification model and training a second classification model, wherein the training of the first classification model selects first effective sample data from first sample data and selects a plurality of characteristic items as a first characteristic item list, a first characteristic data set is constructed, the first classification model is trained and tested by utilizing the first characteristic data set, and M characteristic items of the first classification model with optimal performance are used as a second characteristic item list; training of the second classification model selecting second valid sample data from the second sample data, constructing a second feature data set and training and testing the second classification model using the second feature data set; and selecting a second classification model with optimal performance and L corresponding characteristic items. According to the present disclosure, the recognition efficiency and accuracy of depression can be improved.

Description

Screening system for mental assessment based on machine learning and training method thereof

Technical Field

The present disclosure relates generally to the field of medical prediction technology, and more particularly to a machine learning based mental assessment screening system and training method thereof.

Background

With the improvement of social development level and productivity, more and more people have psychological or mental problems due to great stress in work, study and life, and the incidence of mental diseases such as depression is high.

In the clinic, medical practitioners often use the affective disorder assessment (affective disorder evaluation, ADE) scale to identify depression in patients. The ADE scale is a standardized interview-based tool and is revised based on the consensus of the relevant specialist. The ADE scale has a very strong correlation with affective disorders and can be used to identify depression. With the aid of the ADE scale, medical workers can learn the condition of patients in detail, including symptoms of each attack, past medical history, age of onset, medication and curative effect, co-illness, personality traits, personal history and family history, etc. By quantifying the score, the method has higher accuracy in classifying depression and non-depression affective disorders, and the sensitivity and specificity can reach 0.9 or more.

However, for the original ADE scale, since the amount of information of the ADE scale is large, it takes a relatively long time for a doctor or the like to perform an evaluation process of affective disorder (for example, a process of recognizing depression) based on the original ADE scale, resulting in limited clinical application thereof. In particular, the ADE scale requires a relatively large amount of information to be known in use, the accuracy of which is closely related to the degree of reliability of the source of patient-related information, and the current severity of the disease of a patient may affect his knowledge of certain symptoms of the patient, resulting in a shift in information. And requires a doctor to read a large amount of ADE scale information in a short period of time, which may also lead to a decrease in recognition accuracy.

Disclosure of Invention

The present disclosure has been made in view of the above-described state of the art, and an object thereof is to provide a screening system and a training method thereof capable of improving recognition efficiency and accuracy of depression based on mental assessment by machine learning.

To this end, a first aspect of the present disclosure provides a training method of a screening system based on a mental assessment of machine learning, comprising two steps of training a first classification model and training a second classification model, the training of the first classification model comprising the steps of: obtaining first sample data from ADE scale samples of a plurality of evaluation objects, the first sample data including the plurality of evaluation objects and feature items and feature data corresponding to the respective evaluation objects; selecting sample data with an explicit depression classification label, signed with informed consent and having an age of between 14 and 70 years of the evaluation subject from the first sample data as first valid sample data for modeling; selecting a characteristic item from the current condition part, the depression life part, the manic life part and the psychotic part in the ADE scale from the first effective sample data, using a correlation detection algorithm, retaining a plurality of characteristic items with correlation not smaller than a first preset value, excluding demographic characteristic items from the plurality of characteristic items, calculating a correlation value between the characteristic item related to the course of the disease and a depression classification label by using the correlation detection algorithm, and retaining the characteristic item with correlation not exceeding a second preset value to form a first characteristic item list; selecting one or more feature items from the first list of feature items as different subsets and constructing different first feature data sets based on the respective subsets and the first valid sample data; dividing each first characteristic data set into a training set and a testing set, training a first classification model based on machine learning by using the training set, testing the first classification model by using the testing set to obtain a depression classification result of the testing set, and evaluating the performance of the first classification model corresponding to each first characteristic data set according to the classification result; selecting a first classification model with optimal performance, and taking M characteristic items corresponding to the first classification model with optimal performance as a second characteristic item list, wherein M > =47; the training of the second classification model comprises the following steps: obtaining second sample data from ADE scale samples of a plurality of evaluation objects, the second sample data including a plurality of evaluation objects and the second feature item list and feature data corresponding to each evaluation object, the second sample data being different from the first sample data; selecting sample data with an explicit depression classification label, signed with informed consent and having an age between 14 and 70 years of the evaluation subject from the second sample data as second valid sample data for modeling; selecting one or more feature items from the second list of feature items as different subsets and constructing a different second feature data set based on the respective subsets and the second valid sample data; dividing each second characteristic data set into a training set and a testing set, training a second classification model based on machine learning by using the training set, testing the second classification model by using the testing set to obtain a depression classification result of the testing set, and evaluating the performance of the second classification model corresponding to each second characteristic data set according to the classification result; and selecting the second classification model with optimal performance as L characteristic items corresponding to the classification model for identifying depression and the second classification model with optimal performance, wherein 40< L <50 and L < M. In the present disclosure, machine learning based classification models are trained multiple times with ADE scale samples of different origin to obtain a screening system for identifying depression. In this case, the screening system obtained may contain key problem items required to satisfy the function of the ADE scale, and the ability to identify depression may be comparable to that of the original ADE scale, whereby depression may be identified, and the accuracy and efficiency of depression identification may be improved.

In the training method of the screening system according to the first aspect of the present disclosure, optionally, the first preset value is 30% -40%. Thus, feature items having a small correlation with the depression classification label can be eliminated, and the quality of training data can be improved.

In the training method of the screening system according to the first aspect of the present disclosure, optionally, the demographic characteristic term is name, age, marital status, the disease course-related characteristic term is manic life-onset number, manic life-near-year-onset number, manic life-first-onset age, depression life-onset number, depression life-near-year-onset number, depression life-first-onset age, total number of depression life-onset number, depression life-near 12 month-onset number, depression life-maximum 12 month-onset number, and the second preset value is 80% -90%. Thus, the feature item having an excessive correlation with the depression classification label can be further excluded.

In the training method of the screening system according to the first aspect of the present disclosure, optionally, based on a feature data set corresponding to the first feature item list, weights of feature items in the first feature item list are calculated and feature sequences are formed according to the weights, and one or more feature items with the weights decreasing in order, increasing in order, and starting with the feature item with the largest weight are selected from the feature sequences as different subsets; based on the feature data set corresponding to the second feature item list, calculating the weights of the feature items in the second feature item list, forming a feature sequence according to the weights, and selecting one or more feature items with the weights decreasing in sequence, increasing in sequence and taking the feature item with the largest weight as a starting feature item from the feature sequence as different subsets. In this case, the number of feature item subsets can be reduced, so that the speed of computer operation can be increased.

In the training method of the screening system according to the first aspect of the present disclosure, optionally, based on a feature data set corresponding to the first feature item list, calculating weights of respective feature items of the first feature item list by using a minimum redundancy maximum correlation algorithm, and sorting the weights of the respective feature items from large to small to form the feature sequence; and calculating the weight of each feature item of the second feature item list by using a minimum redundancy maximum correlation algorithm based on the feature data set corresponding to the second feature item list, and sequencing the feature items according to the weight of each feature item from large to small to form the feature sequence. In this case, the feature item list is ordered to form a feature sequence, which can facilitate subsequent training of the classification model based on the weighted feature items.

In the training method of the screening system according to the first aspect of the present disclosure, optionally, the K-fold cross-validation method is used to divide the first feature data sets into K groups, and any one of the K groups is sequentially used as a test set and the remaining K-1 groups are used as training sets for training, where K is greater than or equal to 2; and dividing each second characteristic data set into K groups by using a K-fold cross validation method, and training by taking any one of the K groups as a test set and the rest K-1 groups as training sets, wherein K is more than or equal to 2. In this case, all data in the feature data set can participate in the training of the classification model, whereby the bias can be reduced.

In the training method of the screening system according to the present disclosure, optionally, K is 10. In this case, the duty ratio of the training set in the whole feature data set is moderate, and thus, it is possible to effectively suppress that the classification model is too complex to cause over-fitting or that the classification model is too simple to cause under-fitting.

In the training method of the screening system according to the first aspect of the present disclosure, optionally, the machine learning algorithm for training the training set includes at least one of a random forest algorithm, a support vector machine algorithm, a minimum absolute shrinkage and selection algorithm, a linear discriminant analysis algorithm, and a logistic regression algorithm. In this case, the performance of the classification model obtained based on different machine learning algorithms can be compared, and thus, selection of a classification model with better performance can be facilitated.

In the training method of the screening system according to the first aspect of the present disclosure, optionally, the depressive disorder classification labels are depressive and non-depressive. Thus, both depressive and non-depressive affective disorders can be identified by the screening system.

Additionally, a second aspect of the present disclosure provides a screening system for machine learning based mental assessment characterized by being trained using the training method provided by the first aspect of the present disclosure. In this case, depression can be accurately identified by the screening system and the identification result is output.

According to the present disclosure, a screening system and a training method thereof based on a mental assessment of machine learning, which can improve the recognition efficiency and accuracy of depression, can be provided.

Drawings

The present disclosure will now be explained in further detail by way of example only with reference to the accompanying drawings, in which:

fig. 1 is a schematic view showing an application scenario of a screening system for machine learning-based mental assessment according to an example of the present disclosure.

Fig. 2 is a block diagram illustrating a screening system for machine learning based mental assessment in accordance with examples of the present disclosure.

Fig. 3 is a flowchart illustrating a training method of a screening system based on machine learning mental assessment in accordance with examples of the present disclosure.

Fig. 4 is a flow chart illustrating training of a first classification model in accordance with examples of the present disclosure.

Fig. 5 is a schematic diagram showing an ADE scale to which examples of the present disclosure relate.

Fig. 6 (a) is a schematic diagram showing first sample data table generation related to an example of the present disclosure.

Fig. 6 (b) is a schematic diagram showing first valid sample data table generation related to an example of the present disclosure.

Fig. 6 (c) is a schematic diagram showing first feature data set table generation related to an example of the present disclosure.

Fig. 7 is a schematic diagram showing feature items with numbers in the ADE scale to which the examples of the present disclosure relate.

Fig. 8 is a schematic diagram showing no numbered feature items in the ADE scale to which the examples of the present disclosure relate.

Fig. 9 is a graph showing AUC value comparisons of a first classification model corresponding to five machine learning algorithms according to examples of the present disclosure.

Fig. 10 is a flow chart illustrating training of a second classification model in accordance with examples of the present disclosure.

Detailed Description

Hereinafter, preferred embodiments of the present disclosure will be described in detail with reference to the accompanying drawings. In the following description, the same members are denoted by the same reference numerals, and overlapping description thereof is omitted. In addition, the drawings are schematic, and the ratio of the sizes of the components to each other, the shapes of the components, and the like may be different from actual ones.

It should be noted that the terms "comprises" and "comprising," and any variations thereof, in this disclosure, such as a process, method, system, article, or apparatus that comprises or has a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus, but may include or have other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The present disclosure relates to a screening system and a training method thereof capable of improving recognition efficiency and accuracy of depression based on mental assessment of machine learning. The training method of the screening system based on the mental assessment of machine learning may be sometimes simply referred to as a training method, and the screening system based on the mental assessment of machine learning may be sometimes simply referred to as a screening system.

In some examples, the screening systems related to the present disclosure may be applied to application scenarios of mental assessment such as assessment of depression or screening as shown in fig. 1. Specifically, after acquiring the information input by the user 13, such as an answer to a question item concerning depression (including, for example, a feature item corresponding to the classification model 20 in the screening system 1 described later), the user terminal 11 submits the question item input by the user 13 and the answer thereof to the server 12 via the computer network. The server 12 may perform depression recognition of answers to questions regarding depression using the screening system 1 and may return the recognition result to the user terminal 11. In some examples, the user terminal 11 may display the depression identification result. In some examples, the depression identification result may be an intermediate result in assisting in identifying depression. Here, the problem item may be one in an ADE scale that is often employed in clinical settings.

In some examples, the user terminal 11 may include, but is not limited to, a notebook computer, tablet computer, cell phone, desktop computer, or the like.

In some examples, server 12 may include one or more processors and one or more memories. The processor may include, among other things, a central processing unit, a graphics processing unit, and any other electronic components capable of processing data, capable of executing computer program instructions. The memory may be used to store computer program instructions. In some examples, the screening system 1 to which the present disclosure relates may be stored in memory in the form of computer program instructions. In other examples, the screening system 1 of the present disclosure may be stored in the user terminal 11 in the form of offline computer program instructions and executed by the user terminal 11. In some examples, the server 12 may also be a cloud server.

In some examples, as shown in fig. 2, the screening system 1 may include an input unit 10, a classification model 20, and a display unit 30.

In some examples, the screening system 1 may include an input unit 10. The input unit 10 is used to acquire a question item including a basic question item (a feature item corresponding to the classification model 20 in the screening system 1) and an answer to the question item. In some examples, the screening system 1 may include a classification model 20. The classification model 20 may perform depression recognition on the question item acquired by the input unit 10 and the answer to the question item to obtain a depression classification result. In some examples, the screening system 1 may include a display unit 30. The display unit 30 is used to display the depression classification result. In some examples, the screening system 1 may be utilized to evaluate mental disorders.

In some examples, the classification model 20 of the screening system 1 may be used as an evaluation module of a mental assessment screening tool. The feature items corresponding to the classification model 20 of the screening system 1 may be used as input modules for a mental assessment screening tool.

In the training method of the screening system 1 according to the present disclosure, affective disorders such as depression and non-depression derived from ADE scale mental assessment are learned by machine learning, so that a doctor can be assisted in mental assessment of a patient by the screening system 1. In addition, symptoms may also be graded by the screening system 1. In addition, the training method of the screening system 1 according to the present disclosure can be easily generalized to training of affective disorder scales other than the ADE scale.

Additionally, in some examples, the screening system 1 may be trained multiple times with ADE scale samples of different sources according to different needs to continuously optimize the screening system 1. For example, feature items corresponding to the classification model 20 in the screening system 1 may be simplified by multiple training.

Hereinafter, the screening system 1 and the training method of the screening system 1 will be described in detail with reference to the drawings. Fig. 3 is a flowchart illustrating a training method of a screening system based on machine learning mental assessment in accordance with examples of the present disclosure.

In the disclosed example, the first classification model and the second classification model may be trained to obtain a second classification model that performs optimally, which may be used as classification model 20 to identify depression. As described above, the screening system 1 to which the present disclosure relates may comprise the classification model 20, and the classification model 20 may be obtained by training the first classification model and the second classification model, and thus the training of the screening system 1 to which the present disclosure relates may be regarded as training of the first classification model and the second classification model.

Specifically, as shown in fig. 3, the training method of the screening system 1 may include training of a first classification model (step S10) and training of a second classification model (step S20). A first classification model with optimal performance may be obtained via training of the first classification model, and training the second classification model based on the first classification model with optimal performance may obtain classification model 20 for identifying depression. In this case, the classification model in the screening system may contain key problem items required to satisfy the ADE scale function, and the ability to identify depression may be comparable to the original ADE scale, thereby enabling depression to be identified, and further enabling the accuracy and efficiency of depression identification to be improved.

In some examples, as described above, the training method of the screening system 1 may include training of the first classification model (step S10). In some examples, as shown in fig. 4, the training of the first classification model (step S10) may include the steps of: obtaining first sample data (step S11); selecting first valid sample data (step S12); selecting a plurality of characteristic items (step S13); constructing a first feature data set (step S14); training a first classification model (step S15); a first classification model with optimal performance is selected (step S16). Hereinafter, steps S11 to S16 will be described in detail with reference to the drawings.

In step S11, first sample data may be obtained. In some examples, the first sample data may be obtained from ADE scale samples of a plurality of evaluation objects. As an example, the multiple subjects involved are examined testers of multiple centers participating in the study, including patients with depression and testers not with depression. In the above example, the ADE scale was selected for 255 depressive patients and 588 non-depressive testers from a collaborative medical facility.

Fig. 5 is a schematic diagram showing an ADE scale to which examples of the present disclosure relate. In fig. 5, only the information of the beginning and end portions of the ADE table P is presented. In addition, in some examples, the ADE scale (hereinafter sometimes also simply "scale") of the evaluation object may be one or more of paper or electronic form. The ADE scale is a standardized interview-based tool and is revised based on the consensus of the relevant specialist. The ADE scale may have a very strong correlation with affective disorders and may be used for evaluation of affective disorders such as identification of depression etc.

In some examples, the ADE scale may include assessment items of status, manic life, depression life, circulatory mood disorder, mood disorder and sub-syndrome mood elevation, psychosis, childhood, psychotic use, treatment, mental state, medical, laboratory assessment, family, social and bipolar index.

In some examples, the ADE scale may be a chinese version or an english version. As an example of an ADE scale, fig. 5, for example, shows a schematic diagram of a chinese version ADE scale P showing only the beginning and end parts for convenience of presentation.

In addition, in some examples, the first sample data may include a plurality of evaluation objects and feature items and feature data corresponding to the respective evaluation objects. In some examples, the ADE scale of the evaluation object may be collated by manual collation in combination with a computer method to generate the machine-identifiable first sample data. In this case, the question item and answer in the ADE scale may correspond to the feature item and feature data of the sample data, respectively.

Fig. 6 (a) is a schematic diagram showing first sample data table generation related to an example of the present disclosure. Wherein, the ADE scale samples T11 of the multiple evaluation objects are processed in step S11 to form the first sample data table T12, fig. 6 (b) is a schematic diagram showing the generation of the first valid sample data table according to the example of the present disclosure. The first sample data table T12 is selected in step S12 to form a first valid sample data table T13. Fig. 6 (c) is a schematic diagram showing first feature data set table generation related to an example of the present disclosure. Wherein the first valid sample data table T13 and the subset L1 of the first feature item list form a first feature data set table T14 via step S14.

For example, in some examples, as shown in fig. 6 (a), ADE scale samples T11 of a plurality of evaluation objects may be sorted to form a first sample data table T12. In some examples, the row in the first sample data table T12 may represent feature data of an ADE table of the evaluation object under inspection, the column in the first sample data table T12 may represent feature data corresponding to a certain feature item of the ADE table of the plurality of evaluation objects, and the column ID in the first sample data table T12 may be the number of the evaluation object.

Fig. 7 is a schematic diagram showing feature items with numbers in the ADE scale to which the examples of the present disclosure relate. Fig. 8 is a schematic diagram showing no numbered feature items in the ADE scale to which the examples of the present disclosure relate.

For convenience of presentation, the naming rule of the feature item according to the present embodiment is set as: the names of the feature items with numbers in the ADE scale are generated by using underlined connection in sequence, wherein the underlined connection is formed by the capital letter N, the numbers of the feature items in the scale, the names of the parts of the feature items in the ADE scale and the specific problem item names; the naming of the feature items without numbers in the ADE scale is generated by using underlined connection in sequence, wherein the lower case letter N corresponds to the upper case letter N, the sequence number of the feature items in the scale, the names of the parts of the feature items in the ADE scale and the specific problem item names. For example, as shown in fig. 7, there is a numbered feature item TZ1 in the ADE scale P, and fig. 8, there is no numbered feature item TZ2 in the ADE scale P. In this case, the feature items and feature data of the first sample data can clearly correspond to the question items and answers in the ADE scale, whereby the feature items can be conveniently searched for and the generated sample data can be directly used for machine learning and analysis. Although this embodiment specifically designates naming rules for the above feature items, such rules are merely for convenience of presentation and do not affect the final classification or evaluation result.

In some examples, the feature term may be an overhead question term that evaluates the age of the current condition portion in the ADE scale of the subject, the past two weeks of the current condition portion, the severe mood of the current condition portion, the major depressive episode of the depressive lifelong portion, the age of the first episode of the depressive lifelong portion, whether mania of the mania lifelong portion, and the mania lifelong portion. As shown in table 1, a row in the first sample data may represent feature data of an evaluation object under inspection, and a column in the first sample data may represent feature data corresponding to a certain feature item of a plurality of evaluation objects. In some examples, the first sample data may include a depression classification label (described later) corresponding to an ADE scale of the evaluation subject.

Table 1 partial example of first sample data

In other examples, due to the large amount of information in the ADE scale, the problem items for one or more portions of the ADE scale may be selected to sort into sample data before generating the first sample data.

In some examples, the first sample data is obtained by step S11, followed by step S12. In step S12, first valid sample data may be selected.

In some examples, valid sample data may be selected from the first sample data as the first valid sample data available for modeling. For example, as shown in fig. 6 (b), in some examples, the first sample data table T12 is subjected to sample selection to form a first valid sample data table T13, wherein the first valid sample data table T13 is a first valid sample data schematic table formed assuming that IDs 19, 35, and 51 are selected as valid sample data from the first sample data table T12. In some examples, the number of evaluation objects of the first valid sample data table T13 may be equal to or less than the number of evaluation objects of the first sample data table T12. Therefore, the quality of training data can be improved, and the accuracy of identifying depression by the screening system can be improved.

In some examples, valid sample data selected from the first sample data may be sample data with an explicit depression classification label. The depression classification labels may be both depression and non-depression classification labels. Thus, both depressive and non-depressive affective disorders can be identified by the first classification model. Examples of the present disclosure are not limited thereto, however, for example, a depression classification label may be related to the severity of depression. In some examples, the depression classification label may be the result of consultation of the ADE scale of the assessment subject by a plurality of clinical professionals. In some examples, a depression classification label may be obtained based on clinical diagnostic results.

In some examples, the valid sample data selected from the first sample data may be sample data signed with an informed consent form. Subjects taking part in depression identification need to sign informed consent. Informed consent is a documented demonstration that the subject of evaluation is voluntarily involved in the identification of depression.

In some examples, valid sample data selected from the first sample data may be sample data of an evaluation subject between 14 and 70 years of age. Because of some characteristics of depression itself, early adolescence is the predisposing stage of depression, with the first onset age generally ranging from 15 to 25 years. Sample data of the evaluation object aged 14 to 70 years can thus be selected as the first valid sample data. In this case, the age span of the first effective sample is large, and thus, more balanced first effective sample data can be obtained, and thus the generalization ability of the first classification model can be improved.

In some examples, as described above, sample data with an explicit depression classification label, signed informed consent, and an evaluation subject between 14 and 70 years of age may be selected from the first sample data as the first valid sample data available for modeling. For example, for the first sample data shown in table 1 in step S11, the evaluation target with ID 2 has no clear depression classification label, the evaluation target with ID 3 has age 11, and the condition for selecting the sample data in step S12 is not satisfied. In this case, the first valid sample data formed after excluding the sample data with ID 2 and ID 3 in the first sample data table 1 is shown in table 2.

Table 2 partial example of first valid sample data

Additionally, in step S12, in some examples, the first valid sample data available for modeling may be valid and structured sample data screened from the first sample data, which may be used for a training dataset of the screening system 1.

In other examples, prior to selecting the first valid sample data, sample data corresponding to an evaluation subject having depression due to other diseases, such as severe somatic disease, an evaluation subject having seizure history, may be excluded from the first sample data.

In some examples, step S13 is performed next after the first valid sample data is obtained through step S12. In step S13, a plurality of feature items may be selected. I.e. a number of feature items may be selected from the first valid sample data.

In some examples, feature items that have low or too high relevance to the depression classification label may be excluded. Thereby, the data quality of the feature data set constructed based on the selected feature items can be improved and the excessive interpretation of the classification result can be suppressed.

In some examples, a plurality of feature items from the current condition portion, the depressed life portion, the manic life portion, the psychotic portion of the ADE scale may be selected from the first valid sample data. In some examples, the characteristic term of the current status section may be a question term that evaluates the recent condition of the subject, e.g., the last two weeks, may include the current condition of taking the medication, the duration of taking the medication, etc. In some examples, the feature term of the depression lifetime part may be a question term related to evaluating the subject's depression, e.g. when the last depressive episode was, what the appetite was, etc. In some examples, the feature term of the manic lifetime part may be a question term related to assessing the subject for mood abnormalities, e.g. whether you feel so good or so excited that others think you are not normal themselves, whether you are bothersome because of being too excited, etc. In some examples, the characteristic term of the psychotic portion may be a problem associated with a mental disorder, such as whether an illusion has occurred, or the like.

As described above, some examples of the first valid sample data are shown, but the present embodiment is not limited thereto. In some examples, a plurality of feature items corresponding to portions of the ADE table may be selected from the first valid sample data. For example, a plurality of feature items from the current status portion, depression lifetime portion, medical history portion, etc. in the ADE scale may be selected from the first valid sample data.

Additionally, in some examples, a plurality of feature items having a correlation not less than a first preset value may be selected from the first valid sample data. In some examples, the implementation of a relevance detection algorithm between feature items may be performed using the R language. The analysis of the correlation detection algorithm refers to the analysis of two or more variable elements with correlation, so as to measure the correlation degree of two variable factors. In some examples, the first preset value may be 30% -40%. For example, the first preset value may be 30%, 32%, 33%, 35%, 38%, 40%, or the like. Thus, feature items having a small correlation with the classification label can be eliminated, and the quality of training data can be improved. However, the present embodiment is not limited thereto, and the first preset value may be any value according to specific needs.

In addition, in some examples, several feature items may be excluded from the plurality of feature items to form the first list of feature items. For example, the number of feature items excluded may include demographic feature items, feature items related to the course of the disease, feature items related to the depression classification label exceeding a second preset value, and so forth.

In some examples, as described above, the number of feature items excluded may include demographic feature items. In some examples, the demographic characteristics items may include name, age, marital status, and the like.

In some examples, as described above, the number of feature items that are excluded may also include feature items that are related to the course of the disease. In some examples, the characteristic term related to the course of the disease may include manic life-life onset, manic life-near year onset, manic life-first onset age, depressive life-life onset, depressive life-near year onset, depressive life-first onset age, depressive life-total number of life-life attacks, depressive life-near 12 month onset, depressive life-up to 12 month onset, etc.

In some examples, "manic lifelong number of episodes" may refer to the feature term originating from the manic lifelong portion of the ADE table and the name of the feature term is called "lifelong number of episodes". The feature items such as the number of times of manic lifelong-near one year attacks, the age of manic lifelong-first attacks, the number of times of depression lifelong-lifelong attacks, the number of times of depression lifelong-near one year attacks, the age of depression lifelong-first attacks, the total number of times of depression lifelong-lifelong attacks, the number of times of depression lifelong-near 12 months attacks, the number of times of depression lifelong-up to 12 months attacks, etc. may be similar to the feature item "the number of times of manic lifelong-lifelong attacks" and are not repeated here.

In some examples, as described above, the number of feature items excluded may also include feature items that have a relevance to the depression classification tag that exceeds a second preset value. In some examples, the second preset value may be 80% -90%. For example, the second preset value may be 80%, 82%, 83%, 85%, 88%, 90%, or the like. Thus, the feature item having an excessive correlation with the depression classification label can be further excluded. However, the present embodiment is not limited thereto, and the second preset value may be any value according to specific needs.

In some examples, as described above, excluding a number of feature items from a plurality of feature items may form a first feature item list. For example, 141 feature items from the current condition part, depression life part, mania life part, psychotic part and correlation not less than 35% in the ADE scale may be selected from the first valid sample data, and the remaining 113 feature items after 28 feature items belonging to the demographic feature items, the feature items related to the course of the disease and the feature items related to the depression classification tag correlation exceeding 85% are removed as the first feature item list.

In some examples, step S14 is performed next after the first feature item list is obtained through step S13. In step S14, a first feature data set may be constructed.

In some examples, one or more feature items may be selected from the first list of feature items as the subset. In some examples, a plurality of different subsets may be generated based on the first list of feature items. In some examples, the set of multiple different subsets may contain all of the feature items contained in the first list of feature items. For example, assume that a first feature item list formed by selecting a plurality of feature items from the first valid sample data shown in table 2 is: n13_present_past two weeks low, n65_present_severe mood, and n30_manic lifelong_overhead, then the first list of feature items may correspond to all subsets as shown in table 3.

Table 3 subset of the first list of feature items

In some examples, the subset of the first list of feature items may be one or more of all subsets of the first list of feature items.

Additionally, in some examples, a different first feature data set may be constructed based on the respective subsets and the first valid sample data. For example, in some examples, as shown in fig. 6 (c), the first feature data set table T14 is constructed from the first valid sample data table T13 and the subset L1 of the first feature item list. In this case, assuming that the feature items in the subset L1 of the first feature item list are the feature item 21, the feature item 33, and the feature item 69, sample data corresponding to the feature items in the subset L1 of the first feature item list may be selected from the first valid sample data table T13 and the first feature data set table T14 may be generated. In some examples, a plurality of different first feature data sets may be respectively constructed based on a subset of the plurality of different first feature item lists, and fig. 6 (c) is merely a diagram illustrating how the first feature data sets are constructed based on the subset of the first feature item list and the first valid sample.

For example, a first feature data set constructed based on the first valid sample data table shown in table 2 and the subset 4 of the first feature item list shown in table 3 described above is shown in table 4, and may include feature data of two feature items of n13_present_past two weeks low and n65_present_bad mood in the subset 4, and a depression classification label.

Table 4 partial examples of the first feature dataset constructed by subset 4

ID	N13_present_past two weeks low drop	N65_present_bad mood	Depression classification label
				1	3	Is that	Normal state
4	1	Is that	Depression of
				...	...	...	...

In some examples, the number of entries of the corresponding feature items of the first feature data set table may be no greater than the number of entries of the corresponding feature items of the first valid sample data, and the number of evaluation objects may be no greater than the number of evaluation objects of the first valid sample data.

As described above, in some examples, the subset of the first list of feature items may be one or more of all subsets of the first list of feature items. In some examples, the first list of feature items may be ordered first to regenerate the subset. This can reduce the number of subsets.

In some examples, based on the feature data set corresponding to the first feature item list, weights of feature items in the first feature item list may be calculated and feature sequences may be formed according to the weights, wherein feature items in the feature sequences are ordered, may be ordered from large to small, or may be ordered from small to large. In some examples, the feature data set may be a data set generated based on the first valid sample data and the first list of feature items. Specifically, feature data corresponding to feature items of the first feature item list may be obtained based on the first valid sample data to generate a feature data set. In some examples, the weights of the respective feature items may be calculated from feature data corresponding to the respective feature items in the feature data set.

In some examples, the respective weights of the feature items in the first feature item list may be calculated using a minimum redundancy maximum correlation algorithm (mRMR, min-Redundancy and Max-release) based on the feature data set corresponding to the first feature item list, and sorted from large to small by the weight of the respective feature items to form the feature sequence. In this case, the feature item list is ordered to form a feature sequence, which can facilitate subsequent training of the first classification model based on the weighted feature items. In some examples, the feature sequence is as shown in table 5.

TABLE 5 partial examples of feature sequences

In some examples, one or more of the feature items with successively decreasing weights, successively increasing numbers, and starting with the feature item with the greatest weight may be selected from the feature sequence as different subsets. In some examples, the sequence numbers in the feature sequence corresponding to the feature items contained in each subset may be consecutive. In this case, the number of the obtained subsets of the first feature item list is the same as the number of the feature items in the first feature item list, which is advantageous to reduce the number of the subsets generated based on the first feature item list, and further reduce the training times of the first classification model, thereby enabling to increase the speed of the computer operation.

In some examples, a subset of the first list of feature items may be generated using a sequence forward selection algorithm. For example, taking the first three feature terms n10_mania lifelong_euphoria, n16_present_past two weeks of interest decline, n12_mania lifelong_irritability in the feature sequence shown in table 5 as an example, three subsets generated using the sequence forward algorithm are as follows:

subset G1: n10_mania lifelong_euphoria;

subset G2: n10_mania lifelong_euphoria, n16_present_past two weeks of reduced interest;

subset G3: n10_manic lifelong_euphoria, n16_present_past two weeks of reduced interest, n12_manic lifelong_irritability.

The feature items with the largest weight may be used as the initial feature item, and the feature items may be selected to generate the subsets (e.g., the subsets G1 to G3) in order of decreasing weight, and the number of feature items included in the subsets G1 to G3 may be sequentially increased. The subset G1 is composed of one feature item with the largest weight, the subset G2 is composed of two feature items of the feature item with the largest weight and the feature item with the second largest weight, and the subset G3 is composed of three feature items of the feature item with the largest weight, the feature item with the second largest weight and the feature item with the third largest weight. Similarly, assuming that there are w feature items in the first feature item list, the last subset Gw of the first feature item list is equal to the feature sequence. In some examples, the number of subsets of the first list of feature items may be no greater than the number of items of the first list of feature items. In addition, the present embodiment is not limited thereto, and the subset of the first feature item list may be generated using a sequential backward selection algorithm, a bidirectional search algorithm, or the like according to different requirements.

In some examples, step S15 may be performed next to building a plurality of first feature data sets through step S14. In step S15, the first classification model may be trained.

In some examples, each of the first feature data sets in step S14 may be divided into a training set and a test set. In some examples, the first feature data set may be partitioned into K groups using a K-fold cross-validation method, and training may be performed with any one of the K groups sequentially as a test set and the remaining K-1 groups as training sets, where K is greater than and equal to 2. In some examples, an average of the results of the K tests may be used as a performance indicator for the first classification model. In this case, all data in the feature data set can participate in the training of the first classification model, whereby the bias can be reduced.

In some examples, K may be 10. For example, the first feature data set may be randomly shuffled, the data set is divided into 10 data subsets using a 10-fold cross-validation method, 1 of the data subsets are sequentially taken at a time as a test set, and the remaining 9 data subsets are training sets. In this case, the duty ratio of the training set in the whole feature data set is moderate, and thus, the first classification model can be effectively suppressed from being too complex to cause over-fitting or too simple to cause under-fitting.

In some examples, the first classification model based on machine learning may be trained using a training set. In some examples, the machine learning algorithm trained with the training set may include at least one of a Random Forest algorithm (Random Forest), a support vector machine algorithm (Support Vector Machine), a minimum absolute shrinkage and selection algorithm (Least absolute shrinkage and selection operator), a linear discriminant analysis algorithm (Linear Discriminant Analysis), and a logistic regression algorithm (Logistic Regression). In this case, the performance of the first classification model obtained based on different machine learning algorithms can be compared, and thus, it can be convenient to select a first classification model having better performance. In some examples, the machine learning algorithm employed by the first classification model is a random forest algorithm.

In some examples, the first classification model may be tested with a test set to obtain depression classification results for the test set. In some examples, the depressive disorder classification result may be both depressive and non-depressive. In some examples, the depressive disorder classification results may be in one-to-one correspondence with depressive disorder classification labels. In some examples, performance of the first classification model corresponding to each of the first feature data sets may be evaluated based on the classification results. The performance of the first classification model according to the present embodiment can be evaluated by sensitivity (sensitivity), specificity (specificity), and AUC (Area Under the subject's work characteristic Curve) values obtained from the sensitivity and specificity. Wherein AUC (Area Under subject work characteristic Curve) is Area Under ROC Curve, ROC Curve (Receiver Operating Characteristic Curve, subject work characteristic Curve) is obtained by calculating sensitivity and 1-specificity respectively according to data measured by continuous grouping with sensitivity as ordinate and 1-specificity as abscissa, and connecting given points into a line. In general, the greater the AUC value of the first classification model, the better the performance.

In some examples, performance of the first classification model corresponding to each of the first feature data sets may be obtained by performing steps S10 through S15, followed by performing step S16. In step S16, a first classification model with optimal performance may be selected. In some examples, M feature items corresponding to the first classification model with optimal performance may be used as the second feature item list. Thus, the second feature list can be obtained by training of the first classification model. In some examples, M > =47.

Specifically, the first classification model with the optimal performance can be selected by comparing one or more of sensitivity, specificity and AUC values of the first classification model corresponding to each first characteristic data set. In addition, the feature item corresponding to the first classification model with the optimal performance can be used as the second feature item list. In general, the larger the AUC value, the better the classification performance of the first classification model.

Fig. 9 is a graph showing AUC value comparisons of a first classification model corresponding to five machine learning algorithms according to examples of the present disclosure. For example, as shown in fig. 9, line a is the AUC distribution of the first classification model trained based on the random forest algorithm. Line B is the AUC distribution trained based on the first classification model of the support vector machine algorithm. Line C is the AUC distribution trained based on the first classification model of the minimum absolute shrinkage and selection algorithm. Line D is the AUC distribution trained based on the first classification model of the linear discriminant analysis algorithm. Line E is the AUC distribution trained based on the first classification model of the logistic regression algorithm.

The training data for the AUC distribution shown in fig. 9 is the ADE scale from 255 patients with depression and 588 patients with non-depression in the co-operating medical institution. In the case where the first preset value is set to 35% and the second preset value is set to 85%, the remaining 113 feature items are used as the first feature item list via the feature item selection and exclusion feature item processing of step S13. As can be seen from fig. 9, the AUC values of the first classification model based on the random forest algorithm are distributed at the top in fig. 9, so that the first classification model trained by the random forest algorithm is superior to the other models.

In some examples, the first classification model performance of the random forest algorithm reaches an optimized extremum when 47 items are included in the subset of the first list of feature items. In some examples, when 47 items are included in the subset of the first list of feature items, the sensitivity is 0.8587, the specificity is 0.9558, and the AUC value is 0.9676. In this case, after training the first classification model, all the feature items of the ADE scale can be reduced to 47 feature items and the performance of the obtained first classification model with optimal performance is not significantly reduced. In some examples, the AUC value of the first classification model that performs optimally is as high as 90% or more.

In other examples, step S14 and step S15 may also be performed in a cross-over manner in the training of the first classification model described above. For example, in some examples, step S14 and step S15 may be performed in a cross-manner in which step S14 is performed first, followed by step S15; then, the step S14 is continuously executed, and the step S15 is then executed; finally, step S14 and step S15 are executed again, and the cycle is repeated a plurality of times. Specifically, step S14 is performed to select a first subset and construct a first feature data set, and then step S15 is performed to train the first classification model to obtain the performance of the first classification model based on the first execution of the first feature data set; step S14 is performed again to select a second subset and construct a second feature data set; then, step S15 is performed again to train the first classification model to obtain the performance of the second-performed first classification model based on the second feature data set, and steps S14 and S15 are performed in a cross-loop in this manner until all feature data sets are traversed.

In some examples, a performance threshold may be set during training of the first classification model, and generating the subset of the first list of feature items is stopped when the performance of the first classification model is greater than or equal to the performance threshold. This can further reduce the number of subsets in the first feature item list, and can increase the speed of computer operation.

In some examples, as described above, the training method of the screening system 1 may include training of the second classification model (step S20) (see fig. 3). As shown in fig. 10, the training of the second classification model (step S20) may include the steps of: obtaining second sample data (step S21); selecting second valid sample data (step S22); constructing a second feature data set (step S23); training a second classification model (step S24); the second classification model with the best performance is selected (step S25). Hereinafter, steps S21 to S25 will be described in detail with reference to the drawings.

In step S21, second sample data may be obtained. In some examples, the second sample data may be obtained from ADE scale samples of a plurality of evaluation objects. In some examples, the second sample data may include a plurality of evaluation objects and a second list of feature items and feature data corresponding to each evaluation object. In some examples, the second sample data may be different from the first sample data.

In some examples, the ADE scale sample in step S21 may be of a different source than the ADE scale sample in step S11. For example, the ADE scale sample in step S21 may be an ADE scale sample of a different region from that of step S11. Under the condition, the second classification model is trained based on different ADE scale samples, so that the problem that a predicted classification result cannot be generalized to more data to be classified due to insufficient sample data or insufficient sample data in the training of the first classification model can be solved, and the generalization capability of a screening system can be improved. The acquisition of the second sample data in step S21 may be specifically referred to as step S11 described above.

In step S22, second valid sample data may be selected. In some examples, sample data may be selected from the second sample data as second valid sample data available for modeling. In some examples, the sample data selected may be sample data with an explicit depression classification label, signed informed consent, and evaluated for subjects between 14 and 70 years of age. The selection of the second valid sample data in step S22 may be specifically referred to above in step S12.

In step S23, a second feature data set may be constructed. In some examples, one or more feature items may be selected from the second list of feature items as different subsets. In some examples, the second list of feature items may be obtained based on the first classification model. For example, the feature item corresponding to the first classification model with the optimal performance may be used as the second feature item list. In some examples, weights for feature items in the second list of feature items may be calculated based on a feature data set corresponding to the second list of feature items and a feature sequence may be formed from the respective weights. In some examples, one or more of the feature items with successively decreasing weights, successively increasing numbers, and starting with the feature item with the greatest weight may be selected from the feature sequence as different subsets. In some examples, the respective weights of the feature items in the second feature item list may be calculated using a minimum redundancy maximum correlation algorithm based on the feature data set corresponding to the second feature item list, and sorted from large to small by the weight of the respective feature items to form the feature sequence. In some examples, a different second feature data set may be constructed based on the respective subset and the second valid sample data. The construction of the second feature data set in step S23 may be seen in particular in step S14.

In step S24, a second classification model may be trained. In some examples, each of the second feature data sets in step S23 may be divided into a training set and a test set. In some examples, each second feature data set may be partitioned into K groups using a K-fold cross-validation method, and training may be performed with any one of the K groups sequentially as a test set and the remaining K-1 groups as training sets, where K is greater than and equal to 2. In some examples, K may be 10. In some examples, the second classification model based on machine learning may be trained using a training set. In some examples, the second classification model may be tested using a test set to obtain a depression classification result for the test set, and performance of the second classification model corresponding to each of the second feature data sets is assessed based on the classification result. In some examples, the depressive disorder classification result may be depressive and non-depressive.

In some examples, the machine-based learning algorithm may include at least one of a Random Forest algorithm (Random Forest), a support vector machine algorithm (Support Vector Machine), a minimum absolute shrinkage and selection algorithm (Least absolute shrinkage and selection operator), a linear discriminant analysis algorithm (Linear Discriminant Analysis), and a logistic regression algorithm (Logistic Regression). The training of the second classification model in step S24 may specifically refer to step S15, which is not described herein.

In step S25, a second classification model with optimal performance may be selected. In some examples, the second classification model with optimal performance may be used as classification model 20 to identify depression. In some examples, in step S25, L feature items corresponding to the second classification model with optimal performance may be obtained. In this case, the corresponding feature items of the classification model 20 may be L feature items corresponding to the second classification model that is optimal in performance.

In some examples, the screening system 1 including the classification model 20 may be obtained via the training method of the present disclosure. In the screening system 1, L feature items corresponding to the classification model 20 may be taken as basic question items, and a question item including the basic question items and an answer to the question item may be taken as an input to the classification model 20. In some examples, the question terms corresponding to the L feature terms corresponding to the classification model 20 may be used as key question terms required to satisfy the ADE scale to identify depression. That is, the ADE scale may be simplified based on L feature items and may be comparable to the identification capabilities of the original ADE scale.

In some examples, the number of terms of the feature item corresponding to the second classification model that is optimal in performance, obtained based on different ADE scales, may also be different. In some examples, the number of terms of the feature item corresponding to the second classification model that is optimal in performance may be less than the number of terms of the feature item corresponding to the first classification model that is optimal in performance. In some examples, 40< L <50, and L < M, where M is the number of terms of the feature term corresponding to the first classification model that performs optimally.

In some examples, 47 feature items obtained from ADE scales of 201 depressed patients and 119 non-depressed patients from a collaborative medical facility different from the training of the first classification model and using the training of the first classification model described above were selected as the second feature item list. When the subset of the second feature item list contains 43 feature items, the second classification model performance reaches the first extreme point, and the sensitivity, specificity, and AUC values can all approach the corresponding values of the second classification model based on 47 feature items. It follows that the performance of the second classification model is optimal when the subset of the second list of feature items contains 43 items. In this case, the number of the corresponding feature items of the second classification model with the optimal performance is reduced by 4 items from that of the corresponding feature items of the first classification model with the optimal performance, but the recognition ability for depression is not significantly reduced.

In other examples, step S23 and step S24 may also be performed cross-wise in the training of the second classification model described above. For example, in some examples, step S23 and step S24 may be performed alternately in such a way that step S23 is performed first, followed by step S24; then, the step S23 is continuously executed, and the step S24 is then executed; finally, step S23 and step S24 are executed again, and the cycle is repeated a plurality of times. Specifically, step S23 is performed to select the first subset and construct a feature data set; step S24 is executed to perform classification model training to obtain the performance of the first executed second classification model; step S23 is performed again to select a second subset and construct a second feature data set; step S24 is then performed again to train the second classification model to obtain the performance of the second classification model performed a second time, in this way step S23 and step S24 are performed in a cross-loop until all feature data sets are traversed.

In other examples, the screening system 1 may be trained multiple times, constantly optimizing the screening system 1. Thus, a screening system with better performance can be obtained. For example, L feature items corresponding to the classification model 20 of the screening system 1 may be used as the third feature item list, and the screening system 1 may be trained based on the third feature item list.

In the present disclosure, machine learning based classification models are trained multiple times with ADE scale samples of different origin to obtain a screening system for identifying depression. In this case, the screening system obtained may contain key problem items required to satisfy the function of the ADE scale, and the ability to identify depression may be comparable to that of the original ADE scale, whereby depression may be identified, and the accuracy and efficiency of depression identification may be improved.

While the disclosure has been described in detail in connection with the drawings and examples, it is to be understood that the foregoing description is not intended to limit the disclosure in any way. Modifications and variations of the present disclosure may be made as desired by those skilled in the art without departing from the true spirit and scope of the disclosure, and such modifications and variations fall within the scope of the disclosure.

Claims

1. A training method of a screening system based on machine learning mental assessment is characterized by comprising two steps of training a first classification model and training a second classification model,

the training of the first classification model comprises the steps of:

obtaining first sample data from ADE scale samples of a plurality of evaluation objects, the first sample data including the plurality of evaluation objects and feature items and feature data corresponding to the respective evaluation objects;

selecting sample data with an explicit depression classification label, signed with informed consent and having an age of between 14 and 70 years of the evaluation subject from the first sample data as first valid sample data for modeling;

selecting a characteristic item from the current condition part, the depression life part, the manic life part and the psychotic part in the ADE scale from the first effective sample data, using a correlation detection algorithm, retaining a plurality of characteristic items with correlation not smaller than a first preset value, excluding demographic characteristic items from the plurality of characteristic items, calculating a correlation value between the characteristic item related to the course of the disease and a depression classification label by using the correlation detection algorithm, and retaining the characteristic item with correlation not exceeding a second preset value to form a first characteristic item list;

Selecting one or more feature items from the first list of feature items as different subsets and constructing different first feature data sets based on the respective subsets and the first valid sample data;

dividing each first characteristic data set into a training set and a testing set, training a first classification model based on machine learning by using the training set, testing the first classification model by using the testing set to obtain a depression classification result of the testing set, and evaluating the performance of the first classification model corresponding to each first characteristic data set according to the classification result; and is also provided with

Selecting a first classification model with optimal performance, and taking M characteristic items corresponding to the first classification model with optimal performance as a second characteristic item list, wherein M > =47;

the training of the second classification model comprises the following steps:

obtaining second sample data from ADE scale samples of a plurality of evaluation objects, the second sample data including a plurality of evaluation objects and the second feature item list and feature data corresponding to each evaluation object, the second sample data being different from the first sample data;

selecting sample data with an explicit depression classification label, signed with informed consent and having an age between 14 and 70 years of the evaluation subject from the second sample data as second valid sample data for modeling;

Selecting one or more feature items from the second list of feature items as different subsets and constructing a different second feature data set based on the respective subsets and the second valid sample data;

dividing each second characteristic data set into a training set and a testing set, training a second classification model based on machine learning by using the training set, testing the second classification model by using the testing set to obtain a depression classification result of the testing set, and evaluating the performance of the second classification model corresponding to each second characteristic data set according to the classification result; and is also provided with

And selecting the second classification model with the optimal performance as L characteristic items corresponding to the classification model for identifying the depression and the second classification model with the optimal performance, wherein L < 40< 50 and L < M.

2. Training method according to claim 1, characterized in that:

the first preset value is 30% -40%.

3. Training method according to claim 1 or 2, characterized in that:

the demographic characteristic term is name, age, marital status, the disease course related characteristic term is manic life-life number of episodes, manic life-near one year number of episodes, manic life-first time age of episodes, depressed life-life number of episodes, depressed life-near one year number of episodes, depressed life-first time age of episodes, total depressed life-life number of episodes, depressed life-near 12 months number of episodes, depressed life-maximum 12 months number of episodes, the second preset value is 80% -90%.

4. The training method of claim 1, wherein,

calculating weights of feature items in the first feature item list based on a feature data set corresponding to the first feature item list, forming a feature sequence according to each weight, and selecting one or more feature items with the weights decreasing in sequence, increasing in sequence and taking the feature item with the largest weight as a starting feature item from the feature sequence as different subsets;

based on the feature data set corresponding to the second feature item list, calculating the weights of the feature items in the second feature item list, forming a feature sequence according to the weights, and selecting one or more feature items with the weights decreasing in sequence, increasing in sequence and taking the feature item with the largest weight as a starting feature item from the feature sequence as different subsets.

5. The training method of claim 4, wherein:

calculating weights of all feature items of the first feature item list by using a minimum redundancy maximum correlation algorithm based on a feature data set corresponding to the first feature item list, and sequencing the feature items according to the weights of all feature items from large to small to form the feature sequence;

And calculating the weight of each feature item of the second feature item list by using a minimum redundancy maximum correlation algorithm based on the feature data set corresponding to the second feature item list, and sequencing the feature items according to the weight of each feature item from large to small to form the feature sequence.

6. Training method according to claim 1, characterized in that:

dividing each first characteristic data set into K groups by using a K-fold cross validation method, and training by taking any one of the K groups as a test set and the rest K-1 groups as training sets in sequence, wherein K is more than or equal to 2;

and dividing each second characteristic data set into K groups by using a K-fold cross validation method, and training by taking any one of the K groups as a test set and the rest K-1 groups as training sets, wherein K is more than or equal to 2.

7. The training method of claim 6, wherein:

k is 10.

8. Training method according to claim 1, characterized in that:

the machine learning algorithm for training the training set comprises at least one of a random forest algorithm, a support vector machine algorithm, a minimum absolute shrinkage and selection algorithm, a linear discriminant analysis algorithm and a logistic regression algorithm.

9. Training method according to claim 1, characterized in that:

the depression classification labels are depression and non-depression.

10. A machine learning based mental assessment screening system comprising:

trained by the training method of any one of claims 1 to 9.