CN112599218A

CN112599218A - Training method and prediction method of drug sensitivity prediction model and related device

Info

Publication number: CN112599218A
Application number: CN202011492075.8A
Authority: CN
Inventors: 王爱兰; 倪海洪; 翟晓庆
Original assignee: Beijing Deep Intelligent Pharma Technology Co ltd
Current assignee: Beijing Deep Intelligent Pharma Technology Co ltd
Priority date: 2020-12-16
Filing date: 2020-12-16
Publication date: 2021-04-02

Abstract

The application provides a training method and a prediction method of a drug sensitivity prediction model and a related device. A plurality of sets of training data sets are generated based on an improved bootstrap sampling method for each training cell line set by sampling a first set number of cell lines from a plurality of cell lines as training cell lines. And performing an important feature screening process on each set of training data set to obtain an important feature set. Counting the times of appearance of the metabolite features in the feature screening process of all rounds, sorting the metabolite feature importance from high to low based on the selected times, and selecting the metabolite features with high importance in a set number, so that the metabolite features to be used are guaranteed to be higher in importance and have higher robustness, and the effectiveness is improved. On the basis, the selected metabolite features are used for constructing a prediction model for the training cell line by using an integration method, and then a new test data cell line is predicted, so that the accuracy of model prediction can be improved.

Description

Training method and prediction method of drug sensitivity prediction model and related device

Technical Field

The present application relates to the field of data processing, and in particular, to a training method, a prediction method, and a related apparatus for a drug sensitivity prediction model.

Background

Tumors are a complex heterogeneous group of diseases, such as patients with tumors of the same pathological type respond differently to anti-tumor drugs. Therefore, the tumor science becomes one of the important fields of precise medical treatment, and the precise medication can achieve better treatment effect and reduce the generation of side effects. One approach to implementing accurate tumor therapy is to transplant tumors into animals, then act the drugs on the animals, and observe the influence of the drugs on the tumor growth in the animals to determine the curative effect. The method is costly, time consuming and has a low success rate. Facing these challenges, human cancer cell lines provide new vehicles for screening candidate drugs for cancer treatment. The cancer cell line cultured by the cell line culture technology can approximately simulate the growth environment of the cancer cells in cancer patients, and the cancer cell line and the cancer cells in the cancer patients have great similarity at various omics levels. Therefore, by analyzing cancer cell line molecular data to predict drug response, the response of the drug in the patient can be predicted.

However, how to predict drug response based on cancer cell line molecular data is problematic.

Disclosure of Invention

In order to solve the above technical problems, embodiments of the present application provide a training method, a prediction method, and a related device for a drug sensitivity prediction model, so as to achieve the purposes of improving the effectiveness of characteristics of metabolites to be used and ensuring the effectiveness of training the prediction model, and the technical scheme is as follows:

a method for training a drug sensitivity prediction model, comprising:

obtaining metabolite features of each of said cell lines in a plurality of cell lines and drug response parameters IC50 of each of said cell lines;

for each of said cell lines, determining a drug response class based on said drug response parameter IC 50;

sampling a first set number of cell lines from a plurality of the cell lines for each constructed cancer cell line drug sensitivity prediction model to serve as training cell lines, and performing a plurality of times of important feature screening processes on each training cell line;

each important characteristic screening process comprises the following steps: inputting the metabolite features of the training cell line into the cancer cell line drug sensitivity prediction model to obtain an important feature set output by the cancer cell line drug sensitivity prediction model;

counting the frequency of the metabolite features appearing in the important feature set output by the cancer cell line drug sensitivity prediction model for multiple times for each metabolite feature of each training cell line, and taking the frequency as the selected frequency;

for each said metabolite feature of each said training cell line, taking a maximum of a plurality of said selected times of said metabolite feature as a target time;

sequencing the target times from large to small to obtain target sequencing results, and taking metabolite features corresponding to the first to mth target times in the target sequencing results as to-be-used metabolite features;

and training a cancer cell line to be trained by using the characteristics of the metabolites to be used and the drug response classes of the cell lines to which the characteristics of the metabolites to be used belong.

Inputting the metabolite features of the training cell line into the cancer cell line drug sensitivity prediction model to obtain an important feature set output by the cancer cell line drug sensitivity prediction model, wherein the important feature set comprises:

normalizing each metabolite feature of the training cell line by using a normalization relation (x-min _ x)/(max _ x-min _ x) to obtain normalized metabolite features;

said x represents the content of said metabolite feature in said cell line, min _ x is the minimum of the content of said metabolite feature in a plurality of said cell lines, max _ x is the maximum of the content of said metabolite feature in a plurality of said cell lines;

and inputting the normalized metabolite features of the training cell line into the cancer cell line drug sensitivity prediction model to obtain an important feature set output by the cancer cell line drug sensitivity prediction model.

Training a cancer cell line drug sensitivity prediction model to be trained by utilizing the characteristics of the metabolites to be used and the drug response classes of the characteristics of the metabolites to be used, and then:

sampling a second set number of cell lines from the plurality of cell lines to be used as test cell lines, and respectively predicting each test cell line for multiple times by using the cancer cell line drug sensitivity prediction model to be trained to obtain a prediction result;

evaluating each prediction result to obtain an evaluation result;

judging whether the cancer cell line drug sensitivity prediction model to be trained meets set requirements or not based on a plurality of evaluation results;

if yes, ending the training;

if not, returning to the step of obtaining the metabolite features of each cell line in the plurality of cell lines and the drug response parameter IC50 of each cell line.

Before the evaluation of each prediction result is performed to obtain an evaluation result, the method further includes:

determining whether an abnormal cell line exists in the training cell lines based on the prediction results of the metabolite features;

if the abnormal cell lines exist, rejecting abnormal cell lines in a plurality of training cell lines, taking the cell lines after rejecting the abnormal cell lines as training cell lines, and returning to execute the step of executing a plurality of times of important feature screening processes for each training cell line;

if not, evaluating each prediction result to obtain an evaluation result;

judging whether the cancer cell line drug sensitivity prediction model to be trained meets set requirements or not based on the evaluation result;

if yes, ending the training;

The determining of the drug response category based on the drug response parameter IC50 comprises:

dividing cell lines with the same drug response parameters IC50 in a plurality of cell lines into a group to obtain cell line groups, and counting the number of the cell lines in each cell line group;

searching a target drug response parameter IC50 from drug response parameters IC50 of a plurality of cell line groups, wherein the difference value between the sum of the numbers of cell lines in the cell line group to which the drug response parameter IC50 which is smaller than the target drug response parameter IC50 belongs and the sum of the numbers of cell lines in the cell line group to which the drug response parameter IC50 which is larger than the target drug response parameter IC50 belongs is within a set threshold range;

taking the target drug response parameter IC50 as a preset drug response parameter IC50 threshold;

judging whether the drug response parameter IC50 is larger than a preset drug response parameter IC50 threshold value;

if yes, determining the drug response type as insensitive;

if not, determining that the drug response category is sensitive.

A method of drug sensitivity prediction comprising:

obtaining the metabolite characteristics of the cell line to be processed;

calling a cancer cell line drug sensitivity prediction model, and processing the metabolite features of the cell line to be processed to obtain a drug response category;

the cancer cell line drug sensitivity prediction model is obtained by training based on any one of the above training methods of the drug sensitivity prediction model.

A training apparatus for a drug sensitivity prediction model, comprising:

an obtaining module, configured to obtain metabolite features of each of the cell lines in a plurality of cell lines and drug response parameters IC50 of each of the cell lines;

a first determination module for determining a drug response class for each of said cell lines based on said drug response parameter IC 50;

the important characteristic screening module is used for sampling a first set number of cell lines from a plurality of the cell lines for each constructed cancer cell line drug sensitivity prediction model to serve as training cell lines, and performing a plurality of times of important characteristic screening processes on each training cell line;

the statistic module is used for counting the frequency of the metabolite features appearing in the important feature set output by the cancer cell line drug sensitivity prediction model for multiple times for each metabolite feature of each training cell line, and the frequency is used as the selected frequency;

a second determining module for setting, for each of the metabolite features of each of the training cell lines, a maximum value of the plurality of the selected times of the metabolite features as a target time;

the third determining module is used for sequencing the target times from large to small to obtain a target sequencing result, and taking the metabolite features corresponding to the first to mth target times in the target sequencing result as the metabolite features to be used;

and the training module is used for training the cancer cell line to be trained on the drug sensitivity prediction model by utilizing the characteristics of the metabolites to be used and the drug response categories of the cell lines to which the characteristics of the metabolites to be used belong.

The important characteristic screening module is specifically used for: normalizing each metabolite feature of the training cell line by using a normalization relation (x-min _ x)/(max _ x-min _ x) to obtain normalized metabolite features;

The device further comprises:

the test module is used for evaluating each prediction result to obtain an evaluation result;

if yes, ending the training;

The test module is further configured to:

before each prediction result is evaluated to obtain an evaluation result, judging whether an abnormal cell line exists in a plurality of training cell lines or not based on a plurality of prediction results of each metabolite feature;

if not, evaluating each prediction result to obtain an evaluation result;

if yes, ending the training;

The first determining module is specifically configured to:

if yes, determining the drug response type as insensitive;

if not, determining that the drug response category is sensitive.

A drug sensitivity prediction device comprising:

the acquisition module is used for acquiring the metabolite characteristics of the cell line to be processed;

the calling module is used for calling a cancer cell line drug sensitivity prediction model and processing the metabolite features of the cell line to be processed to obtain a drug response category;

the cancer cell line drug sensitivity prediction model is obtained by training based on the training method of the drug sensitivity prediction model of any one of claims 1-5.

Compared with the prior art, the beneficial effect of this application is:

in the application, a first set number of cell lines are sampled from a plurality of cell lines to serve as training cell lines, a plurality of times of important feature screening processes are carried out on each training cell line to obtain an important feature set, the times of occurrence of the metabolite features in the important feature set output by the cancer cell line drug sensitivity prediction model for a plurality of times are counted, the metabolite features to be used are selected as the selected times, the metabolite features to be used are selected based on the selected times, the metabolite features to be used are guaranteed to be higher in importance and more in used times, the effectiveness of the metabolite features to be used is improved, on the basis, the effectiveness of the prediction model training is guaranteed, the trained cancer cell line drug sensitivity prediction model is used for predicting the metabolite features corresponding to the cell lines, and the accuracy of prediction can be improved.

In addition, because the metabolite features corresponding to the cell line are end products of various biological processes in the cell and are final reactions of organisms to heredity, pathophysiology and environmental stimuli, the training of the cancer cell line drug sensitivity prediction model by using the metabolite features corresponding to the cell line can improve the training precision of the cancer cell line drug sensitivity prediction model and ensure the prediction accuracy of the trained cancer cell line drug sensitivity prediction model.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.

FIG. 1 is a flow chart of an embodiment 1 of a method for training a drug sensitivity prediction model provided by the present application;

FIG. 2 is a flowchart of an embodiment 1 of a method for training a drug sensitivity prediction model provided by the present application;

FIG. 3 is a flowchart of embodiment 1 of a method for training a drug sensitivity prediction model provided herein;

FIG. 4 is a flow chart of a method for drug sensitivity prediction provided herein;

FIG. 5 is a schematic diagram of a logic structure of a training apparatus for a drug sensitivity prediction model provided in the present application;

fig. 6 is a schematic diagram of a logic structure of a drug sensitivity prediction device provided in the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Currently, most of the existing drug sensitivity research work is focused on the genome, and the clinical biomarkers are mainly single genes or a few genes, for example, the sensitivity prediction of gefitinib drugs for treating lung cancer is through EGFR mutation. However, the etiology of some tumors is not solely due to a single major oncogene, such as nearly half of patients with mutation positive for BRAF (V600E), but ineffective for BRAF inhibitors. In addition, many drugs are still not clinically used as biomarkers for personalized medicine at present. There is therefore an urgent need to develop new methods and techniques that can be used to better predict the response (sensitivity or resistance) of cancer patients to drugs. Based on this background, the inventors found that metabolites are the end products of various biological processes in cells, and are the final responses of organisms to genetic, pathophysiological and environmental stimuli, and function as signal collectors and amplifiers for various upstream vital information including genomes, transcriptomes and proteomes. Metabolome is the omic closest to biological phenotype, and its use as a marker of drug response has advantages not possessed by other omics. Therefore, the inventor provides a training method of a drug sensitivity prediction model based on metabolite features.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

As shown in fig. 1, a flowchart of an embodiment 1 of a method for training a drug sensitivity prediction model provided by the present application includes the following steps:

step S11, obtaining metabolite features of each cell line in a plurality of cell lines and drug response parameters IC50 of each cell line.

In this example, the metabolite profiles of each of the plurality of cell lines and the drug response parameters IC50 of each cell line may be obtained from the CCLE database. For example, if 75 cell lines of data are to be obtained, the metabolite profiles of each of the 75 cell lines, and the drug response parameters IC50 of each of the 75 cell lines, are obtained from the CCLE database.

Among these, metabolite features can be understood as: quantification of metabolites. The drug response parameter IC50, can be understood as: the drug response reached a drug concentration that inhibited absolutely 50%.

Step S12, for each of the cell lines, determining a drug response class based on the drug response parameter IC 50.

In this example, for each of the cell lines, the drug response class may be determined based on a drug response parameter threshold value specified in the prior art. Specifically, comparing the drug response parameter IC50 with a specific drug response parameter threshold in the prior art, and if the drug response parameter IC50 is greater than the specific drug response parameter threshold in the prior art, determining that the drug response category is insensitive; if the drug response parameter IC50 is less than the drug response parameter threshold specified in the prior art, the drug response class is determined to be sensitive.

However, the accuracy of determining the drug response category based on the specific drug response parameter threshold in the prior art is not high. Therefore, in this embodiment, another method for determining a drug response category is provided, which specifically includes:

s121, dividing the cell lines with the same drug response parameters IC50 in the cell lines into a group to obtain a cell line group, and counting the number of the cell lines in each cell line group.

For example, if the metabolite features and the drug response parameters IC50 of 75 cell lines are obtained from the CCLE database, and in 75 cell lines, the drug response parameters IC50 of the cell line a1-a20 are all 1, the drug response parameters IC50 of the cell line a21-a36 are all 2, the drug response parameters IC50 of the cell line a37-a50 are all 3, and the drug response parameters IC50 of the cell line a51-a75 are all 4, the cell line a1-a20 is divided into one group as the cell line group 1; dividing the cell line a21-a36 into one group as a cell line group 2; dividing the cell line a37-a50 into one group as a cell line group 3; the cell lines a51-a75 were divided into one group as cell line group 4.

S122, searching a target drug response parameter IC50 from drug response parameters IC50 of the cell line groups, wherein the difference value between the sum of the numbers of the cell lines in the cell line group to which the drug response parameter IC50 which is not more than the target drug response parameter IC50 belongs and the sum of the numbers of the cell lines in the cell line group to which the drug response parameter IC50 which is more than the target drug response parameter IC50 belongs is within a set threshold range.

Now, for example, still taking the cell line division example described in step S121 to find the target drug response parameter IC50 from the drug response parameters IC50 of a plurality of cell line groups, after obtaining the cell line group 1, the cell line group 2, the cell line group 3, and the cell line group 4, it may be determined that the drug response parameter IC50 of the cell line group 1 is 1, the drug response parameter IC50 of the cell line group 2 is 2, the drug response parameter IC50 of the cell line group 3 is 3, and the drug response parameter IC50 of the cell line group 4 is 4, then it may be determined that the cell line group not greater than the drug response parameter IC50 of 2 includes the cell line groups 1 and 2, and the sum of the cell lines of the cell line groups 1 and 2 is 36, the cell line group greater than the drug response parameter IC50 of 2 includes the cell line groups 3 and 4, and the sum of the cell lines 3 and 4 is 39, and the difference between 39 and 36 is 3, within the set threshold range of 1-10, the drug response parameter IC50 of cell line group 2 can be determined to be the target drug response parameter IC 50.

S123, taking the target drug response parameter IC50 as a preset drug response parameter IC50 threshold value.

S124, judging whether the drug response parameter IC50 is larger than a preset drug response parameter IC50 threshold value.

The larger the drug response parameter IC50, the less sensitive to the drug; conversely, the more sensitive to the drug.

If yes, step S125; if not, go to step S126.

And S125, determining the drug response type as insensitive.

And S126, determining the drug response type as sensitive.

In this embodiment, the cell lines on the same side of the drug response parameter IC50 threshold in the cell lines are divided into one group to obtain a cell line group, the number of the cell lines in each cell line group is counted, the target drug response parameter IC50 is searched in the drug response parameter IC50 of the cell line groups, and the target drug response parameter IC50 is used as a preset drug response parameter IC50 threshold, so that the numbers of the cell lines on both sides of the preset drug response parameter IC50 threshold are balanced as much as possible, and therefore, the drug response categories are classified based on the preset drug response parameter IC50 threshold, the balance of the numbers of the drug sensitive and insensitive cell lines is ensured, and the reliability of the training data is ensured.

Step S13, for each constructed cancer cell line drug sensitivity prediction model, sampling a first set number of cell lines from the plurality of cell lines as training cell lines, and for each training cell line, performing a plurality of important feature screening processes, each important feature screening process including: and inputting the metabolite features of the training cell line into the cancer cell line drug sensitivity prediction model to obtain an important feature set output by the cancer cell line drug sensitivity prediction model.

In this example, a plurality of different cancer cell line drug sensitivity prediction models can be constructed.

A plurality of different cancer cell line drug sensitivity prediction models may include: the system comprises at least any two of a cancer cell line drug sensitivity prediction model based on an ExtraTreesClassiier algorithm, a cancer cell line drug sensitivity prediction model based on a GaussianProcessClassiier algorithm, a cancer cell line drug sensitivity prediction model based on a NuSVC algorithm, a cancer cell line drug sensitivity prediction model based on a RidgeClassifierCV algorithm, a cancer cell line drug sensitivity prediction model based on a GaussianNB algorithm, a cancer cell line drug sensitivity prediction model based on a RaomForestClassiier algorithm and a cancer cell line drug sensitivity prediction model based on an XGBPassifier algorithm.

In this embodiment, a first set number of cell lines may be sampled from a plurality of said cell lines based on a modified bootstrap sampling algorithm. Specifically, a first set number of cell lines are obtained by sampling from a plurality of the cell lines in a non-return sampling manner, and the sampled cell lines are different from one another. For example, one cell line is sampled from 75 cell lines, then one cell line is sampled from the remaining 74 cell lines, …, and one cell line is sampled from the remaining (75-i) cell lines until a first set number of cell lines are sampled.

Wherein the first predetermined number is less than the total number of the plurality of cell lines.

By adopting a non-return sampling mode, the cell lines obtained by sampling can be ensured to be non-repetitive, so that the diversity of training data is improved.

The cancer cell line drug sensitivity prediction model can be understood as follows: a machine learning model for predicting whether a drug is sensitive. The cancer cell line drug sensitivity prediction model can evaluate the importance of the metabolite features to obtain the importance index values of the metabolite features. The importance index value of the metabolite features is used to characterize the importance of the metabolite features in the prediction process. The higher the value of the index of importance of the metabolite feature, the greater the influence of the feature on the prediction result.

The set of important features comprises: and ranking the importance index values of the metabolite features of the training cell line from large to small, wherein the set of features corresponding to the first to nth importance index values in the ranking result is formed, and n is less than the total number of the metabolite features of the training cell line.

In this embodiment, the inputting the metabolite features of the training cell line into the cancer cell line drug sensitivity prediction model to obtain the important feature set output by the cancer cell line drug sensitivity prediction model may include:

s131, normalizing each metabolite feature of the training cell line by using a normalization relation (x-min _ x)/(max _ x-min _ x) to obtain the normalized metabolite feature.

Said x represents the content of said metabolite feature in said cell line, min _ x is the minimum of the content of said metabolite feature in a plurality of said cell lines, and max _ x is the maximum of the content of said metabolite feature in a plurality of said cell lines.

S132, inputting the normalized metabolite features of the training cell line into the cancer cell line drug sensitivity prediction model to obtain an important feature set output by the cancer cell line drug sensitivity prediction model.

In this embodiment, each metabolite feature of the training cell line is normalized by using a normalization relation (x-min _ x)/(max _ x-min _ x) to obtain a normalized metabolite feature, which can improve the operation speed and the efficiency of outputting an important feature set by a cancer cell line drug sensitivity prediction model.

In this embodiment, the inputting the metabolite features of the training cell line into the cancer cell line drug sensitivity prediction model to obtain the important feature set output by the cancer cell line drug sensitivity prediction model may also include:

s133, performing quality control and cleaning on the metabolite features of the training cell line to obtain the pretreated metabolite features, and normalizing each pretreated metabolite feature of the training cell line by using a normalization relational expression (x-min _ x)/(max _ x-min _ x) to obtain the normalized metabolite features.

In the embodiment, the metabolite features are more reliable by performing quality control and cleaning on the metabolite features, and the normalization efficiency and the reliability of training data are improved.

And step S14, counting the frequency of the metabolite features appearing in the important feature set output by the cancer cell line drug sensitivity prediction model for multiple times for each metabolite feature of each training cell line, and taking the frequency as the selected frequency.

Taking an example, for each metabolite feature of each of the training cell lines, counting the number of occurrences of the metabolite feature in the important feature set output by each of the cancer cell line drug sensitivity prediction models for multiple times, as the selected number, for example, 80 cell lines are counted, and 60 of the cell lines are extracted as the training set. Then 50 cell lines were extracted from the training set, i.e., 60 cell lines, using the bootstrap algorithm without being put back, for a total of 100 runs. (1) Model training is carried out on the extracted cell lines by respectively adopting 5 different REFCV algorithms, and characteristics are screened. (2) A total of 100 rounds were performed. (3) The number of times each metabolite feature was selected in 100 cycles of the five algorithms was counted separately.

Step S15, regarding each of the metabolite features of each of the training cell lines, using a maximum value of the plurality of the selected times of the metabolite features as a target time.

Still by way of example in step S14, for each of the metabolite features of each of the training cell lines, the maximum value of the plurality of the number of the selections of the metabolite feature is taken as the target number, for example, if the number y1 of the metabolite feature c1 of the training cell line b1 is greater than the number y2, the number y1 of the selections is taken as the target number of the metabolite feature c1 of the training cell line b 1; if metabolite feature c2 of training cell line b2 was selected less times y3 than y4, then y4 was selected as the target number of metabolite feature c2 of training cell line b 2; if metabolite feature c3 of training cell line b3 was selected more times y5 than y6, then y5 was selected as the metabolite feature c3 target number of training cell line b 3; if metabolite feature c4 of training cell line b4 was selected less times y7 than y8, then y8 was selected as the target number of metabolite feature c4 of training cell line b 4; if metabolite feature c5 of training cell line b5 was selected less times y9 than y10, then y10 was selected as the target number of metabolite feature c5 of training cell line b 5.

And S16, sequencing the target times from large to small to obtain target sequencing results, and taking the metabolite features corresponding to the first to mth target times in the target sequencing results as the to-be-used metabolite features.

And S17, training a cancer cell line to be trained on the drug sensitivity prediction model by using the characteristics of the metabolites to be used and the drug response classes of the cell lines to which the characteristics of the metabolites to be used belong.

The cancer cell line drug sensitivity prediction model to be trained may be one of the plurality of cancer cell line drug sensitivity prediction models constructed in step S13. Of course, the cancer cell line drug sensitivity prediction model to be trained can also be: and (4) a model obtained by combining a plurality of models in the plurality of cancer cell line drug sensitivity prediction models constructed in the step S13.

When the to-be-trained cancer cell line drug sensitivity prediction model is a model obtained by combining a plurality of models in the plurality of cancer cell line drug sensitivity prediction models constructed in step S13, when the to-be-trained cancer cell line drug sensitivity prediction model is used for prediction, target data can be predicted by using each prediction model in the to-be-trained cancer cell line drug sensitivity prediction model to obtain a plurality of prediction results, and then the final prediction result is determined in a voting manner. For example, the cancer cell line drug sensitivity model to be trained comprises 7 prediction models, the 7 prediction models predict target data, the obtained 7 prediction results are respectively sensitive, insensitive, and the number of sensitive is more than that of insensitive, and the final prediction result is determined to be sensitive.

As another alternative embodiment of the present application, referring to fig. 2, a flowchart of an embodiment 2 of a method for training a drug sensitivity prediction model provided by the present application is shown, and this embodiment is mainly an extension of the method for training a drug sensitivity prediction model described in the above embodiment 1, and as shown in fig. 2, the method may include, but is not limited to, the following steps:

step S21, obtaining metabolite features of each cell line in a plurality of cell lines and drug response parameters IC50 of each cell line.

Step S22, for each of the cell lines, determining a drug response class based on the drug response parameter IC 50.

And step S23, for each constructed cancer cell line drug sensitivity prediction model, sampling a first set number of cell lines from a plurality of cell lines to be used as training cell lines, and performing a plurality of times of important feature screening processes on each training cell line.

Each important characteristic screening process comprises the following steps: and inputting the metabolite features of the training cell line into the cancer cell line drug sensitivity prediction model to obtain an important feature set output by the cancer cell line drug sensitivity prediction model.

And step S24, counting the frequency of the metabolite features appearing in the important feature set output by the cancer cell line drug sensitivity prediction model for multiple times for each metabolite feature of each training cell line, and taking the frequency as the selected frequency.

Step S25, regarding each of the metabolite features of each of the training cell lines, using a maximum value of the plurality of the selected times of the metabolite features as a target time.

And S26, sequencing the target times from large to small to obtain target sequencing results, and taking the metabolite features corresponding to the first to mth target times in the target sequencing results as the to-be-used metabolite features.

And S27, training a cancer cell line to be trained on the drug sensitivity prediction model by using the characteristics of the metabolites to be used and the drug response classes of the cell lines to which the characteristics of the metabolites to be used belong.

The detailed procedures of steps S21-S27 can be referred to the related descriptions of steps S11-S17, and are not described herein again.

And step S28, sampling a second set number of cell lines from the plurality of cell lines to be used as test cell lines, and respectively predicting the characteristics of each metabolite of each test cell line for multiple times by using the cancer cell line drug sensitivity prediction model to be trained to obtain a prediction result.

The detailed process of sampling the second predetermined number of cell lines from the plurality of cell lines can be referred to the related description of the step S23 of sampling the first predetermined number of cell lines from the plurality of cell lines, and will not be described herein again.

And step S29, evaluating each prediction result to obtain an evaluation result.

Evaluating each of the prediction results to obtain an evaluation result, which may include: and comparing whether the predicted result is consistent with the drug response category of the metabolite feature marker of the test cell line to obtain a comparison result, and taking the comparison result as an evaluation result.

Step S210, judging whether the cancer cell line drug sensitivity prediction model to be trained meets set requirements or not based on a plurality of evaluation results.

Based on a plurality of the evaluation results, judging whether the cancer cell line drug sensitivity prediction model to be trained meets the set requirements may include:

and counting whether the number of the evaluation results with correct representation prediction in the plurality of evaluation results reaches a set number.

If the number reaches the set number, the cancer cell line drug sensitivity prediction model to be trained meets the set requirement.

And if the estimation result is that the prediction result is consistent with the drug response category of the metabolite feature marker of the test cell line, the characteristic cancer cell line drug sensitivity prediction model to be trained predicts correctly.

If yes, go to step S211; if not, the process returns to step S21.

And step S211, finishing training.

In the embodiment, the cancer cell line drug sensitivity prediction model to be trained is evaluated by testing the cell line, and when the set requirement is not met, the cancer cell line drug sensitivity prediction model to be trained is trained continuously, so that the training precision is improved, and the prediction accuracy of the trained cancer cell line drug sensitivity prediction model is ensured.

As another alternative embodiment of the present application, referring to fig. 3, a flowchart of an embodiment 3 of a method for training a drug sensitivity prediction model provided by the present application is shown, and this embodiment is mainly an extension of the method for training a drug sensitivity prediction model described in the above embodiment 2, as shown in fig. 3, the method may include, but is not limited to, the following steps:

step S31, obtaining metabolite features of each cell line in a plurality of cell lines and drug response parameters IC50 of each cell line.

Step S32, for each of the cell lines, determining a drug response class based on the drug response parameter IC 50.

And step S33, for each constructed cancer cell line drug sensitivity prediction model, sampling a first set number of cell lines from a plurality of cell lines to be used as training cell lines, and performing a plurality of times of important feature screening processes on each training cell line.

And step S34, counting the frequency of the metabolite features appearing in the important feature set output by the cancer cell line drug sensitivity prediction model for multiple times for each metabolite feature of each training cell line, and taking the frequency as the selected frequency.

Step S35, regarding each of the metabolite features of each of the training cell lines, using a maximum value of the plurality of the selected times of the metabolite features as a target time.

And S36, sequencing the target times from large to small to obtain target sequencing results, and taking the metabolite features corresponding to the first to mth target times in the target sequencing results as the to-be-used metabolite features.

And S37, training a cancer cell line to be trained on the drug sensitivity prediction model by using the characteristics of the metabolites to be used and the drug response classes of the cell lines to which the characteristics of the metabolites to be used belong.

And step S38, sampling a second set number of cell lines from the plurality of cell lines to be used as test cell lines, and respectively predicting the characteristics of each metabolite of each test cell line for multiple times by using the cancer cell line drug sensitivity prediction model to be trained to obtain a prediction result.

The detailed procedures of steps S31-S38 can be referred to the related descriptions of steps S21-S28 in embodiment 2, and are not described herein again.

Step S39, determining whether an abnormal cell line exists in the training cell lines based on the plurality of prediction results for each metabolite feature.

Based on the plurality of prediction results of each metabolite feature, determining whether an abnormal cell line exists in the plurality of training cell lines may be understood as:

judging whether a preset number of prediction results with prediction errors exist in a plurality of prediction results of each metabolite feature;

if present, the cell line is an abnormal cell line.

If the prediction result is not consistent with the drug response type of the metabolite feature marker, the prediction is wrong.

If yes, go to step S310; if not, go to step S311.

And S310, rejecting abnormal cell lines in the plurality of training cell lines, taking the cell lines after the abnormal cell lines are rejected as the training cell lines, and returning to execute the step of executing a plurality of times of important feature screening processes for each training cell line.

Step S311, evaluating each prediction result to obtain an evaluation result;

and S312, judging whether the cancer cell line drug sensitivity prediction model to be trained meets set requirements or not based on the evaluation result.

If yes, go to step S313; if not, the process returns to step S31.

And step S313, finishing the training.

In this embodiment, whether abnormal metabolite features exist in the metabolite features of the training cell lines is determined based on the plurality of prediction results of each metabolite feature, and if abnormal metabolite features exist, the abnormal metabolite features in the training cell lines are removed, so that training data are more accurate, and the accuracy of training a drug sensitivity prediction model of a cancer cell line to be trained is improved.

In another embodiment of the present application, a method for predicting drug sensitivity is provided, please refer to fig. 4, which includes:

and step S41, acquiring the metabolite characteristics of the cell line to be processed.

And step S42, calling a cancer cell line drug sensitivity prediction model, and processing the metabolite features of the cell line to be processed to obtain the drug response category.

The cancer cell line drug sensitivity prediction model is obtained by training based on the training method of the drug sensitivity prediction model introduced in any one of the embodiments 1-3.

In this embodiment, the model obtained by training the training method of the drug sensitivity prediction model introduced in each of the foregoing embodiments is used for prediction, so that the accuracy of prediction can be improved, and the accuracy of the prediction result can be improved.

Next, a training device of the drug sensitivity prediction model provided in the present application is described, and the training device of the drug sensitivity prediction model described below and the training method of the drug sensitivity prediction model described above may be referred to each other.

Referring to fig. 5, the training device of the drug sensitivity prediction model includes: the system comprises an acquisition module 100, a first determination module 200, an important feature screening module 300, a statistics module 400, a second determination module 500, a third determination module 600 and a training module 700.

An obtaining module 100, configured to obtain a metabolite feature of each of the cell lines in a plurality of cell lines and a drug response parameter IC50 of each of the cell lines;

a first determination module 200 for determining a drug response class for each of said cell lines based on said drug response parameter IC 50;

the important characteristic screening module 300 is used for sampling a first set number of cell lines from a plurality of the cell lines for each constructed cancer cell line drug sensitivity prediction model to serve as training cell lines, and performing a plurality of times of important characteristic screening processes on each training cell line;

a counting module 400, configured to count, for each metabolite feature of each training cell line, the number of times that the metabolite feature appears in an important feature set output by each cancer cell line drug sensitivity prediction model for multiple times, as a selected number of times;

a second determining module 500 for determining, for each of the metabolite features of each of the training cell lines, a maximum value of the plurality of the selected times of the metabolite features as a target time;

a third determining module 600, configured to rank the target times from large to small to obtain target ranking results, and use metabolite features corresponding to first to mth target times in the target ranking results as to-be-used metabolite features;

the training module 700 is configured to train a cancer cell line to be trained on a drug sensitivity prediction model by using the feature of the metabolite to be used and the drug response class of the cell line to which the feature of the metabolite to be used belongs.

In this embodiment, the important feature screening module 300 may be specifically configured to: normalizing each metabolite feature of the training cell line by using a normalization relation (x-min _ x)/(max _ x-min _ x) to obtain normalized metabolite features;

In this embodiment, the training device of the drug sensitivity prediction model may further include:

if yes, ending the training;

In this embodiment, the test module may be further configured to:

before each prediction result is evaluated to obtain an evaluation result, judging whether an abnormal cell line exists in a plurality of training cell lines based on a plurality of prediction results of each metabolite feature in each test cell line;

if not, evaluating each prediction result to obtain an evaluation result;

if yes, ending the training;

In this embodiment, the first determining module 200 may be specifically configured to:

if yes, determining the drug response type as insensitive;

if not, determining that the drug response category is sensitive.

In another embodiment of the present application, a drug sensitivity prediction device is provided, referring to fig. 6, the drug sensitivity prediction device comprising: an acquisition module 800 and a calling module 900.

the cancer cell line drug sensitivity prediction model is obtained by training based on the training method of the drug sensitivity prediction model described in any one of embodiments 1-3.

It should be noted that each embodiment is mainly described as a difference from the other embodiments, and the same and similar parts between the embodiments may be referred to each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.

From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments of the present application.

The above detailed description is given to the training method, the prediction method and the related device of the drug sensitivity prediction model provided by the present application, and a specific example is applied in the present application to explain the principle and the implementation manner of the present application, and the description of the above example is only used to help understanding the method and the core idea of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A training method of a drug sensitivity prediction model is characterized by comprising the following steps:

2. The method of claim 1, wherein inputting the metabolite features of the training cell line into the cancer cell line drug sensitivity prediction model to obtain the feature set of importance output by the cancer cell line drug sensitivity prediction model comprises:

3. The method of claim 1, wherein the training of the cancer cell line drug sensitivity prediction model to be trained is performed using the metabolite features to be used and the drug response classes of the metabolite features to be used, and thereafter:

sampling a second set number of cell lines from the plurality of cell lines to be used as test cell lines, and respectively predicting the characteristics of each metabolite of each test cell line for multiple times by using the cancer cell line drug sensitivity prediction model to be trained to obtain a prediction result;

evaluating each prediction result to obtain an evaluation result;

if yes, ending the training;

4. The method of claim 3, wherein before evaluating each of the predicted results to obtain an evaluation result, further comprising:

determining whether an abnormal cell line is present in a plurality of said training cell lines based on a plurality of said predicted outcomes for each of said metabolite features in each of said test cell lines;

if not, evaluating each prediction result to obtain an evaluation result;

if yes, ending the training;

5. The method of claim 1, wherein said determining a drug response class based on said drug response parameter IC50 comprises:

if yes, determining the drug response type as insensitive;

if not, determining that the drug response category is sensitive.

6. A method for predicting drug sensitivity, comprising:

obtaining the metabolite characteristics of the cell line to be processed;

7. A training device for a drug sensitivity prediction model is characterized by comprising:

8. The apparatus of claim 7, wherein the significant feature filtering module is specifically configured to: normalizing each metabolite feature of the training cell line by using a normalization relation (x-min _ x)/(max _ x-min _ x) to obtain normalized metabolite features;

9. The apparatus of claim 7, further comprising:

if yes, ending the training;

10. The apparatus of claim 9, wherein the testing module is further configured to:

if not, evaluating each prediction result to obtain an evaluation result;

if yes, ending the training;

11. The apparatus of claim 7, wherein the first determining module is specifically configured to:

if yes, determining the drug response type as insensitive;

if not, determining that the drug response category is sensitive.

12. A drug sensitivity prediction device, comprising: