CN117275578B - Method for constructing multi-mode prediction model of lung cancer lymph node metastasis - Google Patents

Method for constructing multi-mode prediction model of lung cancer lymph node metastasis Download PDF

Info

Publication number
CN117275578B
CN117275578B CN202311524786.2A CN202311524786A CN117275578B CN 117275578 B CN117275578 B CN 117275578B CN 202311524786 A CN202311524786 A CN 202311524786A CN 117275578 B CN117275578 B CN 117275578B
Authority
CN
China
Prior art keywords
prediction model
macromage
distmin
lung cancer
lymph node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311524786.2A
Other languages
Chinese (zh)
Other versions
CN117275578A (en
Inventor
李�浩
王俊
杨帆
李运
盛剑鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Yinfei Duozuo Biotechnology Co.,Ltd.
Original Assignee
Peking University Peoples Hospital
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University Peoples Hospital filed Critical Peking University Peoples Hospital
Priority to CN202311524786.2A priority Critical patent/CN117275578B/en
Publication of CN117275578A publication Critical patent/CN117275578A/en
Application granted granted Critical
Publication of CN117275578B publication Critical patent/CN117275578B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/50Mutagenesis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/10Signal processing, e.g. from mass spectrometry [MS] or from PCR

Landscapes

  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Biophysics (AREA)
  • Theoretical Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Analytical Chemistry (AREA)
  • Chemical & Material Sciences (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Bioethics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention belongs to the field of disease diagnosis and risk prediction, and particularly relates to a method for constructing a multi-mode prediction model of lung cancer lymph node metastasis by combining gene mutation characteristics and multiple immunofluorescence staining mIF image characteristics and a prediction model system. Compared with a single-mode prediction model based on gene mutation characteristics, the lung cancer lymph node metastasis multi-mode prediction model system constructed by the comprehensive gene mutation characteristics and the mIF image characteristics has higher accuracy, specificity, sensitivity and robustness, and well solves the problems of inaccurate or insufficient sensitivity of the lung cancer lymph node metastasis prediction result in the clinic at present.

Description

Method for constructing multi-mode prediction model of lung cancer lymph node metastasis
Technical Field
The invention belongs to the field of disease diagnosis and risk prediction, and particularly relates to a method for constructing a multi-mode prediction model for lung cancer lymph node metastasis by combining gene mutation characteristics and mIF image characteristics and a prediction model system.
Background
The accurate stage of lung cancer contributes to the selection and prognosis evaluation of treatment schemes, and the most common lung cancer stage method at present is a TNM tumor stage system deduced by WHO, which divides lung cancer into four stages according to three dimensions of primary tumor invasion degree (T), lymph node metastasis (N) and distant metastasis (M), wherein stage I, II and stage IIIa are early lung cancers. N stage is closely related to the final stage of lung cancer, and greatly influences postoperative adjuvant treatment, recurrence and total survival of patients, so that the judgment of the existence of lymph node metastasis has important clinical significance.
Although non-invasive imaging examinations such as Computed Tomography (CT) are commonly used in clinic at present to detect lymph node metastasis, accurate lymph node stage diagnosis still cannot be obtained. Although invasive examinations such as mediastinal speculum and needle biopsy under ultrasound guidance of bronchi have higher accuracy, they are usually performed only for positively indexed lymph nodes and cannot cover all lymph node sites. Predictive models based on patient clinical information have also been reported, but are too broad to result in insufficient accuracy. Thus, there is a need for accurate and comprehensive means for predicting lymph node metastasis.
Currently, the mainstream view considers that the gene mutation is a main cause of tumorigenesis, and detecting the gene mutation in lung cancer is beneficial to prompting the lymph node metastasis of patients. Second, tumor immune microenvironment (Tumor Immune Micro-environment, TIME) is also an important factor affecting tumor growth, whether tumor cells are capable of undergoing lymph node metastasis has a close relationship with TIME, and evaluation of TIME is beneficial for predicting lymph node metastasis in patients. Thus, if the detection of gene mutations in lung cancer and the detection of tumor immune microenvironment could be combined, the prediction of lung cancer lymph node metastasis would be greatly improved.
Based on the problem that specificity and sensitivity of a prediction model of lymph node metastasis by noninvasive imaging and clinical information are insufficient, and invasive examination based on needle biopsy cannot cover all lymph nodes, the inventor combines two technologies of next generation sequencing (Next Generation Sequencing, NGS) and multiple immunofluorescence (multiple Immunofluence, mIF) to detect gene mutation and tumor immunity microenvironment respectively, so that a multi-mode prediction model of lung cancer lymph node metastasis is constructed.
There is no report on related research results in the prior art.
Disclosure of Invention
The invention aims to fill the blank of the prior art, solve the defects of the lung cancer lymph node metastasis prediction model clinically used at present and in the research and development stage, and provide a lung cancer lymph node metastasis multi-mode prediction model construction method and a prediction model system integrating gene mutation characteristics and mIF image characteristics.
In order to achieve the purpose of the invention, the invention adopts the following technical scheme:
in a first aspect, the present invention provides a method for constructing a multi-modal predictive model of lung cancer lymph node metastasis that combines a gene mutation signature and a multiplex immunofluorescent staining (mIF) image signature, the predictive model constructing method comprising the steps of:
step 1: obtaining the mutation characteristics of the genes: obtaining one or more gene mutation characteristics with larger absolute value of coefficient Coef in the classifier;
step 2: acquiring mIF image characteristics:
step 2.1: subjecting a lung cancer tissue section of a patient to mIF using corresponding antibodies to one or more lung cancer cell surface antigens;
step 2.2: shooting detailed images of the dyed tissue slices, analyzing by using software, and obtaining the distribution of various single cells on the slices through cluster analysis;
step 2.3: assessing the effect of Tumor Immune Microenvironment (TIME) on lymph node metastasis in a lung cancer patient;
step 2.4: obtaining one or more mIF image features with larger absolute values of coefficients Coef in the classifier;
step 3: generation of a predictive model: and (3) constructing a lymph node metastasis prediction model by adopting the gene mutation characteristics obtained in the step (1) and the mIF image characteristics obtained in the step (2).
Step 4: and (3) verifying a prediction model: and verifying the accuracy, specificity, sensitivity and robustness of the prediction model.
Alternatively, in the above prediction model construction method,
in the step 1, systematically analyzing the gene mutation spectrum of a lung cancer patient, calculating a maximal linear classifier of the interval of lymph node metastasis or not by using a support vector machine (support vector machines, SVM) based on various gene mutation characteristics, thereby obtaining the coefficient Coef of each gene mutation characteristic in the classifier, and obtaining one or more gene mutation characteristics with larger absolute values of Coef;
in step 2.2, shooting detailed images of the stained tissue sections, unmixing the multispectral images by utilizing a spectral library constructed by the monoclonal antibody stained tissue images of each antigen, splicing the images of each field of view to obtain a complete view of the sections, segmenting the images by using software to generate mask files, analyzing by using software, and obtaining the distribution of various single cells on the sections by clustering analysis;
in step 2.3, defining tumor immunity microenvironment characteristics of the patient according to four dimensions of a closest inter-cell nucleus distance (distMin), a farthest inter-cell nucleus distance (distMax), an average inter-cell nucleus distance (distMean) and cell attributes;
in step 2.4, using SVM to calculate the interval maximum linear classifier of lymph node metastasis or not, thereby obtaining the coefficient Coef of each image feature in the classifier, and obtaining one or more mIF image features with larger Coef absolute value;
in step 3, the SVM parameters of GeneFeaturs and ImFeaturs are integrated by using the gene mutation features obtained in step 1 and the mIF image features obtained in step 2, and the optimal parameters are screened by using a Leave-One-Out Cross-Validation method, and then a lymph node metastasis prediction model is constructed by using Logistic regression.
In a more preferred embodiment, the prediction model construction method includes the steps of:
step 1: obtaining the mutation characteristics of the genes:
systematically analyzing the gene mutation spectrum of a lung cancer patient, calculating a maximum interval linear classifier for lymph node metastasis or not by using a support vector machine (support vector machines, SVM) based on various gene mutation characteristics, thereby obtaining coefficients Coef of the various gene mutation characteristics in the classifier, and obtaining one or more gene mutation characteristics with larger absolute values of Coef;
step 2: acquiring mIF image characteristics:
step 2.1: subjecting a lung cancer tissue section of a patient to mIF using corresponding antibodies to one or more lung cancer cell surface antigens;
step 2.2: shooting detailed images of the dyed tissue slices, unmixing the multispectral images by utilizing a spectral library constructed by the monoclonal antibody dyed tissue images of each antigen, splicing the images of each view to obtain a complete view of the slices, segmenting the images by using software to generate mask files, analyzing by using software, and obtaining the distribution of various single cells on the slices by clustering analysis;
step 2.3: defining tumor immunity microenvironment characteristics of patients according to four dimensions of a closest inter-cell distance (distMin), a farthest inter-cell distance (distMax), an average inter-cell distance (distMean) and cell attributes, counting 9 ten digits of all the patients on the characteristics for each dimension of the characteristics, wherein the 9 ten digits divide each dimension of the characteristics into 10 intervals, namely, for the characteristics A, the characteristics A are divided into 10 intervals according to the overall situation of all the patients:if the characteristic A of a certain patient is counted, the proportion of the characteristic A falling in the 10 intervals is counted, and the 1-dimensional characteristic is diffused into 10 dimensions, and is marked as A_0, A_1 and … A_9;
step 2.4: calculating an interval maximum linear classifier of lymph node metastasis or not by using an SVM (support vector machine), so as to obtain coefficients Coef of each image feature in the classifier, and obtaining one or more mIF image features with larger Coef absolute values;
step 3: generation of a predictive model:
and (3) integrating the SVM parameters of GeneFeaturs and ImFeaturs by using the gene mutation characteristics obtained in the step (1) and the mIF image characteristics obtained in the step (2), screening the optimal parameters by using a Leave-One-Out Cross-Validation method, and constructing a lymph node metastasis prediction model by using Logistic regression.
Step 4: and (3) verifying a prediction model: and verifying the accuracy, specificity, sensitivity and robustness of the prediction model.
Alternatively, in the method for constructing a predictive model, the lung cancer is lung cancer stage I-IV.
Alternatively, in the above prediction model construction method, in step 2.1, the lung cancer cell surface antigen used is selected from one or more of the following: PANCK, CD4, CD8, FOXP3, CD68, CD86, MPO, CD11c, CD11b, CD206, PD-1, CD19, CD3, aasma, CD31, collagen I, CD94, or PD-L1.
Preferably, in step 2.1, the lung cancer cell surface antigen used is selected from the group consisting of PANCK, CD4, CD8, FOXP3, CD68 and PD-L1.
Alternatively, in the above prediction model building method, in step 2.2, the resolution of the image is 4028×3012 pixels, the image analysis software used is the form advanced image analysis software, the image is segmented by using CellProfiler software, the mask file is analyzed by using histoCAT software, and the clustering method used is a supervised pedigree attribution method.
Alternatively, in the above prediction model construction method, in step 2.2, the distribution of a plurality of different single cells on the section is clustered by a supervised lineage attribution method, the single cells being selected from two or more of: macrophage & CD8T, macrophage & T, CD T: CD4+FOXP3-PDL1-, CD8T: cd8+pdl1-, macrophages: cd68+pdl1-, other, treg: cd4+ foxp3+, PD-l1+ epithelial cells: panck+pdl1+, epithelial cells: PANCK+PDL1-.
Preferably, the single cell is selected from the group consisting of macrophage & CD8T, macrophage & T, CD4T: CD4+FOXP3-PDL1-, CD8T: cd8+pdl1-, macrophages: cd68+pdl1-, other, treg: cd4+ foxp3+, PD-l1+ epithelial cells: panck+pdl1+ and epithelial cells: PANCK+PDL1-.
Alternatively, in the above prediction model construction method, in step 2.3, the cell property is a cell nucleus area, a long diameter, a short diameter, and a ratio of the long diameter to the short diameter of 4 dimensions, and the intercellular distance parameters in each slice are common (C 2 9 +9) 3=135 dimensions, 139 feature dimensions per slice, 139 x 10=1390 feature dimensions for tumor immune microenvironment based on the mIF slice images for each patient.
Alternatively, in the above prediction model construction method, in step 1, the gene mutation feature includes one or more of the following: KIT, MSH3, APC, HGF, GRIN2A, PARP3, PIK3C2G, KMT2C, TLR4, NOTCH1.
Preferably, the gene mutation is characterized by KIT, MSH3, APC, HGF, GRIN2A, PARP3, PIK3C2G, KMT2C, TLR4, NOTCH1.
Alternatively, in the above prediction model building method, in step 2.4, the meif image feature includes one or more of the following: distmin_treg_treg_4, distmin_ephelial_macromage & cd8t_1, distmin_ephelial_treg_5, distmin_macromage & t_treg_1, distmin_cd8t_macromage_9, distmin_macromage & t_macromage & t_0, distmin_macromage_macromage_3, distmin_macromage & t_macromage & t_2, distmin_cd8t_ephelial_5, distmin_macromage_macromage_4.
Preferably, the mff image features are distmin_treg_treg_4, distmin_ephelial_macromage & cd8t_1, distmin_ephelial_treg_5, distmin_macromage & t_treg_1, distmin_cd8t_macromage_9, distmin_macromage & t_macromage & t_0, distmin_macromage_macromage_3, distmin_macromage & t_macromage & t_2, distmin_cd8t_ephelial_5, distmin_macromage_macromage_4.
In a second aspect, the present invention provides a predictive model system for lung cancer lymph node metastasis constructed by the predictive model construction method described in the first aspect above.
Alternatively, in the above prediction model system, the prediction model system isWherein beta is lm SVM parameters being ImFeaturs, beta Gene Pr (ImGene) is a predicted value of the integrated feature ImGene obtained by integrating the two parameters of GeneFeaturs, and is a probability of lymph node metastasis.
Alternatively, in the predictive model system described above, the genetic mutation signature includes one or more of the following: KIT, MSH3, APC, HGF, GRIN2A, PARP3, PIK3C2G, KMT2C, TLR4, NOTCH1.
Preferably, the gene mutation is characterized by KIT, MSH3, APC, HGF, GRIN2A, PARP3, PIK3C2G, KMT2C, TLR4, NOTCH1.
Alternatively, in the above prediction model system, the mIF image features include one or more of: distmin_treg_treg_4, distmin_ephelial_macromage & cd8t_1, distmin_ephelial_treg_5, distmin_macromage & t_treg_1, distmin_cd8t_macromage_9, distmin_macromage & t_macromage & t_0, distmin_macromage_macromage_3, distmin_macromage & t_macromage & t_2, distmin_cd8t_ephelial_5, distmin_macromage_macromage_4.
Preferably, the mff image features are distmin_treg_treg_4, distmin_ephelial_macromage & cd8t_1, distmin_ephelial_treg_5, distmin_macromage & t_treg_1, distmin_cd8t_macromage_9, distmin_macromage & t_macromage & t_0, distmin_macromage_macromage_3, distmin_macromage & t_macromage & t_2, distmin_cd8t_ephelial_5, distmin_macromage_macromage_4.
Compared with the prior art, the invention has the following beneficial effects:
compared with a single-mode prediction model based on gene mutation characteristics, the multi-mode prediction model system for lung cancer lymph node metastasis constructed by the comprehensive gene mutation characteristics and the mIF image characteristics has higher accuracy, specificity, sensitivity and robustness, and well solves the problems of inaccurate or insufficient sensitivity of the lung cancer lymph node metastasis prediction result in the clinic at present.
Drawings
Other features, objects and advantages of the present disclosure will become more apparent from the following detailed description of non-limiting embodiments, taken in conjunction with the accompanying drawings. In the drawings:
fig. 1: the invention discloses a lung cancer lymph node metastasis multi-mode prediction model schematic diagram integrating gene mutation characteristics and mIF image characteristics.
Fig. 2: in the analysis of gene mutation characteristics and transfer correlation, 10 genes with the maximum absolute value of Coef are obtained.
Fig. 3: schematic of the distribution of 9 different single cells on sections clustered by supervised lineage assignment method.
Fig. 4: representative photographs of the distribution of 9 different single cells on sections clustered by supervised lineage assignment method. Wherein the magnification is 20 times.
Fig. 5: in distMin image feature and transfer correlation analysis, the absolute value of Coef is the largest of the 10 genes.
Fig. 6: a calculation formula schematic diagram of a multi-mode prediction model of lung cancer lymph node metastasis.
Fig. 7: the multimodal predictive model of the present invention predicts ROC curves for lymph node metastasis.
Fig. 8: the multimodal predictive model (validation set) of the present invention predicts ROC curves for lymph node metastasis.
Fig. 9: AUC results for single mode predictive models reported in the literature heretofore.
Detailed Description
The invention will be further illustrated with reference to specific examples. It should be understood that the detailed description and specific examples are intended for purposes of illustration only and are not intended to limit the scope of the invention.
In this disclosure, it is to be understood that terms such as "comprises" or "comprising," etc., are intended to indicate the presence of a tag, number, step, action, component, section, or combination thereof disclosed in this specification, and are not intended to exclude the possibility that one or more other tags, numbers, steps, actions, components, sections, or combinations thereof, are present or added.
The specific techniques or conditions are not identified in the examples and are described in the literature in this field or are carried out in accordance with the product specifications. The reagents or equipment used were conventional products available for purchase through regular channels, with no manufacturer noted.
The experimental methods in the following examples are conventional methods unless otherwise specified. The test materials used in the examples described below, unless otherwise specified, are all commercially available products.
Examples
In detecting the gene mutation signatures (GeneFeatures), gene mutation patterns of tumor tissues were systematically analyzed based on NGS data of 257 lung adenocarcinoma patients (clinical trial data from the university of beijing people hospital, IRB No. 2021PHB182-001), and a maximally spaced linear classifier of lymph node metastasis or not was calculated based on the various gene mutation signatures using a support vector machine (support vector machines, SVM), thereby obtaining coefficients Coef of the respective gene mutation signatures in the classifier (table 1). Where Coef >0 represents the positive correlation between the feature and the transition, coef <0 represents the positive correlation between the feature and the transition, and the absolute value size represents the degree of correlation. The 10 genes with the largest absolute value of Coef included KIT, MSH3, APC, HGF, GRIN2A, PARP3, PIK3C2G, KMT2C, TLR4, NOTCH1, etc. (fig. 2).
Table 1: coefficient of gene mutation characteristics
In terms of tumor immunomicroenvironment, the present invention uses corresponding antibodies to 6 cell surface antigens including PANCK, CD4, CD8, FOXP3, CD68, and PD-L1 to perform multiple immunofluorescent staining (mIF) on lung cancer tissue sections.
For analysis of tissue sections PerkinElmer Vectra (Vectra 3.0.5) was used, the whole section was previewed with a 4 x microscope objective lens and detailed images were taken with a 20 x microscope objective lens. The resolution of the obtained image is up to 4028 multiplied by 3012 pixels, which is beneficial to accurate detection. The multispectral images were then unmixed using the form advanced image analysis software (version 2.3.0) using a spectral library constructed from monoclonal antibody stained tissue images for each antigen. The images of each field of view are then stitched to obtain a complete view of the slice. Finally, we used CellProfiler software to segment the image, generate mask files, and analyzed using histoCAT software, clustered by supervised lineage attribution method to obtain the distribution of 9 different single cells on the sections (see fig. 3 and 4, respectively). As shown in fig. 3, the 9 different single cells were macrophage & CD8T, macrophage & T, CD T: CD4+FOXP3-PDL1-, CD8T: cd8+pdl1-, macrophages: cd68+pdl1-, other, treg: cd4+ foxp3+, PD-l1+ epithelial cells: panck+pdl1+, epithelial cells: PANCK+PDL1-.
To evaluate TIME further based on the image features (image) of the mIF slices, tumor immune microenvironment features for each patient were defined in terms of four dimensions, inter-nuclei closest distance (distMin), inter-nuclei farthest distance (distMax), inter-nuclei mean distance (distMean), and cell attributes. distMin comprises 9 nearest distances between the nuclei of each cell and the nuclei of the same cell, other 8 cells, respectively, such as the nearest distance parameter between macrophages (Macrophage) and the same class of macrophages is denoted distmin_macrophage_macrophage, and the nearest distance between cd4+ T cells (CD 4T) and macrophages is denoted distmin_cd4t_macrophage, distMax and distMean, respectively, so that the intercellular distance parameter in one slice is common (C 2 9 +9)3 = 135 dimensions; while the cell attributes include 4 dimensions of nuclear area, major diameter, minor diameter, and ratio of major to minor diameters. So there are 135+4=139 feature dimensions for each slice.
To fully characterize the heterogeneity between tumors of different patients, for each dimension of the image feature, 9 ten digits of all patients on the image feature are counted, and the 9 ten digits divide each dimension of the image feature into 10 intervals, namely, for the image feature a, the image feature a is divided into 10 intervals according to the overall situation of all patients:. If the image feature A of a certain patient is counted, the proportion of the image feature A falling in the 10 intervals is counted, and the 1-dimensional image feature is diffused into 10 dimensions and is marked as A_0, A_1 and ⋯ A_9. The image feature dimension of the TIME is thus estimated based on the mIF slice for each patient to be expanded to 139 x 10 = 1390 dimensions.
To evaluate the effect of TIME on lymph node metastasis in lung cancer patients, a Support Vector Machine (SVM) was used to calculate a maximal-interval linear classifier of lymph node metastasis or not based on the 1390-dimensional image features reflected by the aforementioned mIF slices, thereby obtaining coefficients Coef (table 2) of the respective image features in the classifier, representing the predicted intensities of various immune cell-cell interactions for lymph node metastasis. The 10 mIF image features with the largest Coef absolute values include distMin_Treg_Treg_4, distMin_Epithelial_macromage & CD8T_1, distMin_Epithelial_Treg_5, distMin_macromage & T_Treg_1, distMin_CD8T_macromage_9, distMin_macromage & T_macromage & T_0, distMin_macromage_macromage_3, distMin_macromage & T_macromage & T_2, distMin_CD8T_Epithelial_5, distMin_macromage_4 (FIG. 5).
Table 2: coefficients of mIF image features
The genetic mutation features and immune microenvironment mIF image features were combined, the SVM parameters of GeneFeaturs and ImFeaturs were integrated, and the best parameters were screened using the Leave-One-Out Cross-Validation method, and then a lymph node metastasis prediction model was constructed using Logistic regression (FIG. 6). Where βim is the SVM parameter of ImFeatures, βgene is the parameter of GeneFeatures, and Pr (ImGene) is the predicted value of the integrated feature ImGene obtained by integrating both of them for the probability of lymph node metastasis.
Specifically, as shown below, to predict LN staging, a library in scikit-learn (1.0.1, https:// scikit-learn. Org/stable /) Python was used. The first 10 coefficients were selected from the mhic image features (ImFeatures) and the genomic features (GeneFeatures) to construct the model (as shown in tables 1 and 2). After removal of imfeatenes mismatched samples, a dataset consisting of 83 mIHC and NGS patients was obtained. Among the classifiers, the ImGene Support Vector Machine (SVM) exhibited the highest performance, AUC of 0.86, F1 score of 0.83, accuracy of 0.82, exceeding Random Forest, imFeatures SVM, geneFeatures SVM and Gradient Boosting.
The ROC curve of the comprehensive characteristics for lymph node metastasis prediction is drawn, and the area under the curve (AUC) is as high as 0.86, which shows that the lymph node metastasis prediction model constructed based on the comprehensive characteristics has good prediction effect (figure 7).
In addition, the above prediction model was also verified. An independent external dataset was collected, comprising 61 patients receiving complete excision therapy, ethical No.2021PHB182-001. All patients provided informed consent for sample collection and all participants agreed to publish study results. These patients were selected based on the availability of NGS and mIHC data. The AUC for the independent external data set was 0.81 and the training queue 95% confidence intervals were AUC [0.624,1.036] and [0.457,0.963] (fig. 8).
Therefore, the multi-modal prediction model of the lung cancer lymph node metastasis has better prediction effect on lymph node metastasis than the single-modal prediction model reported in the previous literature. Existing lymph node metastasis prediction models are mainly single-mode prediction models, such as PKU models (Chen K, yang F, jiang G, li J, wang J Development and validation of a clinical prediction model for N2 lymph node metastasis in non-small cell lung cancer, ann Thorac Surg.2013 nov;96 (5): 1761-8), FUDAN models (Zhang Y, sun Y, xang J, zhang Y, hu H, chen H. A prediction model for N2 disease in T1 non-small cell lung cancer, J Thorac Cardiovasc Surg.2012 Dec;144 (6): 1360-4) based solely on clinical characteristics of the patient with AUC of 0.767, 0.806, respectively, sensitivity and specificity being low. The lung cancer multi-mode lymph node metastasis prediction model constructed by combining the gene mutation characteristics and the mIF image characteristics has AUC as high as 0.86, and has higher effect than the specificity and sensitivity of the existing prediction model (figure 9).
Compared with a single-mode prediction model based on gene mutation characteristics, the multi-mode prediction model system for lung cancer lymph node metastasis constructed by the comprehensive gene mutation characteristics and the mIF image characteristics has higher accuracy, specificity, sensitivity and robustness, and well solves the problems of inaccurate or insufficient sensitivity of the lung cancer lymph node metastasis prediction result in the clinic at present.
The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by those skilled in the art that the scope of the invention referred to in this disclosure is not limited to the specific combination of features described above, but encompasses other embodiments in which any combination of features described above or their equivalents is contemplated without departing from the inventive concepts described. Such as those described above, are provided in the present disclosure in place of, but not limited to, features having similar functions.

Claims (8)

1. A method for constructing a multi-modal prediction model for lung cancer lymph node metastasis by combining gene mutation characteristics and multiple immunofluorescence staining mIF image characteristics is characterized by comprising the following steps: the prediction model construction method comprises the following steps:
step 1: obtaining the mutation characteristics of the genes: systematically analyzing the gene mutation spectrum of a lung cancer patient, calculating a maximum interval linear classifier for lymph node metastasis or not by using a support vector machine based on various gene mutation characteristics, thereby obtaining coefficients Coef of the various gene mutation characteristics in the classifier, and obtaining one or more gene mutation characteristics with larger absolute values of Coef;
step 2: acquiring mIF image characteristics:
step 2.1: subjecting a lung cancer tissue section of a patient to mIF using corresponding antibodies to one or more lung cancer cell surface antigens;
step 2.2: shooting detailed images of the dyed tissue slices, unmixing the multispectral images by utilizing a spectral library constructed by the monoclonal antibody dyed tissue images of each antigen, splicing the images of each view to obtain a complete view of the slices, segmenting the images by using software to generate mask files, analyzing by using software, and obtaining the distribution of various single cells on the slices by clustering analysis;
step 2.3: evaluating the influence of the tumor immune microenvironment on the lymph node metastasis of a lung cancer patient, and defining the tumor immune microenvironment characteristics of the patient according to four dimensions, namely the closest distance between cell nuclei, the farthest distance between cell nuclei, the average distance between cell nuclei and cell attributes;
step 2.4: obtaining one or more mIF image features with larger absolute values of Coef coefficients in the classifier, calculating an interval maximum linear classifier of lymph node metastasis or not by using an SVM, thereby obtaining coefficients Coef of each image feature in the classifier, and obtaining one or more mIF image features with larger absolute values of Coef;
step 3: generation of a predictive model: integrating SVM parameters of GeneFeaturs and ImFeaturs by adopting the gene mutation characteristics obtained in the step 1 and the mIF image characteristics obtained in the step 2, screening optimal parameters by using a Leave-One-Out Cross-Validation method, and constructing a lymph node metastasis prediction model by using Logistic regression;
step 4: and (3) verifying a prediction model: verifying the accuracy, specificity, sensitivity and robustness of the predictive model,
the prediction model isWherein beta is lm SVM parameters being ImFeaturs, beta Gene Pr (ImGene) is a predicted value of the integrated feature ImGene obtained by integrating the two parameters of GeneFeaturs, and is a probability of lymph node metastasis.
2. The prediction model construction method according to claim 1, characterized in that: the lung cancer is lung cancer stage I-IV.
3. The prediction model construction method according to claim 1, characterized in that: in step 2.1, the lung cancer cell surface antigen used is selected from one or more of the following: PANCK, CD4, CD8, FOXP3, CD68, CD86, MPO, CD11c, CD11b, CD206, PD-1, CD19, CD3, aasma, CD31, collagen I, CD94, or PD-L1.
4. The prediction model construction method according to claim 1, characterized in that: in step 2.2, the distribution of a plurality of different single cells on the sections is obtained by supervised lineage homing method clustering, the single cells being selected from two or more of the following: macrophage & CD8T, macrophage & T, CD T: CD4+FOXP3-PDL1-, CD8T: cd8+pdl1-, macrophages: cd68+pdl1-, other, treg: cd4+ foxp3+, PD-l1+ epithelial cells: panck+pdl1+, epithelial cells: PANCK+PDL1-.
5. The prediction model construction method according to claim 1, characterized in that: in step 2.3, the cell properties are the cell nucleus area, the long diameter, the short diameter and the ratio of the long diameter to the short diameter of 4 dimensions, and the intercellular distance parameters in each slice are common (C 2 9 +9) 3=135 dimensions, 139 feature dimensions per slice, 139 x 10=1390 feature dimensions for tumor immune microenvironment based on the mIF slice images for each patient.
6. The prediction model construction method according to claim 1, characterized in that: in step 1, the genetic mutation signature comprises one or more of the following: KIT, MSH3, APC, HGF, GRIN2A, PARP3, PIK3C2G, KMT2C, TLR4, NOTCH1.
7. The prediction model construction method according to claim 1, characterized in that: in step 2.4, the mIF image features include one or more of: distmin_treg_treg_4, distmin_ephelial_macromage & cd8t_1, distmin_ephelial_treg_5, distmin_macromage & t_treg_1, distmin_cd8t_macromage_9, distmin_macromage & t_macromage & t_0, distmin_macromage_macromage_3, distmin_macromage & t_macromage & t_2, distmin_cd8t_ephelial_5, distmin_macromage_macromage_4.
8. A predictive model system for lung cancer lymph node metastasis constructed by the predictive model construction method according to any one of claims 1 to 7.
CN202311524786.2A 2023-11-16 2023-11-16 Method for constructing multi-mode prediction model of lung cancer lymph node metastasis Active CN117275578B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311524786.2A CN117275578B (en) 2023-11-16 2023-11-16 Method for constructing multi-mode prediction model of lung cancer lymph node metastasis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311524786.2A CN117275578B (en) 2023-11-16 2023-11-16 Method for constructing multi-mode prediction model of lung cancer lymph node metastasis

Publications (2)

Publication Number Publication Date
CN117275578A CN117275578A (en) 2023-12-22
CN117275578B true CN117275578B (en) 2024-02-27

Family

ID=89214581

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311524786.2A Active CN117275578B (en) 2023-11-16 2023-11-16 Method for constructing multi-mode prediction model of lung cancer lymph node metastasis

Country Status (1)

Country Link
CN (1) CN117275578B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111521794A (en) * 2020-04-21 2020-08-11 山东第一医科大学(山东省医学科学院) Immunofluorescence kit and detection method for detecting NSE gene mutation of peripheral blood circulating tumor cells of small cell lung cancer patients
WO2021174052A1 (en) * 2020-02-27 2021-09-02 Foundation Medicine, Inc. Mitigation of statistical bias in genetic sampling
CN115424740A (en) * 2022-09-30 2022-12-02 四川大学华西医院 Tumor immunotherapy effect prediction system based on NGS and deep learning
CN116047074A (en) * 2022-11-08 2023-05-02 兰州大学第一医院 Marker for diagnosing and/or predicting lung cancer, diagnostic model and construction method thereof
CN116516004A (en) * 2023-03-15 2023-08-01 中南大学湘雅二医院 Predictive model for driving malignant lung nodule lymph node metastasis by tumor genes through changing immune microenvironment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200199555A1 (en) * 2018-12-05 2020-06-25 The Broad Institute, Inc. Cas proteins with reduced immunogenicity and methods of screening thereof

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021174052A1 (en) * 2020-02-27 2021-09-02 Foundation Medicine, Inc. Mitigation of statistical bias in genetic sampling
CN111521794A (en) * 2020-04-21 2020-08-11 山东第一医科大学(山东省医学科学院) Immunofluorescence kit and detection method for detecting NSE gene mutation of peripheral blood circulating tumor cells of small cell lung cancer patients
CN115424740A (en) * 2022-09-30 2022-12-02 四川大学华西医院 Tumor immunotherapy effect prediction system based on NGS and deep learning
CN116047074A (en) * 2022-11-08 2023-05-02 兰州大学第一医院 Marker for diagnosing and/or predicting lung cancer, diagnostic model and construction method thereof
CN116516004A (en) * 2023-03-15 2023-08-01 中南大学湘雅二医院 Predictive model for driving malignant lung nodule lymph node metastasis by tumor genes through changing immune microenvironment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Wei W ,Li X ,Song M , et al.Molecular Analysis of Oncogenic Mutations in Resected Margins by Next-Generation Sequencing Predicts Relapse in Non-Small Cell Lung Cancer Patients.OncoTargets and therapy.2020,(第2020,13期),全文. *

Also Published As

Publication number Publication date
CN117275578A (en) 2023-12-22

Similar Documents

Publication Publication Date Title
CN112185569B (en) Breast cancer patient disease-free survival period prediction model and construction method thereof
Maldonado et al. Noninvasive characterization of the histopathologic features of pulmonary nodules of the lung adenocarcinoma spectrum using computer-aided nodule assessment and risk yield (CANARY)—a pilot study
Xu et al. Using transfer learning on whole slide images to predict tumor mutational burden in bladder cancer patients
Baik et al. Automated classification of oral premalignant lesions using image cytometry and random forests-based algorithms
US20140235487A1 (en) Oral cancer risk scoring
Siviengphanom et al. Mammography-based radiomics in breast cancer: a scoping review of current knowledge and future needs
CN116188423A (en) Super-pixel sparse and unmixed detection method based on pathological section hyperspectral image
Korkmaz et al. Least square support vector machine and minumum redundacy maximum relavance for diagnosis of breast cancer from breast microscopic images
CN114066882A (en) Lung adenocarcinoma Ki67 expression level non-invasive detection method and device based on depth imaging omics
CN111833963A (en) cfDNA classification method, device and application
Beig et al. Radiogenomic analysis of hypoxia pathway reveals computerized MRI descriptors predictive of overall survival in glioblastoma
Zhu et al. A computerized tomography-based radiomic model for assessing the invasiveness of lung adenocarcinoma manifesting as ground-glass opacity nodules
Smedley et al. Using deep neural networks and interpretability methods to identify gene expression patterns that predict radiomic features and histology in non-small cell lung cancer
Pan et al. Application of radiomics in diagnosis and treatment of lung cancer
Yang et al. Deep learning model for the detection of prostate cancer and classification of clinically significant disease using multiparametric MRI in comparison to PI-RADs score
CN117275578B (en) Method for constructing multi-mode prediction model of lung cancer lymph node metastasis
Huang et al. Coupling radiomics analysis of CT image with diversification of tumor ecosystem: A new insight to overall survival in stage I− III colorectal cancer
CN116504406A (en) Method and system for constructing lung cancer postoperative risk model based on image combination pathology
CN110942808A (en) Prognosis prediction method and prediction system based on gene big data
Emaminejad et al. Applying a radiomics approach to predict prognosis of lung cancer patients
Zargar et al. Using VGG16 Algorithms for classification of lung cancer in CT scans Image
CN117275744B (en) Method for constructing lung cancer prognosis multi-mode prediction model by combining gene mutation characteristics and mIF image characteristics
US20130080101A1 (en) System, method and computer-accessible medium for evaluating a malignancy status in at-risk populations and during patient treatment management
Pham et al. Deep learning Of P73 biomarker expression in rectal cancer patients
Huang et al. Conditional generative adversarial network driven radiomic prediction of mutation status based on magnetic resonance imaging of breast cancer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CB03 Change of inventor or designer information

Inventor after: Li Hao

Inventor after: Chen Kezhong

Inventor after: Wang Jun

Inventor after: Yang Fan

Inventor after: Li Yun

Inventor after: Sheng Jianpeng

Inventor before: Li Hao

Inventor before: Wang Jun

Inventor before: Yang Fan

Inventor before: Li Yun

Inventor before: Sheng Jianpeng

CB03 Change of inventor or designer information
TR01 Transfer of patent right

Effective date of registration: 20240724

Address after: Room 301, 401, Building 11, Qianwan Bioport Phase I, Life Science and Technology Innovation Center, No. 3300 Benjing Avenue, Ningwei Street, Xiaoshan District, Hangzhou City, Zhejiang Province 311200

Patentee after: Hangzhou Yinfei Duozuo Biotechnology Co.,Ltd.

Country or region after: China

Address before: 100044 No. 11 South Main Street, Xicheng District, Beijing, Xizhimen

Patentee before: PEKING UNIVERSITY PEOPLE'S Hospital

Country or region before: China

TR01 Transfer of patent right