CN113571193A

CN113571193A - Method and device for constructing lymph node metastasis prediction model based on multi-view learning imaging omics fusion

Info

Publication number: CN113571193A
Application number: CN202110702168.7A
Authority: CN
Inventors: 牛田野; 杨婧; 罗辰
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2021-06-24
Filing date: 2021-06-24
Publication date: 2021-10-29
Anticipated expiration: 2041-06-24
Also published as: CN113571193B

Abstract

The invention discloses a method and a device for constructing a lymph node metastasis prediction model based on multi-view learning imaging omics fusion, which comprise the following steps: acquiring image data and preprocessing the image data to obtain tumor iconomics characteristics and lymph node iconomics characteristics; carrying out correlation analysis and feature screening based on a supervised feature selection strategy of test time budget in sequence aiming at the tumor imagemics feature and the lymph node imagemics feature to obtain a tumor imagemics feature sample and a lymph node imagemics feature sample; performing unsupervised learning from a multi-view subspace to a public space on the two samples by adopting an unsupervised multi-view partial least square method so as to map the lymph node imaging group characteristics and the tumor imaging group characteristics to the public space from the respective view subspaces; and (3) taking the lymph node imaging group characteristics and the tumor imaging group characteristics in the public space as input, taking the label of whether the lymph node is metastasized as output, and training a logistic regression classifier to obtain a lymph node metastasis prediction model.

Description

Method and device for constructing lymph node metastasis prediction model based on multi-view learning imaging omics fusion

Technical Field

The invention belongs to the technical field of multi-view learning, and particularly relates to a method and a device for constructing a lymph node metastasis prediction model based on multi-view learning image omics fusion.

Background

For cancer patients, lymph node metastasis determines the extent of their lymph node clearing, and is also one of the major independent prognostic factors. The method can accurately predict the lymph node state of a cancer patient before operation, and has important significance for avoiding over-treatment and reducing postoperative complications. Multiple studies have shown that preoperative CT imaging has helped to achieve individualized prediction of lymph node status in cancer patients, but these studies often utilize tumor imaging features or combinations with a small number of clinical pathological features (e.g., lymph node status in CT reports, serum biomarkers, TNM staging, etc.). In addition to these features, previous studies have found that lymph node imaging has predictive power for distinguishing between lymph node metastases in cancer patients. By fusing primary tumor and lymph node imaging omics characteristics, the lymph node status of a patient can be better predicted. The two types of imaging omics features can be considered as two views, and the two views can complement each other, and redundancy can exist. Simply combining the two does not fully describe the lymph node status information of cancer patients, limiting the ability to accurately predict lymph node metastasis preoperatively.

The multi-view learning technology based on artificial intelligence overcomes the limitation of single-view analysis by using data collected from multiple views, has attracted much attention in recent years, and is increasingly applied to the field of medical image processing and analysis, such as a medical image classification method and device based on multi-view learning and depth monitoring auto-encoder disclosed in patent application publication No. CN112488102A, and a multi-modal parameter model optimization fusion method based on image characteristics disclosed in patent application publication No. CN 111462116A.

However, the study of multi-view learning imaging omics fusion approach for cancer patients is not clear. There is a need for a multi-view learning fusion method effective in tumor imaging omics and lymph node imaging omics for accurate preoperative prediction of lymph node metastasis in cancer patients.

Disclosure of Invention

In view of the above, an object of the present invention is to provide a method and an apparatus for constructing a lymph node metastasis prediction model based on multi-view learning imaging omics fusion, which map multi-view features to a common space through the acquisition of a tumor imaging omic sample and a lymph node imaging omic sample and the design of a multi-view learning fusion algorithm, and construct a lymph node metastasis prediction model by using the features of the common space, so as to improve the prediction performance of the lymph node metastasis prediction model.

In a first aspect, an embodiment provides a method for constructing a lymph node metastasis prediction model based on multi-view learning imaging omics fusion, which includes the following steps:

acquiring image data, and performing region-of-interest delineation, region-of-interest difference value and image omics feature extraction on the image data to obtain a tumor image omics feature and a lymph node image omics feature;

carrying out correlation analysis and feature screening based on a supervised feature selection strategy of test time budget in sequence aiming at the tumor image omics features to obtain tumor image omics features with minimum quantity and identification capability as tumor image omics feature samples;

carrying out correlation analysis and feature screening based on a supervised feature selection strategy of test time budget in sequence aiming at the lymph node imaging group features to obtain the lymph node imaging group features with minimum quantity and discrimination capability as lymph node imaging group feature samples;

carrying out unsupervised learning from a multi-view subspace to a public space on the lymph node iconic system characteristic sample and the tumor iconic system characteristic sample by adopting an unsupervised multi-view partial least square method so as to map the lymph node iconic system characteristic and the tumor iconic system characteristic from the respective view subspaces to the public space;

and (3) taking the lymph node imaging group characteristics and the tumor imaging group characteristics in the public space as input, taking the label of whether the lymph node is metastasized as output, and training a logistic regression classifier to obtain a lymph node metastasis prediction model.

In a second aspect, the embodiment provides a lymph node metastasis prediction model based on multi-view learning imaging omics fusion, and the lymph node metastasis prediction model is constructed by the construction method of the first aspect.

In a third aspect, an embodiment provides an apparatus for constructing a lymph node metastasis prediction model based on multi-view learning imaging omics fusion, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the method for constructing a lymph node metastasis prediction model based on multi-view learning imaging omics fusion according to the first aspect is implemented.

The method and the device for constructing the lymph node metastasis prediction model based on multi-view learning imaging omics fusion have the advantages that at least:

the method has the advantages that the primary tumor imaging group and the lymph node imaging group are fused by utilizing a multi-view learning technology, the method has prediction capability for distinguishing patients with or without lymph node metastasis, in the multi-step feature selection method, a supervised feature selection algorithm based on test time budget is a supervised feature selection method with a global target, different numbers of features are selected by using proper budget, and the requirement that an imaging group sample is used as a prediction factor is met. The unsupervised multi-view partial least square method can maximize the cross covariance, and achieves better data visualization effect while learning an orthogonal projection matrix training prediction model.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a flowchart illustrating a method for constructing a lymph node metastasis prediction model based on multi-view learning imaging omics fusion according to an embodiment;

FIG. 2 is a graph of tumor and lymph node imaging characteristics retained after Pearson correlation coefficient analysis, according to an embodiment;

FIG. 3 is a graph of tumor imaging omics signature and lymph node imaging omics signature feature numbers after multi-step feature selection, as provided by an embodiment;

FIG. 4 is a diagram illustrating the visualization effect of a lymph node metastasis prediction model provided by an embodiment;

FIG. 5 is a comparison graph of the performance of the lymph node metastasis prediction model in the training set and the validation set according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the detailed description and specific examples, while indicating the scope of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.

Lymph node metastasis is a complex, continuous process that occurs with the development of a primary tumor. Relying only on primary tumor imaging omics and not on lymph node imaging omics, patients with lymph node metastasis and those with lymph node non-metastasis cannot be fully differentiated. Based on the above, the embodiment of the invention provides a lymph node metastasis prediction model comprehensively considering lymph node imaging group characteristics and tumor imaging group characteristics to realize the prediction of lymph node metastasis.

Fig. 1 is a flowchart of a method for constructing a lymph node metastasis prediction model based on multi-view learning imaging omics fusion according to an embodiment. As shown in fig. 1, the method for constructing a lymph node metastasis prediction model according to the embodiment includes the following steps:

step 1, collecting image data, and dividing the image data into a training set and a verification set.

In an embodiment, patient data nanocriteria are formulated and patient image data and clinical data are collected retrospectively. And carrying out random experiments on the incorporated image data and clinical data, dividing a training set and a verification set, and ensuring that the proportion of positive and negative samples of the training set and the verification set is consistent.

And 2, preprocessing image data, including region-of-interest delineation, region-of-interest difference and image omics feature extraction.

In an embodiment, when performing region-of-interest delineation of the image data, the primary tumor contour and the lymph node contour of each patient are manually delineated on the image by a plurality of imaging physicians as the tumor region-of-interest and the lymph node region-of-interest, respectively. If the image physicians disagree, another image physician participates in the discussion until a group consensus is reached. After obtaining the tumor interested area and the lymph node interested area, carrying out interpolation processing on the interested area data, and then respectively carrying out feature extraction on the tumor interested area and the lymph node interested area by adopting an internally developed Python opening source data packet so as to obtain a tumor imagery omics feature and a lymph node imagery omics feature.

And 3, performing feature selection on the tumor imaging omics features and the lymph node imaging omics features extracted in the step 2, wherein the feature selection comprises correlation analysis and feature screening based on a supervised feature selection strategy of test time budget.

In the embodiment, the tumor image omics characteristics and the lymph node image omics characteristics are subjected to correlation analysis by adopting Pearson correlation coefficient analysis, so that redundant characteristics are screened and removed, and the characteristic dimension is reduced. The specific process is as follows: and calculating a Pearson correlation coefficient of the tumor image omics feature pair aiming at the tumor image omics feature, wherein the larger the Pearson correlation coefficient is, the higher the correlation of the tumor image omics feature pair is, and the feature with the Pearson correlation coefficient larger than a certain threshold is considered as a redundant feature and is removed. The same operation is carried out aiming at the lymph node imaging group characteristics, namely, the Pearson correlation coefficient of a lymph node imaging group characteristic pair is calculated, the larger the Pearson correlation coefficient is, the higher the correlation of the lymph node imaging group characteristic pair is, the characteristic that the Pearson correlation coefficient is larger than a certain threshold value is considered as redundant characteristic, and the redundant characteristic is eliminated. The tumor and lymph node imaging characteristic numbers retained after pearson correlation coefficient analysis are shown in figure 2.

And (3) removing redundant features aiming at the correlation analysis, and performing feature screening on the remaining tumor imagomic features and lymph node imagomic features by adopting a supervised feature selection strategy based on test time budget so as to obtain the tumor imagomic features and lymph node imagomic features which are minimum in quantity and have identification capability to be respectively used as tumor imagomic feature samples and lymph node imagomic feature samples.

Supervised feature selection strategies based on test time budgets learn linear predictors by introducing binary index variables to select groups of features with explicit budget constraints, thereby increasing the total cost if the cost of each group is available. The supervised feature selection strategy based on test time budget is described as:

wherein ,

representing a training set of n patients and d features, x_iIs a feature vector for the ith patient, with the s-th feature characterized as x_i,sWherein the feature vector includes tumor imaging group feature vector, lymph node imaging group feature vector, y_iValues of 1 and 0 represent lymph node metastasis and lymph node non-metastasis, respectively, θ is an index vector, values of 0 or 1 represent non-selection or selected, respectively, w and b are coefficients and offsets of a linear predictor, f_iIs a linear predictor pair x_iIs defined at f_i and y_iEach feature vector is treated as a set with a cost of 1, i.e. c_sThe total budget B is the number of expected features to be selected, 1;

in the training set, repeated random experiments are carried out on a plurality of preset hyper-parameters C, the optimal hyper-parameter C is searched, the expected feature number B is further determined, and the tumor image omics feature and the lymph node image omics feature which are the minimum in number and have the identification capability and are obtained through screening are further determined to be respectively used as a tumor image omics feature sample and a lymph node image omics feature sample. The tumor imaging omics signature and lymph node imaging omics signature feature numbers after multi-step feature selection are shown in figure 3. In an embodiment, the values of the predetermined hyper-parameter C include 0.01,0.1,1,10, and 100.

And 4, performing space mapping by using the characteristic sample subjected to characteristic screening by adopting an unsupervised multi-view partial least square method and establishing a lymph node metastasis prediction model.

Tumor imaging omics and lymph node imaging omics are two different views of the lymph node classification prediction problem, and can complement each other or have redundancy. Simply combining the two together fails to fully describe lymph node metastasis information, limiting the ability to accurately predict lymph node status from gastric carcinoma. The multi-view learning technology overcomes the limitation of single-view analysis by using data collected from different views, provides an effective noninvasive auxiliary diagnosis strategy for imaging physicians, and has the potential of being applied to other clinical tasks and expanding to different patients.

In an embodiment, unsupervised multi-view partial least squares are used to perform unsupervised learning from multi-view subspace to common space for the lymph node iconic tissue feature samples and the tumor iconic tissue feature samples to map the lymph node iconic tissue feature and the tumor iconic tissue feature from the respective view subspaces to the common space.

The unsupervised multi-view partial least squares method presented in the examples learns a function for modeling tumor and lymph node imaging omics samples. On the premise of no lack of universality, X is set_LNFor lymph node View with lymph node imaging omics signature, X_TUTumor views with tumor imaging omics signature, each column representing one patient of the training set, are described as follows using unsupervised multiview partial least squares:

wherein ,X_LN and X_TUN columns of (2) have an average value of zero, P_LN,P_TURepresenting projection matrices from lymph node and tumor characteristic numbers to a potential common space dimension k, I_k∈R^k×kIs an identity matrix, and obtains an optimal projection matrix P by solving_LN and P_TU。

The unsupervised multi-view partial least square method can ensure orthogonality constraint, and meanwhile, covariance in a public space is maximized by means of a mature numerical linear algebraic technology. The existing method is often in the situation of unstable numerical value and can not ensure the orthogonality of the view specific projection matrix. The orthogonal projection not only has good measurement retention characteristics, but also provides a natural representation method similar to principal component analysis for data visualization.

In an embodiment, the optimal projection matrix P is passed_LN and P_TUView lymph node X_LNAnd tumor view X_TUMapping from the subspace of the respective views to the common space:

wherein ,z_LN、z_TUThe lymph node imaging group characteristics and tumor imaging group characteristics in the public space are shown respectively.

Although the two samples are located in different feature spaces, the two projection points are located in the same space.

When a lymph node metastasis prediction model is constructed, a lymph node imagomics feature and a tumor imagomics feature of a public space are used as input, a label of whether a lymph node is metastasized is used as output, and a logistic regression classifier is trained to obtain the lymph node metastasis prediction model. As for classification performance, the fusion of two projection points (i.e., z ═ z_LN；z_TU-) generally exhibit better performance, namely lymph node imaging omics characterization and tumor imaging omics input as public spaceThe characteristic splicing matrix is used for predicting whether lymph nodes are metastasized according to the splicing matrix, and a visualization effect schematic diagram is shown in fig. 4.

When visualizing features projected into a common space, an average of two projection points is calculated (i.e., z ═ z (z)_LN+z_TU) /2) to enable visualization of the feature.

And 5, evaluating the performance of the final prediction model in the verification set.

In the embodiment, the lymph node metastasis prediction model is verified by utilizing the lymph node iconic group characteristic samples and the tumor iconic group characteristic samples in the verification set, and five performance indexes of area, accuracy, recall rate and F1 value under a working characteristic curve of a subject are used for quantification during verification. The performance results for the training set and validation set are shown in fig. 5.

And training the lymph node metastasis prediction model which does not meet the requirement until the requirement is met.

In the technical scheme, the performance of the lymph node metastasis prediction model in verification concentration (area under a working characteristic curve of a subject: 0.8660) is superior to that of a prediction model only using tumor imaging omics (area under a working characteristic curve of a subject: 0.7582) and that of a prediction model only using lymph node imaging omics (area under a working characteristic curve of a subject: 0.8431), which indicates that the multi-view fusion method can better ensure better quantitative indexes than a single-view method.

The construction method of the lymph node metastasis prediction model based on multi-view learning imaging omics fusion adopts a two-step feature selection method to obtain the prediction factors of primary tumor imaging omics and lymph node imaging omics, then utilizes multi-view learning technology to fuse tumor and lymph node imaging information, overcomes the limitation of single-view analysis, provides a noninvasive auxiliary diagnosis strategy for lymph node metastasis prediction of gastric cancer for imaging physicians, has potential application in other solutions of clinical tasks, and meets the clinical requirements of different diseases.

The embodiment also provides a device for constructing the lymph node metastasis prediction model based on multi-view learning imaging omics fusion, which comprises a memory, a processor and a computer program stored in the memory and capable of being executed on the processor, wherein when the processor executes the computer program, the method for constructing the lymph node metastasis prediction model based on multi-view learning imaging omics fusion is realized.

In practical applications, the computer memory may be volatile memory at the near end, such as RAM, or may be non-volatile memory, such as ROM, FLASH, floppy disk, mechanical hard disk, etc., or may be a remote storage cloud. The computer processor can be a Central Processing Unit (CPU), a microprocessor unit (MPU), a Digital Signal Processor (DSP), or a Field Programmable Gate Array (FPGA), that is, the steps of the method for constructing the lymph node metastasis prediction model based on multi-view learning imaging omics fusion can be realized by these processors.

The above-mentioned embodiments are intended to illustrate the technical solutions and advantages of the present invention, and it should be understood that the above-mentioned embodiments are only the most preferred embodiments of the present invention, and are not intended to limit the present invention, and any modifications, additions, equivalents, etc. made within the scope of the principles of the present invention should be included in the scope of the present invention.

Claims

1. A method for constructing a lymph node metastasis prediction model based on multi-view learning imaging omics fusion is characterized by comprising the following steps of:

2. The method for constructing a lymph node metastasis prediction model based on multi-view learning imaging group fusion as claimed in claim 1, wherein a pearson correlation coefficient analysis is used to perform a correlation analysis on the tumor imaging group characteristics and the lymph node imaging group characteristics respectively so as to screen and remove redundant characteristics and reduce characteristic dimensions.

3. The method for constructing a lymph node metastasis prediction model based on multi-view learning imaging omics fusion as claimed in claim 1, wherein the supervised feature selection strategy based on test time budget is described as:

wherein ,

representing a training set of n patients and d features, x_iIs a feature vector for the ith patient, with the s-th feature characterized as x_i,sWherein, thisThe feature vector of (a) comprises a tumor imaging group feature vector, a lymph node imaging group feature vector, y_iValues of 1 and 0 represent lymph node metastasis and lymph node non-metastasis, respectively, θ is an index vector, values of 0 or 1 represent non-selection or selected, respectively, w and b are coefficients and offsets of a linear predictor, f_iIs a linear predictor pair x_iIs defined at f_i and y_iEach feature vector is treated as a set with a cost of 1, i.e. c_sThe total budget B is the number of expected features to be selected, 1;

in the training set, repeated random experiments are carried out on a plurality of preset hyper-parameters C, the optimal hyper-parameter C is searched, the expected feature number B is further determined, and the tumor image omics feature and the lymph node image omics feature which are the minimum in number and have the identification capability and are obtained through screening are further determined to be respectively used as a tumor image omics feature sample and a lymph node image omics feature sample.

4. The method for constructing a lymph node metastasis prediction model based on multi-view learning imaging omics fusion as claimed in claim 1, wherein the predetermined value of the hyper-parameter C comprises 0.01,0.1,1,10, 100.

5. The method for constructing the lymph node metastasis prediction model based on multi-view learning imaging group fusion as claimed in claim 1, wherein the description by the unsupervised multi-view partial least squares method is as follows:

wherein ,X_LN、X_TULymph node view and tumor view of tumor imaging omics signature, X, representing lymph node imaging omics sample, respectively_LN and X_TUN columns of (2) have an average value of zero, P_LN,P_TURepresenting projection matrices from lymph node and tumor characteristic numbers to a potential common space dimension k, I_k∈R^k×kIs a unitMatrix, obtaining optimal projection matrix P by solving_LN and P_TU；

Then, passing through the optimal projection matrix P_LN and P_TUView lymph node X_LNAnd tumor view X_TUMapping from the subspace of the respective views to the common space:

6. The method for constructing a lymph node metastasis prediction model based on multi-view learning imaging omics fusion as claimed in claim 1, further comprising:

verifying the predictive performance of the lymph node metastasis prediction model by using the lymph node iconic group characteristic samples and the tumor iconic group characteristic samples which are centrally verified, and quantifying five performance indexes of area, accuracy, recall rate and F1 value under a working characteristic curve of a subject during verification;

7. The lymph node metastasis prediction model based on multi-view learning imaging omics fusion is constructed by the construction method of the lymph node metastasis prediction model based on multi-view learning imaging omics fusion of any one of claims 1 to 6.

8. An apparatus for constructing a lymph node metastasis prediction model based on multi-view learning imaging omics fusion, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the method for constructing the lymph node metastasis prediction model based on multi-view learning imaging omics fusion according to any one of claims 1 to 6 when executing the computer program.