WO2024066722A1 - 目标模型的获取方法、预后评估值确定方法、装置、设备及介质 - Google Patents

目标模型的获取方法、预后评估值确定方法、装置、设备及介质 Download PDF

Info

Publication number
WO2024066722A1
WO2024066722A1 PCT/CN2023/110353 CN2023110353W WO2024066722A1 WO 2024066722 A1 WO2024066722 A1 WO 2024066722A1 CN 2023110353 W CN2023110353 W CN 2023110353W WO 2024066722 A1 WO2024066722 A1 WO 2024066722A1
Authority
WO
WIPO (PCT)
Prior art keywords
sample
sub
information
value
feature
Prior art date
Application number
PCT/CN2023/110353
Other languages
English (en)
French (fr)
Inventor
张振中
Original Assignee
京东方科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司 filed Critical 京东方科技集团股份有限公司
Publication of WO2024066722A1 publication Critical patent/WO2024066722A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/50ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders

Definitions

  • the present disclosure relates to the field of information processing technology, and in particular to a method for acquiring a target model, a method, a device, a equipment and a medium for determining a prognosis evaluation value.
  • Gliomas originate from glial cells and are the most common central nervous system tumors, accounting for about 50% to 60% of brain tumors. The incidence rate has been increasing year by year. Patients with gliomas generally need prognosis assessment after surgery.
  • prognostic assessment traditional prognosis is based on the size of the lesion, the extent of involvement, etc. as prognostic predictors, which has certain limitations.
  • the present disclosure provides a method for acquiring a target model, the method comprising:
  • sample group corresponding to each of the plurality of sample users, wherein the sample group includes sample information of multiple modalities, and the sample information of the multiple modalities includes at least two of nuclear magnetic resonance (MR) sample images, clinical sample information, and gene sample information;
  • MR nuclear magnetic resonance
  • Using the preset model feature extraction is performed on sample information of multiple modalities in the current sample group, and based on the extracted sample features, a predicted prognosis evaluation value and a consistency expression value are determined; wherein the consistency expression value is used to characterize the consistency degree of each sample feature corresponding to the same target disease;
  • the parameters of the preset model are updated based on the predicted prognostic evaluation value, the prognostic evaluation label corresponding to the current sample group, and the consistent expression value.
  • the updating of the parameters of the preset model based on the predicted prognostic evaluation value, the prognostic evaluation label corresponding to the current sample group, and the consistent expression value includes:
  • the parameters of the preset model are updated with the goal of minimizing the difference and maximizing the consistent expression value.
  • the updating of the parameters of the preset model with the goal of minimizing the difference and maximizing the consistent expression value includes:
  • loss ⁇ i (y′ i ⁇ y i ) 2 -consistency
  • the parameters of the preset model are updated with the goal of minimizing the difference and maximizing the consistent expression value
  • loss represents the loss value
  • y'i represents the predicted prognosis evaluation value
  • yi represents the prognosis evaluation label
  • consistency represents the consistent expression value
  • determining the consistent expression value based on the extracted sample features includes:
  • the consistent expression value is determined.
  • the sample information of multiple modalities includes multiple sub-sample information
  • the using the preset model to extract features of the sample information of multiple modalities in the current sample group respectively includes:
  • feature extraction is performed on a plurality of sub-sample information in the modality of sample information to obtain a plurality of corresponding sub-feature vectors;
  • Feature fusion is performed on the multiple sub-feature vectors corresponding to the sample information of each modality to obtain sample features corresponding to the sample information of this modality.
  • the plurality of sub-features corresponding to the sample information of each modality The feature vector is fused to obtain the sample features corresponding to the sample information of this modality, including:
  • multiple sub-feature vectors are fused to obtain sample features corresponding to the sample information of the modality.
  • the fusing of the plurality of sub-feature vectors based on the attention value to obtain sample features corresponding to the sample information of the modality includes:
  • all other sub-feature vectors are merged into the sub-feature vector to obtain a fused sub-vector of the sub-feature vector;
  • the plurality of fused sub-vectors are re-fused to obtain sample features corresponding to the sample information of the modality.
  • the parameters of the preset model include a first parameter matrix
  • the MR sample image includes a plurality of slice sample images
  • determining the attention value between every two sub-feature vectors includes:
  • an attention value between sub-feature vectors corresponding to every two slice sample images is determined.
  • the parameters of the preset model include a second parameter matrix and a third parameter matrix
  • the feature fusion of the multiple sub-feature vectors corresponding to the sample information of each modality to obtain the sample features corresponding to the sample information of the modality includes:
  • feature fusion is performed on the sub-feature vectors corresponding to each clinical sub-sample information to obtain the sample features corresponding to the clinical sample information;
  • feature fusion is performed on the sub-feature vectors corresponding to each gene sub-sample information to obtain the sample feature corresponding to the gene sample information.
  • the parameters of the preset model include a parameter set corresponding to the clinical sample information
  • the clinical sample information includes numerical sub-sample information and non-numerical sub-sample information
  • the preset model is used to respectively analyze the sample information of multiple modalities in the current sample group. Perform feature extraction, including:
  • the numerical sub-sample information is mapped to the target space to obtain a corresponding second sub-feature vector; the parameters in the parameter set are used to determine the dimension of the target space and the value at each spatial point;
  • the first sub-feature vector and the second sub-vector are fused to obtain a sample feature corresponding to the clinical sample information.
  • the parameter set includes a first parameter vector, a second parameter vector and a fourth parameter matrix, and based on the current value of each parameter in the parameter set, the numerical sub-sample information is mapped to the target space to obtain the corresponding second sub-feature vector, including:
  • mapping the numerical subsample information to a first dimension in the target space to obtain a mapping value of the first dimension; wherein the first parameter vector is used to determine the value of a spatial point in the target space on the first dimension;
  • the second sub-eigenvector is determined based on the first dimensional mapping value, the current value of the second parameter vector and the current value of the fourth parameter matrix; wherein the second parameter vector is used to determine the value of the spatial point of the target space in the second dimension, and the fourth parameter matrix is used to assign parameters to each spatial position in the first dimension and the second dimension.
  • the parameter set further includes a plurality of third parameter vectors, and after determining the second sub-eigenvector based on the first dimensional mapping value, the current value of the second parameter vector and the current value of the fourth parameter matrix, the method further includes:
  • the second sub-eigenvector is corrected according to the following formula:
  • va is the modified second sub-eigenvector
  • sa is the second sub-eigenvector
  • a 1 , a 2 and a 3 are the third parameter vectors respectively;
  • the first sub-feature vector and the second sub-vector are merged to obtain the sample feature corresponding to the clinical sample information, including:
  • the first sub-feature vector and the modified second sub-vector are fused to obtain the clinical Sample features corresponding to sample information.
  • the parameters of the preset model include a dimensional parameter matrix corresponding to the sample information of each modality, and the prediction prognosis evaluation value and the consistency expression value are determined based on the extracted sample features, including:
  • the sample features corresponding to the sample information of the modality are dimensional transformed to obtain transformed sample features
  • the predicted prognostic evaluation value and the consistent expression value are determined based on the converted sample features corresponding to each of the sample information of multiple modalities.
  • the preset model includes a fusion module, a prediction branch, a consistency expression branch, and a data processing module corresponding to each modality of sample information; using the preset model, extracting features of sample information of multiple modalities in the current sample group, and determining a predicted prognosis evaluation value and a consistency expression value based on the extracted sample features, including:
  • the sample features output by each of the data processing modules are fused;
  • the consistent expression branch is used to determine the consistent expression value corresponding to the sample feature output by each of the data processing modules.
  • the gene sample information includes information of at least one gene selected from the group consisting of isocitrate dehydrogenase, chromosome 1p/19q joint deletion status, telomerase reverse transcriptase gene promoter, and O6-methylguanine-DNA methyltransferase promoter region methylation;
  • the clinical sample information includes at least one of gender, age, histological diagnosis, tumor grade, medication information, and history of malignant tumor.
  • the present disclosure also provides a method for determining a prognostic evaluation value, the method comprising:
  • Acquiring information of multiple modalities of the subject to be tested wherein the information of multiple modalities includes nuclear magnetic resonance (MR) images, clinical information, and genetic information;
  • MR nuclear magnetic resonance
  • the information of the multiple modalities is input into the target model to obtain the prognostic evaluation value of the object to be tested; wherein the target model is obtained according to the target model acquisition method.
  • the preset model can be used to extract features of sample information of multiple modalities in the current sample group, and determine the predicted prognosis evaluation value and the consistency expression value based on the extracted sample features; then, based on the predicted prognosis evaluation value, the prognosis evaluation label corresponding to the current sample group, and the consistency expression value, the parameters of the preset model are updated.
  • the sample group disclosed in the present invention includes sample information of multiple modalities, specifically may include at least two of MR sample images, clinical sample information and genetic sample information, it is possible to use the information of sample users in different dimensions as reference factors to predict the prognostic evaluation value, thereby achieving rapid multi-factor prognostic analysis.
  • the consistent expression value is used as the basis for updating the parameters of the preset model, and the consistent expression value is used to characterize the consistency of each sample feature corresponding to the same target disease, wherein the extracted sample feature can be understood as a prognostic factor for prognosis prediction.
  • the preset model can extract the prognostic factors (sample features) related to the target disease from the information of each modality, and gradually abandon the prognostic factors that are not related to the target disease in the information of various modalities, so that the prognostic factors selected by the model have clinical importance, which can help improve the interpretability of the model, so that the results output by the target model have a higher prognostic reference value.
  • FIG1 schematically shows a flow chart of steps of a method for acquiring a target model
  • FIG2 schematically shows a schematic diagram of the structure of a preset model
  • FIG3 schematically shows a structural diagram of an image data processing module of the present disclosure
  • FIG4 schematically shows a schematic diagram of the structure of the ResNet network in FIG3 ;
  • FIG5 schematically shows a flowchart of the steps of a method for determining a prognostic evaluation value
  • FIG6 schematically shows a schematic structural diagram of a device for acquiring a target model
  • FIG7 schematically shows a schematic structural diagram of a device for determining a prognosis evaluation value
  • FIG8 schematically shows a structural block diagram of an electronic device of the present disclosure.
  • the prognosis of the disease is generally evaluated based on the size of the lesion and the presence or absence of enhancement, which has certain limitations.
  • brain glioma originates from glial cells and is the most common central nervous system tumor, accounting for about 50% to 60% of brain tumors, and the incidence has a tendency to increase year by year.
  • the World Health Organization divides gliomas into low-grade (I and II) and high-grade (III and IV). High-grade glioma (HGG) has a poor prognosis.
  • Grade IV glioblastoma is the most malignant, with a 10-year survival rate of less than 3% and a median survival of about 12 to 14 months. Previous studies have used tumor location, size, resection range and traditional imaging methods as prognostic predictors to evaluate the prognosis of gliomas, which has certain limitations.
  • the present disclosure proposes a method for prognosis evaluation.
  • the specific concept is to use sample information of multiple modalities as training samples, train a target model, and use the target model to perform prognosis evaluation, wherein the sample information of multiple modalities includes at least two of nuclear magnetic resonance MR sample images, clinical sample information, and gene sample information.
  • the information source can be enriched, so that Prognostic factors are screened in multiple dimensions.
  • the parameters of the preset model are updated with consistent expression values as factors, so that prognostic factors closely related to the target disease can be screened out from information in multiple modalities to improve the clinical importance of prognostic factors, so that the results output by the target model have a higher prognostic reference value.
  • FIG. 1 a flowchart of the method for obtaining a target model of the present disclosure is shown. As shown in FIG. 1 , the method may specifically include the following steps:
  • Step S101 obtaining sample groups corresponding to multiple sample users, wherein the sample groups include sample information of multiple modalities, and the sample information of multiple modalities includes at least two of MRI sample images, clinical sample information, and gene sample information;
  • Step S102 Based on multiple sample groups, iteratively train the preset model to obtain a target model, which is used to predict the prognosis evaluation value of the target object;
  • step S102 In each iteration of the training in step S102, the following steps are performed:
  • Step S1021 Using the preset model, extract features of the sample information of multiple modalities in the current sample group, and determine the predicted prognosis evaluation value and the consistency expression value based on the extracted sample features.
  • the consistency expression value is used to characterize the consistency of the sample information of multiple modalities corresponding to the same target disease.
  • Step S1022 based on the predicted prognostic evaluation value, the prognostic evaluation label corresponding to the current sample group, and the consistent expression value, the parameters of the preset model are updated.
  • the sample user may refer to a user who clearly knows the prognosis of survival and the prognosis of quality of life of the target disease.
  • the target disease may be any clinically known disease, such as the cranial brain tumors, common liver, gallbladder and lung tumors described above, and is not particularly limited here.
  • the age and gender of the sample users can be as diverse as possible.
  • the sample information of multiple modalities can include at least two of the nuclear magnetic resonance MR sample images, clinical sample information and gene sample information.
  • Sample information of different modalities can be used to reflect the representation of the sample users in the corresponding dimensions after suffering from the target disease.
  • MR sample information can reflect the morphological representation of the sample users in the organ tissue after suffering from the target disease
  • clinical sample information can reflect the representation of the sample users in diagnosis and treatment after suffering from the target disease
  • gene sample information can reflect the representation of the expression state of certain genes after the sample users suffer from the target disease.
  • the MR sample image can be used to reflect the morphological characteristics of the corresponding organ tissue after the sample user suffers from the target disease.
  • the nuclear magnetic resonance MR sample image can be directly obtained from the MR imaging device,
  • the sample image of the sample user can be obtained from a memory or any other suitable source.
  • the sample MR image of the sample user can be retrieved from the medical database.
  • the MR sample image is an image related to the target disease suffered by the sample user.
  • the MR sample image is an MR image of the sample user's brain, which can reflect the morphological characteristics of the brain tissue.
  • the MR sample image can be an MR image of the user's abdomen, which can reflect the morphological characteristics of the tissue in the abdominal cavity.
  • clinical sample information may include information about the sample user receiving diagnosis and treatment, such as drug information, hospitalization information, treatment plan information, attending physician information, hospital information, etc.
  • the gene sample information may include information about genes related to the occurrence and prognosis of the target disease.
  • the information about each gene in the gene sample information includes the name of the gene and the expression status of the gene. This is because the onset and prognosis of the disease can be reflected in the expression of some genes.
  • the TERT gene has no transcriptional activity in the vast majority of non-tumor cells, but TERT gene mutations exist in 73% of tumors, such as promoter mutations, gene translocations, and DNA amplifications. That is to say, the expression categories of the above genes have a certain correlation with tumors.
  • the nuclear magnetic resonance MR sample image, clinical sample information, and gene sample information may be used as a sample group of the sample user.
  • the MR sample image in the sample group may be necessary, that is, any one or both of the clinical sample information and the gene sample information may be combined with the MR sample image to obtain the sample group.
  • the sample group may include the nuclear magnetic resonance MR sample image and the clinical sample information, or may include the nuclear magnetic resonance MR sample image and the gene sample information, or may include the nuclear magnetic resonance MR sample image, the clinical sample information, and the gene sample information.
  • the preset model may be iteratively trained with multiple sample groups as training samples. Specifically, during each training, multiple sample groups may be input into the preset model in batches, or one sample group may be input at a time.
  • the preset model in each iterative training, can be used to extract features from sample information of multiple modalities in the current sample group, wherein the current sample group refers to the sample group input into the preset model at that time.
  • the current sample group refers to any sample group among the multiple sample groups input at that time.
  • the preset model can be used to extract features from the sample information of each modality in the current sample group, so as to obtain the sample features corresponding to each sample information.
  • the extracted features are the influencing genomics feature vectors
  • the extracted features are the feature vectors obtained after the clinical information is converted to a feature vector
  • the extracted features are the feature vectors obtained after the genetic information is converted to a feature vector.
  • the predicted prognostic evaluation value of the sample object corresponding to the current sample group can be determined based on the sample features extracted from each modal information.
  • the sample features corresponding to multiple modal information can be fused, and the predicted prognostic evaluation value can be determined based on the fused features.
  • sample information of different modalities should all contain descriptions of the target disease, and each should contain a large amount of consistent information about the target disease.
  • sample information of different modalities also contains information that is irrelevant to the target disease.
  • MR sample images contain images of other normal tissue sites in addition to the lesion site, and the images of normal tissue sites may not have relevant information in sample information of other modalities. For example, this information is not included in clinical sample information. Therefore, the preset model should learn to discard this useless information during the training process.
  • the preset model can determine the consistent expression value based on the sample features corresponding to multiple modal information, and incorporate the consistent expression value into the construction of the loss function to update the parameters of the preset model, so that the preset model can continuously enhance the expression level of the extracted sample features for the target disease based on the consistent expression value.
  • the consistent expression value can be used to characterize the consistency of the sample features corresponding to multiple modal information in reflecting the target disease, that is, whether the extracted sample features consistently express the target disease.
  • the sample features extracted from the MR sample image, clinical sample information and gene sample information are all used to express the features of the target disease.
  • the extracted sample features are used to express the features at the lesion site of the target disease.
  • the extracted sample features are the expression of the diagnosis and treatment plan of the target disease, as well as the expression of the target disease in terms of the patient's age, occupation, etc.
  • the extracted sample features are used to characterize the expression of the target disease in related genes.
  • the prognostic evaluation label of the present disclosure is the real prognosis of the sample object. If the prognosis survival period needs to be estimated, the prognosis evaluation label is the real prognosis survival period of the sample object.
  • the post-evaluation value can be expressed as the prognosis years; if the prognosis quality of life needs to be estimated, the prognosis evaluation label is the actual prognosis quality of life level of the sample object, including high level, low level and medium level, and the predicted prognosis evaluation value can be expressed as the prognosis quality of life level.
  • the consistent expression value, the predicted prognostic evaluation value, and the prognostic evaluation label are included in the construction of the loss function.
  • the expression degree of the extracted sample features on the target disease can be enhanced based on the consistent expression value, so that the prognostic factors used to determine the prognostic evaluation value are strongly correlated with the target disease, thereby improving the clinical importance of the prognostic factors, so that the prognostic evaluation value can be more medically valuable.
  • the parameters of the preset model can be continuously updated based on the difference between the predicted prognostic evaluation value and the prognostic evaluation label, so that as the training deepens, the predicted prognostic evaluation value predicted by the model can infinitely approach the prognostic evaluation label, thereby making the prognostic evaluation value closer to the prognostic evaluation value in the actual situation, thereby further improving the clinical reference value of the prognostic evaluation value.
  • the present invention utilizes sample information of multiple modalities to predict prognosis, thereby utilizing the complementarity between sample information of different modalities as a prognostic factor for the target disease, thereby enriching the data source of prognostic factors and improving the medical reference value of prognostic evaluation values.
  • the genetic sample information may include information on at least one gene of isocitrate dehydrogenase IDH, chromosome 1p/19q joint deletion status, telomerase reverse transcriptase gene promoter, and O6-methylguanine-DNA methyltransferase promoter region methylation;
  • the clinical sample information includes: at least one clinical information of gender, age, histological diagnosis, tumor grade, medication information, and history of malignant tumors.
  • the information of each gene in the gene sample information may include the name of the gene and the expression category of the gene, and the expression category may be different for different genes.
  • the expression categories in the information of the IDH gene are mutant and wild type
  • the expression state of the 1p/19q gene includes deletion and non-deletion
  • the expression category of the O6-methylguanine-DNA methyltransferase promoter region MGMT is methylated and unmethylated
  • the expression category of the telomerase reverse transcriptase gene TERT promoter includes mutant and wild type.
  • Table 1 The specific information of each type of gene is shown in Table 1 below:
  • the clinical sample information may include one or more of the sample subject's gender, age, histological diagnosis, tumor grade, medication information, and history of malignant tumors.
  • it may not be limited to the clinical information mentioned above, but may also include too much clinical information, such as the sample subject's occupation, location, etc. Any information that is related to the occurrence and prognosis of the target disease can be used as clinical information.
  • the clinical sample information may be as described in Table 2 below:
  • sample information of multiple modalities may include MR sample images, clinical sample information, and genetic sample information, that is, each sample group includes MR sample images, clinical sample information, and genetic sample information. In this way, feature extraction of sample information of three modalities is required in each iterative training.
  • each iterative training includes the stages of feature extraction, building a loss function based on the model output, and updating parameters.
  • the sample information of each modality can include multiple sub-sample information.
  • feature extraction can be performed on each sub-sample information in the sample information of each modality.
  • feature fusion is performed on the features of multiple sub-sample information in the sample information of one modality to obtain the sample features corresponding to the sample information of this modality.
  • Stage 1 Feature extraction stage.
  • the sample information of each modality includes multiple sub-sample information.
  • feature extraction is performed on the multiple sub-sample information in the sample information of the modality to obtain the corresponding multiple sub-feature vectors; then, feature fusion is performed on the multiple sub-feature vectors corresponding to the sample information of each modality to obtain the sample features corresponding to the sample information of the modality.
  • each sub-sample information may be as described in the above embodiment.
  • the feature fusion may be performed according to the following process to obtain the sample feature:
  • the subsample information can be a slice image of the MR sample image.
  • feature extraction can be performed on each slice image to obtain a sub-feature vector corresponding to each slice image.
  • the sub-feature vectors corresponding to each slice image are fused to obtain the sample feature of the MR sample image.
  • sub-sample information can be a type of clinical information in the clinical sample information.
  • age, gender, and tumor grade in the clinical information can all be used as sub-sample information.
  • feature extraction is performed on each type of information separately. Specifically, each type of information is converted into a feature vector to obtain a sub-feature vector corresponding to each type of information. Then, the sub-feature vectors corresponding to various clinical information in the clinical sample information are fused to obtain the sample features of the clinical sample information.
  • sub-sample information can be information of a gene in the gene sample information, such as information of IDH and information of TERT gene promoter. Then, the information of each gene can be converted into a sub-feature vector, and then the sub-feature vectors corresponding to the information of various genes in the gene sample information are fused to obtain the sample characteristics of the gene sample information.
  • the fusion when fusing the sub-feature vectors corresponding to the sample information of each modality, the fusion can be performed directly according to the preset weights corresponding to the sub-feature vectors.
  • a preset weight can be manually set for each slice image in advance, a weight can be preset for each clinical information, and a corresponding preset weight can be preset for each gene information.
  • the preset weight can characterize the importance of the seed sample information to the prognosis.
  • feature fusion can be performed based on the importance of each sub-feature vector, thereby fusing into features that are more important for the prognosis evaluation, thereby improving the medical value of the prognosis evaluation.
  • an attention mechanism can be integrated into a preset model.
  • the degree of correlation between each sub-feature vector can be determined through the attention mechanism, and then the sub-feature vectors can be fused based on the degree of correlation. This can fuse the features with a high degree of correlation in the sample information of the modality, thereby improving the correlation between prognostic factors.
  • the prognostic factors used by the target model for prognostic evaluation are closely related factors, further improving the clinical importance and making the target model clinically interpretable.
  • the attention value between every two sub-feature vectors can be determined, and based on the attention value, multiple sub-feature vectors are fused to obtain sample features corresponding to the sample information of this modality.
  • the attention value is used to characterize the closeness between two sub-feature vectors.
  • the two sub-feature vectors can be fused based on the attention value between each two sub-feature vectors to obtain a fused vector, thereby obtaining multiple fused vectors, and then the multiple fused vectors are fused to obtain sample features.
  • the sub-feature vectors include vector i, vector j and vector k.
  • vector i and vector j can be fused to obtain fused vector ij; similarly, fused vector ik and fused vector jk are obtained, and then fused vector ij, fused vector ik and fused vector jk are fused to obtain sample features.
  • all other sub-feature vectors can be fused into the sub-feature vector to obtain a fused sub-vector of the sub-feature vector; and multiple fused sub-vectors can be re-fused to obtain sample features corresponding to the sample information of this modality.
  • the sub-feature vector includes vector i, vector j and vector k.
  • vector j and vector k can be fused to vector i according to the attention value between vector i and vector j, and the attention value between vector i and vector k, to obtain the fused sub-vector i'; similarly, the fused sub-vector j' and the fused sub-vector k' are obtained, and then the fused sub-vector i', the fused sub-vector j' and the fused sub-vector k' are fused to obtain the sample feature.
  • the attention value can be determined according to the following process, and feature fusion can be performed based on the attention value:
  • the attention value between each two sub-feature vectors it can be determined based on the average attention value between the multiple clinical information and the two sub-feature vectors, wherein the average attention value between the multiple clinical information can be obtained by referring to the following formula (1);
  • va represents the sub-feature vector corresponding to age
  • vg represents the sub-feature vector corresponding to gender
  • vh represents the sub-feature vector corresponding to histological diagnosis
  • vhom represents the sub-feature vector corresponding to the history of malignant tumor
  • vd represents the sub-feature vector corresponding to medication information
  • vgr represents the sub-feature vector corresponding to tumor grade information
  • S represents the average attention value.
  • Si represents the attention value between sub-feature vector vi and sub-feature vector vj.
  • a sub-feature vector when performing feature fusion, can be fused with all other sub-feature vectors, and then the fused sub-vectors obtained can be fused; specifically, taking the fusion of sub-feature vectors corresponding to age information in clinical information as an example, the fused sub-vector corresponding to age information can be determined according to the following formula (III):
  • va_att is the fused sub-vector corresponding to the age information.
  • the process of determining the attention value and performing feature fusion based on the attention value can refer to the above-mentioned clinical sample information, which will not be repeated here.
  • the attention value can be determined according to the following process, and feature fusion can be performed based on the attention value:
  • parameter matrices can be set for both the feature extraction and feature fusion stages, and the parameter matrices can be continuously updated as the model is trained, so as to extract prognostic factors with higher importance.
  • the MR sample image includes multiple slice sample images, and each slice sample image is a sub-sample information.
  • the attention value between the sub-feature vectors corresponding to each two slice sample images can be determined based on the current value of the first parameter matrix; then, the sub-feature vectors can be fused based on the attention value.
  • the attention value between the sub-feature vectors corresponding to every two slice sample images can be determined according to the following formula (IV):
  • Q and K are first parameter matrices, where the values of Q and K can be different.
  • Q and K can be 512 ⁇ 512 parameter matrices
  • vi is the sub-feature vector corresponding to the i-th slice sample image
  • vj is the sub-feature vector corresponding to the j-th slice sample image
  • ⁇ i,j is the attention value between the i-th slice sample image and the j-th slice sample image
  • each fused sub-vector is fused according to the following formula (VI) to obtain the sample features of the MR sample image:
  • SV represents the sample feature of the MR sample image
  • SV i represents the fused sub-vector corresponding to the i-th slice sample image.
  • parameter matrices are set for both feature extraction and feature fusion stages, the parameter matrices can be continuously updated as the model is trained, thereby extracting prognostic factors of higher importance.
  • fusion can be further performed based on the set parameter matrix.
  • feature fusion is performed on the sub-feature vectors corresponding to each clinical sub-sample information to obtain the sample features corresponding to the clinical sample information;
  • feature fusion is performed on the sub-feature vectors corresponding to each gene sub-sample information to obtain the sample features corresponding to the gene sample information.
  • each fused sub-vector corresponding to the sub-feature vector obtained by the above formula (III) can be performed on the fused sub-vectors corresponding to the sub-feature vectors corresponding to each clinical sub-sample information to obtain the sample features corresponding to the clinical sample information.
  • CV represents the sample characteristics corresponding to the clinical sample information
  • vp is the second parameter matrix, which can be a 128 ⁇ 1 parameter vector.
  • the fusion of the fused sub-vectors corresponding to each gene sub-sample information can also be performed by referring to the above formula (VII) and formula (VIII), wherein the second parameter matrix and the third parameter matrix can both be 128 ⁇ 1 parameter vectors, and the parameters in the second parameter matrix and the third parameter matrix are updated as the preset model is updated, that is, they are updated according to the loss value of the loss function.
  • the clinical sample information includes numerical sub-sample information and non-numerical sub-sample information.
  • the numerical sub-sample information can be mapped to a vector space according to the parameter set of the model, and the parameters in the parameter set are continuously updated during the training process, so that the numerical clinical information of different sample objects can be mapped within a spatial range, and then the comprehensive prognostic factors of the numerical clinical information are extracted. For example, for different patients of the same age, after the sample group of patient A is input into the preset model training, the parameters in the parameter set are updated.
  • the age is vector-mapped based on the parameters of the updated parameter set.
  • the corresponding feature vectors of different patients of the same age can be different during the training process, but vary within a certain spatial range. Therefore, the influence of age group on prognosis can be obtained in the prognostic evaluation.
  • the non-numerical sub-sample information of the clinical sample information in the current sample group can be converted into a first sub-feature vector; and based on the current values of each parameter in the parameter set, the numerical sub-sample information is mapped to the target space to obtain the corresponding second sub-feature vector; the first sub-feature vector and the second sub-vector are fused to obtain the sample features corresponding to the clinical sample information.
  • the parameters in the parameter set are used to determine the dimension of the target space and each spatial point The value on .
  • non-numeric sub-sample information can refer to sub-sample information in string format or text type sub-sample information.
  • gender "male” is text type sub-sample information
  • tumor grade is string type sub-sample information.
  • the feature vectors corresponding to these sub-sample information can be preset in advance. Then, for the clinical sample information in the current sample group, the first sub-feature vector corresponding to the non-numeric sub-sample information in the current sample group can be obtained by looking up the table.
  • the table includes feature vectors corresponding to various non-numeric sub-sample information, which can be understood as a fixed feature vector.
  • the numerical sub-sample information can be sub-sample information of the numerical type.
  • the age "62" is the sub-sample information of the numerical type.
  • the numerical sub-sample information can be mapped to the target space based on the current values of each parameter in the parameter set.
  • the target space can be a vector space
  • the vector space can be a two-dimensional space, including multiple values on the first dimension of the two-dimensional space, and multiple values on the second dimension. That is to say, a numerical value can be dispersed to various positions in the target space to obtain a second sub-feature vector corresponding to the numerical value.
  • each parameter in the parameter set can be understood as a weight value that disperses the numerical value to each position in the target space.
  • the process of fusing the first sub-feature vector and the second sub-vector to obtain the sample features corresponding to the clinical sample information can refer to the above example.
  • the first sub-feature vector and the second sub-vector together constitute multiple sub-feature vectors, which can be fused according to the above-mentioned fusion method of multiple sub-feature vectors.
  • the parameter set includes a first parameter vector, a second parameter vector and a fourth parameter matrix.
  • the numerical sub-sample information can be mapped to the first dimension in the target space based on the current value of the first parameter vector to obtain the mapping value of the first dimension;
  • the second sub-feature vector is determined based on the first dimension mapping value, the current value of the second parameter vector and the current value of the fourth parameter matrix;
  • the first parameter vector is used to determine the value of the spatial point of the target space in the first dimension
  • the second parameter vector is used to determine the value of the spatial point of the target space in the second dimension
  • the fourth parameter matrix is used to assign parameters to each spatial position in the first dimension and the second dimension.
  • the first parameter vector can be understood as the distribution of values to each of the first dimension of the target space.
  • the second parameter vector can be understood as the weight value for dispersing the numerical value to each position on the second dimension in the target space
  • the fourth parameter matrix can be understood as the weight value for dispersing the numerical value to each position in the target space.
  • w is the first parameter vector, which can be a parameter vector of 128 ⁇ 1 dimension in practice, and a is numerical sub-sample information, for example, a is the age value "62".
  • b is the second parameter vector, which can actually be a parameter vector of 128 ⁇ 1 dimension
  • W is the fourth parameter matrix, which can actually be a parameter matrix of 128, and Sa is the second sub-eigenvector.
  • the parameter set further includes a plurality of third parameter vectors.
  • the second sub-eigenvector may be modified, specifically, according to the following formula (XI):
  • va is the modified second sub-eigenvector
  • sa is the second sub-eigenvector
  • a 1 , a 2 and a 3 are third parameter vectors respectively, and specifically, the three third parameter vectors may all be 128*1 parameter vectors, wherein differences are allowed to exist between the three third parameter vectors.
  • the first sub-feature vector and the corrected second sub-vector may be fused to obtain the sample feature corresponding to the clinical sample information.
  • the sample features corresponding to the sample information of multiple modalities when the sample features corresponding to the sample information of multiple modalities are fused, the sample features can be mapped to the same space and then fused.
  • the parameters of the preset model include a dimensional parameter matrix corresponding to the sample information of each modality, so that the sample features corresponding to the sample information of each modality can be dimensionally transformed based on the sample features corresponding to the sample information of each modality and the corresponding dimensional parameter matrix to obtain the converted sample features; based on the converted sample features corresponding to the sample information of multiple modalities, the predicted prognostic evaluation value and the consistency expression value are determined.
  • each dimensional parameter matrix is used to adjust the dimension of the sample features corresponding to the sample signals of the modality.
  • PV 1 is the converted sample feature corresponding to the MR sample image
  • PV 2 is the converted sample feature corresponding to the clinical sample information
  • PV 3 is the converted sample feature corresponding to the genetic sample information
  • SV is the sample feature corresponding to the MR sample image
  • CV is the sample feature corresponding to the clinical sample information
  • GV is the sample feature corresponding to the genetic sample information.
  • M1 is the dimensional parameter matrix corresponding to the MR sample image, which can be a 64 ⁇ 512 parameter matrix
  • M2 is the dimensional parameter matrix corresponding to the clinical sample information, which can be a 64 ⁇ 128 parameter matrix
  • M3 is the dimensional parameter matrix corresponding to the gene sample information, which can be a 64 ⁇ 128 parameter matrix.
  • Phase 2 Parameter update phase.
  • the parameters of the preset model based on the predicted prognostic evaluation value, the prognostic evaluation label corresponding to the current sample group, and the consistent expression value.
  • the difference between the predicted prognostic evaluation value and the prognostic evaluation label can be obtained; the parameters of the preset model are updated with the goal of minimizing the difference and maximizing the consistent expression value.
  • the difference between the predicted prognostic evaluation value and the prognostic evaluation label can reflect the distance between the prognostic evaluation value predicted by the preset model and the actual prognostic evaluation value.
  • the consistency expression value can characterize the consistency of each sample feature corresponding to the same target disease.
  • the training goal can be to minimize the difference between the predicted prognostic evaluation value and the prognostic evaluation label, and to maximize the consistency.
  • the parameters of the preset model can be updated with the goal of minimizing differences and maximizing consistent expression values.
  • the parameters of the preset model are updated with the goal of minimizing the difference and maximizing the consistent expression value; wherein, loss represents the loss value, y'i represents the wherein the predicted prognostic evaluation value, y i represents the prognostic evaluation label, and consistency represents the consistent expression value.
  • the training goal is to minimize the loss value.
  • the loss value it is necessary to minimize the difference and maximize the consistency expression value.
  • the consistency expression value disclosed in the present invention can be a value between 0-1.
  • the process of determining the consistent expression value may be as follows:
  • each sample feature is transposed to obtain the transposed feature corresponding to each sample feature; then, for two different sample features, the transposed features corresponding to one sample feature and the other sample feature are fused to obtain the corresponding fused feature value; based on each fused feature value, the consistent expression value is determined.
  • PV is the sample feature
  • PVT is the transposed feature
  • the result of the dot product between the two is 1; that is, by transposing, the sample feature can be normalized.
  • transposed features corresponding to one sample feature and another sample feature can be fused according to the following formula (XVII) to obtain the corresponding fused feature value:
  • the consistent expression value can be determined according to formula (XVIII):
  • PV 1 is the sample feature of MR sample image
  • PV 2 is the sample feature of clinical sample information
  • PV 3 is the sample feature of gene sample information.
  • FIG. 2 a schematic diagram of the structure of a preset model of the present disclosure is shown, which may include a fusion module, a prediction branch, a consistency expression branch, and a data processing module corresponding to the sample information of each modality.
  • the sample information of each modality in the current sample group can be subjected to feature extraction by using the data processing module corresponding to the sample information of the modality; and the sample features output by each data processing module can be fused by using the fusion module; then, the predicted prognosis evaluation value corresponding to the feature output by the fusion module can be determined by using the prediction branch; and then, the consistency expression value corresponding to the sample features output by each data processing module can be determined by using the consistency expression branch.
  • the sample information of each modality disclosed in the present invention is input into the corresponding data processing module, which performs feature extraction on the sample information of the modality.
  • the extracted sample features are input into the fusion module, which fuses the sample features.
  • the fused sample features are input into the prediction module, which determines the predicted prognostic evaluation value based on the fused sample features.
  • the training samples include sample groups corresponding to multiple sample users, each sample group includes MR sample images, clinical sample information and gene sample information of the sample user.
  • the image data processing module is used to extract features from multiple slice subsample images in the MR sample image, and after extraction, perform feature fusion on the sub-feature vectors corresponding to each slice subsample image to obtain sample features corresponding to the MR sample image.
  • FIG. 3 shows a schematic diagram of the structure of the image data processing module
  • FIG. 4 shows a schematic diagram of the structure of the ResNet network in FIG. 3 .
  • a plurality of slice sample images are included, such as Slice1-Slice n.
  • Each slice sample image is input into a corresponding ResNet network, and the ResNet network extracts features from the slice sample image.
  • the size of each slice sample image input in this example is 256 ⁇ 256. It first passes through a 7*7 convolution kernel with a step size of 2 and a 3*3 maximum convolution kernel with a step size of 2. The large pooling step, in this way, the 256*256 input slice image becomes a 64*64 feature map, which effectively reduces the size required for storage, and then enters multiple ResNet_Block and downsampling modules in sequence.
  • the network layer 1 composed of 3 ResNet_Blocks and downsampling modules
  • the network layer 2 composed of 3 ResNet_Blocks and 1 downsampling module
  • the network layer 3 composed of 5 ResNet_Blocks and 1 downsampling module
  • the network layer 4 composed of 2 ResNet_Blocks; followed by the average pooling layer, and finally outputs the sub-feature vector of each slice sample image, which can be a 512 ⁇ 1 vector.
  • each sub-feature vector enters the self-attention layer, and the attention value between every two sub-feature vectors is determined by the self-attention layer. Then, the attention value between every two sub-feature vectors is input into the feature representation layer.
  • the feature representation layer obtains the sample features corresponding to the MR sample image based on the attention value according to the above formulas (IV) and (V).
  • each ResNet_Block includes a 3 ⁇ 3 convolution with a stride of 1, Batch Norm regularization, ReLU activation function, a 3 ⁇ 3 convolution with a stride of 1, and Batch Norm regularization, as shown on the right.
  • the downsampling module structure is similar to ResNet_Block, but uses a 3 ⁇ 3 convolution with a stride of 2, Batch Norm regularization, ReLU activation function, a 3 ⁇ 3 convolution with a stride of 1, Batch Norm regularization, and downsampling (1 ⁇ 1 convolution with a stride of 2 and Batch Norm regularization).
  • the clinical data processing module is used to perform feature conversion on multiple types of clinical information in the clinical sample information to obtain sample features corresponding to the clinical sample information.
  • this example uses gender, age, histological diagnosis, tumor grade, medication information, and malignant tumor history as clinical sample information, and then maps the information into sub-feature vectors.
  • sub-feature vectors can be represented by vg, vh, vgr, vd, and vhom; specifically, the lookup table method can be used to complete the vector mapping of character-type sub-sample information.
  • each sub-feature vector is fused using formula (VII) to obtain the sample features corresponding to the clinical sample information.
  • the sub-feature vector corresponding to the numerical sub-sample information can be obtained according to formula (9), formula (10) and formula (11).
  • the gene data processing module is used to perform feature conversion on information of multiple genes in the gene sample information to obtain sample features corresponding to the gene sample information.
  • the lookup table method can be used to complete the vector mapping of each gene information to obtain the sub-feature vector corresponding to each gene information. After that, the sub-feature vectors are fused using formula (VII) to obtain the sample features corresponding to the gene sample information.
  • the first parameter matrix, the second parameter matrix, the third parameter matrix, the fourth parameter matrix and the first parameter vector, the second parameter vector and the three third parameter vectors in the parameter set mentioned above can be updated synchronously. In this way, based on the two optimization objectives, the accuracy of the data processing module, the fusion module and the prediction module of the three modalities can be affected simultaneously in one training.
  • S6 stop training the preset model after multiple updates, or when the difference between the predicted prognostic evaluation value and the prognostic evaluation label is less than the preset difference, and the consistency expression value is higher than or equal to the preset expression value, and use the preset model when stopping training as the target model, and use the target model to predict the patient's prognostic evaluation value.
  • FIG5 a flowchart of a method for determining a prognosis evaluation value of the present disclosure is shown. As shown in FIG5 , the method may specifically include the following steps:
  • Step S501 obtaining information of multiple modalities of the object to be tested, wherein the information of multiple modalities includes nuclear magnetic resonance MR images, clinical information and gene information;
  • Step S502 input the information of the multiple modalities into a target model to obtain a prognostic evaluation value of the object to be tested; wherein the target model is obtained by the target model acquisition method described in the above embodiment.
  • the target model after obtaining the target model, can be used to predict the patient's prognostic evaluation value.
  • information of multiple modalities of the object to be tested can be obtained.
  • the specific information of multiple modalities of the object to be tested can be consistent with the modality used for training the preset model. For example, if the sample group used for training the preset model includes MR sample images, clinical sample information, and genetic sample information, then the information of multiple modalities of the object to be tested can also include MR images, clinical information, and genetic information of the object to be tested.
  • the target model uses the consistent expression value as the basis for updating the parameters of the preset model, and the consistent expression value is used to characterize the consistency of each sample feature corresponding to the same target disease.
  • the preset model can extract the prognostic factors (sample features) related to the target disease, and gradually discard the prognostic factors that are not related to the target disease in the information of various modalities, so that the prognostic factors selected by the model have clinical importance, which can help improve the interpretability of the model, so that the results output by the target model have a higher prognostic reference value.
  • the target model includes a data processing module corresponding to the sample information of each modality, and a fusion module connected to multiple data processing modules, and since the data processing module and the fusion module extract sample features that are strongly correlated with the expression of the target disease from the sample information of multiple modalities, therefore, in one application, after obtaining the target model, the data processing module and the fusion module in the target model can be extracted separately as a feature extraction model, and the feature extraction model can be used to extract features that are strongly correlated with the target disease from information of multiple modalities, and can thus be independently used for the screening of prognostic factors in the prognosis process.
  • the information of different modalities can be mapped to the space of important expression information of the target disease, so that the features extracted from the information of different modalities are closer in this space, thereby improving the complementarity of information of different modalities and reducing the impact of noise (unimportant information) and the clinical importance of the target disease, so that the target model has medical interpretability and its predicted prognostic evaluation value has a high medical reference value.
  • information from different modalities can complement each other, thereby enriching the data sources of prognostic factors and interpreting the expression of target diseases from multiple dimensions, thereby further improving the medical reference value of prognostic assessment.
  • the information within the same modality is combined through the self-attention mechanism to enhance the nonlinear representation ability of the target model.
  • the present disclosure also provides a device for acquiring a target model, as shown in FIG6 , which shows a schematic diagram of the structure of the device for acquiring a target model.
  • the device may specifically include the following modules:
  • the sample acquisition module 601 is used to acquire sample groups corresponding to multiple sample users, wherein the sample groups include sample information of multiple modalities, and the sample information of multiple modalities includes at least two of MRI sample images, clinical sample information, and gene sample information;
  • the training module 602 is used to iteratively train the preset model based on the plurality of sample groups to obtain the target model, wherein the target model is used to predict the prognosis evaluation value of the target object; wherein the following steps are performed in each iterative training:
  • Using the preset model feature extraction is performed on sample information of multiple modalities in the current sample group, and based on the extracted sample features, a predicted prognosis evaluation value and a consistency expression value are determined; wherein the consistency expression value is used to characterize the consistency degree of each sample feature corresponding to the same target disease;
  • the parameters of the preset model are updated based on the predicted prognostic evaluation value, the prognostic evaluation label corresponding to the current sample group, and the consistent expression value.
  • the training module 602 includes a parameter updating unit, and the parameter updating unit includes:
  • a difference determination subunit for obtaining the difference between the predicted prognostic evaluation value and the prognostic evaluation label
  • the parameter updating subunit is used to update the parameters of the preset model with the goal of minimizing the difference and maximizing the consistency expression value.
  • parameter updating subunit is specifically used to:
  • loss ⁇ i (y′ i - y i ) 2 -consistency
  • the parameters of the preset model are updated with the goal of minimizing the difference and maximizing the consistent expression value
  • loss represents the loss value
  • the predicted prognosis evaluation value represents the prognosis evaluation label
  • consistency represents the consistent expression value
  • the step of determining the consistent expression value based on the extracted sample features comprises: include:
  • the consistent expression value is determined.
  • the sample information of each modality includes a plurality of sub-sample information
  • the step of using the preset model to extract features from the sample information of the plurality of modalities in the current sample group comprises the following steps:
  • feature extraction is performed on a plurality of sub-sample information in the sample information of the modality to obtain a plurality of corresponding sub-feature vectors;
  • Feature fusion is performed on the multiple sub-feature vectors corresponding to the sample information of each modality to obtain sample features corresponding to the sample information of this modality.
  • the step of performing feature fusion on the multiple sub-feature vectors corresponding to the sample information of each modality to obtain sample features corresponding to the sample information of the modality includes:
  • multiple sub-feature vectors are fused to obtain sample features corresponding to the sample information of the modality.
  • the step of fusing the plurality of sub-feature vectors based on the attention value to obtain sample features corresponding to the sample information of the modality includes:
  • all other sub-feature vectors are merged into the sub-feature vector to obtain a fused sub-vector of the sub-feature vector;
  • the plurality of fused sub-vectors are re-fused to obtain sample features corresponding to the sample information of the modality.
  • the parameters of the preset model include a first parameter matrix
  • the MR sample image includes a plurality of slice sample images
  • the step of determining the attention value between every two sub-feature vectors includes:
  • the parameters of the preset model include a second parameter matrix and a third parameter matrix
  • the step of performing feature fusion on the multiple sub-feature vectors corresponding to the sample information of each modality to obtain sample features corresponding to the sample information of the modality includes:
  • feature fusion is performed on the sub-feature vectors corresponding to each clinical sub-sample information to obtain the sample features corresponding to the clinical sample information;
  • feature fusion is performed on the sub-feature vectors corresponding to each gene sub-sample information to obtain the sample feature corresponding to the gene sample information.
  • the parameters of the preset model include a parameter set corresponding to the clinical sample information
  • the clinical sample information includes numerical sub-sample information and non-numerical sub-sample information
  • the step of using the preset model to extract features of sample information of multiple modalities in the current sample group includes:
  • the numerical sub-sample information is mapped to the target space to obtain a corresponding second sub-feature vector; the parameters in the parameter set are used to determine the dimension of the target space and the value at each spatial point;
  • the first sub-feature vector and the second sub-vector are fused to obtain a sample feature corresponding to the clinical sample information.
  • the parameter set includes a first parameter vector, a second parameter vector and a fourth parameter matrix
  • the step of mapping the numerical subsample information to a target space based on the current value of each parameter in the parameter set to obtain a corresponding second sub-feature vector includes:
  • mapping the numerical subsample information to a first dimension in the target space to obtain a mapping value of the first dimension; wherein the first parameter vector is used to determine the value of a spatial point in the target space on the first dimension;
  • the second sub-eigenvector is determined; wherein the second parameter vector is used to determine the value of the spatial point of the target space in the second dimension, and the fourth parameter matrix is used to Each spatial position in the first dimension and in the second dimension is assigned a parameter.
  • the parameter set further includes a plurality of third parameter vectors
  • the device further includes:
  • a correction module is used to correct the second sub-feature vector according to the following formula based on the second sub-feature vector and the plurality of third parameter vectors:
  • va is the modified second sub-eigenvector
  • sa is the second sub-eigenvector
  • a 1 , a 2 and a 3 are the third parameter vectors respectively;
  • the step of fusing the first sub-feature vector and the second sub-vector to obtain the sample feature corresponding to the clinical sample information includes:
  • the first sub-feature vector and the corrected second sub-vector are fused to obtain the sample feature corresponding to the clinical sample information.
  • the parameters of the preset model include a dimensional parameter matrix corresponding to the sample information of each modality
  • the step of determining the predicted prognostic evaluation value and the consistent expression value based on the extracted sample features includes:
  • the sample features corresponding to the sample information of the modality are dimensional transformed to obtain transformed sample features
  • the predicted prognostic evaluation value and the consistent expression value are determined based on the converted sample features corresponding to each of the sample information of multiple modalities.
  • the preset model includes a fusion module, a prediction branch, a consistent expression branch, and a data processing module corresponding to each modality of sample information; the step of using the preset model to extract features of sample information of multiple modalities in the current sample group, and determining a predicted prognostic evaluation value and a consistent expression value based on the extracted sample features, includes:
  • the sample features output by each of the data processing modules are fused;
  • the consistent expression branch is used to determine the consistent expression value corresponding to the sample feature output by each of the data processing modules.
  • the gene sample information includes information of at least one gene selected from the group consisting of isocitrate dehydrogenase, chromosome 1p/19q joint deletion status, telomerase reverse transcriptase gene promoter, and O6-methylguanine-DNA methyltransferase promoter region methylation;
  • the clinical sample information includes at least one of gender, age, histological diagnosis, tumor grade, medication information, and history of malignant tumor.
  • FIG. 7 a schematic diagram of a structure of a device for determining a prognosis evaluation value is shown. As shown in FIG. 7 , the device includes:
  • the information acquisition module 701 is used to acquire information of multiple modalities of the object to be tested, wherein the information of multiple modalities includes nuclear magnetic resonance MR images, clinical information and gene information;
  • the input module 702 is used to input the information of the multiple modalities into the target model to obtain the prognostic evaluation value of the object to be tested; wherein the target model is obtained according to the target model acquisition method.
  • the present disclosure also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the target model acquisition method when executed, or implements the prognostic evaluation value determination method when executed.
  • FIG. 8 a structural block diagram of an electronic device 800 according to an embodiment of the present disclosure is shown.
  • an electronic device provided by an embodiment of the present invention can be used to execute a classification model acquisition method or a mutation category determination method for a TERT gene promoter.
  • the electronic device 800 may include a memory 801, a processor 802, and a computer program stored in the memory and executable on the processor, wherein the processor 802 is configured to execute the image processing method.
  • the electronic device 800 can completely include an input device 803, an output device 804 and a data acquisition device 805, wherein when executing the image processing method of the embodiment of the present disclosure, the data acquisition device 805 can obtain information of multiple modalities, and then the input device 803 can obtain information of multiple modalities of the data acquisition device 805, and the information of multiple modalities can be processed by the processor 802.
  • the processing can specifically execute the above-mentioned target model acquisition method and the above-mentioned prognosis evaluation value determination method.
  • the output device 804 can output the target model, or can output the prognosis evaluation value result output by the target model.
  • the memory 801 may include a volatile memory and a non-volatile memory, wherein the volatile memory may be understood as a random access memory for storing and saving Data.
  • Non-volatile memory refers to computer memory in which the stored data does not disappear when the current is turned off.
  • the computer program of the target model acquisition method or the prognostic evaluation value determination method disclosed in the present invention can be stored in a volatile memory and a non-volatile memory, or in either one of the two.
  • references to "one embodiment,””embodiment,” or “one or more embodiments” herein mean that a particular feature, structure, or characteristic described in conjunction with the embodiment is included in at least one embodiment of the present disclosure.
  • examples of the term “in one embodiment” herein do not necessarily refer to the same Embodiments.
  • any reference signs placed between brackets shall not be construed as limiting the claims.
  • the word “comprising” does not exclude the presence of elements or steps not listed in the claims.
  • the word “a” or “an” preceding an element does not exclude the presence of a plurality of such elements.
  • the present disclosure may be implemented by means of hardware comprising a number of different elements and by means of a suitably programmed computer. In a unit claim enumerating a number of means, several of these means may be embodied by the same item of hardware.
  • the use of the words first, second, and third, etc. does not indicate any order. These words may be interpreted as names.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Computational Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Evolutionary Computation (AREA)
  • Pure & Applied Mathematics (AREA)
  • Public Health (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Pathology (AREA)
  • Algebra (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

一种目标模型的获取方法、预后评估值确定方法、装置、设备及介质,所述方法包括:获取多个样本用户各自对应的样本组,样本组包括多种模态的样本信息,基于多个样本组,对预设模型进行迭代训练,得到目标模型,目标模型用于预测目标对象的预后评估值;其中,在每一次迭代训练中,可以利用预设模型,分别对当前样本组中多种模态的样本信息进行特征提取,并基于提取到的各样本特征,确定预测预后评估值和一致性表达值;其中,一致性表达值用于表征各样本特征对应同一目标疾病的一致性程度;基于预测预后评估值、当前样本组对应的预后评估标签,以及一致性表达值,对预设模型的参数进行更新。

Description

目标模型的获取方法、预后评估值确定方法、装置、设备及介质
本申请要求在2022年9月27日提交中国专利局、申请号为202211186768.3、发明名称为“目标模型的获取方法、预后评估值确定方法、装置、设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开涉及信息处理技术领域,特别是涉及一种目标模型的获取方法、预后评估值确定方法、装置、设备及介质。
背景技术
脑胶质瘤起源于神经胶质细胞,是最常见的中枢神经系统肿瘤,约占颅脑肿瘤的50%~60%,发生率有逐年上升的趋势,脑胶质瘤患者在术后一般需要进行预后评估。
在预后评估中,传统的预后是基于病灶部位的大小、波及范围等作为预后预测因素来评估,有一定的局限性。
概述
本公开提供了一种目标模型获取方法,所述方法包括:
获取多个样本用户各自对应的样本组,所述样本组包括多种模态的样本信息,所述多种模态的样本信息包括核磁共振MR样本图像、临床样本信息和基因样本信息中的至少两者;
基于多个所述样本组,对预设模型进行迭代训练,得到所述目标模型,所述目标模型用于预测目标对象的预后评估值;
其中,在每一次迭代训练中执行以下步骤:
利用所述预设模型,分别对当前样本组中多种模态的样本信息进行特征提取,并基于提取到的各样本特征,确定预测预后评估值和一致性表达值;其中,所述一致性表达值用于表征各所述样本特征对应同一目标疾病的一致性程度;
基于所述预测预后评估值、所述当前样本组对应的预后评估标签,以及所述一致性表达值,对所述预设模型的参数进行更新。
在一种可选的示例中,所述基于所述预测预后评估值、所述当前样本组对应的预后评估标签,以及所述一致性表达值,对所述预设模型的参数进行更新,包括:
获取所述预测预后评估值与所述预后评估标签之间的差异;
以最小化所述差异、最大化所述一致性表达值为目标,对所述预设模型的参数进行更新。
在一种可选的示例中,所述以最小化所述差异、最大化所述一致性表达值为目标,对所述预设模型的参数进行更新,包括:
基于所述差异和所述所述一致性表达值,构建如下损失函数:
loss=∑i(y′i-yi)2-consistency;
基于所述损失函数的损失值,以最小化所述差异、最大化所述一致性表达值为目标,对所述预设模型的参数进行更新;
其中,所述loss表示损失值、y'i表示所述预测预后评估值、yi表示所述预后评估标签、consistency表示一致性表达值。
在一种可选的示例中,所述基于提取到的各样本特征,确定一致性表达值,包括:
对每个所述样本特征进行转置,得到每个所述样本特征对应的转置特征;
对两个不同的样本特征,对其中一个所述样本特征与另一个样本特征对应的转置特征进行融合,得到对应的融合特征值;
基于各个所述融合特征值,确定所述一致性表达值。
在一种可选的示例中,多种模态的样本信息包括多个子样本信息,所述利用所述预设模型,分别对当前样本组中多种模态的样本信息进行特征提取,包括:
对每种模态的样本信息,分别对该种模态的样本信息中多个所述子样本信息进行特征提取,得到对应的多个子特征向量;
对每种模态的样本信息对应的多个所述子特征向量进行特征融合,得到该种模态的样本信息对应的样本特征。
在一种可选的示例中,所述对每种模态的样本信息对应的多个所述子特 征向量进行特征融合,得到该种模态的样本信息对应的样本特征,包括:
确定每两个子特征向量之间的注意力值,所述注意力值用于表征两个子特征向量之间的紧密程度;
基于所述注意力值,对多个所述子特征向量进行融合,得到所述该种模态的样本信息对应的样本特征。
在一种可选的示例中,所述基于所述注意力值,对多个所述子特征向量进行融合,得到所述该种模态的样本信息对应的样本特征,包括:
针对每个所述子特征向量,基于该子特征向量与其他全部子特征向量之间的注意力值,将其他全部子特征向量融合进该子特征向量,得到该子特征向量的融合后子向量;
对多个所述融合后子向量进行再融合,得到所述该种模态的样本信息对应的样本特征。
在一种可选的示例中,所述预设模型的参数包括第一参数矩阵,所述MR样本图像包括多个切片样本图像,所述确定每两个子特征向量之间的注意力值,包括:
针对所述MR样本图像所包括的每个切片样本图像,基于所述第一参数矩阵的当前值,确定每两个所述切片样本图像对应的的子特征向量之间的注意力值。
在一种可选的示例中,所述预设模型的参数包括第二参数矩阵和第三参数矩阵,所述对每种模态的样本信息对应的多个所述子特征向量进行特征融合,得到该种模态的样本信息对应的样本特征,包括:
针对所述临床样本信息中的每个临床子样本信息,基于所述第二参数矩阵的当前值,对各个所述临床子样本信息对应的子特征向量进行特征融合,得到所述临床样本信息对应的样本特征;
针对所述基因样本信息中的每个基因子样本信息,基于所述第三参数矩阵的当前值,对各个所述基因子样本信息对应的子特征向量进行特征融合,得到所述基因样本信息对应的样本特征。
在一种可选的示例中,所述预设模型的参数包括与所述临床样本信息对应的参数集,所述临床样本信息包括数值型的子样本信息和非数值型的子样本信息,所述利用所述预设模型,分别对当前样本组中多种模态的样本信息 进行特征提取,包括:
对所述当前样本组中的临床样本信息,将所述非数值型的子样本信息转换为第一子特征向量;
基于所述参数集中各个参数的当前值,将所述数值型的子样本信息映射至目标空间,得到对应的第二子特征向量;所述参数集中的参数用于确定所述目标空间的维度以及每个空间点上的值;
将所述第一子特征向量和所述第二子向量进行融合,得到所述临床样本信息对应的样本特征。
在一种可选的示例中,所述参数集包括第一参数向量、第二参数向量和第四参数矩阵,所述基于所述参数集中各个参数的当前值,将所述数值型的子样本信息映射至目标空间,得到对应的第二子特征向量,包括:
基于所述第一参数向量的当前值,将所述数值型的子样本信息映射至所述目标空间中的第一维度,得到所述第一维度的映射值;其中,所述第一参数向量用于确定所述目标空间在所述第一维度上的空间点的值;
基于所述第一维度映射值、所述第二参数向量的当前值和所述第四参数矩阵的当前值,确定所述第二子特征向量;其中,所述第二参数向量用于确定所述目标空间在第二维度上的空间点的值,所述第四参数矩阵用于为所述第一维度和所述第二维度上的每个空间位置赋予参数。
在一种可选的示例中,所述参数集还包括多个第三参数向量,在所述基于所述第一维度映射值、所述第二参数向量的当前值和所述第四参数矩阵的当前值,确定所述第二子特征向量之后,所述方法还包括:
基于所述第二子特征向量和多个所述第三参数向量,按照以下公式修正所述第二子特征向量:
其中,va为修正后的第二子特征向量,sa为所述第二子特征向量,a1、a2和a3分别为所述第三参数向量;
将所述第一子特征向量和所述第二子向量进行融合,得到所述临床样本信息对应的样本特征,包括:
将所述第一子特征向量和修正后的第二子向量进行融合,得到所述临床 样本信息对应的样本特征。
在一种可选的示例中,所述预设模型的参数包括与每种模态的样本信息对应的维度参数矩阵,所述基于提取到的各样本特征,确定预测预后评估值和一致性表达值,包括:
基于每种模态的样本信息对应的样本特征和所述维度参数矩阵,对该种模态的样本信息对应的样本特征进行维度变换,得到转换后样本特征;
基于多种模态的样本信息各自对应的转换后样本特征,确定所述预测预后评估值和所述一致性表达值。
在一种可选的示例中,所述预设模型包括融合模块、预测分支、一致性表达分支、以及与每种模态的样本信息各自对应的数据处理模块;所述利用所述预设模型,分别对当前样本组中多种模态的样本信息进行特征提取,并基于提取到的各样本特征,确定预测预后评估值和一致性表达值,包括:
对当前样本组中每种模态的样本信息,利用该模态的样本信息对应的数据处理模块,对对应的样本信息进行特征提取;
利用所述融合模块,对各个所述数据处理模块输出的样本特征进行融合;
利用所述预测分支,确定所述融合模块输出后的特征对应的所述预测预后评估值;
利用所述一致性表达分支,确定各个所述数据处理模块输出的样本特征对应的所述一致性表达值。
在一种可选的示例中,所述基因样本信息包括异柠檬酸脱氢酶、染色体1p/19q联合缺失状态、端粒酶逆转录酶基因启动子、O6-甲基鸟嘌呤-DNA甲基转移酶启动子区甲基化中的至少一种基因的信息;
所述临床样本信息包括:性别、年龄、组织学诊断、肿瘤分级、用药信息、恶性肿瘤病史中的至少一种临床信息。
本公开实施例,还提供了一种预后评估值确定方法,所述方法包括:
获取待测对象的多种模态的信息,所述多种模态的信息包括核磁共振MR图像、临床信息和基因信息;
将所述多种模态的信息输入至目标模型,得到待测对象的预后评估值;其中,所述目标模型是根据所述的目标模型的获取方法得到的。
采用本公开实施例的技术方案,可以获取多个样本用户各自对应的样本组,并基于多个样本组,对预设模型进行迭代训练,得到目标模型,目标模型用于预测目标对象的预后评估值;其中,在每一次迭代训练中,可以利用预设模型,分别对当前样本组中多种模态的样本信息进行特征提取,并基于提取到的各样本特征,确定预测预后评估值和一致性表达值;接着,基于预测预后评估值、当前样本组对应的预后评估标签,以及一致性表达值,对预设模型的参数进行更新。
一方面,由于本公开的样本组包括多种模态的样本信息,具体可以包括核磁共振MR样本图像、临床样本信息和基因样本信息中的至少两者,由此可以利用样本用户的不同维度的信息为参考因素,预测预后评估值,从而实现快速的多因素的预后分析。
另一方面,由于在训练过程中,以一致性表达值作为预设模型的参数更新依据,而一致性表达值用于表征各个样本特征对应同一目标疾病的一致性程度,其中,提取到的样本特征可以理解为是一种进行预后预测的预后因素,这样,随着训练的推进,预设模型可以提取到每种模态的信息与目标疾病有关的预后因素(样本特征),而逐渐摒弃各种模态的信息中与目标疾病不相关的预后因素,而使得模型所选出的预后因素具有临床上的重要性,进而可以帮助提高模型的可解释性,从而目标模型输出的结果具有较高的预后参考价值。
上述说明仅是本公开技术方案的概述,为了能够更清楚了解本公开的技术手段,而可依照说明书的内容予以实施,并且为了让本公开的上述和其它目的、特征和优点能够更明显易懂,以下特举本公开的具体实施方式。
附图简述
为了更清楚地说明本公开实施例或相关技术中的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。需要说明的是,附图中的比例仅作为示意并不代表实际比例。
图1示意性地示出了目标模型的获取方法的步骤流程图;
图2示意性地示出了一种预设模型的结构示意图;
图3示意性地示出本公开的影像数据处理模块的结构示意图;
图4示意性地示出了图3中的ResNet网络的结构示意图;
图5示意性地示出了一种预后评估值确定方法的步骤流程图;
图6示意性地示出了一种目标模型的获取装置的结构示意图;
图7示意性地示出了一种预后评估值确定装置的结构示意图;
图8示意性地示出了本公开的电子设备的结构框图。
详细描述
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。
相关技术中,对于疾病的预后一般是基于病灶大小及有无强化来评估,有一定的局限性。以脑胶质瘤为例,脑胶质瘤起源于神经胶质细胞,是最常见的中枢神经系统肿瘤,约占颅脑肿瘤的50%~60%,发生率有逐年上升的趋势。世界卫生组织将胶质瘤分为低级别(Ⅰ和Ⅱ)和高级别(Ⅲ和Ⅳ)。高级别胶质瘤(HGG)预后较差。Ⅳ级胶质母细胞瘤恶性度最高,10年的生存率不到3%,中位生存约12~14个月。以往的研究通过肿瘤的部位、大小、切除范围及传统影像学方法等作为预后预测因子,来评估神经胶质瘤的预后,有一定的局限性。
然而,采用多因素分析法评估时,需要研究者对众多患者在诊疗、术后过程中进行跟踪随访,收集诊疗信息、患者的生理指标信息,再分析筛选出影像预后的因素,而这一工作需耗费大的人力物力。耗时耗力,单是人工分析筛选预后的因素便会花费很长的时间,存在效率低的问题。
有鉴于此,本公开提出了一种预后评估的方式,具体构思在于,以多种模态的样本信息为训练样本,训练得到目标模型,使用该目标模型进行预后评估,其中,多种模态的样本信息包括核磁共振MR样本图像、临床样本信息和基因样本信息中的至少两者,这样,便可以丰富信息来源,使得可以从 多个维度筛选预后因素;而在模型训练过程中,以一致性表达值为因素,更新预设模型的参数,从而可以以多种模态的信息中与目标疾病密切相关的预后因素被筛选出,以提高预后因素的临床重要性,从而而目标模型输出的结果具有较高的预后参考价值。
参照图1所示,示出了本公开的目标模型的获取方法的步骤流程图,如图1所示,具体可以包括以下步骤:
步骤S101:获取多个样本用户各自对应的样本组,样本组包括多种模态的样本信息,多种模态的样本信息包括核磁共振MR样本图像、临床样本信息和基因样本信息中的至少两者;
步骤S102:基于多个样本组,对预设模型进行迭代训练,得到目标模型,目标模型用于预测目标对象的预后评估值;
其中,在步骤S102的每一次迭代训练中执行以下步骤:
步骤S1021:利用预设模型,分别对当前样本组中多种模态的样本信息进行特征提取,并基于提取到的各样本特征,确定预测预后评估值和一致性表达值。其中,一致性表达值用于表征多种模态的样本信息对应同一目标疾病的一致性程度。
步骤S1022:基于预测预后评估值、当前样本组对应的预后评估标签,以及一致性表达值,对预设模型的参数进行更新。
本实施例中,样本用户可以是指明确知道罹患目标疾病的预后生存期和预后生存质量的用户。其中,目标疾病可以是临床已知疾病的任一种疾病,例如上文所阐述的颅脑肿瘤、常见的肝胆肺肿瘤等,在此不做特别限制。
其中,样本用户的年龄、性别可以尽量多样化。其中,多种模态的样本信息可以包括核磁共振MR样本图像、临床样本信息和基因样本信息中的至少两者。不同模态的样本信息可以用于反映样本用户在罹患目标疾病后在相应维度上的表征,例如,MR样本信息可以反映样本用户罹患目标疾病后在器官组织上的形态表征,临床样本信息可以反映样本用户罹患目标疾病后在诊断治疗上的表征,而基因样本信息可以反映样本用户罹患目标疾病后某些基因的表达状态的表征。
其中,MR样本图像可以用于反映样本用户罹患目标疾病后对应的器官组织的形态特征,具体地,核磁共振MR样本图像可以直接从MR成像设备、 存储器或任何其它合适的来源获取,例如,可以在获得对应权限后,从医疗数据库中调取样本用户的MR样本图像。需要说明的是,该MR样本图像是与样本用户罹患目标疾病有关的图像,如样本用户罹患的是脑胶质瘤,则MR样本图像是样本用户的脑部MR图像,可以反映脑部组织的形态特征。再如,样本用户罹患的是肝部肿瘤,则MR样本图像可以是用户的腹部MR图像,可以反映腹腔内组织的形态特征。
其中,临床样本信息可以包括样本用户接收诊断和治疗过程中的信息,如药物信息、住院信息、治疗方案信息、主治医生信息、医院信息等。
其中,基因样本信息可以包括与目标疾病的发生、预后有关的基因的信息,具体地,基因样本信息中每种基因的信息包括该基因的名称、该基因的表达状态,这是因为,疾病的发作和预后可以体现在一些基因的表达上,例如,对于脑胶质瘤而言,以TERT(telomerase reverse tranase,端粒酶逆转录酶)基因为例,其是编码端粒酶复合体的重要基因之一,TERT基因在绝大多数非肿瘤细胞中没有转录活性,但是在73%的肿瘤中存在TERT基因突变,如启动子突变、基因易位和DNA扩增等。也就是说上述基因的表达类别与肿瘤具有一定的关联性。
本实施例中,可以将核磁共振MR样本图像、临床样本信息和基因样本信息中的至少两者作为样本用户的一个样本组,一种具体示例中,样本组中的MR样本图像可以是必须的,即可以将临床样本信息和基因样本信息中的任一个或两个与MR样本图像进行组合,从而得到样本组。具体地,样本组可以包括核磁共振MR样本图像和临床样本信息,或者,可以包括核磁共振MR样本图像和基因样本信息,或者可以包括核磁共振MR样本图像、临床样本信息和基因样本信息。
其中,可以以多个样本组为训练样本,对预设模型进行迭代训练,具体的,在每一次训练时,可以批量向预设模型输入多个样本组,或者一次输入一个样本组。
本实施例中,在每一次迭代训练中,可以利用预设模型,分别对当前样本组中多种模态的样本信息进行特征提取,其中,当前样本组是指当次输入到预设模型的样本组,在每次向预设模型输入多个样本组的情况下,当前样本组是指当次输入的多个样本组中的任一样本组。
其中,预设模型可以用于对当前样本组中的每种模态的样本信息进行特征提取,从而得到每种样本信息对应的样本特征,具体地,针对MR图像样本,则提取到的特征是影响组学特征向量,针对临床样本信息,提取到的特征是对临床信息进行特征向量转换后的特征向量,针对基因样本信息,提取到的特征是对基因信息进行特征向量转换后得到的特征向量。
其中,可以依据每种模态信息提取到的样本特征,确定当前样本组对应的样本对象的预测预后评估值,具体实施时,可以对多种模态信息对应的样本特征进行特征融合后,基于融合后的特征确定预测预后评估值。
实际中,不同模态的样本信息应该均包含是目标疾病的描述,各自应当包含大量关于目标疾病一致的信息,但是,不同模态的样本信息也包含了和目标疾病不相关的信息,例如MR样本图像除了包含病灶部位也包含了其他正常组织部位的影像,而正常组织部位的影像在其他模态的样本信息中可能没有相关信息,例如在临床样本信息中便不包含这部分信息,因此预设模型应该在训练过程中,学习到摒弃这部分无用信息。
因此,在本实施例中,预设模型可以基于多种模态信息对应的样本特征,确定一致性表达值,将一致性表达值纳入到损失函数的构建,以对预设模型的参数进行更新,从而使得预设模型不断基于一致性表达值,增强提取到的样本特征对目标疾病的表达程度。
这样,一致性表达值可以用于表征多种模态信息对应的样本特征反应目标疾病的一致性程度,也就是说提取出的各个样本特征是否均一致性地表达了目标疾病。
示例地,当一致性表达值较高时,表征从MR样本图像、临床样本信息和基因样本信息中提取到的样本特征均是用于表达目标疾病的特征,对于MR样本图像而言,提取到的样本特征便是用于表达目标疾病的病灶部位处的特征,而对临床样本信息而言,提取到的样本特征便是目标疾病的诊断、治疗方案的表达,以及目标疾病在患者年龄、从事职业等上的表达,对基因样本信息而言,提取到的样本特征便是用于表征目标疾病在相关基因上的表达。
其中,本公开的预后评估标签是样本对象的真实预后情况,若需要对预后生存期进行预估,则预后评估标签是样本对象的真实预后生存期,预测预 后评估值可以表示为预后年限;若需要需要对预后生存质量进行预估,则预后评估标签是样本对象的真实预后生存质量等级,包括高等级、低等级和中等级,预测预后评估值可以表示为预后生存质量等级。
采用本实施例的技术方案,在对预设模型的参数更新时,将一致性表达值、预测预后评估值、预后评估标签纳入损失函数的构建,如此,随着模型的训练,一方面,可以基于一致性表达值,增强提取到的样本特征对目标疾病的表达程度,从而使得用于确定预后评估值所依据的预后因素是与目标疾病强相关的,由此,可以提高预后因素的临床重要性,从而使得预后评估值可以更具医学参考价值。另一方面,可以基于预测预后评估值与预后评估标签之间的差异,不断更新预设模型的参数,由此随着训练的深入,模型预测的预测预后评估值可以无限趋近预后评估标签,进而使得预后评估值可以愈加接近真实情况下的预后评估值,由此,可以进一步提高预后评估值的临床参考价值。
再一方面,本公开利用多种模态的样本信息对预后进行预测,由此可以利用不同模态的样本信息之间的互补性,作为目标疾病的预后因素,从而丰富了预后因素的数据来源,提高了预后评估值的医学可参考性。
在一种可选的实施方式中,在目标疾病是神经胶质瘤的情况下,基因样本信息可以包括异柠檬酸脱氢酶IDH、染色体1p/19q联合缺失状态、端粒酶逆转录酶基因启动子、O6-甲基鸟嘌呤-DNA甲基转移酶启动子区甲基化中的至少一种基因的信息;临床样本信息包括:性别、年龄、组织学诊断、肿瘤分级、用药信息、恶性肿瘤病史中的至少一种临床信息。
其中,基因样本信息中每种基因的信息可以包括基因的名称和基因的表达类别,表达类别可以依据不同的基因有所不同。具体地,IDH基因的信息中表达类别为突变型和野生型,1p/19q基因的表达状态包括缺失态和未缺失态,O6-甲基鸟嘌呤-DNA甲基转移酶启动子区MGMT的表达类别为甲基化和未甲基化,端粒酶逆转录酶基因TERT启动子的表达类别包括突变型和野生型,具体的各类基因的信息见下表1所示:
表1-基因样本信息
其中,临床样本信息中可以包括样本对象的性别、年龄、组织学诊断、肿瘤分级、用药信息、恶性肿瘤病史中的一种或多种,当然,在实际情况中,也可以不仅限于上述所述的临床信息,也可以包括过多的临床信息,如样本对象的职业、所在地区等信息,只要与目标疾病的发生和预后有一定关联的信息都可以作为临床信息。
在一个示例中,临床样本信息可以如下表2所述:
表2-临床样本信息
在一种实施例中,多种模态的样本信息可以包括MR样本图像、临床样本信息和基因样本信息,也就是说每个样本组均包括MR样本图像、临床样本信息和基因样本信息,这样,在每一次的迭代训练中均需要对三种模态的样本信息进行特征提取。
在训练过程中,每一次的迭代训练中均包括特征的提取、基于模型的输出构建损失函数进行参数更新的阶段。
其中,每种模态的样本信息均可以包括多种子样本信息,在特征提取的阶段,可以对每种模态的样本信息中的每个子样本信息进行特征提取,之后,对一种模态的样本信息中的多子样本信息的特征进行特征融合,得到该种模态的样本信息对应的样本特征。
下面,对两个阶段进行分别介绍:
阶段1:特征的提取阶段。
如上所述,在一种实施方式中,每种模态的样本信息包括多个子样本信息,对每种模态的样本信息,分别对该种模态的样本信息中多个子样本信息进行特征提取,得到对应的多个子特征向量;接着,对每种模态的样本信息对应的多个子特征向量进行特征融合,得到该种模态的样本信息对应的样本特征。
其中,对每个子样本信息的特征提取,可以如上述实施例所述,在得到子样本信息对应的子特征向量后,可以按照以下过程进行特征融合得到样本特征:
其中,对于MR样本图像而言,子样本信息可以是MR样本图像的一个切片图像,则在特征提取时,可以对每个切片图像进行特征提取,得到每个切片图像对应的子特征向量,之后对各个切片图像对应的子特征向量进行融合,得到MR样本图像的样本特征。
其中,对于临床样本信息而言,子样本信息可以是临床样本信息中的一种临床信息,如临床信息中的年龄、性别、肿瘤分级均可以作为一个子样本信息,在特征提取时,分别对每种信息进行特征提取,具体来说,是将每种信息转换为特征向量,得到每种信息对应的子特征向量,之后对临床样本信息中各种临床信息对应的子特征向量进行融合,得到临床样本信息的样本特征。
其中,对基因样本信息而言,子样本信息可以是基因样本信息中的一种基因的信息,如IDH的信息、TERT基因启动子的信息,则可以将每种基因的信息转换为子特征向量,之后对基因样本信息中各种基因的信息对应的子特征向量进行融合,得到基因样本信息的样本特征。
在一种可选示例中,在对每种模态的样本信息对应的各个子特征向量进行融合时,可以直接按照各个子特征向量各自对应的预设权重进行融合,例如,可以预先为每个切片图像人工设置预设权重,对每种临床信息也预先设置权重,对每种基因的信息也预先设置对应的预设权重,该预设权重可以表征该种子样本信息对预后的重要性,由此,可以基于各个子特征向量的重要性进行特征融合,从而融合到对预后评估较为重要的特征,以此可以提高预后评估的医学价值。
在又一种示例中,可以在预设模型中融合注意力机制,对于每种模态的样本信息而言,可以通过该注意力机制确定各个子特征向量之间的关联程度,接着基于关联程度对各个子特征向量进行融合,从而可以融合该模态的样本信息中相互关联的程度较高的特征,提高预后因素之间的关联性,由此,使得目标模型进行预后评估的预后因素是联系紧密的因素,进一步提高临床重要性,使得目标模型具有临床可解释性。
具体实施时,可以确定每两个子特征向量之间的注意力值,基于注意力值,对多个子特征向量进行融合,得到该种模态的样本信息对应的样本特征。
如上所述,注意力值用于表征两个子特征向量之间的紧密程度。
其中,在基于每两个子特征向量之间的注意力值,对各个子特征向量进行融合时,在一种示例中,可以基于每两个子特征向量之间的注意力值,将该两个字特征向量进行融合,得到一个融合向量,由此,得到多个融合向量,接着,再将多个融合向量融合得到样本特征。例如,子特征向量包括向量i、向量j和向量k,融合时,可以将向量i和向量j融合,得到融合向量ij;同理,得到融合向量ik、融合向量jk,之后对融合向量ij、融合向量ik、融合向量jk进行融合,得到样本特征。
在又一种示例中,可以针对每个子特征向量,基于该子特征向量与其他全部子特征向量之间的注意力值,将其他全部子特征向量融合进该子特征向量,得到该子特征向量的融合后子向量;对多个融合后子向量进行再融合,得到该种模态的样本信息对应的样本特征。
示例地,子特征向量包括向量i、向量j和向量k,融合时,可以按照向量i与向量j之间的注意力值,以及向量i与向量k之间的注意力值,向量j和向量k融合到向量i,得到融合后子向量i';同理得到融合后子向量j'和融合后子向量k',接着将融合后子向量i'、融合后子向量j'和融合后子向量k'融合后得到样本特征。
其中,对于临床样本信息而言,可以按照如下过程确定注意力值,以及基于注意力值进行特征融合:
在确定每两种子特征向量之间的注意力值时,可以基于多种临床信息之间的平均注意力值以及该两种子特征向量确定,其中,多种临床信息之间的平均注意力值可以参照以下公式(一)获取;
其中,va表示年龄对应的子特征向量,vg表示性别对应的子特征向量,vh表示组织学诊断对应的子特征向量,vhom表示恶性肿瘤病史对应的子特征向量,vd用药信息对应的子特征向量,vgr表示肿瘤分级信息对应的子特征向量;S表示平均注意力值。
接着,可以按照参照以下公式(二)确定每两种子特征向量之间的注意力值:
其中,Si表示子特征向量vi和子特征向量vj之间的注意力值。
相应地,在进行特征融合时,可以将一个子特征向量与其他的全部子特征向量融合后,再对各个得到的融合后子向量进行融合;具体地,以临床信息中的年龄信息对应的子特征向量的融合为例,可以按照以下公式(三)确定年龄信息对应的融合后子向量:
公式(三)中,va_att为年龄信息对应的融合后子向量。
其中,对于基因样本信息而言,其确定注意力值以及基于注意力值进行特征融合的过程可以参照上述临床样本信息进行,在此不再赘述。
其中,对于MR样本图像而言,可以按照如下过程确定注意力值,以及基于注意力值进行特征融合:
在一种示例中,在特征提取和特征融合阶段,由于需要提取出临床重要性较高的预后因素,也即是提取出临床重要性较高的特征向量,该示例中,可以为特征提取和特征融合阶段均设置参数矩阵,参数矩阵可以随着模型训练而被不断更新,从而提取出重要性较高的预后因素。
具体地,如上所述,MR样本图像包括多个切片样本图像,每个切片样本图像即为一个子样本信息。具体地,针对MR样本图像所包括的每个切片样本图像,可以基于第一参数矩阵的当前值,确定每两个切片样本图像对应的的子特征向量之间的注意力值;接着,可以注意力值,对各个子特征向量进行融合。
具体实施时,可以按照以下公式(四)确定每两个切片样本图像对应的的子特征向量之间的注意力值:
公式(四)中,Q和K是第一参数矩阵,其中,Q和K的值可以不同,实际中,Q和K可以是512×512的参数矩阵,vi是第i个切片样本图像对应的子特征向量,vj是第j个切片样本图像对应的子特征向量,αi,j是第i个切片样本图像和第j个切片样本图像之间的注意力值;
接着,可以按照以下公式(五),对每个子特征向量,基于该子特征向量与其他全部子特征向量之间的注意力值,将其他全部子特征向量融合进该子特征向量,得到该子特征向量的融合后子向量:
公式(五)中,是第i个切片样本图像对应的融合后子向量,n表示切片样本图像的总数量。
之后,按照以下公式(六)对各个融合后子向量进行融合,得到MR样本图像的样本特征:
SV表示MR样本图像的样本特征,SVi表示第i个切片样本图像对应的融合后子向量。
在进一步的实施例中,由于为特征提取和特征融合阶段均设置参数矩阵,参数矩阵可以随着模型训练而被不断更新,从而提取出重要性较高的预后因素。其中,对于临床样本信息和基因样本信息,可以在进行子特征向量的融合时,进一步基于设置的参数矩阵进行融合。
其中,针对临床样本信息中的每个临床子样本信息,基于第二参数矩阵的当前值,对各个临床子样本信息对应的子特征向量进行特征融合,得到临床样本信息对应的样本特征;
针对基因样本信息中的每个基因子样本信息,基于第三参数矩阵的当前值,对各个基因子样本信息对应的子特征向量进行特征融合,得到基因样本信息对应的样本特征。
具体地,在由上述公式(三)得到子特征向量对应的每个融合后子向量 后,可以基于第二参数矩阵的当前值,对各个临床子样本信息对应的子特征向量对应的融合后子向量进行特征融合,得到临床样本信息对应的样本特征,具体地,可以按照以下公式(七)和公式(八)进行融合:

S1=vpT·(va_att+vg_att+vh_att+vgr_att+vd_att+vhom_att)
公式(八);
其中,CV表示临床样本信息对应的样本特征,vp是第二参数矩阵,其可以是128×1的参数向量。
其中,对于每个基因子样本信息对应的融合后子向量的融合,也可以参照上述公式(七)和公式(八)进行,其中,第二参数矩阵和第三参数矩阵均可以是128×1的参数向量,第二参数矩阵和第三参数矩阵中的参数随着预设模型的更新而被更新,即是根据损失函数的损失值,对其进行更新。
在一种更进一步的示例中,临床样本信息包括数值型的子样本信息和非数值型的子样本信息,在特征提取阶段,可以基于上述为预设模型设置的参数集,对数值型的子样本信息,按照模型的参数集将其映射到一个向量空间,并在训练过程中,不断更新参数集中的参数,从而使得不同的样本对象的数值型的临床信息可以在一个空间范围内映射,进而提取到数值型的临床信息的综合性预后因素。例如,对年龄相同的不同患者而言,患者A的样本组在输入预设模型训练后,更新后参数集中的参数,之后,患者B的样本组输入预设模型训练时,基于更新后的参数集的参数对年龄进行向量映射,这样,年龄相同的不同患者其在训练过程中,对应的特征向量便可以不同,而是在一定空间范围内变化,由此,可以在预后评估中,得到年龄段对预后的影响。
具体实施时,针对临床样本信息,在每次迭代训练中,可以对当前样本组中的临床样本信息,将非数值型的子样本信息转换为第一子特征向量;并基于参数集中各个参数的当前值,将数值型的子样本信息映射至目标空间,得到对应的第二子特征向量;将第一子特征向量和第二子向量进行融合,得到临床样本信息对应的样本特征。
本实施例中,参数集中的参数用于确定目标空间的维度以及每个空间点 上的值。
其中,非数值型的子样本信息可以是指字符串格式的子样本信息或者文字类型的子样本信息,例如,性别“男”便是文字类型的子样本信息,肿瘤分级便是字符串类型的子样本信息,对于非数值型的子样本信息,可以提前预置这些子样本信息对应的特征向量,之后,对于当前样本组中的临床样本信息,可以查表获取当前样本组中非数值型的子样本信息对应的第一子特征向量,表中包括各种非数值型的子样本信息对应的特征向量,其可以理解为是一个固定的特征向量。
其中,数值型的子样本信息可以是数值类型的子样本信息,例如,年龄“62”便是数值类型的子样本信息,对于此种类型的子样本信息,可以基于参数集中各个参数的当前值,将数值型的子样本信息映射至目标空间,目标空间可以是一个向量空间,该向量空间可以是一个二维空间,包括二维空间上的第一维度上的多个值,以及第二维度上的多个值,也就是说,可以件给一个数值分散到该目标空间中的各个位置上,得到数值对应的第二子特征向量。
这样,参数集中的各个参数,便可以理解为是将数值分散到目标空间中的各个位置上的权重值。
其中,对第一子特征向量和第二子向量进行融合,得到临床样本信息对应的样本特征的过程,可以参照上述示例所述,例如,第一子特征向量和第二子向量共构成多个子特征向量,按照上述对多个子特征向量的融合方式进行融合即可。
具体而言,参数集中包括第一参数向量、第二参数向量和第四参数矩阵,在基于参数集中各个参数的当前值,将数值型的子样本信息映射至目标空间,得到对应的第二子特征向量时,可以基于第一参数向量的当前值,将数值型的子样本信息映射至目标空间中的第一维度,得到第一维度的映射值;基于第一维度映射值、第二参数向量的当前值和第四参数矩阵的当前值,确定第二子特征向量;
其中,第一参数向量用于确定目标空间在第一维度上的空间点的值;第二参数向量用于确定目标空间在第二维度上的空间点的值,第四参数矩阵用于为第一维度和第二维度上的每个空间位置赋予参数。
第一参数向量可以理解为是将数值分散到目标空间中第一维度上的各个 位置上的权重值;第二参数向量可以理解为是将数值分散到目标空间中第二维度上的各个位置上的权重值,第四参数矩阵可以理解为是将数值分散到目标空间中的每个位置上的权重值。
具体地,可以按照以下公式(九),基于第一参数向量的当前值,将数值型的子样本信息映射至目标空间中的第一维度,得到第一维度的映射值:
temp_a=sigmoid(w×a)    公式(九)
公式(九)中,w是第一参数向量,实际中可以是128×1维度的参数向量,a是数值型的子样本信息,例如a是年龄数值“62”。
接着,可以按照以下公式(十),基于第一维度映射值、第二参数向量的当前值和第四参数矩阵的当前值,确定第二子特征向量:
sa=W×temp_a+b     公式(十)
公式(十)中,b是第二参数向量,实际中可以是128×1维度的参数向量,W第四参数矩阵,实际中可以是128的参数矩阵,Sa是第二子特征向量。
当然,在又一种实施例中,参数集还包括多个第三参数向量,在确定第二子特征向量后,还可以对该第二子特向量进行修正,具体可以按照以下公式(十一)进行修正:
其中va为修正后的第二子特征向量,sa为所述第二子特征向量,a1、a2和a3分别为第三参数向量,具体地,这三个第三参数向量可以均为128*1的参数向量,其中,三个第三参数向量之间允许存在差异。
相应地,可以将第一子特征向量和修正后的第二子向量进行融合,得到临床样本信息对应的样本特征。
在一种可选的示例中,在对多种模态的样本信息各自对应的样本特征进行融合时,可以将各个样本特征映射到同一个空间后进行融合。具体地,预设模型的参数包括与每种模态的样本信息对应的维度参数矩阵,这样,可以基于每种模态的样本信息对应的样本特征和对应的维度参数矩阵,对该种模态的样本信息对应的样本特征进行维度变换,得到转换后样本特征;基于多种模态的样本信息各自对应的转换后样本特征,确定预测预后评估值和一致性表达值。
其中,每个维度参数矩阵用于调整该种模态的样本信对应的样本特征的维度,具体地,可以按照以下公式(十二)至公式(十四),分别对种模态的样本信对应的样本特征进行维度调整:
PV1=M1·SV   公式(十二)
PV2=M2·CV    公式(十三)
PV3=M3·GV    公式(十四)
其中,PV1是MR样本图像对应的转换后样本特征,PV2是临床样本信对应的转换后样本特征,PV3是基因样本信息对应的转换后样本特征;
其中,SV是MR样本图像对应的样本特征,CV是临床样本信对应的样本特征,GV是基因样本信息对应的样本特征。
其中,M1是MR样本图像对应的维度参数矩阵,其可以是64×512的参数矩阵,M2是临床样本信息对应的维度参数矩阵,其可以64×128的参数矩阵,M3是基因样本信息对应的维度参数矩阵,其可以是64×128的参数矩阵。
阶段2:参数更新的阶段。
本公开中,需要基于预测预后评估值、当前样本组对应的预后评估标签,以及一致性表达值,对预设模型的参数进行更新。具体实施时,可以获取预测预后评估值与预后评估标签之间的差异;以最小化差异、最大化一致性表达值为目标,对预设模型的参数进行更新。
预测预后评估值与预后评估标签之间的差异,可以反应预设模型预测的预后评估值与真实的预后评估值之间的距离,一致性表达值可以表征各样本特征对应同一目标疾病的一致性程度,在训练过程中,训练目标可以是预测预后评估值与预后评估标签之间的差异最小化,以及一致性程度的最大化。
这样,可以最小化差异、最大化一致性表达值为目标,对预设模型的参数进行更新。
具体实施时,在以最小化所述差异、最大化所述一致性表达值为目标,对所述预设模型的参数进行更新时,可以基于差异和一致性表达值,构建如下公式(十五)所示的损失函数:
loss=∑i(y′i-yi)2-consistency   公式(十五)
并基于所述损失函数的损失值,以最小化差异、最大化一致性表达值为目标,对预设模型的参数进行更新;其中,所述loss表示损失值、y'i表示所 述预测预后评估值、yi表示所述预后评估标签、consistency表示一致性表达值。
从上述损失函数可看出,训练目标是损失值的最小,则为了使得损失值可以最小,则需要最小化差异以及最大化一致性表达值,需要说明的是,本公开的一致性表达值可以是0-1之间的数值。
在一种示例中,一致性表达值的确定过程可以如下:
首先,对每个样本特征进行转置,得到每个所述样本特征对应的转置特征;接着,对两个不同的样本特征,对其中一个样本特征与另一个样本特征对应的转置特征进行融合,得到对应的融合特征值;基于各个融合特征值,确定一致性表达值。
在一种实施例中,可以基于以下公式(十六)确定各个样本特征对应的转置特征:
PVT·PV=1   公式(十六);
公式(十六)中,PV为样本特征,PVT是转置特征,二者之间的点乘的结果为1;也就是说通过转置,可以对样本特征进行归一化处理。
接着,可以按照以下公式(十七)对其中一个样本特征与另一个样本特征对应的转置特征进行融合,得到对应的融合特征值:
公式(十八)中,是样本特征1的转置特征,PV2是样本特征2,二者之间的点乘的结果为0-1之间的特征。
其中,以样本信息包括MR样本信息、临床样本信息和基因样本信息为例,可以按照公式(十八)确定一致性表达值:
其中,PV1是MR样本图像的样本特征,PV2是临床样本信息的样本特征,PV3是基因样本信息的样本特征,是MR样本图像的样本特征对应的转置特征,是临床样本信息的样本特征对应的转置特征,是基因样本信息的样本特征对应的转置特征。
参照图2所示,示出了本公开的一种预设模型的结构示意图,如图2所示,可以包括融合模块、预测分支、一致性表达分支、以及与每种模态的样本信息各自对应的数据处理模块。
其中,在利用预设模型,分别对当前样本组中多种模态的样本信息进行特征提取,并基于提取到的各样本特征,确定预测预后评估值和一致性表达值时,可以对当前样本组中每种模态的样本信息,利用该模态的样本信息对应的数据处理模块,对对应的样本信息进行特征提取;以及利用融合模块,对各个数据处理模块输出的样本特征进行融合;之后,利用预测分支,确定融合模块输出后的特征对应的预测预后评估值;接着,利用一致性表达分支,确定各个数据处理模块输出的样本特征对应的一致性表达值。
也就是说,本公开的每种模态的样本信息都输入到对应的数据处理模块,由对应的数据处理模块对该模态的样本信息进行特征提取,之后,将提取到的样本特征输入至融合模块,由融合模块对各个样本特征进行融合,接着,将融合后的样本特征输入到预测模块,预测模块基于融合后的样本特征确定预测预后评估值。
下面,结合图2所示的预设模型,对本公开的目标模型的获取方法进行示例性说明:
S1,准备训练样本,训练样本包括多个样本用户各自对应的样本组,每个样本组包括样本用户的MR样本图像、临床样本信息以及基因样本信息。
S2,将多个样本组输入到预设模型,其中,样本组中的MR样本图像输入至影像数据处理模块,临床样本信息输入至临床数据处理模块,基因样本信息输入到基因数据处理模块,以进行多种模态的样本信息的特征提取,得到每种模态的样本信息对应的样本特征。
具体地,影像数据处理模块用于对MR样本图像中的多个切片子样本图像进行特征提取,提取后,对各个切片子样本图像各自对应的子特征向量进行特征融合,得到MR样本图像对应的样本特征。
参照图3和图4所示,图3示出了影像数据处理模块的结构示意图,图4示出了图3中的ResNet网络的结构示意图。
如图3所示,包括多个切片样本图像,如包括Slice1-Slice n,每个切片样本图像均输入到对应的ResNet网络,由ResNet网络对切片样本图像进行特性提取。
其中,如图4所示,本示例中输入的每张切片样本图像的大小为256×256,首先经过一个步长为2的7*7的卷积核和一个步长为2的3*3的最 大池化步骤,这样,256*256的输入切片图像则变为64*64大小的特征图,有效减少了存储所需要的大小,之后依次进入多个ResNet_Block和下采样模块。具体为,依次进入3个ResNet_Block、下采样模块构成的网络层1;3个ResNet_Block、1个下采样模块构成的网络层2;5个ResNet_Block、1个下采样模块构成的网络层3;2个ResNet_Block构成的网络层4;随后接平均池化层,最终输出每张切片样本图像的子特征向量,该子特征向量可以为512×1的向量。
接着,各个子特征向量进入自注意力层,通过自注意力层确定每两个子特征向量之间的注意力值,接着将每两个子特征向量之间的注意力值输入至特征表示层,特征表示层基于注意力值按照上述公式公式(四)和公式(五)得到MR样本图像对应的样本特征。
其中,如图4所示,每个ResNet_Block如右侧所示,依次包括步长为1的3×3的卷积、Batch Norm正则化、ReLU激活函数、步长为1的3×3的卷积以及Batch Norm正则化。下采样模块结构与ResNet_Block类似,但使用步长为2的3×3的卷积,Batch Norm正则化,ReLU激活函数,步长为1的3×3的卷积,Batch Norm正则化,下采样(步长为2的1×1的卷积和Batch Norm正则化)。
具体地,临床数据处理模块用于对临床样本信息中的多中临床信息进行特征转换,得到临床样本信息对应的样本特征。
其中,本示例使用性别、年龄、组织学诊断、肿瘤分级、用药信息、恶性肿瘤病史作为临床样本信息,然后将信息映射成子特征向量。对于性别、组织学诊断、肿瘤分级、用药信息、恶性肿瘤病史等字符型的子样本信息可以用vg,vh,vgr,vd,vhom来表示子特向量;具体地,可以采用lookup查表方法来完成字符型的子样本信息的向量映射。
最终利用公式(七)对各个子特征向量进行融合,得到临床样本信息对应的样本特征。
而对于年龄这种数值型的子样本信息,可以按照公式(九)和公式(十)以及公式(十一)得到数值型的子样本信息对应的子特征向量。
具体地,基因数据处理模块用于对基因样本信息中的多种基因的信息进行特征转换,得到基因样本信息对应的样本特征。
与临床样本信息中字符型的子样本信息类似,可以采用lookup查表方法来完成每种基因的信息的向量映射,得到每种基因的信息对应的子特征向量,之后,利用公式(七)对各个子特征向量进行融合,得到基因样本信息对应的样本特征。
S3,将多种模态的样本信息各自对应的样本特征输入到融合模块,该融合模块可以利用公式(十二)至公式(十四)对各个样本特征进行融合,以及基于公式(十六)至公式(十八)得到本次输入到预设模型的样本组对应的一致性表达值。
S4,将融合模块输出的融合后的样本特征输入到预测模块,由预测模块基于融合后的样本特征确定预测预后评估值。
S5,基于融合模块输出的一致性表达值、预测模块输出的预测预后评估值,以及本次输入到预设模型的样本组对应的预后评估标签,构建公式(十五)所示的损失函数,以最小化差异、最大化一致性表达值为目标,对预设模型的参数进行更新。
在更新预设模型的参数时,可以对上述提到的第一参数矩阵、第二参数矩阵、第三参数矩阵、第四参数矩阵以及参数集中的第一参数向量、第二参数向量和三个第三参数向量进行同步更新,由此便可以基于两个优化目标,在一次训练中同时影响三个模态的数据处理模块、融合模块以及预测模块的准确性。
S6,将多次更新后的预设模型,或者在预测预后评估值与预后评估标签之间的差异小于预设差异,一致性表达值高于或等于预设表达值的情况下,停止训练,将停止训练时的预设模型作为目标模型,利用该目标模型即可以预测患者的预后评估值。
相应地,参照图5所示,示出了本公开的一种预后评估值确定方法的步骤流程图,如图5所示,具体可以包括以下步骤:
步骤S501:获取待测对象的多种模态的信息,所述多种模态的信息包括核磁共振MR图像、临床信息和基因信息;
步骤S502:将所述多种模态的信息输入至目标模型,得到待测对象的预后评估值;其中,所述目标模型是上述实施例所述的目标模型的获取方法得到的。
本实施例中,在得到目标模型后,可以利用目标模型即预测患者的预后评估值,则实际中,可以获取待测对象的多种模态的信息,具体的待测对象的多种模态的信息与训练预设模型所用到的模态可以一致,如训练预设模型所用到的样本组中包括MR样本图像、临床样本信息和基因样本信息,则待测对象的多种模态的信息也可以包括待测对象的MR图像、临床信息和基因信息。
由于目标模型在训练过程中,以一致性表达值作为预设模型的参数更新依据,而一致性表达值用于表征各个样本特征对应同一目标疾病的一致性程度,这样,随着训练的推进,预设模型可以提取到与目标疾病有关的预后因素(样本特征),而逐渐摒弃各种模态的信息中与目标疾病不相关的预后因素,而使得模型所选出的预后因素具有临床上的重要性,进而可以帮助提高模型的可解释性,从而目标模型输出的结果具有较高的预后参考价值。
当然,在一些可选示例中,由于目标模型包括与每种模态的样本信息对应的数据处理模块,以及与多个数据处理模块连接的融合模块,由于数据处理模块和融合模块是从多种模态的样本信息中提取与目标疾病的表达强相关的样本特征,因此,在一种应用中,可以在得到目标模型后,将目标模型中的数据处理模块和融合模块单独提取出来作为一个特征提取模型,该特征提取模型可以用于从多种模态的信息中提取与目标疾病强相关的特征,从而可以独立应用于预后过程中的预后因素的筛选。
采用本公开实施例的技术方案,具有以下优点:
第一,通过一致性表达值,可以将不同模态的信息映射到反目标疾病的重要表达信息的空间中,使得不同模态的信息提取出的特征在该空间中更加接近,从而可以提高不同模态的信息的互补性,降低噪声(不重要信息)的影响,以及对目标疾病的临床重要性,从而使得目标模型具有医学可解释性,其预测的预后评估值具有较高的医学参考价值。
第二,不同模态的信息之间可以进行信息互补,从而丰富了预后因素的数据来源,从多个维度对目标疾病的表达进行诠释,从而进一步提高了预后评估值的医学可参考性。
第三,通过自注意力机制组合同一模态内的信息,从而增强目标模型的非线性表示能力。
第四,无需用户人工多次筛选预后因素,提高了预后因素确定的效率,降低了人力成本。
基于相同的发明构思,本公开还提供一种目标模型的获取装置,参照图6所示,示出了该目标模型的获取装置的结构示意图,如图6所示,具体可以包括以下模块:
样本获取模块601,用于获取多个样本用户各自对应的样本组,所述样本组包括多种模态的样本信息,所述多种模态的样本信息包括核磁共振MR样本图像、临床样本信息和基因样本信息中的至少两者;
训练模块602,用于基于多个所述样本组,对预设模型进行迭代训练,得到所述目标模型,所述目标模型用于预测目标对象的预后评估值;其中,在每一次迭代训练中执行以下步骤:
利用所述预设模型,分别对当前样本组中多种模态的样本信息进行特征提取,并基于提取到的各样本特征,确定预测预后评估值和一致性表达值;其中,所述一致性表达值用于表征各所述样本特征对应同一目标疾病的一致性程度;
基于所述预测预后评估值、所述当前样本组对应的预后评估标签,以及所述一致性表达值,对所述预设模型的参数进行更新。
可选地,所述训练模块602包括参数更新单元,参数更新单元包括:
差异确定子单元,用于获取所述预测预后评估值与所述预后评估标签之间的差异;
参数更新子单元,用于以最小化所述差异、最大化所述一致性表达值为目标,对所述预设模型的参数进行更新。
可选地,所述参数更新子单元,具体用于:
基于所述差异和所述所述一致性表达值,构建如下损失函数:
loss=∑i(y′i-yi)2-consistency
基于所述损失函数的损失值,以最小化所述差异、最大化所述一致性表达值为目标,对所述预设模型的参数进行更新;
其中,所述loss表示损失值、表示所述预测预后评估值、表示所述预后评估标签、consistency表示一致性表达值。
可选地,所述基于提取到的各样本特征,确定一致性表达值的步骤,包 括:
对每个所述样本特征进行转置,得到每个所述样本特征对应的转置特征;
对两个不同的样本特征,对其中一个所述样本特征与另一个样本特征对应的转置特征进行融合,得到对应的融合特征值;
基于各个所述融合特征值,确定所述一致性表达值。
可选地,每种模态的样本信息包括多个子样本信息,所述利用所述预设模型,分别对当前样本组中多种模态的样本信息进行特征提取的步骤,包括以下步骤:
对每种模态的样本信息,分别对该种模态的样本信息中多个所述子样本信息进行特征提取,得到对应的多个子特征向量;
对每种模态的样本信息对应的多个所述子特征向量进行特征融合,得到该种模态的样本信息对应的样本特征。
可选地,所述对每种模态的样本信息对应的多个所述子特征向量进行特征融合,得到该种模态的样本信息对应的样本特征的步骤,包括:
确定每两个子特征向量之间的注意力值,所述注意力值用于表征两个子特征向量之间的紧密程度;
基于所述注意力值,对多个所述子特征向量进行融合,得到所述该种模态的样本信息对应的样本特征。
可选地,所述基于所述注意力值,对多个所述子特征向量进行融合,得到所述该种模态的样本信息对应的样本特征的步骤,包括:
针对每个所述子特征向量,基于该子特征向量与其他全部子特征向量之间的注意力值,将其他全部子特征向量融合进该子特征向量,得到该子特征向量的融合后子向量;
对多个所述融合后子向量进行再融合,得到所述该种模态的样本信息对应的样本特征。
可选地,所述预设模型的参数包括第一参数矩阵,所述MR样本图像包括多个切片样本图像,所述确定每两个子特征向量之间的注意力值的步骤,包括:
针对所述MR样本图像所包括的每个切片样本图像,基于所述第一参数矩阵的当前值,确定每两个所述切片样本图像对应的的子特征向量之间的注 意力值。
可选地,所述预设模型的参数包括第二参数矩阵和第三参数矩阵,所述对每种模态的样本信息对应的多个所述子特征向量进行特征融合,得到该种模态的样本信息对应的样本特征的步骤,包括:
针对所述临床样本信息中的每个临床子样本信息,基于所述第二参数矩阵的当前值,对各个所述临床子样本信息对应的子特征向量进行特征融合,得到所述临床样本信息对应的样本特征;
针对所述基因样本信息中的每个基因子样本信息,基于所述第三参数矩阵的当前值,对各个所述基因子样本信息对应的子特征向量进行特征融合,得到所述基因样本信息对应的样本特征。
可选地,所述预设模型的参数包括与所述临床样本信息对应的参数集,所述临床样本信息包括数值型的子样本信息和非数值型的子样本信息,所述利用所述预设模型,分别对当前样本组中多种模态的样本信息进行特征提取的步骤,包括:
对所述当前样本组中的临床样本信息,将所述非数值型的子样本信息转换为第一子特征向量;
基于所述参数集中各个参数的当前值,将所述数值型的子样本信息映射至目标空间,得到对应的第二子特征向量;所述参数集中的参数用于确定所述目标空间的维度以及每个空间点上的值;
将所述第一子特征向量和所述第二子向量进行融合,得到所述临床样本信息对应的样本特征。
可选地,所述参数集包括第一参数向量、第二参数向量和第四参数矩阵,所述基于所述参数集中各个参数的当前值,将所述数值型的子样本信息映射至目标空间,得到对应的第二子特征向量的步骤,包括:
基于所述第一参数向量的当前值,将所述数值型的子样本信息映射至所述目标空间中的第一维度,得到所述第一维度的映射值;其中,所述第一参数向量用于确定所述目标空间在所述第一维度上的空间点的值;
基于所述第一维度映射值、所述第二参数向量的当前值和所述第四参数矩阵的当前值,确定所述第二子特征向量;其中,所述第二参数向量用于确定所述目标空间在第二维度上的空间点的值,所述第四参数矩阵用于为所述 第一维度和所述第二维度上的每个空间位置赋予参数。
可选地,所述参数集还包括多个第三参数向量,所述装置还包括:
修正模块,用于基于所述第二子特征向量和多个所述第三参数向量,按照以下公式修正所述第二子特征向量:
其中,va为修正后的第二子特征向量,sa为所述第二子特征向量,a1、a2和a3分别为所述第三参数向量;
将所述第一子特征向量和所述第二子向量进行融合,得到所述临床样本信息对应的样本特征的步骤,包括:
将所述第一子特征向量和修正后的第二子向量进行融合,得到所述临床样本信息对应的样本特征。
可选地,所述预设模型的参数包括与每种模态的样本信息对应的维度参数矩阵,所述基于提取到的各样本特征,确定预测预后评估值和一致性表达值的步骤,包括:
基于每种模态的样本信息对应的样本特征和所述维度参数矩阵,对该种模态的样本信息对应的样本特征进行维度变换,得到转换后样本特征;
基于多种模态的样本信息各自对应的转换后样本特征,确定所述预测预后评估值和所述一致性表达值。
可选地,所述预设模型包括融合模块、预测分支、一致性表达分支、以及与每种模态的样本信息各自对应的数据处理模块;所述利用所述预设模型,分别对当前样本组中多种模态的样本信息进行特征提取,并基于提取到的各样本特征,确定预测预后评估值和一致性表达值的步骤,包括:
对当前样本组中每种模态的样本信息,利用该模态的样本信息对应的数据处理模块,对对应的样本信息进行特征提取;
利用所述融合模块,对各个所述数据处理模块输出的样本特征进行融合;
利用所述预测分支,确定所述融合模块输出后的特征对应的所述预测预后评估值;
利用所述一致性表达分支,确定各个所述数据处理模块输出的样本特征对应的所述一致性表达值。
可选地,所述基因样本信息包括异柠檬酸脱氢酶、染色体1p/19q联合缺失状态、端粒酶逆转录酶基因启动子、O6-甲基鸟嘌呤-DNA甲基转移酶启动子区甲基化中的至少一种基因的信息;
所述临床样本信息包括:性别、年龄、组织学诊断、肿瘤分级、用药信息、恶性肿瘤病史中的至少一种临床信息。
参照图7所示,示出了一种预后评估值确定装置的结构示意图,如图7所示,所述装置包括:
信息获取模块701,用于获取待测对象的多种模态的信息,所述多种模态的信息包括核磁共振MR图像、临床信息和基因信息;
输入模块702,用于将所述多种模态的信息输入至目标模型,得到待测对象的预后评估值;其中,所述目标模型是根据所述的目标模型的获取方法得到的。
基于相同的发明构思,本公开还提供一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行时实现所述的目标模型获取方法,或执行时实现所述的预后评估值确定方法。
参照图8所示,示出了本公开实施例的一种电子设备800的结构框图,如图8所示,本发明实施例提供的一种电子设备,该电子设备800可以用于执行分类模型获取方法或者TERT基因启动子的突变类别确定方法。
电子设备800可以包括存储器801、处理器802及存储在存储器上并可在处理器上运行的计算机程序,所述处理器802被配置为执行所述的图像处理方法。
如图8所示,在一实施例中,该电子设备800完整的可以包括输入装置803、输出装置804以及数据采集装置805,其中,在执行本公开实施例的图像处理方法时,数据采集装置805可以获取多种模态的信息,接着输入装置803可以获得数据采集装置805多种模态的信息,该多种模态的信息可以由处理器802进行处理,该处理具体可以执行上述的目标模型的获取方法以及上述的预后评估值确定方法,输出装置804可以输出目标模型,或者可以输出目标模型输出的预后评估值结果。
当然,在一实施例中,存储器801可以包括易失性存储器和非易失性存储器,其中,易失性存储器可以理解为是随机存取记忆体,用来存储和保存 数据的。非易失性存储器是指当电流关掉后,所存储的数据不会消失的电脑存储器,当然,本公开的目标模型的获取方法,或者预后评估值确定方法的计算机程序可以存储在易失性存储器和非易失性存储器中,或者存在二者中的任意一个中。
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。
以上对本公开所提供的一种目标模型获取方法、预后评估值确定方法、装置、设备及介质进行了详细介绍,本文中应用了具体个例对本公开的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本公开的方法及其核心思想;同时,对于本领域的一般技术人员,依据本公开的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本公开的限制。
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本公开的其它实施方案。本公开旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由下面的权利要求指出。
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。
本文中所称的“一个实施例”、“实施例”或者“一个或者多个实施例”意味着,结合实施例描述的特定特征、结构或者特性包括在本公开的至少一个实施例中。此外,请注意,这里“在一个实施例中”的词语例子不一定全指同一 个实施例。
在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本公开的实施例可以在没有这些具体细节的情况下被实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。
在权利要求中,不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本公开可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中,这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。
最后应说明的是:以上实施例仅用以说明本公开的技术方案,而非对其限制;尽管参照前述实施例对本公开进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本公开各实施例技术方案的精神和范围。

Claims (19)

  1. 一种目标模型的获取方法,其特征在于,所述方法包括:
    获取多个样本用户各自对应的样本组,所述样本组包括多种模态的样本信息,所述多种模态的样本信息包括核磁共振MR样本图像、临床样本信息和基因样本信息中的至少两者;
    基于多个所述样本组,对预设模型进行迭代训练,得到所述目标模型,所述目标模型用于预测目标对象的预后评估值;
    其中,在每一次迭代训练中执行以下步骤:
    利用所述预设模型,分别对当前样本组中多种模态的样本信息进行特征提取,并基于提取到的各样本特征,确定预测预后评估值和一致性表达值;其中,所述一致性表达值用于表征各所述样本特征对应同一目标疾病的一致性程度;
    基于所述预测预后评估值、所述当前样本组对应的预后评估标签,以及所述一致性表达值,对所述预设模型的参数进行更新。
  2. 根据权利要求1所述的方法,其特征在于,所述基于所述预测预后评估值、所述当前样本组对应的预后评估标签,以及所述一致性表达值,对所述预设模型的参数进行更新,包括:
    获取所述预测预后评估值与所述预后评估标签之间的差异;
    以最小化所述差异、最大化所述一致性表达值为目标,对所述预设模型的参数进行更新。
  3. 根据权利要求2所述的方法,其特征在于,所述以最小化所述差异、最大化所述一致性表达值为目标,对所述预设模型的参数进行更新,包括:
    基于所述差异和所述所述一致性表达值,构建如下损失函数:
    loss=∑i(y′i-yi)2-consistency;
    基于所述损失函数的损失值,以最小化所述差异、最大化所述一致性表达值为目标,对所述预设模型的参数进行更新;
    其中,所述loss表示损失值、y'i表示所述预测预后评估值、yi表示所述预后评估标签、consistency表示一致性表达值。
  4. 根据权利要求1所述的方法,其特征在于,所述基于提取到的各样本特征,确定一致性表达值,包括:
    对每个所述样本特征进行转置,得到每个所述样本特征对应的转置特征;
    对两个不同的样本特征,对其中一个所述样本特征与另一个样本特征对应的转置特征进行融合,得到对应的融合特征值;
    基于各个所述融合特征值,确定所述一致性表达值。
  5. 根据权利要求1所述的方法,其特征在于,每种模态的样本信息包括多个子样本信息,所述利用所述预设模型,分别对当前样本组中多种模态的样本信息进行特征提取,包括:
    对每种模态的样本信息,分别对该种模态的样本信息中多个所述子样本信息进行特征提取,得到对应的多个子特征向量;
    对每种模态的样本信息对应的多个所述子特征向量进行特征融合,得到该种模态的样本信息对应的样本特征。
  6. 根据权利要求5所述的方法,其特征在于,所述对每种模态的样本信息对应的多个所述子特征向量进行特征融合,得到该种模态的样本信息对应的样本特征,包括:
    确定每两个子特征向量之间的注意力值,所述注意力值用于表征两个子特征向量之间的紧密程度;
    基于所述注意力值,对多个所述子特征向量进行融合,得到所述该种模态的样本信息对应的样本特征。
  7. 根据权利要求5所述的方法,其特征在于,所述基于所述注意力值,对多个所述子特征向量进行融合,得到所述该种模态的样本信息对应的样本特征,包括:
    针对每个所述子特征向量,基于该子特征向量与其他全部子特征向量之间的注意力值,将其他全部子特征向量融合进该子特征向量,得到该子特征向量的融合后子向量;
    对多个所述融合后子向量进行再融合,得到所述该种模态的样本信息对应的样本特征。
  8. 根据权利要求6所述的方法,其特征在于,所述预设模型的参数包括第一参数矩阵,所述MR样本图像包括多个切片样本图像,所述确定每两个子特征向量之间的注意力值,包括:
    针对所述MR样本图像所包括的每个切片样本图像,基于所述第一参数矩阵的当前值,确定每两个所述切片样本图像对应的的子特征向量之间的注意力值。
  9. 根据权利要求5-8任一所述的方法,其特征在于,所述预设模型的参数包括第二参数矩阵和第三参数矩阵,所述对每种模态的样本信息对应的多个所述子特征向量进行特征融合,得到该种模态的样本信息对应的样本特征,包括:
    针对所述临床样本信息中的每个临床子样本信息,基于所述第二参数矩阵的当前值,对各个所述临床子样本信息对应的子特征向量进行特征融合,得到所述临床样本信息对应的样本特征;
    针对所述基因样本信息中的每个基因子样本信息,基于所述第三参数矩阵的当前值,对各个所述基因子样本信息对应的子特征向量进行特征融合,得到所述基因样本信息对应的样本特征。
  10. 根据权利要求1-8任一所述的方法,其特征在于,所述预设模型的参数包括与所述临床样本信息对应的参数集,所述临床样本信息包括数值型的子样本信息和非数值型的子样本信息,所述利用所述预设模型,分别对当前样本组中多种模态的样本信息进行特征提取,包括:
    对所述当前样本组中的临床样本信息,将所述非数值型的子样本信息转换为第一子特征向量;
    基于所述参数集中各个参数的当前值,将所述数值型的子样本信息映射至目标空间,得到对应的第二子特征向量;所述参数集中的参数用于确定所述目标空间的维度以及每个空间点上的值;
    将所述第一子特征向量和所述第二子向量进行融合,得到所述临床样本信息对应的样本特征。
  11. 根据权利要求10所述的方法,其特征在于,所述参数集包括第一参数向量、第二参数向量和第四参数矩阵,所述基于所述参数集中各个参数的 当前值,将所述数值型的子样本信息映射至目标空间,得到对应的第二子特征向量,包括:
    基于所述第一参数向量的当前值,将所述数值型的子样本信息映射至所述目标空间中的第一维度,得到所述第一维度的映射值;其中,所述第一参数向量用于确定所述目标空间在所述第一维度上的空间点的值;
    基于所述第一维度映射值、所述第二参数向量的当前值和所述第四参数矩阵的当前值,确定所述第二子特征向量;其中,所述第二参数向量用于确定所述目标空间在第二维度上的空间点的值,所述第四参数矩阵用于为所述第一维度和所述第二维度上的每个空间位置赋予参数。
  12. 根据权利要求11所述的方法,其特征在于,所述参数集还包括多个第三参数向量,在所述基于所述第一维度映射值、所述第二参数向量的当前值和所述第四参数矩阵的当前值,确定所述第二子特征向量之后,所述方法还包括:
    基于所述第二子特征向量和多个所述第三参数向量,按照以下公式修正所述第二子特征向量:
    其中,va为修正后的第二子特征向量,sa为所述第二子特征向量,a1、a2和a3分别为所述第三参数向量;
    将所述第一子特征向量和所述第二子向量进行融合,得到所述临床样本信息对应的样本特征,包括:
    将所述第一子特征向量和修正后的第二子向量进行融合,得到所述临床样本信息对应的样本特征。
  13. 根据权利要求1所述的方法,其特征在于,所述预设模型的参数包括与每种模态的样本信息对应的维度参数矩阵,所述基于提取到的各样本特征,确定预测预后评估值和一致性表达值,包括:
    基于每种模态的样本信息对应的样本特征和所述维度参数矩阵,对该种模态的样本信息对应的样本特征进行维度变换,得到转换后样本特征;
    基于多种模态的样本信息各自对应的转换后样本特征,确定所述预测预后评估值和所述一致性表达值。
  14. 根据权利要求1所述的方法,其特征在于,所述预设模型包括融合模块、预测分支、一致性表达分支、以及与每种模态的样本信息各自对应的数据处理模块;所述利用所述预设模型,分别对当前样本组中多种模态的样本信息进行特征提取,并基于提取到的各样本特征,确定预测预后评估值和一致性表达值,包括:
    对当前样本组中每种模态的样本信息,利用该模态的样本信息对应的数据处理模块,对对应的样本信息进行特征提取;
    利用所述融合模块,对各个所述数据处理模块输出的样本特征进行融合;
    利用所述预测分支,确定所述融合模块输出后的特征对应的所述预测预后评估值;
    利用所述一致性表达分支,确定各个所述数据处理模块输出的样本特征对应的所述一致性表达值。
  15. 一种预后评估值确定方法,其特征在于,所述方法包括:
    获取待测对象的多种模态的信息,所述多种模态的信息包括核磁共振MR图像、临床信息和基因信息;
    将所述多种模态的信息输入至目标模型,得到待测对象的预后评估值;
    其中,所述目标模型是根据权利要求1-14任一所述的方法得到的。
  16. 一种目标模型的获取装置,其特征在于,所述装置包括:
    样本获取模块,用于获取多个样本用户各自对应的样本组,所述样本组包括多种模态的样本信息,所述多种模态的样本信息包括核磁共振MR样本图像、临床样本信息和基因样本信息中的至少两者;
    训练模块,用于基于多个所述样本组,对预设模型进行迭代训练,得到所述目标模型,所述目标模型用于预测目标对象的预后评估值;
    其中,在每一次迭代训练中执行以下步骤:
    利用所述预设模型,分别对当前样本组中多种模态的样本信息进行特征提取,并基于提取到的各样本特征,确定预测预后评估值和一致性表达值;其中,所述一致性表达值用于表征各所述样本特征对应同一目标疾病的一致性程度;
    基于所述预测预后评估值、所述当前样本组对应的预后评估标签,以及所述一致性表达值,对所述预设模型的参数进行更新。
  17. 一种预后评估值确定装置,其特征在于,所述装置包括:
    信息获取模块,用于获取待测对象的多种模态的信息,所述多种模态的信息包括核磁共振MR图像、临床信息和基因信息;
    输入模块,用于将所述多种模态的信息输入至目标模型,得到待测对象的预后评估值;
    其中,所述目标模型是根据权利要求1-14任一所述的方法得到的。
  18. 一种电子设备,其特征在于,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行时实现如权利要求1-14任一所述的目标模型电话获取方法,或执行时实现如权利要求15所述的预后评估值确定方法。
  19. 一种计算机可读存储介质,其特征在于,其存储的计算机程序使得处理器执行如权利要求1-14任一所述的目标模型的获取方法,或执行时实现如权利要求15所述的预后评估值确定方法。
PCT/CN2023/110353 2022-09-27 2023-07-31 目标模型的获取方法、预后评估值确定方法、装置、设备及介质 WO2024066722A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211186768.3A CN115762796A (zh) 2022-09-27 2022-09-27 目标模型的获取方法、预后评估值确定方法、装置、设备及介质
CN202211186768.3 2022-09-27

Publications (1)

Publication Number Publication Date
WO2024066722A1 true WO2024066722A1 (zh) 2024-04-04

Family

ID=85350392

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/110353 WO2024066722A1 (zh) 2022-09-27 2023-07-31 目标模型的获取方法、预后评估值确定方法、装置、设备及介质

Country Status (2)

Country Link
CN (1) CN115762796A (zh)
WO (1) WO2024066722A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115762796A (zh) * 2022-09-27 2023-03-07 京东方科技集团股份有限公司 目标模型的获取方法、预后评估值确定方法、装置、设备及介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110491479A (zh) * 2019-07-16 2019-11-22 北京邮电大学 一种基于神经网络的骨质状态评估模型的构建方法
US20210174958A1 (en) * 2018-04-13 2021-06-10 Freenome Holdings, Inc. Machine learning implementation for multi-analyte assay development and testing
CN113870259A (zh) * 2021-12-02 2021-12-31 天津御锦人工智能医疗科技有限公司 多模态医学数据融合的评估方法、装置、设备及存储介质
CN114121291A (zh) * 2021-10-26 2022-03-01 泰康保险集团股份有限公司 疾病分级预测方法、装置、电子设备及存储介质
CN114708465A (zh) * 2022-06-06 2022-07-05 中国科学院自动化研究所 图像分类方法、装置、电子设备与存储介质
CN115762796A (zh) * 2022-09-27 2023-03-07 京东方科技集团股份有限公司 目标模型的获取方法、预后评估值确定方法、装置、设备及介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210174958A1 (en) * 2018-04-13 2021-06-10 Freenome Holdings, Inc. Machine learning implementation for multi-analyte assay development and testing
CN110491479A (zh) * 2019-07-16 2019-11-22 北京邮电大学 一种基于神经网络的骨质状态评估模型的构建方法
CN114121291A (zh) * 2021-10-26 2022-03-01 泰康保险集团股份有限公司 疾病分级预测方法、装置、电子设备及存储介质
CN113870259A (zh) * 2021-12-02 2021-12-31 天津御锦人工智能医疗科技有限公司 多模态医学数据融合的评估方法、装置、设备及存储介质
CN114708465A (zh) * 2022-06-06 2022-07-05 中国科学院自动化研究所 图像分类方法、装置、电子设备与存储介质
CN115762796A (zh) * 2022-09-27 2023-03-07 京东方科技集团股份有限公司 目标模型的获取方法、预后评估值确定方法、装置、设备及介质

Also Published As

Publication number Publication date
CN115762796A (zh) 2023-03-07

Similar Documents

Publication Publication Date Title
Liao et al. Deep learning‐based classification and mutation prediction from histopathological images of hepatocellular carcinoma
Tang et al. CapSurv: Capsule network for survival analysis with whole slide pathological images
WO2024066722A1 (zh) 目标模型的获取方法、预后评估值确定方法、装置、设备及介质
Hossain et al. Brain Tumor Auto-Segmentation on Multimodal Imaging Modalities Using Deep Neural Network.
CN108206056B (zh) 一种鼻咽癌人工智能辅助诊疗决策终端
CN108335756B (zh) 鼻咽癌数据库及基于所述数据库的综合诊疗决策方法
He et al. Microarrays—the 21st century divining rod?
Karchin et al. Classifying variants of undetermined significance in BRCA2 with protein likelihood ratios
US20230056839A1 (en) Cancer prognosis
Park et al. Bayesian multiple instance regression for modeling immunogenic neoantigens
Pao et al. Predicting EGFR mutational status from pathology images using a real-world dataset
Zeng et al. Discovery of genetic biomarkers for Alzheimer’s disease using adaptive convolutional neural networks ensemble and genome-wide association studies
Good Analyzing the large number of variables in biomedical and satellite imagery
CN108320797A (zh) 一种鼻咽癌数据库及基于所述数据库的综合诊疗决策方法
Hsu et al. Semiparametric estimation of marginal hazard function from case–control family studies
CN116956138A (zh) 一种基于多模态学习的影像基因融合分类方法
CN117457065A (zh) 一种基于单细胞多组学数据识别表型相关细胞类型的方法和系统
Tafavvoghi et al. Publicly available datasets of breast histopathology H&E whole-slide images: A scoping review
CN112687329A (zh) 一种基于非癌组织突变信息的癌症预测系统及其构建方法
Hedyehzadeh et al. A comparison of the efficiency of using a deep CNN approach with other common regression methods for the prediction of EGFR expression in glioblastoma patients
US20100205141A1 (en) method, system and computer program product for data collection and retrieval for medical research
WO2021142625A1 (zh) 基于单细胞转录组测序数据预测细胞空间关系的方法
Santos Breast cancer survival prediction using machine learning and gene expression profiles
Sims et al. A masked image modeling approach to cyclic Immunofluorescence (CyCIF) panel reduction and marker imputation
Yuan et al. Application of machine learning in the management of lymphoma: Current practice and future prospects

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23869968

Country of ref document: EP

Kind code of ref document: A1