CN116597985A - Survival rate prediction model training method, survival period prediction method, survival rate prediction device and survival rate prediction equipment - Google Patents

Survival rate prediction model training method, survival period prediction method, survival rate prediction device and survival rate prediction equipment Download PDF

Info

Publication number
CN116597985A
CN116597985A CN202310631754.6A CN202310631754A CN116597985A CN 116597985 A CN116597985 A CN 116597985A CN 202310631754 A CN202310631754 A CN 202310631754A CN 116597985 A CN116597985 A CN 116597985A
Authority
CN
China
Prior art keywords
patient
image
sample data
pathology
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310631754.6A
Other languages
Chinese (zh)
Inventor
杨家亮
纪彬彬
姜彦�
梁乐彬
王丽霞
田埂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Geneis Beijing Co ltd
Original Assignee
Geneis Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Geneis Beijing Co ltd filed Critical Geneis Beijing Co ltd
Priority to CN202310631754.6A priority Critical patent/CN116597985A/en
Publication of CN116597985A publication Critical patent/CN116597985A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • G06V10/765Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H30/00ICT specially adapted for the handling or processing of medical images
    • G16H30/40ICT specially adapted for the handling or processing of medical images for processing medical images, e.g. editing
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/50ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30096Tumor; Lesion
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Primary Health Care (AREA)
  • Epidemiology (AREA)
  • Biomedical Technology (AREA)
  • Pathology (AREA)
  • Radiology & Medical Imaging (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The application provides a survival rate prediction model training method, a survival period prediction device and equipment, and relates to the field of application of cancer pathological image information. The survival rate prediction model training method comprises the following steps: obtaining dyeing pathology images and clinical pathology data of a plurality of patients aiming at preset cancer, determining survival time labels of the plurality of patients, generating a group of sample data, respectively extracting features of the dyeing pathology images and the clinical pathology data in each group of sample data to obtain image features and pathology features of each group of sample data, then fusing the extracted features to obtain fusion features of each group of sample data, and training an initial feature fusion module and a preset initial perception module according to the fusion features of a plurality of groups of sample data to obtain a survival rate prediction model aiming at the preset cancer. According to the method, the model training is carried out on the fused characteristics, so that the survival rate prediction result of the survival rate prediction model is more accurate, and the prediction performance can be improved.

Description

Survival rate prediction model training method, survival period prediction method, survival rate prediction device and survival rate prediction equipment
Technical Field
The invention relates to the field of application of cancer pathology image information, in particular to a survival rate prediction model training method, a survival period prediction device and survival period prediction equipment.
Background
There are more than one hundred cancers worldwide, severely affecting the life of the patient, for example cervical cancer is the fourth most common female cancer worldwide, and one of the three major cancers affecting women under 45 years of age. In recent years, with the progress of early screening technology of cervical cancer, human Papillomavirus (HPV) vaccine, and recognition of cancer mechanisms such as susceptibility genes, the incidence of cervical cancer has been significantly reduced. The overall survival rate of cervical cancer has increased for many years for 5 years, but is still only around 66%, and the disease places a heavy burden worldwide. Therefore, it is important to diagnose cervical cancer as early as possible and to evaluate prognosis accurately to determine appropriate adjuvant therapy. Intensive therapy can cause tremendous harm to the patient, while mild therapy may not prevent cancer recurrence and metastasis.
However, it is currently unclear how to select the most appropriate adjuvant therapy and therapy intensity for a particular patient. With the development of medical imaging and machine learning techniques, multiple sets of chemical data including somatic mutation, gene expression, clinical information and methylation, and medical images have been used to predict survival of cancer patients after surgical resection, but there have been no relevant reports on survival prediction of cervical cancer patients.
Therefore, the application provides a survival rate prediction model training method, a survival time prediction device and survival time prediction equipment, which can predict the survival time of a patient, reduce the evaluation time of a doctor on the prognosis of the patient, the treatment cost and the medical consumption of the patient, and help the doctor to select the most suitable auxiliary treatment and treatment intensity for the patient.
Disclosure of Invention
The application aims to provide a survival rate prediction model training method, a survival time prediction device and survival time prediction equipment aiming at the defects in the prior art so as to predict the survival time of a patient, reduce the evaluation time of a doctor on the prognosis of the patient, the treatment cost and the medical consumption of the patient and help the doctor to select the most suitable auxiliary treatment and treatment intensity for the patient.
In order to achieve the above purpose, the technical scheme adopted by the embodiment of the application is as follows:
in a first aspect, an embodiment of the present application provides a method for training a survival rate prediction model, where the method includes:
acquiring staining pathology images of a plurality of patients for a preset cancer disease and clinical pathology data of the plurality of patients;
determining survival time labels corresponding to the plurality of patients according to the clinical pathology data of the plurality of patients, wherein the survival time labels are used for indicating the survival states of the corresponding patients in the time periods corresponding to the preset time;
Generating a group of sample data according to the staining pathology image of each patient, the clinical pathology data of each patient and the corresponding survival time label of each patient;
respectively extracting features of the dyeing pathological image and the clinical pathological data in each group of sample data to obtain image features and pathological features of each group of sample data;
fusing the image features and the pathological features of each group of sample data by adopting a preset initial feature fusion module to obtain fusion features of each group of sample data;
and training the initial characteristic fusion module and a preset initial perception module according to fusion characteristics of a plurality of groups of sample data to obtain a survival rate prediction model aiming at the preset cancer.
In an alternative embodiment, the determining the time-to-live tag corresponding to the plurality of patients according to the clinical pathology data of the plurality of patients includes:
determining a total survival time of the plurality of patients according to the clinical pathology data of the plurality of patients, wherein the total survival time of each patient is used for indicating the survival time from the detection time of the staining pathology image of each patient;
Generating survival time labels corresponding to the patients according to the total survival time and the preset survival time of the patients.
In an optional embodiment, the feature extraction is performed on the staining pathology image and the clinical pathology data in each set of sample data, so as to obtain an image feature and a pathology feature of each set of sample data, and the method includes:
performing feature extraction on the dyeing pathology image in each group of sample data by adopting a preset feature extraction network to obtain the image features of each group of sample data;
and extracting the characteristics of the clinical pathology data in each group of sample data by adopting a preset random forest algorithm to obtain the pathology characteristics of each group of sample data.
In an optional embodiment, the generating a set of sample data according to the staining pathology image of each patient, the clinical pathology data of each patient, and the corresponding time-to-live tag of each patient includes:
determining a tumor region in the stained pathology image of each patient;
dividing the tumor area in the staining pathological image of each patient to obtain a plurality of image blocks of each patient;
And generating a group of sample data according to the plurality of image blocks of each patient, the clinical pathology data of each patient and the corresponding survival time label of each patient.
In an optional embodiment, the feature extraction is performed on the staining pathology image and the clinical pathology data in each set of sample data, so as to obtain an image feature and a pathology feature of each set of sample data, and the method includes:
determining image blocks with blank rate exceeding a preset threshold value from a plurality of image blocks in each group of sample data;
removing the image blocks of which the blank rate exceeds the preset threshold value in each group of sample data to obtain target image blocks in each group of sample data;
performing color normalization processing on the target image blocks in each group of sample data;
and respectively extracting features of the target image block and the clinical pathology data after the color normalization processing in each group of sample data to obtain image features and pathology features of each group of sample data.
In a second aspect, an embodiment of the present application further provides a lifetime prediction method, where the method includes:
obtaining a staining pathology image and clinical pathology data of a patient to be predicted aiming at a preset cancer;
Respectively extracting features of the dyeing pathological image and the clinical pathological data of the patient to be predicted to obtain image features and pathological features of the patient to be predicted;
according to the image characteristics and pathological characteristics of the patient to be predicted, adopting a survival rate prediction model aiming at the preset cancer to process so as to obtain the survival rate of the patient to be predicted in a time period corresponding to the preset time;
and determining the survival time of the patient to be predicted from the detection time of the dyed pathological image according to the survival rate in the time period corresponding to the preset time.
In a third aspect, an embodiment of the present application further provides a device for training a survival rate prediction model, where the device includes:
an acquisition module for acquiring staining pathology images for a plurality of patients of a preset cancer and clinical pathology data of the plurality of patients;
the determining module is used for determining survival time labels corresponding to the plurality of patients according to the clinical pathology data of the plurality of patients, wherein the survival time labels are used for indicating the survival states of the corresponding patients in the time periods corresponding to the preset time;
the generation module is used for generating a group of sample data according to the staining pathology image of each patient, the clinical pathology data of each patient and the survival time label corresponding to each patient;
The extraction module is used for respectively extracting characteristics of the dyeing pathological image and the clinical pathological data in each group of sample data to obtain image characteristics and pathological characteristics of each group of sample data;
the fusion module is used for fusing the image features and the pathological features of each group of sample data by adopting a preset initial feature fusion module to obtain fusion features of each group of sample data;
and the training module is used for training the initial characteristic fusion module and the preset initial perception module according to the fusion characteristics of the plurality of groups of sample data to obtain a survival rate prediction model aiming at the preset cancer.
In a fourth aspect, an embodiment of the present application further provides a lifetime prediction apparatus, where the apparatus includes:
the acquisition module is used for acquiring a staining pathology image and clinical pathology data of a patient to be predicted aiming at a preset cancer;
the extraction module is used for respectively extracting characteristics of the dyeing pathological image and the clinical pathological data of the patient to be predicted to obtain image characteristics and pathological characteristics of the patient to be predicted;
the processing module is used for processing the image characteristics and the pathological characteristics of the patient to be predicted by adopting a survival rate prediction model aiming at the preset cancer to obtain the survival rate of the patient to be predicted in a time period corresponding to the preset time;
And the determining module is used for determining the survival time of the patient to be predicted from the detection time of the dyed pathological image according to the survival rate in the time period corresponding to the preset time.
In a fifth aspect, an embodiment of the present application further provides a computer apparatus, including: a processor, a storage medium and a bus, the storage medium storing program instructions executable by the processor, the processor and the storage medium communicating via the bus when the computer device is running, the processor executing the program instructions to perform the steps of the survival prediction model training method according to any one of the first aspects or to perform the steps of the survival prediction method according to the second aspect.
In a sixth aspect, the present application provides a computer readable storage medium having stored thereon a computer program for execution by a processor of the steps of the survival prediction model training method according to any of the first aspects, or for performing the steps of the survival prediction method according to the second aspect.
The beneficial effects of the application are as follows:
the embodiment of the application provides a survival rate prediction model training method, a survival period prediction method, a survival rate prediction device and equipment, wherein the survival rate prediction model training method comprises the following steps: obtaining dyeing pathology images of a plurality of patients aiming at preset cancer and clinical pathology data of the plurality of patients, determining survival time labels corresponding to the plurality of patients according to the clinical pathology data of the plurality of patients, generating a group of sample data according to the dyeing pathology images of the plurality of patients, the clinical pathology data of the plurality of patients and the survival time labels corresponding to the plurality of patients, respectively extracting features of the dyeing pathology images and the clinical pathology data in the group of sample data to obtain image features and pathology features of the group of sample data, fusing the image features and the pathology features of the group of sample data by adopting a preset initial feature fusion module to obtain fusion features of the group of sample data, and finally training the initial feature fusion module and a preset initial perception module according to the fusion features of the plurality of groups of sample data to obtain a survival rate prediction model aiming at the preset cancer. According to the method, the characteristics of the dyeing pathology image and the clinical pathology data in each group of sample data are respectively extracted, the extracted characteristics are fused, and finally, the fused characteristics are subjected to model training, so that when the survival rate prediction model for the preset cancer is finally obtained, the prediction result is more accurate, the prediction performance is improved, the evaluation time of a doctor on the prognosis of a patient, the treatment cost and the medical consumption of the patient are greatly reduced, and the doctor can be helped to select the most suitable auxiliary treatment and treatment intensity for the patient.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow frame of a training method for a survival rate prediction model according to an embodiment of the present application;
FIG. 2 is a schematic flow chart of a method for training a survival rate prediction model according to an embodiment of the present application;
FIG. 3 is a second flowchart of a training method for a survival rate prediction model according to an embodiment of the present application;
FIG. 4 is a third flow chart of a training method for a survival rate prediction model according to an embodiment of the present application;
FIG. 5 is a flowchart of a training method for a survival rate prediction model according to an embodiment of the present application;
FIG. 6 is a flowchart of a training method for a survival rate prediction model according to an embodiment of the present application;
FIG. 7 is a flowchart of a training method for a survival rate prediction model according to an embodiment of the present application;
FIG. 8 is a graph of a subject's performance characteristics provided by an embodiment of the present application;
FIG. 9 is a flowchart of a method for lifetime prediction according to an embodiment of the present application;
FIG. 10 is a schematic diagram of functional modules of a training device for a survival rate prediction model according to an embodiment of the present application;
FIG. 11 is a schematic diagram of a functional module of a lifetime prediction device according to an embodiment of the present application;
fig. 12 is a schematic diagram of a computer device according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application.
Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
In the description of the present application, it should be noted that, if the terms "upper", "lower", and the like indicate an azimuth or a positional relationship based on the azimuth or the positional relationship shown in the drawings, or an azimuth or the positional relationship conventionally put in use of the product of the application, it is merely for convenience of describing the present application and simplifying the description, and it is not indicated or implied that the apparatus or element referred to must have a specific azimuth, be configured and operated in a specific azimuth, and thus should not be construed as limiting the present application.
Furthermore, the terms first, second and the like in the description and in the claims and in the above-described figures, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be noted that the features of the embodiments of the present application may be combined with each other without conflict.
In order to accurately and automatically obtain the survival rate of a patient in a preset time period, so that a doctor can formulate a proper treatment scheme for the patient according to the survival rate, the embodiment of the application provides a survival rate prediction model training method, and fig. 1 is a flow frame of the survival rate prediction model training method provided by the embodiment of the application. As shown in fig. 1, firstly, feature extraction is performed on dyeing pathology images and clinical pathology data in a plurality of sample data, feature fusion is performed on the extracted features, and finally training is performed to obtain a survival rate prediction model aiming at preset cancer diseases, so that the survival rate of patients in a preset time period can be accurately predicted through the model, the evaluation time and medical consumption of doctors on prognosis of the patients are reduced, and doctors can be helped to make treatment schemes which are most suitable for the patients.
The following describes the training method of the survival rate prediction model provided by the embodiment of the application in detail through specific examples with reference to the accompanying drawings. The survival rate prediction model training method provided by the embodiment of the application can be realized by the following steps: the training algorithm of the preset survival rate prediction model or the computer equipment of the detection software is realized by running the algorithm or the software. The computer device may be, for example, a server or a terminal, which may be a user computer. Fig. 2 is a schematic flow chart of a method for training a survival rate prediction model according to an embodiment of the present application. As shown in fig. 2, the method includes:
S101, obtaining staining pathology images of a plurality of patients with preset cancer and clinical pathology data of the plurality of patients.
In this embodiment, the preset cancer may be any cancer of different types of cancers such as cervical cancer, breast cancer, lung cancer, etc., and the staining pathology images of the patients with the preset cancer and the clinical pathology data of the patients may be downloaded from a preset database, where the preset database may be a database in which sample data of the patients with the various cancers are recorded, for example, a gene sequencing TCGA (The Cancer Genome Atlas) database in which various data of 20000 samples of 33 types of cancers are recorded, including transcriptome expression data, genomic variation data, methylation data, clinical data, etc.
The stained pathology image is a pathology image of a patient, which is obtained by staining a pathology image of the patient with hematoxylin-eosin stain, wherein hematoxylin stains the nucleus into purplish blue, eosin stains the extracellular matrix and the cytoplasm into pink, and other structures show different shades, hues and combinations of these colors, and thus, the nuclear and cytoplasmic parts of cells can be easily distinguished.
The clinical pathology data are basic information and clinical treatment information of a patient, and specifically comprise: age, tumor stage of preset cancer, survival time, last follow-up time, survival state, etc. The clinical pathology data and the staining pathology image of the patient are mutually corresponding.
S102, determining survival time labels corresponding to a plurality of patients according to clinical pathology data of the plurality of patients.
According to the clinical pathology data of the plurality of patients obtained in the step S101, the total survival time of each patient is counted, the statistical criterion is that if the survival state of the patient is non-survival, the survival time corresponding to the patient is the total survival time, if the survival state of the patient is survival, the survival time corresponding to the patient is the last follow-up time, and the survival time label of the patient is determined according to the total survival time, so as to determine the survival time labels corresponding to the plurality of patients, wherein the survival time labels are used for indicating the survival state of the corresponding patient in the time period corresponding to the preset time, namely, the survival state of the patient in the time period corresponding to the preset time is non-survival or survival state.
S103, generating a group of sample data according to the staining pathology image of each patient, the clinical pathology data of each patient and the corresponding survival time label of each patient.
S104, respectively extracting features of the dyeing pathology image and the clinical pathology data in each group of sample data to obtain image features and pathology features of each group of sample data.
Specifically, each group of sample data comprises a staining pathology image, clinical pathology data and a survival time label of a patient, and the staining pathology image and the clinical pathology data are respectively subjected to feature extraction to obtain image features and pathology features of each group of sample data.
S105, fusing the image features and the pathological features of each group of sample data by adopting a preset initial feature fusion module to obtain fusion features of each group of sample data.
The preset initial feature fusion module is used for fusing the image features and the pathological features in each group of sample data, and specifically adopts a preset fusion algorithm to perform feature fusion on the image features and the pathological features, wherein the preset fusion algorithm can be a bilinear pooling algorithm, so that fusion features of each group of sample data are obtained.
S106, training an initial characteristic fusion module and a preset initial perception module according to fusion characteristics of a plurality of groups of sample data to obtain a survival rate prediction model aiming at preset cancer.
Specifically, the fusion features of a plurality of groups of sample data in the initial feature fusion module are input into a preset initial perception module for perception classification, and the initial feature fusion module and the preset initial perception module are trained to obtain a survival rate prediction model aiming at preset cancer.
The preset initial sensing module may be a three-layer sensor, including a Batch normalization layer (Batch-Norm, BN layer), an activation function layer (Relu layer) and a full connection layer, where each layer of the BN layer and the Relu layer has 128 nodes, the fusion features of multiple groups of sample data in the initial feature fusion module are firstly input to the BN layer, the fusion feature data is normalized, i.e. mean normalization, by the BN layer, the difference between multiple groups of samples is reduced, convergence of the neural network is accelerated, the BN layer outputs 128-dimensional vectors, then the output vectors are input to the Relu layer, the model calculation amount is saved by the Relu layer, gradient disappearance can be avoided, and the generation of over-fitting problem is alleviated, and because the Relu layer has 128 nodes, the Relu layer outputs 128-dimensional vectors, then through the full connection layer, outputs 2-dimensional vectors, i.e. the probability of being classified into 1 and 0, and finally classification is achieved by using a normalization index (Softmax) function.
It should be noted that, when training the initial feature fusion module and the preset initial perception module, five-fold cross validation is adopted to divide all sample data into a training set and a verification set, a model is constructed through sample data of the training set, and a survival rate prediction model classification result is verified through sample data of the verification set, for example, if all sample data is divided into 5 groups of sample data, 4 groups are used as the training set, 1 group is used as the verification set, and for the 4 groups of training sets, a cross entropy loss function is adopted to optimize end-to-end parameters of the neural network, so that an optimal survival rate prediction model meeting a loss threshold is trained, and the cross entropy loss function can be expressed as:
where y represents the true label of the sample, the positive class label is 1, the negative class label is 0,expressed as a probability of being predicted as a positive class.
And for 1 group of verification sets, verifying the survival rate prediction model through the verification sets, storing an AUC result, repeating for 5 times until each group of data is subjected to the verification sets, and finally averaging the AUC results of the 5 verification sets to be used as a final AUC result of the survival rate prediction model, wherein the AUC result is used for measuring a performance index of classification of the survival rate prediction model, the greater the AUC result is, the more accurate the prediction result of the survival rate prediction model is, and the better the classification effect is, so that the survival rate prediction model aiming at preset cancer is obtained.
In summary, the method for training the survival rate prediction model provided by the embodiment of the application includes: obtaining dyeing pathology images of a plurality of patients aiming at preset cancer and clinical pathology data of the plurality of patients, determining survival time labels corresponding to the plurality of patients according to the clinical pathology data of the plurality of patients, generating a group of sample data according to the dyeing pathology images of the plurality of patients, the clinical pathology data of the plurality of patients and the survival time labels corresponding to the plurality of patients, respectively extracting features of the dyeing pathology images and the clinical pathology data in the group of sample data to obtain image features and pathology features of the group of sample data, fusing the image features and the pathology features of the group of sample data by adopting a preset initial feature fusion module to obtain fusion features of the group of sample data, and finally training the initial feature fusion module and a preset initial perception module according to the fusion features of the plurality of groups of sample data to obtain a survival rate prediction model aiming at the preset cancer. According to the method, the characteristics of the dyeing pathology image and the clinical pathology data in each group of sample data are respectively extracted, the extracted characteristics are fused, and finally, the fused characteristics are subjected to model training, so that when the survival rate prediction model for the preset cancer is finally obtained, the prediction result is more accurate, the prediction performance is improved, the evaluation time of a doctor on the prognosis of a patient, the treatment cost and the medical consumption of the patient are greatly reduced, and the doctor can be helped to select the most suitable auxiliary treatment and treatment intensity for the patient.
On the basis of the survival rate prediction model training method provided by the embodiment, the embodiment of the application also provides another possible implementation example of the survival rate prediction model training method. FIG. 3 is a second flowchart of a training method for a survival rate prediction model according to an embodiment of the present application. As shown in fig. 3, determining a time-to-live signature corresponding to a plurality of patients based on clinical pathology data of the plurality of patients includes:
s201, respectively determining total survival time of a plurality of patients according to clinical pathology data of the plurality of patients.
In the present embodiment, according to the clinical pathology data of a plurality of patients acquired in the above step S102, the total survival time of the plurality of patients is statistically determined according to the clinical pathology data, wherein the total survival time of each patient is used to indicate the survival time from the detection time of the stained pathology image of each patient.
S202, generating survival time labels corresponding to a plurality of patients according to the total survival time and the preset survival time of the plurality of patients.
The total survival time of the plurality of patients is classified according to the preset survival time, and if the preset survival time is 3 years, the total survival time of the plurality of patients is divided into two groups, which are respectively: the ST group is a patient whose total survival time is 3 years or less, the LT group is a patient whose total survival time is more than 3 years, and survival time tags are generated for both groups of patients, the survival time tag of the patient whose total survival time is 3 years or less is generated as 1, and the LT group is a patient whose total survival time is more than 3 years is generated as 0.
In the method provided by the embodiment of the application, the total survival time of a plurality of patients is respectively determined according to the clinical pathology data of the plurality of patients, the total survival time of each patient is used for indicating the survival time from the detection time of the dyed pathology image of each patient, and the survival time labels corresponding to the plurality of patients are generated according to the total survival time of the plurality of patients and the preset survival time, so that the generation time labels of each patient are obtained and are used for training a subsequent survival rate prediction model.
The embodiment of the application also provides another possible implementation example of the survival rate prediction model training method. Fig. 4 is a third flowchart of a training method for a survival rate prediction model according to an embodiment of the present application. As shown in fig. 4, feature extraction is performed on the staining pathology image and the clinical pathology data in each set of sample data, so as to obtain image features and pathology features of each set of sample data, including:
s301, adopting a preset feature extraction network to extract features of the dyed pathological images in each group of sample data, and obtaining image features of each group of sample data.
In this embodiment, the preset feature extraction Network is used to perform feature extraction on the dyed pathological image in each set of sample data, where the preset feature extraction Network may use a Residual Network (Residual Network 18) in the convolutional neural Network, and Residual blocks inside the Residual Network structure use jump connection, so that the gradient vanishing problem caused by adding depth in the deep neural Network can be alleviated.
Specifically, the ResNet18 network structure has 18 layers, firstly, a 512×512 pixel and 3 channel dyeing pathological image block, namely (512×512×3) dyeing pathological image block is subjected to size adjustment to be (224×224×3), and then is input into a convolution layer of the ResNet18 network structure, and since the convolution kernel of the convolution layer is 7×7 and the step length is 2, the output of the convolution layer is 112×112 and the channel number is 64; then input to a max-pooling layer with a convolution kernel size of 3 x 3 and a step size of 2, and output of 56 x 56, 64 channels. Four residual blocks, each consisting of two basic blocks, each containing two convolutional layers, are then passed. After passing through four residual blocks, namely sixteen convolution layers, the output is 7 multiplied by 7, and the channel number is 512; the last two layers of the ResNet18 network structure are an average pooling layer and a full connection layer, the dimension of the image is 1 multiplied by 512 after the average pooling layer is adopted, and finally, the feature extraction of each dyed pathological image block is finished after the full connection layer is adopted, and finally, 128-dimensional vectors are output, so that the image features of each group of sample data are obtained.
S302, carrying out feature extraction on clinical pathology data in each group of sample data by adopting a preset random forest algorithm to obtain pathology features of each group of sample data.
Specifically, the missing values of the clinical pathology data in each group of sample data are processed, the mode of the corresponding features is complemented for the discrete variable, the average value of the corresponding features is used for the continuous variable to fill the features to be between 0 and 1, then the clinical pathology data in each group of sample data are ranked according to importance through a random forest algorithm, and the pathology features with the best performance, such as age, tumor stage of preset cancer and the like, are selected from the ranking.
In the method provided by the embodiment of the application, the characteristic extraction is carried out on the dyeing pathological images in each group of sample data by adopting a preset characteristic extraction network to obtain the image characteristics of each group of sample data, the characteristic extraction is carried out on the clinical pathological data in each group of sample data by adopting a preset random forest algorithm to obtain the pathological characteristics of each group of sample data, and finally the extracted image characteristics and the pathological characteristics are used for subsequent characteristic fusion, so that the prediction result is more accurate when the survival rate prediction model is finally obtained for survival rate prediction.
The embodiment of the application also provides another possible implementation example of the survival rate prediction model training method. Fig. 5 is a flowchart of a training method for a survival rate prediction model according to an embodiment of the present application. As shown in fig. 5, generating a set of sample data according to the staining pathology image of each patient, the clinical pathology data of each patient, and the corresponding time-to-live label of each patient, includes:
S401, determining a tumor area in the staining pathology image of each patient.
In this embodiment, the tumor area in the stained pathological image of each patient is determined by marking the tumor area in the stained pathological image of each patient using a labeling analysis platform (ASAP), wherein the labeling analysis platform is an open source platform that is mainly used to visualize and label the entire stained pathological image.
S402, segmenting a tumor area in the dyed pathological image of each patient to obtain a plurality of image blocks of each patient.
After determining the tumor region in the stained pathology image of each patient according to the above step S401, the stained pathology image of each patient is divided into a plurality of image blocks of 512×512 pixels, thereby obtaining a plurality of image blocks of each patient.
Generating an image tag for each image block according to the total survival time of the patient, wherein the image tag of the image block is 1 if the total survival time of the patient is less than or equal to the preset survival time, and the image tag of the image block is 0 if the total survival time of the patient is greater than the preset survival time.
S403, generating a group of sample data according to the plurality of image blocks of each patient, the clinical pathology data of each patient and the corresponding survival time label of each patient.
Wherein, a plurality of image blocks, clinical pathology data and time-to-live labels in each group of sample data are mutually corresponding.
Before determining the tumor area in the staining pathology image of each patient, the unqualified staining pathology image of each patient needs to be filtered to obtain the staining pathology image with qualified quality, and then the tumor area is determined.
Specifically, the method for identifying whether the dyed pathological image is qualified is to identify abnormal values of image metrics in the dyed pathological image by using an open source quality control tool HistocC of a digital pathological section, so as to filter out the dyed pathological image with the abnormal values, wherein the image metrics can comprise a color histogram, brightness, contrast and the like. For example, whether the j-th image metric value of the i-th sample in the metric matrix is abnormal may be determined by a preset condition, where the preset condition may be expressed as:
mean(T j )-2×std(T j )≤a ij ≤mean(T j )+2×std(T j )
wherein a is ij The j-th image metric value, T, representing the i-th sample in metric matrix A j Image metrics representing all samples, mean (T j ) Represents the average of all samples, std (T j ) Representing standard deviation of all samples, if the j-th image metric value of the i-th sample meets the preset condition, the i-th sample is regarded as a qualified sample, the dyed pathological image in the i-th sample is qualified and reserved, if the j-th image metric value of the i-th sample meets the preset condition if the j-th image metric value of the i samples does not meet the preset condition, the i-th sample is regarded as a disqualified sample, and the staining pathological image in the i-th sample is disqualified to filter the staining pathological image, so that the artifacts and the batch effect caused by no intention in the conventional slide preparation (such as staining) process are overcome.
In the method provided by the embodiment of the application, the tumor area in the staining pathology image of each patient is determined, the tumor area in the staining pathology image of each patient is segmented to obtain a plurality of image blocks of each patient, and then a group of sample data is generated according to the plurality of image blocks of each patient, the clinical pathology data of each patient and the survival time label corresponding to each patient, so that the preprocessing of the staining pathology image of each patient is more beneficial to the subsequent extraction of image features.
The embodiment of the application also provides another possible implementation example of the survival rate prediction model training method. Fig. 6 is a flowchart of a method for training a survival rate prediction model according to an embodiment of the present application. As shown in fig. 6, feature extraction is performed on the staining pathology image and the clinical pathology data in each set of sample data, so as to obtain image features and pathology features of each set of sample data, including:
S501, determining an image block with the blank rate exceeding a preset threshold value from a plurality of image blocks in each group of sample data.
S502, removing image blocks with blank rate exceeding a preset threshold value in each group of sample data to obtain target image blocks in each group of sample data.
In this embodiment, an image block with a blank ratio exceeding a preset threshold is determined from a plurality of image blocks by a preset algorithm, where the preset algorithm may determine an image block with a blank ratio exceeding a preset threshold by using an algorithm provided by a cross-platform computer vision library (OpenCV (version 4.1.1)) in a computer programming language Python, and remove the image block with the blank ratio exceeding the preset threshold to obtain a target image block in each set of sample data, where the preset threshold may be set to 20%, 30%, 35%, and the like, and is not limited herein.
S503, performing color normalization processing on the target image block in each group of sample data.
Specifically, the target image block in each set of sample data is subjected to color normalization processing by using a computational pathology kit (tiatosox), so that the target image block after the color normalization processing in each set of sample data is obtained.
S504, respectively extracting features of the target image block and the clinical pathology data after the color normalization processing in each group of sample data to obtain image features and pathology features of each group of sample data.
According to the step S503, a target image block after the color normalization processing is obtained in each set of sample data, and the image features of the target image block after the color normalization processing are extracted, and the pathological features in the clinical pathological data are extracted.
In the method provided by the embodiment of the application, the image blocks with the blank rate exceeding the preset threshold value are determined from the plurality of image blocks in each group of sample data, the image blocks with the blank rate exceeding the preset threshold value in each group of sample data are removed, the target image blocks in each group of sample data are obtained, the target image blocks in each group of sample data are subjected to color normalization processing, finally, the target image blocks and the clinical pathology data subjected to the color normalization processing in each group of sample data are respectively subjected to feature extraction, the image features and the pathology features of each group of sample data are obtained, the interference of a high blank area on a training model can be eliminated through the removal of the image blocks with the blank rate exceeding the preset threshold value in each group of sample data, meanwhile, the influence of color deviation on the training model can be reduced through the color normalization processing on the target image blocks in each group of sample data, and the finally obtained survival rate prediction model is more accurate in the prediction result of survival rate.
In the method provided by the embodiment of the application, the image characteristics and the pathological characteristics of each group of sample data are obtained by carrying out color normalization processing on the target image block in each group of sample data and then respectively carrying out characteristic extraction on the target image block subjected to the color normalization processing and the clinical pathological data in each group of sample data. By carrying out color normalization processing on the target image blocks in each group of sample data, the influence of color deviation on the training model can be reduced, and the finally obtained survival rate prediction model is more accurate in the prediction result of the survival rate.
The embodiment of the application also provides a complete example, and fig. 7 is a flowchart of a method for training a survival rate prediction model according to the embodiment of the application. As shown in fig. 7, a detailed explanation is given to the survival prediction model training method:
s601, obtaining staining pathology images and clinical pathology data of a plurality of patients with preset cancer diseases.
S602, determining survival time labels corresponding to a plurality of patients according to clinical pathology data of the plurality of patients.
For example, the dyeing pathology images and clinical pathology data of 251 patients aiming at preset cancer diseases are obtained, the dyeing pathology images and clinical pathology data of 123 patients without survival information are removed from 251 patients, then the unqualified dyeing pathology images are removed by preprocessing the dyeing pathology images of 128 patients, the dyeing pathology images and clinical pathology data of 119 patients are finally obtained, and the clinical pathology data of 119 patients are summarized as follows:
TABLE 1 summary of clinical pathology data
/>
Wherein, table 1 describes that 119 patients are divided into 2 groups by total survival time and preset survival time in clinical pathology data, namely 3 years, 40 patients in ST group with total survival time less than or equal to 3 years, 79 patients in LT group with total survival time greater than 3 years, and the survival time labels corresponding to 119 patients are determined.
S603, determining a tumor area in the staining pathology image of each patient, and dividing the tumor area in the staining pathology image of each patient to obtain a plurality of image blocks of each patient.
The tumor areas in the stained pathological images of 119 patients are determined by marking the tumor areas in the stained pathological images of the 119 patients by using an annotation analysis platform (ASAP), and the tumor areas in the stained pathological images of each patient are segmented to obtain a plurality of image blocks of each patient.
S604, determining image blocks with blank rate exceeding a preset threshold value from a plurality of image blocks in each group of sample data, and removing the image blocks with blank rate exceeding the preset threshold value in each group of sample data to obtain target image blocks in each group of sample data.
Specifically, an image block with a blank ratio exceeding 30% is determined from a plurality of image blocks of 119 patients by a preset algorithm, that is, an algorithm provided by a cross-platform computer vision library (OpenCV (version 4.1.1)) in Python, and removed, so as to obtain target image blocks of 119 patients.
S605, performing color normalization processing on the target image block in each group of sample data.
S606, performing feature extraction on the target image block in each group of sample data by adopting a preset feature extraction network to obtain the image features of each group of sample data.
S607, carrying out feature extraction on clinical pathology data in each group of sample data by adopting a preset random forest algorithm to obtain pathology features of each group of sample data.
S608, fusing the image features and the pathological features of each group of sample data by adopting a preset initial feature fusion module to obtain fusion features of each group of sample data.
S609, training an initial characteristic fusion module and a preset initial perception module according to fusion characteristics of multiple groups of sample data to obtain a survival rate prediction model aiming at preset cancer.
The above steps are explained in detail in the methods for training the survival rate prediction model provided in S101-S504, and are not described herein.
It should be noted that fig. 8 is a graph of a working characteristic of a subject according to an embodiment of the present application. As shown in fig. 8, for the survival rate prediction model obtained by training only the image features without fusion of the image features and the pathology features, the final AUC result is 0.725, and for the survival rate prediction model obtained by training the fusion features with the AUC result of 0.738, it is clear that the survival rate prediction model obtained by training only the image features has smaller AUC result and poorer prediction effect than the survival rate prediction model obtained by training the fusion features.
The embodiment of the application also provides a lifetime prediction method which can predict the lifetime of the patient to be predicted. Fig. 9 is a flowchart of a lifetime prediction method according to an embodiment of the present application. As shown in fig. 9, the method includes:
s701, obtaining a staining pathology image and clinical pathology data of a patient to be predicted aiming at a preset cancer.
In this embodiment, a staining pathology image of a patient to be predicted of a preset cancer and clinical pathology data are obtained, wherein the staining pathology image is a pathology image obtained by staining a pathology image of the patient to be predicted with a hematoxylin-eosin stain, and the clinical pathology data are basic information and clinical treatment information of the patient to be predicted, and specifically include: age, tumor stage of preset cancer, survival time, last follow-up time, survival state, etc.
S702, respectively extracting features of the dyed pathological image and the clinical pathological data of the patient to be predicted to obtain image features and pathological features of the patient to be predicted.
Specifically, the characteristic extraction can be performed on the dyed pathological image of the patient to be predicted through a preset characteristic extraction network to obtain the image characteristic of the patient to be predicted, and the characteristic extraction can be performed on the clinical pathological data of the patient to be predicted through a preset random forest algorithm to obtain the pathological characteristic of the patient to be predicted.
S703, processing by adopting a survival rate prediction model aiming at preset cancer according to the image characteristics and pathological characteristics of the patient to be predicted, so as to obtain the survival rate of the patient to be predicted in a time period corresponding to the preset time.
The survival rate prediction model for the preset cancer is a deep learning model based on fusion of the dyeing pathology image and the clinical pathology data, and is obtained through the survival rate prediction model training method provided by the steps S101-S602, and after the image characteristics and the pathology characteristics of the patient to be predicted are input into the survival rate prediction model for the preset cancer, the survival rate of the patient to be predicted in a time period corresponding to the preset time can be obtained.
S704, determining the survival time of the patient to be predicted from the detection time of the dyed pathological image according to the survival rate in the time period corresponding to the preset time.
For example, the preset survival threshold value is 0.5, the time period corresponding to the preset time is 3 years, if the survival rate of the patient to be predicted in 3 years is smaller than the preset survival threshold value 0.5, the output label is 0, the survival time of the patient to be predicted from the detection time of the dyed pathological image is larger than 3 years, if the survival rate of the patient to be predicted in 3 years is larger than or equal to the preset survival threshold value 0.5, the output label is 1, the survival time of the patient to be predicted from the detection time of the dyed pathological image is smaller than or equal to 3 years, and accordingly the survival time of the patient to be predicted from the detection time of the dyed pathological image is determined according to the survival rate of the patient in the time period corresponding to the preset time.
In summary, the method for predicting lifetime provided by the embodiment of the application includes: the method comprises the steps of acquiring a dyeing pathology image and clinical pathology data of a patient to be predicted aiming at preset cancer, respectively extracting features of the dyeing pathology image and the clinical pathology data of the patient to be predicted to obtain image features and pathology features of the patient to be predicted, processing by adopting a survival rate prediction model aiming at the preset cancer according to the image features and the pathology features of the patient to be predicted to obtain survival rate of the patient to be predicted in a time period corresponding to preset time, and determining survival time of the patient to be predicted from detection time of the dyeing pathology image according to the survival rate of the patient to be predicted in the time period corresponding to the preset time.
The specific implementation process and the generated technical effects of the device for training the survival rate prediction model, the device for predicting survival time and the computer device provided by any of the embodiments of the present application are the same as those of the corresponding method embodiments, and for brevity, no reference is made to the corresponding contents of the method embodiments in this embodiment.
Fig. 10 is a schematic diagram of functional modules of a training device for a survival rate prediction model according to an embodiment of the present application. As shown in fig. 10, the survival prediction model training apparatus 100 includes:
an acquisition module 110 for acquiring staining pathology images for a plurality of patients of a preset cancer disease and clinical pathology data of the plurality of patients;
the determining module 120 is configured to determine time-to-live labels corresponding to the plurality of patients according to the clinical pathology data of the plurality of patients, where the time-to-live labels are used to indicate a survival state of the corresponding patient in a time period corresponding to a preset time;
a generating module 130, configured to generate a set of sample data according to the stained pathology image of each patient, the clinical pathology data of each patient, and the time-to-live label corresponding to each patient;
the extracting module 140 is configured to perform feature extraction on the dyeing pathology image and the clinical pathology data in each set of sample data, so as to obtain image features and pathology features of each set of sample data;
the fusion module 150 is configured to fuse the image features and the pathological features of each set of sample data by using a preset initial feature fusion module, so as to obtain fusion features of each set of sample data;
The training module 160 is configured to train the initial feature fusion module and the preset initial perception module according to the fusion features of the plurality of sets of sample data, so as to obtain a survival rate prediction model for the preset cancer.
In an alternative embodiment, the determining module 120 is further configured to determine, according to the clinical pathology data of the plurality of patients, a total survival time of the plurality of patients, respectively, where the total survival time of each patient is used to indicate a survival time from a detection time of the stained pathology image of each patient; generating survival time labels corresponding to the patients according to the total survival time and the preset survival time of the patients.
In an optional embodiment, the extracting module 140 is further configured to perform feature extraction on the dyed pathological image in each set of sample data by using a preset feature extraction network, so as to obtain an image feature of each set of sample data; and carrying out feature extraction on the clinical pathology data in each group of sample data by adopting a preset random forest algorithm to obtain pathology features of each group of sample data.
In an alternative embodiment, the generating module 130 is further configured to determine a tumor region in the stained pathology image of each patient; dividing a tumor region in the dyed pathological image of each patient to obtain a plurality of image blocks of each patient; a set of sample data is generated from the plurality of image blocks for each patient, the clinical pathology data for each patient, and the corresponding time-to-live signature for each patient.
In an alternative embodiment, the extracting module 140 is further configured to determine, from the plurality of image blocks in each set of sample data, an image block whose blank rate exceeds a preset threshold; removing image blocks with blank rate exceeding a preset threshold value in each group of sample data to obtain target image blocks in each group of sample data; performing color normalization processing on the target image blocks in each group of sample data; and respectively carrying out feature extraction on the target image block and the clinical pathology data which are subjected to the color normalization processing in each group of sample data to obtain the image features and the pathology features of each group of sample data.
Fig. 11 is a schematic functional block diagram of a lifetime prediction device according to an embodiment of the present application. As shown in fig. 11, the lifetime prediction apparatus 200 includes:
an acquisition module 210 for acquiring a staining pathology image and clinical pathology data of a patient to be predicted for a preset cancer;
the extracting module 220 is configured to perform feature extraction on the staining pathological image and the clinical pathological data of the patient to be predicted, so as to obtain image features and pathological features of the patient to be predicted;
the processing module 230 is configured to process by using a survival rate prediction model for a preset cancer according to the image features and the pathological features of the patient to be predicted, so as to obtain a survival rate of the patient to be predicted in a time period corresponding to the preset time;
The determining module 240 is configured to determine a lifetime of the patient to be predicted from the detection time of the dyed pathology image according to the survival rate in the time period corresponding to the preset time.
The foregoing apparatus is used for executing the method provided in the foregoing embodiment, and its implementation principle and technical effects are similar, and are not described herein again.
The above modules may be one or more integrated circuits configured to implement the above methods, for example: one or more application specific integrated circuits (Application Specific Integrated Circuit, abbreviated as ASICs), or one or more microprocessors, or one or more field programmable gate arrays (Field Programmable Gate Array, abbreviated as FPGAs), etc. For another example, when a module above is implemented in the form of a processing element scheduler code, the processing element may be a general-purpose processor, such as a central processing unit (Central Processing Unit, CPU) or other processor that may invoke the program code. For another example, the modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).
FIG. 12 is a schematic diagram of a computer device that may be used for survival prediction model training, or survival prediction, according to an embodiment of the present application. As shown in fig. 12, the computer apparatus 300 includes: a processor 310, a storage medium 320, and a bus 330.
The storage medium 320 stores machine-readable instructions executable by the processor 310. When the computer device is running, the processor 310 communicates with the storage medium 320 via the bus 330, and the processor 310 executes the machine-readable instructions to perform the steps of the method embodiments described above. The specific implementation manner and the technical effect are similar, and are not repeated here.
Optionally, the present application further provides a storage medium 320, where the storage medium 320 stores a computer program, which when executed by a processor performs the steps of the above-mentioned method embodiments. The specific implementation manner and the technical effect are similar, and are not repeated here.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in hardware plus software functional units.
The integrated units implemented in the form of software functional units described above may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium, and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (english: processor) to perform some of the steps of the methods according to the embodiments of the invention. And the aforementioned storage medium includes: u disk, mobile hard disk, read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk, etc.
The foregoing is merely illustrative of embodiments of the present invention, and the present invention is not limited thereto, and any changes or substitutions can be easily made by those skilled in the art within the technical scope of the present invention, and the present invention is intended to be covered by the present invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims (9)

1. A method of training a survival prediction model, the method comprising:
acquiring staining pathology images of a plurality of patients for a preset cancer disease and clinical pathology data of the plurality of patients;
determining survival time labels corresponding to the plurality of patients according to the clinical pathology data of the plurality of patients, wherein the survival time labels are used for indicating the survival states of the corresponding patients in the time periods corresponding to the preset time;
generating a group of sample data according to the staining pathology image of each patient, the clinical pathology data of each patient and the corresponding survival time label of each patient;
respectively extracting features of the dyeing pathological image and the clinical pathological data in each group of sample data to obtain image features and pathological features of each group of sample data;
Fusing the image features and the pathological features of each group of sample data by adopting a preset initial feature fusion module to obtain fusion features of each group of sample data;
and training the initial characteristic fusion module and a preset initial perception module according to fusion characteristics of a plurality of groups of sample data to obtain a survival rate prediction model aiming at the preset cancer.
2. The method of claim 1, wherein determining a corresponding time-to-live signature for the plurality of patients based on the clinical pathology data for the plurality of patients comprises:
determining a total survival time of the plurality of patients according to the clinical pathology data of the plurality of patients, wherein the total survival time of each patient is used for indicating the survival time from the detection time of the staining pathology image of each patient;
generating survival time labels corresponding to the patients according to the total survival time and the preset survival time of the patients.
3. The method according to claim 1, wherein the feature extraction is performed on the staining pathology image and the clinical pathology data in each set of sample data, respectively, to obtain the image feature and the pathology feature of each set of sample data, including:
Performing feature extraction on the dyeing pathology image in each group of sample data by adopting a preset feature extraction network to obtain the image features of each group of sample data;
and extracting the characteristics of the clinical pathology data in each group of sample data by adopting a preset random forest algorithm to obtain the pathology characteristics of each group of sample data.
4. The method of claim 1, wherein generating a set of sample data from the stained pathology image for each patient, the clinical pathology data for each patient, and the time-to-live signature for each patient, comprises:
determining a tumor region in the stained pathology image of each patient;
dividing the tumor area in the staining pathological image of each patient to obtain a plurality of image blocks of each patient;
and generating a group of sample data according to the plurality of image blocks of each patient, the clinical pathology data of each patient and the corresponding survival time label of each patient.
5. The method according to claim 4, wherein the feature extraction is performed on the staining pathology image and the clinical pathology data in each set of sample data, respectively, to obtain the image feature and the pathology feature of each set of sample data, including:
Determining image blocks with blank rate exceeding a preset threshold value from a plurality of image blocks in each group of sample data;
removing the image blocks of which the blank rate exceeds the preset threshold value in each group of sample data to obtain target image blocks in each group of sample data;
performing color normalization processing on the target image blocks in each group of sample data;
and respectively extracting features of the target image block and the clinical pathology data after the color normalization processing in each group of sample data to obtain image features and pathology features of each group of sample data.
6. A method of lifetime prediction, the method comprising:
obtaining a staining pathology image and clinical pathology data of a patient to be predicted aiming at a preset cancer;
respectively extracting features of the dyeing pathological image and the clinical pathological data of the patient to be predicted to obtain image features and pathological features of the patient to be predicted;
according to the image characteristics and pathological characteristics of the patient to be predicted, adopting a survival rate prediction model aiming at the preset cancer to process so as to obtain the survival rate of the patient to be predicted in a time period corresponding to the preset time;
And determining the survival time of the patient to be predicted from the detection time of the dyed pathological image according to the survival rate in the time period corresponding to the preset time.
7. A survival prediction model training apparatus, the apparatus comprising:
an acquisition module for acquiring staining pathology images for a plurality of patients of a preset cancer and clinical pathology data of the plurality of patients;
the determining module is used for determining survival time labels corresponding to the plurality of patients according to the clinical pathology data of the plurality of patients, wherein the survival time labels are used for indicating the survival states of the corresponding patients in the time periods corresponding to the preset time;
the generation module is used for generating a group of sample data according to the staining pathology image of each patient, the clinical pathology data of each patient and the survival time label corresponding to each patient;
the extraction module is used for respectively extracting characteristics of the dyeing pathological image and the clinical pathological data in each group of sample data to obtain image characteristics and pathological characteristics of each group of sample data;
the fusion module is used for fusing the image features and the pathological features of each group of sample data by adopting a preset initial feature fusion module to obtain fusion features of each group of sample data;
And the training module is used for training the initial characteristic fusion module and the preset initial perception module according to the fusion characteristics of the plurality of groups of sample data to obtain a survival rate prediction model aiming at the preset cancer.
8. A lifetime prediction device, the device comprising:
the acquisition module is used for acquiring a staining pathology image and clinical pathology data of a patient to be predicted aiming at a preset cancer;
the extraction module is used for respectively extracting characteristics of the dyeing pathological image and the clinical pathological data of the patient to be predicted to obtain image characteristics and pathological characteristics of the patient to be predicted;
the processing module is used for processing the image characteristics and the pathological characteristics of the patient to be predicted by adopting a survival rate prediction model aiming at the preset cancer to obtain the survival rate of the patient to be predicted in a time period corresponding to the preset time;
and the determining module is used for determining the survival time of the patient to be predicted from the detection time of the dyed pathological image according to the survival rate in the time period corresponding to the preset time.
9. A computer device, comprising: a processor, a storage medium, and a bus, the storage medium storing program instructions executable by the processor, the processor and the storage medium communicating over the bus when the computer device is running, the processor executing the program instructions to perform the steps of the survival prediction model training method of any one of claims 1 to 5, or to perform the steps of the survival prediction method of claim 6.
CN202310631754.6A 2023-05-31 2023-05-31 Survival rate prediction model training method, survival period prediction method, survival rate prediction device and survival rate prediction equipment Pending CN116597985A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310631754.6A CN116597985A (en) 2023-05-31 2023-05-31 Survival rate prediction model training method, survival period prediction method, survival rate prediction device and survival rate prediction equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310631754.6A CN116597985A (en) 2023-05-31 2023-05-31 Survival rate prediction model training method, survival period prediction method, survival rate prediction device and survival rate prediction equipment

Publications (1)

Publication Number Publication Date
CN116597985A true CN116597985A (en) 2023-08-15

Family

ID=87611508

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310631754.6A Pending CN116597985A (en) 2023-05-31 2023-05-31 Survival rate prediction model training method, survival period prediction method, survival rate prediction device and survival rate prediction equipment

Country Status (1)

Country Link
CN (1) CN116597985A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116994745A (en) * 2023-09-27 2023-11-03 中山大学附属第六医院 Multi-mode model-based cancer patient prognosis prediction method and device
CN117423479A (en) * 2023-12-19 2024-01-19 神州医疗科技股份有限公司 Prediction method and system based on pathological image data

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116994745A (en) * 2023-09-27 2023-11-03 中山大学附属第六医院 Multi-mode model-based cancer patient prognosis prediction method and device
CN116994745B (en) * 2023-09-27 2024-02-13 中山大学附属第六医院 Multi-mode model-based cancer patient prognosis prediction method and device
CN117423479A (en) * 2023-12-19 2024-01-19 神州医疗科技股份有限公司 Prediction method and system based on pathological image data

Similar Documents

Publication Publication Date Title
EP3916674B1 (en) Brain image segmentation method, apparatus, network device and storage medium
US11908139B1 (en) Systems and methods for training a statistical model to predict tissue characteristics for a pathology image
US20210312627A1 (en) Image analysis method, apparatus, program, and learned deep learning algorithm
CN116597985A (en) Survival rate prediction model training method, survival period prediction method, survival rate prediction device and survival rate prediction equipment
CN108629768B (en) Method for segmenting epithelial tissue in esophageal pathology image
EP2894599B1 (en) Information processing device, information processing method, and program
CN111798425B (en) Intelligent detection method for mitotic image in gastrointestinal stromal tumor based on deep learning
CN115036002B (en) Treatment effect prediction method based on multi-mode fusion model and terminal equipment
Xu et al. Using transfer learning on whole slide images to predict tumor mutational burden in bladder cancer patients
CN112215807A (en) Cell image automatic classification method and system based on deep learning
CN106780522A (en) A kind of bone marrow fluid cell segmentation method based on deep learning
Gundersen et al. End-to-end training of deep probabilistic CCA on paired biomedical observations
CN110246109A (en) Merge analysis system, method, apparatus and the medium of CT images and customized information
CN113077875B (en) CT image processing method and device
Zheng et al. Integrating semi-supervised and supervised learning methods for label fusion in multi-atlas based image segmentation
WO2016146469A1 (en) Tissue sample analysis technique
CN112614573A (en) Deep learning model training method and device based on pathological image labeling tool
CN112950583A (en) Method and device for training cell counting model in pathological image
Chidester et al. Discriminative bag-of-cells for imaging-genomics
CN117036288A (en) Tumor subtype diagnosis method for full-slice pathological image
CN117612711B (en) Multi-mode prediction model construction method and system for analyzing liver cancer recurrence data
CN114283406A (en) Cell image recognition method, device, equipment, medium and computer program product
Wang et al. Integrative Analysis for Lung Adenocarcinoma Predicts Morphological Features Associated with Genetic Variations.
Martin et al. A graph based neural network approach to immune profiling of multiplexed tissue samples
Sarnecki et al. A robust nonlinear tissue-component discrimination method for computational pathology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination