CN109642258A

CN109642258A - A kind of method and system of tumor prognosis prediction

Info

Publication number: CN109642258A
Application number: CN201880002164.4A
Authority: CN
Inventors: 张道允; 巩子英; 孙永华; 叶建伟; 王伟
Original assignee: Shanghai Yunying Medical Technology Co Ltd
Current assignee: Zhejiang Yunying Medical Technology Co ltd
Priority date: 2018-10-17
Filing date: 2018-10-17
Publication date: 2019-04-16
Anticipated expiration: 2038-10-17
Also published as: CN109642258B; WO2020077552A1

Abstract

The embodiment of the present application discloses a kind of method and system of tumor prognosis prediction.The tumor prognosis prediction technique includes: the characteristic information for obtaining tumor patient, and the characteristic information at least reflects the tumor patient in the gene mutation information of tumor locus；The prognosis prediction result of the tumor patient is determined according to tumor prognosis prediction model based on the characteristic information of the tumor patient.The application is based on tumor patient data and establishes tumor prognosis prediction model, and the accuracy rate to tumor prognosis prediction can be improved.

Description

Method and system for tumor prognosis prediction

Technical Field

The present application relates to the medical field, and in particular, to a method and system for tumor prognosis prediction.

Background

Tumors (e.g., osteosarcoma, etc.) are the second leading cause of death worldwide, and the mortality and morbidity of tumors are increasing. Despite the increasing diagnosis and treatment of tumors, the mortality rate of patients is still not controlled effectively, and recurrence and metastasis are the main causes of death of tumor patients, for example, osteosarcoma can metastasize to various tissues and organs such as lung and spinal cord, which is a serious threat to the life of patients.

At present, the tumor is clinically evaluated mainly through pathological and imaging morphological changes, and indexes such as the age of a patient, the pathological type of the tumor, the operation stage, the residual tumor and the like are determined. With the development of technologies such as molecular biology, molecular epidemic pathology and the like, the screening research of tumor-related genes and molecular markers on the molecular level is a hot spot of the current tumor research, and the method can provide reference indications for the operation of tumor patients, predict the postoperative recurrence or metastasis, radically treat the tumor objective indications, provide targets for anti-metastasis treatment and the like on the molecular level of tumor cells.

Therefore, it is important to study the expression difference of genes in tumor formation, development, drug resistance, etc. and analyze the activation and inhibition of genes in tumor, so as to more comprehensively and accurately evaluate the disease condition and prognosis of patients, and realize the individual treatment of tumor patients, and it is also the focus of attention of those skilled in the art.

Disclosure of Invention

One embodiment of the present application provides a method for predicting tumor prognosis, including: acquiring characteristic information of a tumor patient, wherein the characteristic information at least reflects gene mutation information of the tumor patient; and determining the prognosis prediction result of the tumor patient according to a tumor prognosis prediction model based on the characteristic information of the tumor patient.

In some embodiments, the gene mutation information includes genes mutated in DNA and their mutation abundances, and/or tumor prognosis on DNA predicts related genes and their mutation abundances.

In some embodiments, the obtaining characteristic information of the tumor patient further comprises: obtaining a tissue sample from the tumor patient; extracting DNA of the tissue sample; preparing a library of the DNA; performing gene sequencing according to the library to obtain a sequencing result; analyzing the sequencing result to determine the gene mutation information of the tumor patient.

In some embodiments, the characteristic information further comprises at least one of the following information of the tumor patient: age, gender, smoking history, educational age, working age, treatment regimen, and sample storage time.

In some embodiments, the tumor prognosis prediction model is a support vector machine model or a neural network model.

In some embodiments, the method for prognosis of a tumor further comprises: and training an initial model by utilizing the characteristic information and the prognosis information of a plurality of tumor patients to obtain the tumor prognosis prediction model.

In some embodiments, the training an initial model using feature information of a plurality of tumor patients and prognosis information thereof to obtain the tumor prognosis prediction model comprises: and removing mutant gene information of which the mutation abundance is less than a certain set threshold value from the gene mutation information of the plurality of tumor patients.

In some embodiments, the training an initial model using feature information of a plurality of tumor patients and prognosis information thereof to obtain the tumor prognosis prediction model comprises: removing redundant gene mutation information in the gene mutation information of the plurality of tumor patients.

In some embodiments, the tumor prognosis prediction model is a support vector machine model; the method for training an initial model to obtain the tumor prognosis prediction model by using the characteristic information and the prognosis information of a plurality of tumor patients comprises the following steps: determining at least part of genes as tumor prognosis prediction related genes according to the contribution value of each gene mutation information in the feature information of a plurality of tumor patients to the support vector machine model; and training the initial model by using the gene mutation information and the prognosis information of the genes related to the prognosis prediction of the tumors of a plurality of tumor patients to obtain the prognosis prediction model of the tumors.

In some embodiments, the tumor prognosis prediction model is a support vector machine model; the training the initial model to obtain the tumor prognosis prediction model further comprises: and optimizing the parameters of the support vector machine model by utilizing a particle swarm algorithm or a grid division method.

In some embodiments, the prognostic prediction result includes: disease progression, disease stabilization, partial remission and complete remission; alternatively, the prognostic prediction includes: good and bad curative effect.

In some embodiments, the tumor is osteosarcoma.

In some embodiments, the characteristic information reflects at least mutation information of at least one of the following genes in osteosarcoma patients: KMT2C, SOX9, LRP1B, NF-1, PRKDC, FAT1, STAG2, SLIT2, NOTCH1, EPHA7, ATRX, KDM6A, APC, RANBP2, RARA. AS1, C11orf30, ROS1, ARID2, TAF1, DICER1, MSH2, MSH6, TP53, KDM5A, JAK2, ALK, RB1, NOTCH2, and RICTOR.

In some embodiments, the tumor patient gene mutation information is gene mutation information of an osteosarcoma lesion site.

One embodiment of the present application provides a tumor prognosis prediction system, including an obtaining module and a prediction module, where the obtaining module is configured to obtain feature information of a tumor patient, and the feature information at least reflects gene mutation information of the tumor patient; the prediction module is used for determining the prognosis prediction result of the tumor patient according to the tumor prognosis prediction model based on the characteristic information of the tumor patient.

In some embodiments, the system further comprises a training module for training an initial model to obtain the prognosis prediction model by using the feature information of a plurality of tumor patients and the prognosis information thereof.

In some embodiments, the training module is further configured to remove mutant gene information in which the abundance of mutation is less than a set threshold from the gene mutation information of the plurality of tumor patients.

In some embodiments, the training module is further configured to remove redundant gene mutation information from the gene mutation information of the plurality of tumor patients.

In some embodiments, the tumor prognosis prediction model is a support vector machine model; the training module is further configured to: determining at least part of genes as tumor prognosis prediction related genes according to the contribution value of each gene mutation information in the feature information of a plurality of tumor patients to the support vector machine model; and training the initial model by using the gene mutation information and the prognosis information of the genes related to the prognosis prediction of the tumors of a plurality of tumor patients to obtain the prognosis prediction model of the tumors.

In some embodiments, the tumor prognosis prediction model is a support vector machine model; the training module is further used for optimizing parameters of the support vector machine model by utilizing a particle swarm algorithm or a grid division method.

In some embodiments, the tumor is osteosarcoma.

One embodiment of the present application provides a prognosis prediction apparatus for a tumor, the apparatus including at least one processor and at least one memory; the at least one memory is for storing computer instructions; the at least one processor is configured to execute at least a portion of the computer instructions to implement the method for prognosis of a tumor.

One of the embodiments of the present application provides a computer-readable storage medium storing computer instructions that, when executed by a processor, implement the method for tumor prognosis prediction.

One embodiment of the present application provides a tumor prognosis prediction system, including: at least one computer-readable storage medium comprising a set of instructions for prognosis prediction of a tumor; and at least one processor in communication with the at least one storage medium, the at least one processor, when executing the set of instructions, configured to: acquiring characteristic information of a tumor patient, wherein the characteristic information at least reflects gene mutation information of the tumor patient; and determining the prognosis prediction result of the tumor patient according to the tumor prognosis prediction model based on the characteristic information of the tumor patient.

Drawings

The present application will be further explained by way of exemplary embodiments, which will be described in detail by way of the accompanying drawings. These embodiments are not intended to be limiting, and in these embodiments like numerals are used to indicate like structures, wherein:

FIG. 1 is a schematic diagram of an application scenario of a tumor prognosis prediction system according to some embodiments of the present application;

FIG. 2 is an architectural diagram of a computing device shown in accordance with some embodiments of the present application;

FIG. 3 is a block diagram of a prognostic tumor prediction system according to some embodiments of the present application;

FIG. 4 is an exemplary flow chart of a method of prognosis of a tumor according to some embodiments of the present application;

FIG. 5 is an exemplary flow chart for determining gene mutation information for a tumor patient according to some embodiments of the present application;

FIG. 6 is an exemplary flow chart for training an obtained prognosis prediction model for a tumor according to some embodiments of the present application;

FIG. 7 is a gene mutation heatmap of osteosarcoma patients according to exemplary embodiments of the present application;

FIG. 8 is a heat map of gene mutations in osteosarcoma patients with good therapeutic effect according to an exemplary embodiment of the present application;

FIG. 9 is a heat map of gene mutations in osteosarcoma patients with poor therapeutic effect according to exemplary embodiments of the present application; and

fig. 10 is a schematic diagram illustrating verification of a prediction result of a tumor prognosis prediction model according to an exemplary embodiment of the present application.

Detailed Description

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings used in the description of the embodiments will be briefly introduced below. It is obvious that the drawings in the following description are only examples or embodiments of the application, from which the application can also be applied to other similar scenarios without inventive effort for a person skilled in the art. Unless otherwise apparent from the context, or otherwise indicated, like reference numbers in the figures refer to the same structure or operation.

It should be understood that "system", "device", "unit" and/or "module" as used herein is a method for distinguishing different components, elements, parts, portions or assemblies at different levels. However, other words may be substituted by other expressions if they accomplish the same purpose.

As used in this application and the appended claims, the terms "a," "an," "the," and/or "the" are not intended to be inclusive in the singular, but rather are intended to be inclusive in the plural unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that steps and elements are included which are explicitly identified, that the steps and elements do not form an exclusive list, and that a method or apparatus may include other steps or elements.

Flow charts are used herein to illustrate operations performed by systems according to embodiments of the present application. It should be understood that the preceding or following operations are not necessarily performed in the exact order in which they are performed. Rather, the various steps may be processed in reverse order or simultaneously. Meanwhile, other operations may be added to the processes, or a certain step or several steps of operations may be removed from the processes.

Fig. 1 is a schematic diagram illustrating an application scenario of a tumor prognosis prediction system 100 according to some embodiments of the present application. As shown in fig. 1, the prognosis prediction system 100 can include a server 110, a network 120, and a database 130. In some embodiments, the database 130 can store patient basic information, disease history, treatment plan data, and can also store patient genetic information, such as genetic mutation information of the tumor patient 140 at the tumor site, genetic information of normal tissue of the tumor patient, reference genetic information, and the like. A biological tissue sample or fluid sample from a patient, such as tissue sample 145 from tumor patient 140, may be stored in a dedicated storage facility for further processing, such as a genetic sequencing process. In particular, tissue sample 145 may comprise a tumor tissue sample from a patient or a tissue sample from another part of the patient's body. The server 110 may be used to process and analyze the relevant information to generate a prognostic prediction. In some embodiments, the server 110 may obtain relevant information and/or data from the database 130 (e.g., genetic mutation information of the tumor patient at the tumor site, basic information of the tumor patient, reference genetic data, etc.), or directly obtain relevant information and/or data obtained by a worker or other equipment processing the tissue sample 145 of the tumor patient 140.

The server 110 may be a server or a server group. The server farm may be centralized, such as a data center. The server farm may also be distributed, such as a distributed system. The server 110 may be local or remote. In some embodiments, the server 110 may be implemented on a cloud platform. By way of example only, the cloud platform may include a private cloud, a public cloud, a hybrid cloud, a community cloud, a distributed cloud, an intermediate cloud, a multi-cloud, and the like, or any combination thereof. In some embodiments, server 110 may be implemented on a computing device 200 having at least one of the components shown in FIG. 2.

In some embodiments, the server 110 may include a processing engine 112. The processing engine 112 may be used to execute instructions (program code) of the server 110. For example, the processing engine 112 can execute instructions for analyzing the characteristic information of the tumor patient 140 to obtain a prognosis prediction of the tumor. The instructions for analyzing the characteristic information of the tumor patient 140 may be stored in the form of computer instructions in a computer-readable storage medium (not shown). In some embodiments, the processing engine 112 may include one or more sub-processing devices (e.g., a single core processing device or a multi-core processing device). By way of example only, the processing engine 112 may include a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), an Application Specific Instruction Processor (ASIP), a Graphics Processing Unit (GPU), a Physical Processing Unit (PPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), a programmable logic circuit (PLD), a controller, a micro-controller unit, a Reduced Instruction Set Computer (RISC), a microprocessor, or the like, or any combination thereof.

The network 120 may provide a conduit for the exchange of information. In some embodiments, information may be exchanged between server 110 and database 130 via network 120. For example, server 110 may receive reference gene data in database 130 via network 120. In some embodiments, information related to tumor patient 140 and/or tissue sample 145 may be transmitted to server 110 and/or database 130 via network 120. For example, characteristic information (e.g., genetic mutation information, basic information, etc.) of the tumor patient 140 may be transmitted to the server 110 via the network 120. In some embodiments, the network 120 may be any type of wired or wireless network. For example, network 120 may include a cable network, a wired network, a fiber optic network, a telecommunications network, an intranet, the Internet, a Local Area Network (LAN), a Wide Area Network (WAN), a Wireless Local Area Network (WLAN), a Metropolitan Area Network (MAN), a Public Switched Telephone Network (PSTN), a Bluetooth network, a ZigBee network, a Near Field Communication (NFC) network, the like, or any combination thereof.

The database 130 may be used to store data and/or sets of instructions. In some embodiments, database 130 may store data obtained from server 110. In some embodiments, database 130 may store information and/or instructions for server 110 to perform or use to perform the example methods described herein. In some embodiments, the database 130 may store reference gene data. Specifically, the database 130 may store gene data in various types of genomic databases and/or gene data having an influence (or significant influence) on tumorigenesis reported in the existing literature, and the like. The genomic database may include, but is not limited to, a COSMIC database, ClinVar database, HGMD database, OMIM database, TCGA database, GeneCards database, and the like, among others. In some embodiments, database 130 may include mass storage, removable storage, volatile read-write memory, read-only memory (ROM), the like, or any combination thereof. In some embodiments, database 130 may be implemented on a cloud platform. By way of example only, the cloud platform may include a private cloud, a public cloud, a hybrid cloud, a community cloud, a distributed cloud, an intermediate cloud, a multi-cloud, and the like, or any combination thereof. In some embodiments, database 130 may be part of server 110.

In some embodiments, the oncology patient 140 may be a patient having one or more oncology diseases. Wherein the neoplastic disease may comprise a carcinoma, sarcoma, benign tumor, or the like, or any combination thereof. Specifically, the cancer may include squamous carcinoma, adenocarcinoma, undifferentiated carcinoma, and the like. For example, squamous cell carcinoma may include those occurring in the skin, esophagus, lung, cervix, vagina, vulva, penis, and the like. Adenocarcinoma may include cancer occurring in the digestive tract, lung, uterine body, breast, ovary, prostate, thyroid, liver, kidney, pancreas, gall bladder, and the like. Sarcomas may include, but are not limited to: soft tissue sarcoma, osteosarcoma, malignant fibrous histiocytoma, bilateral sarcoma, rhabdomyosarcoma, lymphosarcoma, synovial sarcoma, leiomyoma, etc. Benign tumors may include, but are not limited to, hamartoma, benign tumors of the pancreas, thyroid adenoma, mammary gland fibroma, uterine tumor, gastrointestinal flat bone myoma, soft tissue fibroma, synovioma, ligament fibroma, and the like. In one embodiment of the present application, the tumor patient 140 can be an osteosarcoma patient. In some embodiments, the tumor patient 140 can be a patient with a tumor at various stages (e.g., early, mid, late, etc.). The tumor patient 140 can also be a patient at various stages of treatment (e.g., pre-treatment, under-treatment, post-treatment, etc.).

In some embodiments, the tissue sample 145 may be used to reflect relevant information about the tumor of the tumor patient 140. In particular, the tissue sample 145 may be a biological tissue or fluid sample taken from a tumor site (e.g., a target lesion) and/or a non-tumor site (e.g., a site other than a lesion) of the tumor patient 140. For example, tissue samples may include, but are not limited to: sputum, blood samples, fresh tissue (e.g., surgical tissue, punctured tissue, etc.), paraffin-embedded tissue, urine, serosal cavity effusion (e.g., ascites, pleural effusion, pericardial effusion, etc.), or tissue, cells, etc. extracted from a tumor site, or any combination thereof. In some embodiments, the tissue sample 145 may include tissue, cells of the tumor patient 140 at the tumor site as well as sites other than the tumor. In some embodiments, the tissue sample 145 may include only tissue, cells of the tumor patient 140 at the tumor site.

In some embodiments, information related to the tumor patient 140 and/or the tissue sample 145 may be transmitted to one or more components of the tumor prognosis prediction system 100 (e.g., server 110, database 130) manually (e.g., personnel) or by machine (e.g., a robotic device, etc.).

FIG. 2 is a schematic diagram of an architecture of a computing device 200 shown in accordance with some embodiments of the present application. As shown in fig. 2, computing device 200 may include a processor 210, a memory 220, input/output interfaces 230, and communication ports 240. Server 110 and/or database 130 may be implemented on the computing device 200. For example, the processing engine 112 may be implemented on the computing device 200 and configured to perform the functions of the processing engine 112 in the present application.

The processor 210 may execute the computing instructions (program code) and perform the functions of the server 110 described herein. Computing instructions may include programs, objects, components, data structures, procedures, modules, and functions (a function refers to a specific function described in this application). For example, processor 210 may process instructions for predicting the effect of a prognosis of a tumor in prognosis prediction system 100. In some embodiments, processor 210 may include microcontrollers, microprocessors, Reduced Instruction Set Computers (RISC), Application Specific Integrated Circuits (ASIC), application specific instruction set processors (ASIP), Central Processing Units (CPU), Graphics Processing Units (GPU), Physical Processing Units (PPU), microcontroller units, Digital Signal Processors (DSP), Field Programmable Gate Array (FPGA), Advanced RISC Machines (ARM), programmable logic devices, any circuit or processor capable of executing one or more functions, or the like, or any combination thereof. For illustration only, only one processor 210 is depicted in FIG. 2, but it should be noted that the present application may include multiple processors.

Memory 220 may store data/information obtained from any component in the prognosis of tumor prediction system 100. In some embodiments, memory 220 may include mass storage, removable storage, volatile read and write memory, Read Only Memory (ROM), and the like, or any combination thereof. Exemplary mass storage devices may include magnetic disks, optical disks, solid state drives, and the like. The removable memory may include flash drives, floppy disks, optical disks, memory cards, U-disks, compact disks, removable hard disks, and the like. Volatile read and write memory can include Random Access Memory (RAM). RAM may include Dynamic RAM (DRAM), double-data-rate synchronous dynamic RAM (DDRSDRAM), Static RAM (SRAM), thyristor RAM (T-RAM), zero-capacitance (Z-RAM), and the like. ROM may include Masked ROM (MROM), Programmable ROM (PROM), erasable programmable ROM (PEROM), Electrically Erasable Programmable ROM (EEPROM), compact disk ROM (CD-ROM), digital versatile disk ROM, and the like.

The input/output interface 230 may be used to input or output signals, data, or information. In some embodiments, the input/output interface 230 may be used for user (e.g., the tumor patient 140, a user of the tumor prognosis prediction system 100, etc.) contact with the server 110. In some embodiments, the user may enter characteristic information of the oncology patient via the input/output interface 230. In some embodiments, input/output interface 230 may include an input device and an output device. Exemplary input devices may include a keyboard, mouse, touch screen, microphone, and the like, or any combination thereof. Exemplary output devices may include a display device, speakers, printer, projector, etc., or any combination thereof. Exemplary display devices may include Liquid Crystal Displays (LCDs), Light Emitting Diode (LED) based displays, flat panel displays, curved displays, television equipment, Cathode Ray Tubes (CRTs), and the like, or any combination thereof.

The communication port 240 may be connected to the network 120 for data communication. The connection may be a wired connection, a wireless connection, or a combination of both. The wired connection may include an electrical cable, an optical cable, or a telephone line, etc., or any combination thereof. The wireless connection may include bluetooth, WiFi, WiMax, WLAN, ZigBee, mobile networks (e.g., 3G, 4G, or 5G, etc.), etc., or any combination thereof. In some embodiments, the communication port 240 may be a standardized port, such as RS232, RS485, and the like. In some embodiments, the communication port 240 may be a specially designed port.

FIG. 3 is a block diagram of a prognostic tumor prediction system according to some embodiments of the present application. As shown in fig. 3, the prognosis of tumor prediction system may include an acquisition module 310, a prediction module 320, and a training module 330.

The acquisition module 310 may be used to acquire characteristic information of the tumor patient 140. In some embodiments, the characteristic information may reflect at least gene mutation information of the tumor patient. In some embodiments, the characteristic information of the tumor patient 140 may include: gene mutation information of tumor patients, basic information of tumor patients and the like.

The prediction module 320 may be used to predict a prognostic prediction for a tumor patient. For example, the prediction module 320 may determine a prognosis prediction result of the tumor patient according to a tumor prognosis prediction model based on the characteristic information of the tumor patient.

The training module 330 may be used to train to obtain a prognosis prediction model for the tumor. Specifically, the training module 330 can obtain the characteristic information of a plurality of tumor patients and the prognosis information thereof. The training module 330 may train the initial model to obtain a tumor prognosis prediction model by using the feature information of a plurality of tumor patients and their prognosis information. In some embodiments, the training module 330 may remove mutant gene information in which the abundance of mutations is less than a certain set threshold. In some embodiments, the training module 330 may remove redundant gene mutation information from the gene mutation information. In some embodiments, the training module 330 may determine that at least part of the genes are tumor prognosis prediction related genes according to the contribution value of each gene mutation information in the feature information of a plurality of tumor patients to the support vector machine model. In some embodiments, the training module 330 may train the initial model to obtain the prognosis prediction model by using the gene mutation information of the prognosis prediction related genes of a plurality of tumor patients and the prognosis information thereof. In some embodiments, the training module 330 may also optimize the parameters of the support vector machine model using a particle swarm algorithm or a grid partitioning method.

It should be understood that the system and its modules shown in FIG. 3 may be implemented in a variety of ways. For example, in some embodiments, the system and its modules may be implemented in hardware, software, or a combination of software and hardware. Wherein the hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory for execution by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the methods and systems described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided, for example, on a carrier medium such as a diskette, CD-or DVD-ROM, a programmable memory such as read-only memory (firmware), or a data carrier such as an optical or electronic signal carrier. The system and its modules of the present application may be implemented not only by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., but also by software executed by various types of processors, for example, or by a combination of the above hardware circuits and software (e.g., firmware).

It should be noted that the above descriptions of the candidate item display and determination system and the modules thereof are only for convenience of description, and are not intended to limit the present application within the scope of the illustrated embodiments. It will be appreciated by those skilled in the art that, given the teachings of the present system, any combination of modules or sub-system configurations may be used to connect to other modules without departing from such teachings. For example, in some embodiments, the obtaining module 310, the predicting module 320, and the training module 330 may be different modules in a system, or may be a module that implements the functions of two or more modules described above. For example, the obtaining module 310 and the predicting module 320 may be a single module having both obtaining and predicting functions. For example, each module may share one memory module, and each module may have its own memory module. Such variations are within the scope of the present application.

Fig. 4 is an exemplary flow chart of a method of prognosis of a tumor according to some embodiments of the present application. As shown in fig. 4, the method for predicting tumor prognosis may include:

at step 410, characteristic information of the tumor patient is obtained, and the characteristic information at least reflects gene mutation information of the tumor patient. In particular, step 410 may be performed by the obtaining module 310.

In some embodiments, the characteristic information of the tumor patient 140 may include: gene mutation information of tumor patients, basic information of tumor patients and the like. In some embodiments, the characteristic information of the tumor patient may include only gene mutation information of the tumor patient. Specifically, the gene mutation information of the tumor patient may include a gene mutated on DNA and its mutation abundance, and/or a tumor prognosis prediction related gene on DNA and its mutation abundance. The basic information of the tumor patient may reflect other information related to the tumor patient than the gene mutation information. For example, the basic information of the oncology patient may include the oncology patient's age, sex, smoking history, educational age, working age, sample storage time (e.g., blood storage time, oncology tissue storage time, other normal tissue storage time of the patient), treatment protocol, etc., or any combination thereof. In some embodiments, the treatment plan may include the type of treatment plan (e.g., radiation therapy, chemotherapy, immunotherapy, etc.), the duration of treatment, the dose of radiation used, the dose of a drug, the name or type of drug, and the like. In some embodiments, the genetic mutation information of the tumor patient may be genetic mutation information of the tumor patient at a tumor site (e.g., a target lesion). For example, the genetic mutation information of osteosarcoma patients can be genetic mutation information of osteosarcoma lesion sites. In some embodiments, the tumor patient 140 can be a patient at various stages of the tumor (e.g., early, intermediate, late, etc.), and/or at various stages of treatment (e.g., pre-treatment, under-treatment, post-treatment, etc.). For example, characteristic information of an osteosarcoma patient before treatment (e.g., chemotherapy) can be obtained for predicting the prognosis effect of the treatment, and thus reference can be provided for formulation, selection and the like of a treatment scheme.

In some embodiments, obtaining/determining gene mutation information for the tumor patient 140 may include: obtaining a tissue sample 145 of the tumor patient 140, extracting DNA of the tissue sample, preparing a library of the DNA, performing gene sequencing according to the library to obtain a sequencing result, analyzing the sequencing result to determine gene mutation information of the tumor patient, and the like. For more details on determining the gene mutation information of the tumor patient 140, see FIG. 5 and its related description.

And step 420, determining the prognosis prediction result of the tumor patient according to the tumor prognosis prediction model based on the characteristic information of the tumor patient. In particular, this step 420 may be performed by the prediction module 320.

In some embodiments, the characteristic information of the tumor patient can be input into a trained tumor prognosis prediction model to obtain a prognosis prediction result of the tumor patient. In some embodiments, the tumor prognosis prediction model can be a supervised learning model. Specifically, the supervised learning model may include: one or more of a support vector machine model, a decision tree model, a neural network model, a nearest neighbor classifier and the like. The training procedure for the tumor prognosis prediction model can be seen in fig. 6 and its related description.

In some embodiments, the prognostic prediction outcome may be a prognostic status for a period of time (e.g., 5 years) after treatment. For example, prognostic prediction results can be classified into four categories, i.e., disease Progression (PD), Stable Disease (SD), Partial Remission (PR), and Complete Remission (CR), according to changes in the target lesion. In particular, PD may refer to an increase in the sum of the maximum diameters of the target lesions of 20% and above, or the appearance of new lesions (e.g., new lesions that appear due to tumor metastasis); SD may refer to the reduction in the sum of the maximum diameters of the target lesions to miss PR, or the increase in miss PD; PR may mean that the sum of the maximum diameters of the target lesions is reduced by 30% or more for at least 4 weeks; CR may mean that all target lesions disappear, no new lesions appear, and tumor markers are normal for at least 4 weeks. In some embodiments, the prognostic prediction results may include: good treatment effect and poor treatment effect. In particular, good or bad therapeutic effects can be determined according to clinical criteria. For example, a tumor patient shows poor therapeutic effect if the disease has recurred within 5 years after treatment, and shows good therapeutic effect if the disease has not recurred within 5 years after treatment. For another example, PD and SD may be classified as having poor therapeutic efficacy, and PR and CR may be classified as having good therapeutic efficacy. For another example, if the survival time of the patient exceeds 5 years after the first treatment, the treatment effect is good; if the survival time of the patient after the first treatment is less than 5 years, the treatment effect is poor.

In alternative embodiments, the prognostic prediction results can be classified into other categories, which are not limited by the embodiments of the present application. For example, prognostic prediction results can be classified into three categories, i.e., good therapeutic effect, general therapeutic effect, and poor therapeutic effect. In some embodiments, the prognostic prediction result may also be a prediction value of a specific certain index. For example, prognostic predictors can include, but are not limited to, disease remission rate, disease relapse rate, recurrence of disease within a few years, disease survival rate, time to live, near term mortality, far term mortality, in-patient mortality, out-of-hospital mortality, surgical mortality, and the like.

It should be noted that the above description related to the flow 400 is only for illustration and explanation, and does not limit the applicable scope of the present application. Various modifications and changes to flow 400 may occur to those skilled in the art in light of the teachings herein. However, such modifications and variations are intended to be within the scope of the present application.

FIG. 5 is an exemplary flow chart for determining gene mutation information for a tumor patient according to some embodiments of the present application. Specifically, the steps shown in fig. 5 may be performed by a worker (e.g., a doctor, a laboratory technician, an operator, etc.) and/or an instrument (e.g., a detector, an analyzer, etc.), etc. As shown in fig. 5, the process of determining the gene mutation information of the tumor patient may include:

at step 510, a tissue sample is obtained from a tumor patient.

In some embodiments, the tissue sample 145 may be used to reflect relevant information of the tumor. In particular, the tissue sample 145 may be a biological tissue or fluid sample taken from a tumor site (e.g., a target lesion) and/or a non-tumor site (e.g., a site other than a lesion) of the tumor patient 140. For example, tissue samples may include, but are not limited to: sputum, blood samples, fresh tissue (e.g., surgical tissue, punctured tissue, etc.), paraffin-embedded tissue, urine, serosal cavity effusion (e.g., ascites, pleural effusion, pericardial effusion, etc.), or tissue, cells, etc. extracted from a tumor site, or any combination thereof. In some embodiments, the tissue sample 145 may include tissue, cells of the tumor patient 140 at a tumor site or a site other than a tumor. In some embodiments, the tissue sample 145 may include only tissue, cells of the tumor patient 140 at the tumor site. In some embodiments, inclusion criteria may be formulated for the tissue sample 145. For example, the requirement to collect a tissue sample may be made as surgical tissue, fresh tissue, punctured tissue, 10% neutraline, paraffin embedded tissue, and the like. As another example, the paraffin white slice may be 10 (5 microns) or 5 (10 microns) white slices, and is provided to ensure that the sliced tissue contains a sufficient proportion of tumor cells (e.g., tumor cells)>70%) can be added with the same HE staining piece (or mail can inform the examined specimen of the amount of tumor cells after HE staining). And alsoFor example, for surgical or penetrating tissue, the sample size collected may be required>0.3cm³And quickly placed into the EP tube. As another example, sample shipping criteria may be established: the white paraffin slice can be sent for examination at normal temperature within 2 weeks after being cut, such as by using an EP tube, the opening of the tube is sealed by a sealing film to prevent leakage in the transportation process, and the pathological number of a sample to be examined is written on an application form. As another example, criteria for screening tissue samples may be established, such as sample rejection criteria: non-10% neutral formalin fixed liquid tissue, non-conformity of inspection sample information and application form, tissue autolysis or degeneration and the like.

Step 520, extracting DNA from the tissue sample.

In some embodiments, the method of extracting DNA of a tissue sample may include a cetyltrimethylammonium bromide method (CTAB method), a glass bead method, an ultrasonic method, a milling method, a freeze-thaw method, a guanidinium isothiocyanate method, an alkaline lysis method, an enzymatic method, and the like, or any combination thereof. In some embodiments, any known method can be used to extract DNA from a tissue sample, which is not limited by the embodiments of the present application.

Step 530, a library of the DNA is prepared.

In some embodiments, the library preparation process may include some or all of the steps of DNA fragmentation, end repair, bead fragment screening, end tailing, linker ligation, PCR enrichment, sequencing by hybridization, etc. In addition, any known method can be used to prepare a library of DNA from a tissue sample, which is not limited in the examples of the present application.

And 540, performing gene sequencing according to the library to obtain a sequencing result.

In some embodiments, the prepared library may be subjected to gene sequencing to obtain sequencing data. Among them, the gene sequencing technology can be a high-throughput sequencing technology. High-throughput sequencing technology ("NGS") may include: one or more arbitrary combinations of single-molecule real-time sequencing (Pacific Bio), Ion semiconductor (Ion torrent sequencing), pyrosequencing (454), sequencing by synthesis (Illumina), sequencing by ligation (SOLIDsequencing), and chain termination (Sanger sequencing). In addition, any known method can be used for gene sequencing, which is not limited in the examples of the present application.

Step 550, analyzing the sequencing result to determine the gene mutation information of the tumor patient.

In some embodiments, data analysis can be performed on the obtained sequencing data to obtain gene mutation information (including the gene mutated on the DNA and its mutation abundance, and/or the prediction of the relevant gene by the tumor prognosis on the DNA, mutation site mutation abundance, gene mutation abundance, etc.) of the tumor patient. In some embodiments, the gene mutation abundance can be the cumulative sum of mutation abundances for sites in the statistical sequencing result where a Single Nucleotide Variation (SNV) is greater than a certain set value. The set value may be 0.05%, 0.1%, 0.2%, 1%, 2%, 3%, or 10%, and so forth. The mutation abundance of the mutation site can refer to the proportion of one base mutation. Specifically, the mutation abundance of the mutation site is the number of mutant reads/(the number of mutant reads + the number of wild-type reads), wherein reads represents a short sequencing fragment. For example, the mutant gene KMT2C of a certain patient is obtained by sequencing, and the mutation abundances of 5 mutant sites are respectively as follows: 1%, 3%, 4%, 6%, 8%, and the threshold is set to 2%. The mutation abundance of the mutant gene KMT2C is the cumulative sum of the mutation abundances of 4 mutation sites of more than 2%. In some embodiments, data analysis may include (1) removing linker sequences in sequencing data; (2) performing quality control and removing low quality sequencing data (e.g., low quality bases, excessively short sequencing data, etc.); (3) comparing the processed sequencing data with reference gene data to identify mutant genes; (4) eliminating normal variation (such as polymorphism variation, synonymous variation, etc.) of gene; (5) obtaining gene mutation information of tumor patients and the like. In some embodiments, the reference genetic data can be normal genetic data (e.g., genetic data in normal cells of a non-tumor site of a tumor patient, genetic data of a non-tumor patient, etc.), genetic data of a corresponding tumor disease (e.g., a prognostic prediction-associated gene for each tumor), or the like. 93 patients are sequenced by the sequencing method, and the target region coverage is calculated to be 98.2-99.6%, and the mean value is 99.41%; the average sequencing depth of the target area is 462.7-1252.89, and the average value is 705.51; the target area capture efficiency was 75.6% to 84.6% with an average value of 80.01%. In some embodiments, the reference gene data may be stored in the database 130, and may be retrieved from the database 130 at the time of use. In some embodiments, the abundance of mutations in a gene can also be determined using any known method. For example, second generation sequencing, BEAMING, PARE, etc.

Through sequencing, different mutant genes are found to be distributed differently in different patient samples. FIGS. 7-9 are gene mutation heatmaps of osteosarcoma patients according to some embodiments of the present application; wherein, FIG. 7 is a gene mutation heatmap of a total osteosarcoma patient according to the exemplary embodiment of the present application; FIG. 8 is a heat map of gene mutations in osteosarcoma patients with good therapeutic effect according to an exemplary embodiment of the present application; FIG. 9 is a heat map of gene mutations in osteosarcoma patients with poor therapeutic effect according to the exemplary embodiment of the present application.

In this example, the corresponding tissue and cell can be extracted from the target lesion (osteosarcoma lesion site) of osteosarcoma patients (93 samples of osteosarcoma patients as shown in FIG. 7), and the gene mutation information of osteosarcoma patients can be determined from the tissue and cell. Specifically, the genetic mutation information of osteosarcoma patients can be determined by the above procedure for determining genetic mutation information of tumor patients.

In this example, the 315 genes (genes having a more significant effect on cancer according to the prior literature report) of the sample were mainly tested for mutation (e.g., gene mutation abundance). In some alternative embodiments, the number of genes detected may be increased or decreased as appropriate. The first 29 gene mutation heat maps of all osteosarcoma patients, patients with good prognosis and patients with poor prognosis are shown in FIGS. 7-9, wherein the left ordinate of FIGS. 7-9 represents the ratio of the mutation of a certain mutant gene in 93 samples, the right ordinate represents the mutant gene, and the abscissa represents the sample. Specifically, in this example, the mutant gene information (the partial mutant gene information shown in fig. 7 to 9) with a high ratio of gene mutation in the sample includes: lysine N-methyl transferase 2C (KMT2C), SRY-box 9(SOX9), LDLreceptor related protein 1B (LRP1B), Neofornia type I (NF-1), Protease (PRKDC), FATTypical cadherin 1(FAT1), SLIT ligand 2(SLIT2), Notch1, EPH receptor A8 (EPHA7), ATRX, Lysine dehydrogenase 6A (KDM6A), APC, binding protein 2(RANBP2), auto-oncogene 1(ROS1), EMSY (C11orf30), AT-marginal gene-binding protein 2 (ARR 42), ROS-promoter A1 (ROS 6355), RNA (ARS 59465), RNA (ARS 465), RNA of ATRX probe 5, TARG 5-gene 465), TARG 5-gene 465 (TARG 595), TARG 5-gene 465, TARG 5-gene 4619, TARG-gene 465, TARG-gene A, TARG-5, TARG-gene 465, TARG-DNA 465, TARG 3, TAI 3, TAI, structural antigen 2(STAG2), polybranched 1(PBRM1), mesoporous associated transformation factor (MITF), cytochromic P450family2subfamily C member 8(CYP2C8), phosphorescent 3-kinase 4, 5-biphospheric 3-phosphorescent subunit alpha (PIK3CA), phosphorescent 4, 5-biphospheric 3-cytotoxic subunit beta (PIK3CB), B-Raf promoter (BRAF), MET promoter, exonuclease (MET) ase, hexokinase 90A 90, isobornic kinase (HSP 2 XL), mesoporous associated transformation factor alpha (HSP 3/Asp 5), platelet alpha (HSP 5/Asp 5), platelet alpha (HSP 3/Asp 5), platelet alpha (AA) 3-promoter (HSP 5), platelet alpha (AA) 5, platelet alpha (5, three, four, BRCA2DNA repair associated (BRCA2), cell division cycle 73(CDC73), cycle dependent kinase 12(CDK12), CREB binding protein (CREBBP), catenin alpha 1(CTNNA1), CYLD dependent 63 dependent kinase (CYLD), EPH receiver A3(EPHA3), EPH receiver B1(EPHB1), erb-B2 receiver dependent kinase 3(ERBB3), erb-B2 dependent kinase 4(ERBB4), ERBB receiver inhibitor 1(ERRFI1), FA comparative set A (FAA), FA D2(FANCD2), FANCfeedback 1 (GAMBET 101845), GAMBE dependent kinase A927, GAMBE 2 dependent kinase 1 (GAMBE 592 binding kinase), GAMBE dependent kinase 7 (GAMBE 592), GAMBE dependent kinase A985 (GAMBE dependent kinase) and GAMBE 2 linkage kinase 7 (GAMBE.A.7), GAMBE 2 dependent kinase A.7, GAMBE 2 dependent kinase 1 (GAMBE.A.A.7), GAMBE.A.A.A.598, GAMBE.A.A.A.A.A.A.A.A., mutL homolog 1(MLH1), MYC proto-oncogene (MYC), MYCNproto-oncogene (MYCN), NFKB inhibitor alpha (NFKBIA), PARK2, phosphatilinosol-4, 5-bisphosphate 3-kinase catalytic repair gamma (PIK3CG), phosphatoninosite-3-kinase regulatory repair 2(PIK3R2), protein kinase C iota (PRKCI), patched 1(PTCH1), ret pro-oncogene (RET), SET domain linking 2(SETD2), SMAD 12 (SMAD4), SMARCA4, platelet polypeptide promoter (specific receptor), SPC receptor (SPIRE) and SPIROMETA 3625, SPIROMETA 3 and SPIROMETL 2 (TSCgene 2), SPIROMETL 3-kinase coding 1, SPIROMETA 3-kinase C6323), protein C865 3-kinase C865I, PATC 865 1 (PRKCI), and SPIRE 3-promoter 3-gene (SETD2), SMAD 6335, SMAD4, SPIRE 3-gene (TSC 11, SPIRE 3-gene, SPIRE 3-promoter, SPIRE 3.

In addition, after 315 genes of each patient were sequenced, it was also found that the mutation abundances of different mutant genes in the patient samples were different, as shown in table 1.

Table 1 list of information on abundant mutant genes corresponding to each patient (only 10 patients with good prognosis and 10 patients with poor prognosis are shown as examples).

It should be noted that the above description of the process for determining gene mutation information of tumor patients is only for illustration and description, and does not limit the application scope of the present application. For those skilled in the art, any information on gene mutation in tumor patients obtained by other technical means can be used under the guidance of the present application for the technical purpose of prognosis prediction of patients.

Fig. 6 is an exemplary flow chart for training an obtained tumor prognosis prediction model according to some embodiments of the present application. In particular, the process shown in fig. 6 (e.g., step 610, step 620, etc.) may be performed by training module 330. As shown in fig. 6, an exemplary procedure for training to obtain a tumor prognosis prediction model may include:

and step 610, acquiring characteristic information and prognosis information of a plurality of tumor patients.

In some embodiments, the characteristic information of the plurality of tumor patients may include: gene mutation information of tumor patients, basic information of tumor patients and the like. Specifically, the gene mutation information of a plurality of tumor patients may include the gene mutated in the DNA of each tumor patient and the abundance of the mutation thereof. In some embodiments, the genetic mutation information of the plurality of tumor patients may be genetic mutation information of the tumor patients at a tumor site (e.g., a target lesion). For a specific method for determining gene mutation information of the plurality of tumor patients, the flow of determining gene mutation information of tumor patients described in FIG. 5 can be referred. The basic information of the tumor patient may reflect other information related to the tumor patient than the gene mutation information. For example, the basic information of the cancer patient may include the age, sex, smoking history, education age, working age, treatment plan, sample preservation time, kind of medication, etc., or any combination thereof of the cancer patient.

In some embodiments, the prognosis information of multiple tumor patients can be classified into four categories, disease Progression (PD), disease Stabilization (SD), Partial Remission (PR), and Complete Remission (CR), according to the change of target lesion. As another example, the prognostic condition can include: good treatment effect and poor treatment effect. In some embodiments, the prognosis may also be a numerical value for a particular indicator. For example, prognosis can include, but is not limited to, disease remission rate, disease relapse rate, recurrence of disease within a few years, disease survival rate, time to live, recent mortality, distant mortality, hospitalized mortality, out-of-hospital mortality, surgical mortality, and the like. In some embodiments, the prognostic scenarios described herein may correspond to the prognostic prediction determined in step 420.

And step 620, training the initial model to obtain a tumor prognosis prediction model by using the characteristic information and prognosis information of a plurality of tumor patients. In some embodiments, the tumor prognosis prediction model can be a supervised learning model. Specifically, the supervised learning model may include: one or more of a support vector machine model, a decision tree model, a neural network model, a nearest neighbor classifier and the like. In this embodiment, a support vector machine model is taken as an example to describe the training process of the tumor prognosis prediction model.

In some embodiments, initial model parameters (e.g., parameters c (cost), g (gamma)), etc. may be set to establish an initial support vector machine model. And the optimal model parameters (such as parameter c (cost), parameter g (gamma) and the like) can be searched based on the characteristic information of a plurality of tumor patients and the prognosis information thereof by using a gridding partition method so as to update and optimize the model. In some embodiments, a kernel function (e.g., linear kernel function, polynomial kernel function, gaussian (RBF) kernel function, sigmoid kernel function) of the support vector machine model may be selected and trained based on feature information of a plurality of tumor patients and prognosis information thereof to obtain the support vector machine model. In some embodiments, the optimal model parameters can be found by combining a grid partition method and a verification method. For example, model parameters (e.g., parameter c (cost), parameter g (gamma)) are adjusted by a mesh partition method, the model with the parameters adjusted is verified, and the optimal model parameters are determined and selected according to the verification result.

In still other embodiments, a particle swarm optimization algorithm may be employed to optimize the parameters of the support vector machine model. Specifically, the parameters of the particle swarm optimization algorithm may be initialized first, and then the particle swarm optimization algorithm is used to find the optimal parameters (e.g., paired parameters c, g, etc.) of the updated model, and the optimal parameters are used as the optimized model parameters. The particle swarm optimization algorithm can include, but is not limited to, a basic particle swarm optimization algorithm, an adaptive variant particle swarm optimization algorithm, and the like. The parameters of the particle swarm optimization algorithm can comprise local search capability parameters, global search capability parameters, elastic coefficients of speed updating, maximum evolution quantity, population maximum quantity, folding times of cross validation, variation range of the parameter C, variation range of the parameter g and the like, or any combination thereof. In some embodiments, the parameters of the particle swarm optimization algorithm may be initially set manually or non-manually.

In other embodiments, the grid search and the particle swarm optimization algorithm can be jointly adopted to optimize the parameters of the support vector machine model. For example, the parameters of the support vector machine model may be optimized by grid search and then optimized again by the particle swarm optimization algorithm.

In order to improve the model precision or the training efficiency, the feature information of a plurality of tumor patients can be further screened, and the screened feature information is utilized to carry out model training.

In some embodiments, mutant gene information in which the abundance of a mutation is less than a certain set threshold in the gene mutation information of the plurality of tumor patients may be removed. The gene mutation abundance can be the accumulated sum of the mutation abundances of a plurality of different mutation sites in the gene, the threshold value (such as 0.05%, 0.1%, 0.2%, 1%, 2%, 3% and the like) of the mutation site gene mutation abundance can be artificially set, and the mutation gene information of which the mutation abundance is less than the set threshold value can be removed. For example, for some mutation sites with abundance less than a certain value (e.g., 0.05%, 0.1%, 0.2%, etc.), the abundance of the mutation may not be included in the abundance of the mutation in the gene.

In some embodiments, redundant gene mutation information among the gene mutation information of the plurality of tumor patients may be removed. Specifically, in the gene mutation information, two or more genes may exist, which are relatively highly correlated with each other. In some embodiments, two genes are considered more related when the mutations are identical or similar, or when the expression of the mutation abundances of the two genes are similar. For such highly related genes, one or more of them may be considered as redundant genes. By removing redundant gene mutation information (e.g., only one gene remains among the highly relevant genes), the gene dimensionality can be effectively reduced without affecting the model training effect.

In some embodiments, at least part of the genes that are relevant for predicting the tumor prognosis may be determined based on the contribution of each gene mutation information in the feature information of the plurality of tumor patients to the support vector machine model.

In some embodiments, the profile of multiple tumor patients may be further screened for individual gene mutation information. Specifically, a recursive feature elimination method can be used to screen the genetic mutation information in the feature information of a plurality of tumor patients. The method comprises the steps of taking the prediction accuracy of a model as an evaluation standard, carrying out alternative elimination on each gene mutation information in the characteristic information of a plurality of tumor patients to obtain a plurality of training sets, respectively training on each training set to obtain a model, and carrying out contribution value sequencing on the gene mutation information eliminated when each model is trained based on the prediction accuracy. Finally, screening the mutation information of each gene according to the contribution value to obtain the prediction related gene of which at least part is tumor prognosis. In some embodiments, a random forest algorithm may be further selected to screen the genetic mutation information in the feature information of a plurality of tumor patients. Specifically, (1) first construct a decision tree: p trees (such as 20 trees, 40 trees and the like) in the forest can be defined; a plurality of sample sets can be extracted from 93 samples by using a bootstrap sampling method to serve as a training sample set of each decision tree, the training sample set of each decision tree can be obtained by repeating P sampling, and a training set of one decision tree can be obtained by sampling 93 samples in each sampling cycle in a return sampling mode for 93 times; at each node of the decision tree, assuming that 315 characteristic variables are in total, randomly extracting m characteristic variables from the 315 characteristic variables, selecting one characteristic from the m characteristic variables for branch growth, not performing pruning operation in the growth process, and calculating the optimal splitting mode; (2) and combining the trained P decision trees to obtain a random forest. And predicting each of a plurality of tumor patients according to the P decision trees, wherein the final prediction result is the output of the random forest by a weighting or voting method. In the process of training each decision tree, it can be calculated how much less the tree is for each feature. For a decision tree forest, the average reduction purity of each feature can be calculated, and the average reduction impurity degree is used as a contribution value evaluation criterion. For example, the most impure gene mutation information can be used as the characteristic with the largest contribution value, and so on, the contribution values of different mutant genes to the model can be determined (as shown in table 2), so as to screen out the genes related to the tumor prognosis prediction at least in part. For example, n (e.g., 20, 29, 40, 100, etc.) mutant genes having the largest contribution to the tumor prognosis prediction model can be selected from among the mutant genes having significant influence on tumor occurrence as the genes related to tumor prognosis prediction.

TABLE 2 List of contribution values of different mutant genes to the model

In some embodiments, the trained prognostic predictive models of tumors may be validated. For example, for the support vector machine model, cross-validation may be employed to verify the model effect. Specifically, the cross-validation method may include: Leave-Out (Hold-Out Method), K-fold Cross Validation (K-CV), and Leave-One-Out Cross Validation (LOO-CV). Taking an LOO-CV as an example, the training samples may be divided into total number of samples (e.g., 93), 1 of the total number of samples is used as a verification sample, and the remaining 92 samples are used as training samples and input into the initial support vector machine model for training, the cross-validation process is repeated 93 times to obtain 93 validation results, and the 93 validation results are combined to determine a final validation result of the tumor prognosis prediction model obtained by training. Further, a receiver operating characteristic curve (ROC curve) can be drawn according to the verification result and visually represented (as shown in fig. 10). As shown in FIG. 10, the points on the ROC curve represent the sensitivity and specificity of the osteosarcoma prognosis prediction model under different truncation conditions (e.g., prognosis effect classification criteria). The point of the uppermost left corner of the ROC curve is close to the upper left corner, so that the fact that the osteosarcoma prognosis prediction model obtained in the embodiment is high in prediction accuracy can be reflected; the area AUC under the ROC curve is 0.988, which is very close to 1, and can reflect that the osteosarcoma prognosis prediction model obtained in the embodiment has a good classification effect; in addition, the osteosarcoma prognosis prediction model has higher sensitivity mean value (0.95) and specificity mean value (0.97) under different truncation conditions.

In this example, 6 osteosarcoma patients (4 of them are known to have poor prognosis effect, and 2 of them have good prognosis effect) were additionally selected. The gene mutation information of the osteosarcoma lesion part is obtained, based on the information, the prognosis prediction results of the 6 osteosarcoma patients are determined according to the osteosarcoma prognosis prediction model obtained by training in the embodiment (as shown in table 3, wherein the threshold value of the prediction value is set to 0.5, the prognosis is good when less than 0.5, and the prognosis is poor when more than 0.5), and the obtained prediction results are completely consistent with the known prognosis effects.

TABLE 3 comparison of the prognosis of osteosarcoma with the actual prognosis

Sample name	Prediction value	Predicting effect	Actual prognostic effect
				Patient 1	0.335717	Good prognosis	Good prognosis
Patient 2	0.44896	Good prognosis	Good prognosis
				Patient 3	0.67417	Poor prognosis	Poor prognosis
Patient 4	0.735268	Poor prognosis	Poor prognosis
				Patient 5	0.756405	Poor prognosis	Poor prognosis
Patient 6	0.930926	Poor prognosis	Poor prognosis

It should be noted that the above description related to the flow 600 is only for illustration and explanation, and does not limit the applicable scope of the present application. Various modifications and changes to flow 600 may occur to those skilled in the art, given the benefit of this disclosure. However, such modifications and variations are intended to be within the scope of the present application.

The beneficial effects that may be brought by the embodiments of the present application include, but are not limited to: (1) the prognosis effect of the tumor patient can be predicted based on the gene mutation information of the tumor patient; (2) the tumor prognosis prediction accuracy is improved; (3) the implementation of the tumor prognosis prediction process is convenient; (4) provides reference for the formulation and selection of treatment schemes. It is to be noted that different embodiments may produce different advantages, and in different embodiments, any one or combination of the above advantages may be produced, or any other advantages may be obtained.

Having thus described the basic concept, it will be apparent to those skilled in the art that the foregoing detailed disclosure is to be considered merely illustrative and not restrictive of the broad application. Various modifications, improvements and adaptations to the present application may occur to those skilled in the art, although not explicitly described herein. Such modifications, improvements and adaptations are proposed in the present application and thus fall within the spirit and scope of the exemplary embodiments of the present application.

Also, this application uses specific language to describe embodiments of the application. Reference throughout this specification to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with at least one embodiment of the present application is included in at least one embodiment of the present application. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, some features, structures, or characteristics of one or more embodiments of the present application may be combined as appropriate.

Moreover, those skilled in the art will appreciate that aspects of the present application may be illustrated and described in terms of several patentable species or situations, including any new and useful combination of processes, machines, manufacture, or materials, or any new and useful improvement thereon. Accordingly, various aspects of the present application may be embodied entirely in hardware, entirely in software (including firmware, resident software, micro-code, etc.) or in a combination of hardware and software. The above hardware or software may be referred to as "data block," module, "" engine, "" unit, "" component, "or" system. Furthermore, aspects of the present application may be represented as a computer product, including computer readable program code, embodied in one or more computer readable media.

The computer storage medium may comprise a propagated data signal with the computer program code embodied therewith, for example, on baseband or as part of a carrier wave. The propagated signal may take any of a variety of forms, including electromagnetic, optical, etc., or any suitable combination. A computer storage medium may be any computer-readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code located on a computer storage medium may be propagated over any suitable medium, including radio, cable, fiber optic cable, RF, or the like, or any combination of the preceding.

Computer program code required for the operation of various portions of the present application may be written in any one or more programming languages, including an object oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C + +, C #, VB.NET, Python, and the like, a conventional programming language such as C, Visual Basic, Fortran 2003, Perl, COBOL 2002, PHP, ABAP, a dynamic programming language such as Python, Ruby, and Groovy, or other programming languages, and the like. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any network format, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet), or in a cloud computing environment, or as a service, such as a software as a service (SaaS).

Additionally, the order in which elements and sequences of the processes described herein are processed, the use of alphanumeric characters, or the use of other designations, is not intended to limit the order of the processes and methods described herein, unless explicitly claimed. While various presently contemplated embodiments of the invention have been discussed in the foregoing disclosure by way of example, it is to be understood that such detail is solely for that purpose and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements that are within the spirit and scope of the embodiments herein. For example, although the system components described above may be implemented by hardware devices, they may also be implemented by software-only solutions, such as installing the described system on an existing server or mobile device.

Similarly, it should be noted that in the preceding description of embodiments of the application, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the embodiments. This method of disclosure, however, is not intended to require more features than are expressly recited in the claims. Indeed, the embodiments may be characterized as having less than all of the features of a single embodiment disclosed above.

Numerals describing the number of components, attributes, etc. are used in some embodiments, it being understood that such numerals used in the description of the embodiments are modified in some instances by the use of the modifier "about", "approximately" or "substantially". Unless otherwise indicated, "about", "approximately" or "substantially" indicates that the number allows a variation of ± 20%. Accordingly, in some embodiments, the numerical parameters used in the specification and claims are approximations that may vary depending upon the desired properties of the individual embodiments. In some embodiments, the numerical parameter should take into account the specified significant digits and employ a general digit preserving approach. Notwithstanding that the numerical ranges and parameters setting forth the broad scope of the range are approximations, in the specific examples, such numerical values are set forth as precisely as possible within the scope of the application.

The entire contents of each patent, patent application publication, and other material cited in this application, such as articles, books, specifications, publications, documents, and the like, are hereby incorporated by reference into this application. Except where the application is filed in a manner inconsistent or contrary to the present disclosure, and except where the claim is filed in its broadest scope (whether present or later appended to the application) as well. It is noted that the descriptions, definitions and/or use of terms in this application shall control if they are inconsistent or contrary to the statements and/or uses of the present application in the material attached to this application.

Finally, it should be understood that the embodiments described herein are merely illustrative of the principles of the embodiments of the present application. Other variations are also possible within the scope of the present application. Thus, by way of example, and not limitation, alternative configurations of the embodiments of the present application can be viewed as being consistent with the teachings of the present application. Accordingly, the embodiments of the present application are not limited to only those embodiments explicitly described and depicted herein.

Claims

1. A method for predicting prognosis of a tumor, comprising:

acquiring characteristic information of a tumor patient, wherein the characteristic information at least reflects gene mutation information of the tumor patient;

and determining the prognosis prediction result of the tumor patient according to a tumor prognosis prediction model based on the characteristic information of the tumor patient.

2. The method of predicting tumor prognosis of claim 1 wherein the gene mutation information includes a gene mutated in DNA and its mutation abundance, and/or a gene related to tumor prognosis prediction on DNA and its mutation abundance.

3. The method of predicting prognosis of tumor according to claim 1, wherein said obtaining the characteristic information of the tumor patient further comprises:

obtaining a tissue sample from the tumor patient;

extracting DNA of the tissue sample;

preparing a library of the DNA;

performing gene sequencing according to the library to obtain a sequencing result;

analyzing the sequencing result to determine the gene mutation information of the tumor patient.

4. The method of predicting prognosis of a tumor according to claim 1, wherein the characteristic information further includes at least one of the following information of the tumor patient: age, gender, smoking history, educational age, working age, treatment regimen, and sample storage time.

5. The method of predicting prognosis of a tumor according to claim 1, wherein the model for predicting prognosis of a tumor is a support vector machine model or a neural network model.

6. The method of predicting prognosis of a tumor according to claim 1, further comprising:

and training an initial model by utilizing the characteristic information and the prognosis information of a plurality of tumor patients to obtain the tumor prognosis prediction model.

7. The method of claim 6, wherein the training of the initial model using the feature information of a plurality of tumor patients and their prognosis information to obtain the tumor prognosis prediction model comprises:

and removing mutant gene information of which the mutation abundance is less than a certain set threshold value from the gene mutation information of the plurality of tumor patients.

8. The method of claim 6, wherein the training of the initial model using the feature information of a plurality of tumor patients and their prognosis information to obtain the tumor prognosis prediction model comprises:

removing redundant gene mutation information in the gene mutation information of the plurality of tumor patients.

9. The method of predicting prognosis of tumor according to claim 6,

the tumor prognosis prediction model is a support vector machine model;

the method for training an initial model to obtain the tumor prognosis prediction model by using the characteristic information and the prognosis information of a plurality of tumor patients comprises the following steps:

determining at least part of genes as tumor prognosis prediction related genes according to the contribution value of each gene mutation information in the feature information of a plurality of tumor patients to the support vector machine model;

and training the initial model by using the gene mutation information and the prognosis information of the genes related to the prognosis prediction of the tumors of a plurality of tumor patients to obtain the prognosis prediction model of the tumors.

10. The method of predicting prognosis of tumor according to claim 6,

the tumor prognosis prediction model is a support vector machine model;

the training the initial model to obtain the tumor prognosis prediction model further comprises: and optimizing the parameters of the support vector machine model by utilizing a particle swarm algorithm or a grid division method.

11. The method of predicting prognosis of tumor according to claim 1,

the prognostic prediction results include: disease progression, disease stabilization, partial remission and complete remission; or,

the prognostic prediction results include: good and bad curative effect.

12. The method of any one of claims 1-11, wherein the tumor is an osteosarcoma.

13. The method of predicting prognosis of a tumor according to claim 12, wherein the characteristic information reflects at least mutation information of at least one of the following genes in osteosarcoma patients: KMT2C, SOX9, LRP1B, NF-1, PRKDC, FAT1, STAG2, SLIT2, NOTCH1, EPHA7, ATRX, KDM6A, APC, RANBP2, RARA. AS1, C11orf30, ROS1, ARID2, TAF1, DICER1, MSH2, MSH6, TP53, KDM5A, JAK2, ALK, RB1, NOTCH2, and RICTOR.

14. The method of predicting prognosis of tumor according to claim 12, wherein said information on gene mutation of tumor patient is information on gene mutation of osteosarcoma lesion site.

15. A tumor prognosis prediction system is characterized by comprising an acquisition module and a prediction module, wherein,

the acquisition module is used for acquiring characteristic information of a tumor patient, and the characteristic information at least reflects gene mutation information of the tumor patient;

the prediction module is used for determining the prognosis prediction result of the tumor patient according to the tumor prognosis prediction model based on the characteristic information of the tumor patient.

16. The system of claim 15, wherein the gene mutation information includes a gene mutated in DNA and its mutation abundance, and/or a gene related to tumor prognosis prediction on DNA and its mutation abundance.

17. The tumor prognosis prediction system of claim 15 wherein the characteristic information further comprises at least one of the following information of the tumor patient: age, gender, smoking history, educational age, working age, treatment regimen, and sample storage time.

18. The tumor prognosis prediction system of claim 15 wherein the tumor prognosis prediction model is a support vector machine model or a neural network model.

19. The system of claim 15, further comprising a training module for training an initial model to obtain the prognosis prediction model by using the feature information of a plurality of tumor patients and their prognosis information.

20. The system of claim 19, wherein the training module is further configured to remove mutated gene information from the gene mutation information of the plurality of tumor patients, wherein the abundance of mutation is less than a predetermined threshold.

21. The system of claim 19, wherein the training module is further configured to remove redundant gene mutation information from the gene mutation information of the plurality of tumor patients.

22. The system of claim 19, wherein the prognosis of the tumor is predicted,

the tumor prognosis prediction model is a support vector machine model;

the training module is further configured to:

23. The system of claim 19, wherein the prognosis of the tumor is predicted,

the tumor prognosis prediction model is a support vector machine model;

the training module is further used for optimizing parameters of the support vector machine model by utilizing a particle swarm algorithm or a grid division method.

24. The tumor prognosis prediction system of claim 15,

the prognostic prediction results include: good and bad curative effect.

25. The system of any one of claims 15-24, wherein the tumor is an osteosarcoma.

26. The tumor prognosis prediction system of claim 25 wherein the characteristic information reflects at least mutation information of at least one of the following genes in osteosarcoma patients: KMT2C, SOX9, LRP1B, NF-1, PRKDC, FAT1, STAG2, SLIT2, NOTCH1, EPHA7, ATRX, KDM6A, APC, RANBP2, RARA. AS1, C11orf30, ROS1, ARID2, TAF1, DICER1, MSH2, MSH6, TP53, KDM5A, JAK2, ALK, RB1, NOTCH2, and RICTOR.

27. The system of claim 26, wherein the tumor patient gene mutation information is gene mutation information of osteosarcoma lesion site.

28. An apparatus for prognosis of a tumor, the apparatus comprising at least one processor and at least one memory;

the at least one memory is for storing computer instructions;

the at least one processor is configured to execute at least a portion of the computer instructions to implement the method of any of claims 1-11.

29. A computer-readable storage medium storing computer instructions which, when executed by a processor, implement a method of prognosis prediction of a tumor according to any one of claims 1 to 11.

30. A tumor prognosis prediction system comprising:

at least one computer-readable storage medium comprising a set of instructions for prognosis prediction of a tumor; and

at least one processor in communication with the at least one storage medium, the at least one processor, when executing the set of instructions, configured to:

acquiring characteristic information of a tumor patient, wherein the characteristic information at least reflects gene mutation information of the tumor patient; and