CN116153424B

CN116153424B - Monogenic pan-cancer prognosis analysis system and analysis method

Info

Publication number: CN116153424B
Application number: CN202310409654.9A
Authority: CN
Inventors: 张冠雄; 张海伦; 滕雁波; 崔冶; 杨丹; 高琛琛
Original assignee: Beijing Gaopu Biotechnology Co ltd
Current assignee: Beijing Gaopu Biotechnology Co ltd
Priority date: 2023-04-18
Filing date: 2023-04-18
Publication date: 2023-06-23
Anticipated expiration: 2043-04-18
Also published as: CN116153424A

Abstract

The application belongs to the technical field of biotechnology, and particularly relates to a monogenic carcinoma prognosis analysis system and an analysis method; wherein the analysis system comprises: the analysis instruction generation module is used for generating at least one sub-analysis instruction; the cache module is used for storing analysis data of the corresponding genes; the analysis sub-module is used for firstly judging whether corresponding analysis result data are stored in the buffer module when corresponding input parameters are received, if yes, extracting, otherwise, analyzing, and storing and extracting the analysis result data; the analysis project module is used for matching the analysis sub-modules and acquiring analysis result data extracted by all the analysis sub-modules; and the report generation module is used for generating an analysis report according to the received analysis result data and the preset layout mode. The single-gene cancer prognosis analysis system and the analysis method can improve analysis efficiency, automatically generate reports, improve report generation speed and reduce error rate.

Description

Monogenic pan-cancer prognosis analysis system and analysis method

Technical Field

The application belongs to the technical field of biotechnology, and particularly relates to a monogenic carcinoma prognosis analysis system and analysis method.

Background

With the progress of tumor research, researchers have found that more and more genes play an important role in the progression of tumor development. For example, TP53 is a well-known oncogene, which encodes the product tumor suppressor protein p53, which is capable of maintaining many genes in their stability and regulating cell growth, differentiation, senescence, known as "gene daemon".

After the researcher acquires the gene of interest, the research on the influence of the gene on the tumor is urgently needed, but the existing analysis service system has at least the following disadvantages:

1) The program runs slowly. The existing single-gene flow is mostly written in a script form, no connection exists between any two operations, and the analysis of the same parameters is performed many times. For example, looking at the expression difference of TP53 in liver cancer, after user 1 runs the program to obtain the result, if user 2 still needs to see, it needs to run once again, which greatly reduces the efficiency of program running.

2) The sorting report is slow and the error rate is high. When the program is run, the often required report is an analysis report to the researcher, and the existing service system often writes the report manually. The sorting method generally comprises the steps of firstly determining an analysis report template, and pasting analysis results into the report template one by an analysis staff, so that report is slow and error rate is high.

The two disadvantages lead to great effort in large-scale analysis and the inability to provide timely high quality services to researchers in time.

Disclosure of Invention

In order to solve at least one technical problem existing in the prior art, the application provides a monogenic carcinoma prognosis analysis system and an analysis method.

In a first aspect, the present application discloses a monogenic pan-carcinoma prognostic assay system comprising:

the analysis instruction generation module is used for generating at least one sub-analysis instruction according to the selection of a user, and each sub-analysis instruction comprises: basic information of a target gene selected by a user, one biological information analysis item selected for the target gene, at least one analysis means related to the biological information analysis item, and input parameters corresponding to the analysis of the target gene by adopting the analysis means;

the buffer module is internally and simultaneously stored with various different input parameters, analysis means type data adopted by the different input parameters and finally obtained corresponding historical analysis result data;

the analysis sub-modules are corresponding to the number and types of analysis means related to all biological information analysis projects in number and type, wherein each analysis sub-module can analyze received input parameters through corresponding analysis means so as to obtain corresponding analysis result data, before each analysis, whether historical analysis result data corresponding to the input parameters and the analysis means types thereof are stored in the buffer module is judged, if so, the historical analysis result data is directly extracted, otherwise, the input parameters are analyzed, the latest obtained analysis result data is sent to the buffer module for storage, and then the latest obtained analysis result data is extracted;

The quantity and the type of the analysis project modules correspond to those of the biological information analysis projects, wherein each analysis project module is used for matching analysis sub-modules according to the types of analysis means contained in the analysis sub-modules when receiving corresponding sub-analysis instructions, transmitting input parameters in the analysis project modules to the corresponding analysis sub-modules, and finally acquiring analysis result data extracted by all the analysis sub-modules;

and the report generation module is used for receiving all the analysis result data acquired by the analysis project module and generating an analysis report according to a preset layout mode.

Optionally, in the buffer module, the same input parameter, the analysis means type data adopted by the input parameter and the analysis result data obtained by analysis are stored in a storage mode of a key value pair, wherein:

the key refers to a module fingerprint, and comprises an input parameter and analysis means type data adopted by the input parameter;

the value refers to analysis result data obtained by analysis.

Optionally, when judging whether the buffer module stores the historical analysis result data corresponding to the input parameter and the analysis means type thereof, the analysis sub-module firstly converts the input parameter and the analysis means type data thereof into module fingerprints, and completes the judgment by judging whether the buffer module stores the same module fingerprints;

Correspondingly, after the analysis submodule analyzes the corresponding input parameter, the input parameter, analysis means type data adopted by the input parameter and analysis result data obtained by analysis are sent to the cache module for storage in a key value pair mode.

Optionally, when the analysis result data includes a picture and/or a table, each analysis sub-module further includes:

the first conversion module is used for encoding the content of the picture by using base64 and serializing the encoded content into json character strings; and/or

The second conversion module is used for converting the table into a serializable data type and serializing the data type into a json character string;

at this time, the analysis submodule converts the analysis result data into json character strings and transmits the json character strings to the buffer module as values of corresponding key value pairs, and each key value pair is stored as a single file.

Optionally, the report generating module includes:

the template dividing module is used for dividing the main report template into at least one sub-template, typesetting is carried out on the report templates by the sub-templates according to a preset typesetting mode, and each sub-template is used for presenting report data associated with an analysis result of a corresponding analysis item module;

The variable acquisition module is used for calculating analysis result data acquired by the sub-templates through variables corresponding to the sub-templates, so as to obtain the values of the variables of the sub-templates;

the variable construction module is used for constructing the variable of each sub-template and filling the value of the variable obtained by the variable obtaining module into the corresponding sub-template to be used as report data;

the rendering director module is used for carrying out preset rendering processing on the content pre-presented by each sub-template to obtain a final analysis report, and the final analysis report format at least comprises an HTML format and a PDF format.

Optionally, the bioinformation analysis item and the corresponding analysis submodule are as follows:

1) The TCGA (TCGA) differential expression analysis project of the cancer cell line comprises a differential expression module of a single gene of the cancer cell;

2) The analysis project of prognosis efficiency of TCGA pan-cancer queue gene expression, its correspondent analysis submodule includes the single gene survival module of pan-cancer;

3) The analysis submodule corresponding to the correlation analysis project of the TCGA pan-cancer queue gene expression and the molecular mechanism comprises a single sample enrichment analysis module, a pan-cancer single gene score correlation module and CIBERSORTModule, MCPcounterModule, xCellModule;

4) The influence analysis project of TCGA (TCGA) cancer-making queue mutation on prognosis comprises corresponding analysis submodules including a cancer-making single gene SNV survival module;

5) The immune treatment queue expresses a differential analysis project, and the corresponding analysis submodule comprises a gene expression module;

6) The corresponding analysis submodules of the gene prognosis efficacy analysis project of the immune treatment queue comprise a single-gene survival module of the cancer;

7) The corresponding analysis submodules of the correlation analysis project of the immune treatment queue gene expression and the immune treatment effect comprise a cancer single-gene immune treatment correlation module;

8) Verifying a queue expression difference analysis item, wherein a corresponding analysis submodule comprises a gene expression module;

9) And verifying a queue prognosis efficacy analysis project, wherein the corresponding analysis submodule comprises a gene survival module.

In a second aspect, the present application also discloses a monogenic pan-carcinoma prognostic assay comprising the steps of:

generating at least one sub-analysis instruction according to the selection of a user, wherein each sub-analysis instruction comprises: basic information of a target gene selected by a user, one biological information analysis item selected for the target gene, at least one analysis means related to the biological information analysis item, and input parameters corresponding to the analysis of the target gene by adopting the analysis means;

Step two, sending each sub-analysis instruction to a corresponding analysis project module, wherein each analysis project module matches the analysis sub-module according to the type of the analysis means contained in the sub-analysis instruction, and sends the input parameters in the analysis sub-module to the corresponding analysis sub-module;

wherein the number and type of analysis item modules corresponds to the number and type of the biological information analysis items;

step three, when each matched analysis sub-module receives corresponding input parameters, firstly judging whether analysis result data corresponding to the input parameters and analysis means types thereof are stored in a cache module, if so, performing step four, otherwise, performing step five;

the number and types of the analysis sub-modules correspond to the number and types of analysis means related to all biological information analysis items; and

the buffer memory module simultaneously stores various different input parameters, analysis means type data adopted by the different input parameters and finally obtained corresponding analysis result data;

step four, the analysis submodule extracts corresponding analysis result data in the cache module;

step five, the analysis sub-module adopts a corresponding analysis means to analyze the input parameters, and sends the latest obtained analysis result data to the cache module for storage, and then extracts the latest obtained analysis result data;

Step six, each analysis project module acquires analysis result data extracted by all analysis sub-modules contained in the analysis project module;

and step seven, generating an analysis report according to a preset layout mode by using the analysis result data acquired by all analysis project modules through a report generation module.

Optionally, in the fifth step, the analysis submodule transmits the same input parameter, analysis means type data adopted by the input parameter and analysis result data obtained by analysis to the cache module for storage in a storage mode of a key value pair, where:

the value refers to analysis result data obtained by analysis, and comprises pictures and/or tables;

correspondingly, in this step, the method further comprises the following processing steps of the analysis submodule for analyzing the result data:

when the analysis result data is a picture, encoding the content of the picture by using base64, and serializing the encoded content into a json character string;

when the analysis result data is a table, converting the table into a serializable data type, and serializing the data into a json character string; and

when the analysis result data is the combination of the picture and the form, processing is carried out by adopting the processing modes of the picture and the form respectively; and

And transmitting the obtained json character string to a cache module to be used as a value of a corresponding key value pair, wherein each key value pair is stored as a single file.

Optionally, in the third step, the analysis submodule firstly converts corresponding input parameters and analysis means type data thereof into module fingerprints, and judges whether the corresponding analysis result data is stored in the cache module by judging whether the same module fingerprints are stored in the cache module;

correspondingly, in the fifth step, the analysis submodule sends the corresponding input parameter, the analysis means type data adopted by the input parameter and the analysis result data obtained by analysis to the cache module for storage in a key value pair mode.

Optionally, the step seven includes the following sub-steps:

step 7.1, dividing a main report template into at least one sub-template, wherein the sub-templates typeset on the report template according to a preset typesetting mode, and each sub-template is used for presenting report data associated with an analysis result of a corresponding analysis item module;

step 7.2, calculating analysis result data obtained by the sub-templates through variables corresponding to the sub-templates, so as to obtain the values of the variables of the sub-templates;

Step 7.3, constructing variables of each sub-template, and filling values of the variables obtained by the variable obtaining module into corresponding sub-templates to be used as report data;

and 7.4, carrying out preset rendering treatment on the pre-presented content of each sub-template to obtain a final analysis report, wherein the final analysis report format at least comprises an HTML format and a PDF format.

The application has at least the following beneficial technical effects:

1) According to the monogenic cancer prognosis analysis system and the analysis method, the instruction of the user is subjected to multidimensional decomposition to generate at least one sub-analysis instruction; in addition, before the input parameters are analyzed by adopting corresponding analysis means, each analysis sub-module in each analysis item module directly extracts the historical analysis result data if judging that the corresponding historical analysis result data is stored in the buffer module, so that the biological information analysis step can be omitted, and the analysis efficiency is improved; moreover, when the number of analysis project modules and analysis submodules thereof is larger, the effect of improving the analysis efficiency is more obvious; further, as the system is used, the amount of the historical data stored in the cache module is increased continuously, and the probability that a subsequent user matches with the historical analysis result data can be increased, so that the analysis efficiency is improved;

2) In the single-gene cancer prognosis analysis system and the analysis method, the related analysis data of each gene are stored in a storage mode of a key value pair, and the module fingerprints serving as keys are arranged, so that the storage and extraction of analysis result data are more convenient and rapid, and the analysis efficiency can be improved;

3) According to the monogenic cancer prognosis analysis system and the monogenic cancer prognosis analysis method, pictures and/or tables in analysis result data are converted into character strings, so that each key value pair is conveniently stored as a single file, storage, identification and extraction are convenient, and analysis processing efficiency and accuracy are remarkably improved;

4) In the single-gene cancer prognosis analysis system and the analysis method, the main report template is divided into at least one sub-template, and the report data finally presented by each sub-template is associated with the analysis result corresponding to one analysis item module, so that the report can be automatically rendered, the report generation speed is improved, and the error rate is reduced.

Term definition or interpretation:

TCGA pan-cancer cohort expression differential analysis: theCancerGenome Atlas (TCGA) database provides high throughput chip or sequencing data for the tissues of more than 1 million patients for 33 tumors, here transcriptome expression differences of the genes were studied for analysis of 33 tumor and normal expression profile data;

TCGA pan-cancer cohort gene expression: researching transcriptome expression quantity of genes in 33 tumors in TCGA database;

CIBERSORTModule: CIBERSORTModul is a tool that uses gene expression data to calculate the abundance of non-copper cell types;

mcp counter module: mcp counter module is an R package that can be used to quantify the absolute abundance of 8 immune cells and 2 stromal cells using transcriptome data;

xcell module: xcell module is an R-packet that estimates the abundance of cell types in bulk rna samples based on expression profile data;

TCGA pan cancer cohort mutations: 33 kinds of tumor genome data of TCGA database, study the somatic mutation situation of the gene;

single gene SNV survival module for cancer: 33 kinds of tumor genome data of TCGA database, study gene SNV mutation and wild type survival difference;

immunotherapy cohort expression differences: drug resistance and differential expression of sensitive genes in transcriptome queues using immunotherapy;

immunotherapy cohort gene prognosis: the effect of gene expression on disease immunotherapy prognosis was studied in transcriptome cohorts using immunotherapy.

Drawings

FIG. 1 is a schematic structural diagram of one embodiment of a single gene pan-cancer prognostic assay system of the present application;

FIG. 2 is a schematic structural diagram of a report generating module in the single gene pan-cancer prognostic assay system of the present application;

FIG. 3 is a schematic diagram of the operational flow of the analysis submodule in the single-gene flood prognosis analysis system and the analysis method of the present application;

FIG. 4 is an example of conversion processing of pictures in the single gene pan-cancer prognostic assay system and assay method of the present application;

FIG. 5 is an example of conversion treatment of a table in the single gene pan-cancer prognosis analysis system and analysis method of the present application;

FIG. 6 is an example of simultaneous conversion of tables and pictures in the single gene pan-cancer prognostic assay system and assay method of the present application;

FIG. 7 is a system frame diagram of one of the sub-templates of the single gene pan-cancer prognostic assay system and assay method of the present application;

FIG. 8 is a profile of all analysis sub-modules in one embodiment of a single gene pan-cancer prognostic analysis system and method of the present application;

FIG. 9 is a flow chart of a single gene flood prognosis assay method according to one embodiment of the present application.

Detailed Description

In order to make the purposes, technical solutions and advantages of the implementation of the present application more clear, the technical solutions in the embodiments of the present application will be described in more detail below with reference to the accompanying drawings in the embodiments of the present application.

In a first aspect, the present application discloses a monogenic pan-cancer prognosis analysis system, as shown in fig. 1, which may include an analysis instruction generation module 101, a caching module 102, an analysis sub-module 103, an analysis project module 104, and a report generation module 105.

The analysis instruction generating module 101 is configured to generate at least one sub-analysis instruction according to a selection of a user, where each sub-analysis instruction includes basic information (such as a name or a code) of a target gene (i.e., a single gene) selected by the user (i.e., an interest), one of the biological information analysis items selected for the target gene, at least one analysis means related to the biological information analysis item, and an input parameter corresponding to the analysis of the target gene by each analysis means. Taking the prognosis analysis of the single gene for the flood cancer as an example, the biological information analysis item is a TCGA flood cancer queue expression differential analysis item, and the analysis item corresponding analysis means is to perform differential expression analysis on the TCGA flood cancer queue expression profile. The selection of specific input parameters (related to the single gene) required for the analysis means belongs to a technology which is relatively mature in the art, and therefore will not be described in detail.

The buffer memory module 102 (which corresponds to the buffer memory pool in fig. 3) stores various input parameters, analysis means type data adopted by the different input parameters and finally obtained corresponding historical analysis result data; it will be appreciated that the present invention may include a plurality of analysis result data obtained by processing the same gene with a plurality of biological information analysis items (i.e., simultaneously corresponding to a plurality of analysis means).

The number of analysis sub-modules 103 (in the embodiment shown in FIG. 1, the number of analysis sub-modules 103 is two, respectively analysis sub-module B ₁ And an analysis submodule B ₂ ) The types of the analysis means are corresponding to the number and types of the analysis means related to all biological information analysis projects, taking the differential expression analysis means of the TCGA cancer array expression profile as an example, and the corresponding analysis sub-module 103 can be a cancer single gene differential expression module; further, each analysis sub-module 103 can analyze the received input parameters through a corresponding analysis means, so as to obtain corresponding analysis result data; also, referring to fig. 3, each analysis sub-module 103 determines whether the buffer module 102 exists before analyzing the corresponding input parameters Storing historical analysis result data corresponding to the input parameters and the analysis means types thereof; if the historical analysis result data is stored, directly extracting the historical analysis result data; if not, the input parameters are analyzed by adopting a corresponding analysis means, and the latest obtained analysis result data is sent to the cache module 102 for storage (namely, is stored by the historical analysis result data), and then the latest obtained analysis result data is extracted.

The number of analysis item modules 104 (in the embodiment shown in FIG. 1, the number of analysis item modules 104 is one, i.e., analysis item module A ₁ ) And the type corresponds to the number and type of the biological information analysis items; wherein each analysis item module 104 is configured to match the analysis sub-modules 103 according to the types of analysis means included in the sub-analysis instructions when receiving the corresponding sub-analysis instructions (i.e., select the corresponding analysis sub-modules 103, referring to the embodiment of fig. 1, two analysis sub-modules B are matched in one analysis item module 104) ₁ And B ₂ ) And the input parameters in the sub-analysis instructions are transmitted to the corresponding analysis sub-modules 103 (so that the analysis sub-modules 103 perform the above processing), and finally, the analysis result data extracted by all the analysis sub-modules 103 are obtained and used as the final analysis result data of the analysis project module 104.

The report generation module 105 is capable of receiving all analysis result data finally acquired by the analysis item module 104, and generating an analysis report according to a predetermined layout manner.

In summary, in the monogenic pan-cancer prognosis analysis system of the present application, before each analysis submodule 103 in each analysis project module 104 analyzes an input parameter by adopting a corresponding analysis means, if it is determined that corresponding historical analysis result data is stored in the cache module 102, the historical analysis result data is directly extracted, so that the biological information analysis step can be omitted, and the analysis efficiency is improved; and when the number of the analysis project modules 104 and the analysis sub-modules 103 thereof is larger, the effect of improving the analysis efficiency is more remarkable; further, since the analysis sub-module 103 that is not matched with the corresponding data will send the analysis result data obtained by the latest analysis to the cache module 102 for storage, with the use of the system, the amount of the historical data stored in the cache module 102 will be continuously increased, which can also increase the probability that the subsequent user matches with the historical analysis result data, thereby improving the analysis efficiency.

It should be noted that, the biological information analysis items selected by the user for the target gene may be a plurality of suitable analysis items known at present; in one implementation, referring to fig. 8, the biological information analysis items (i.e., corresponding analysis item modules 104), their corresponding analysis sub-modules 103 (i.e., corresponding analysis means), and the analysis sub-modules 103 function as follows:

1) The TCGA pan cancer queue expression difference analysis project (also called TCGA pan cancer queue expression difference analysis project module), and the corresponding analysis submodule 103 includes a pan cancer single gene difference expression module (also called pancancer singlegenediffexp module); the differential expression module can be used for differential expression analysis (namely analysis means) of TCGA (TCGA) flood queue expression profile. Wherein, the grouping adopts beside tumor vs cancer, the multiple threshold adopts 0.585, and the significance threshold adopts 0.05.

2) A prognostic efficacy analysis program for TCGA pan-cancer cohort gene expression (also referred to as a prognostic efficacy analysis program module for TCGA pan-cancer cohort gene expression), the corresponding analysis submodule 103 comprising a pan-cancer single gene survival module (also referred to as pancancer singlegenesurvivinvalsmodule); the single gene survival module can be used for carrying out survival analysis on TCGA (TCGA) flood queue expression profile. The prognosis endpoint was OS, DSS, PFI.

3) The analysis module 103 includes a single sample enrichment analysis module (also called SSGSEAModule), a cancer single gene score correlation module (also called pancancer singlegenexscorecormodule), and CIBERSORTModule, MCPcounterModule, xCellModule.

Specifically, the item is divided into the following three parts:

3.1 Correlation of HALLMARK pathway Activity

The single sample enrichment analysis module was used to calculate HALLMARK pathway activity scores for TCGA pan-cancer queue expression profiles, with HALLMARK pathway genes obtained from the MSigDB database. Correlation of HALLMARK pathway activity with gene expression was calculated using a pan-cancer monogenic score correlation module.

3.2 Expression correlation of immune checkpoints

After TCGA pan-cancer cohort immune checkpoint gene expression is extracted, the correlation of immune checkpoint gene expression to gene expression is calculated using the pan-cancer single gene score correlation module.

3.3 Correlation of immune cell infiltration ratio

And calculating immune cell infiltration proportion of the TCGA flood cancer queue expression profile by using a CIBERSORTModule, MCPcounterModule, xCellModule single sample enrichment analysis module respectively, and calculating the correlation between each immune cell infiltration proportion and gene expression by using a flood cancer single gene score correlation module respectively.

4) An analysis item of the influence of the mutation of the TCGA cancer array on prognosis (also called as an analysis item module of the influence of the mutation of the TCGA cancer array on prognosis), wherein the corresponding analysis submodule 103 comprises a single gene SNV survival module of the cancer; the single gene SNV survival module of this pan-carcinoma can be used to assess the effect of genetic mutations on prognosis. The prognosis endpoint was selected OS, DSS, PFI.

5) An immunotherapy cohort expression differential analysis project (also called an immunotherapy cohort expression differential analysis project module), whose corresponding analysis submodule 103 includes a gene expression module (also called a geneexpression diffmodule); the gene expression module can be used to calculate the expression difference of genes in an immunotherapy cohort. The grouping was immunotherapeutically with response and no response.

6) An immunotherapeutic cohort gene prognostic efficacy analysis program (also referred to as an immunotherapeutic cohort gene prognostic efficacy analysis program module), whose corresponding analysis submodule 103 includes a single gene survival module for cancer; accordingly, the single gene survival module for pan-cancer can be used to assess prognostic efficacy of genes in immune treatment cohorts. Prognosis endpoint uses OS.

7) A correlation analysis item of immune treatment queue gene expression and immune treatment effect (also called a correlation analysis item module of immune treatment queue gene expression and immune treatment effect), wherein the corresponding analysis submodule 103 comprises a pan cancer single gene immune treatment correlation module (also called a pancancer singlegeneicbcormodule); this module can be used to assess the correlation of gene expression with immune therapy response.

8) Verification queue expression differential analysis items (also called verification queue expression differential analysis item modules), whose corresponding analysis sub-modules 103 include gene expression modules (also called geneexpressionmodules); the gene expression module can be used to calculate the expression difference of genes in the validation queue.

9) The validation cohort prognostic efficacy analysis program (also known as the validation cohort prognostic efficacy analysis program module) includes a gene survival module (also known as a genesurvivinvalmodule) corresponding to the analysis submodule 103. The gene survival module can be used to assess the prognostic efficacy of genes in a validation cohort.

It will be appreciated that the number of analysis sub-modules 103 corresponding to one bioinformatic analysis item (i.e. analysis item module 104) may be one, or two or more, depending on the specific bioinformatic analysis item, for example, 5 analysis sub-modules 103 may be included in the above-mentioned correlation analysis item for TCGA pan-cancer queue gene expression and molecular mechanism. In addition, since each of the above biological information analysis items (and the corresponding sub-modules included therein) already belong to a relatively mature technology, a description of the principle thereof and the corresponding sub-modules will not be repeated here.

In one implementation manner, when storing analysis related data in the buffer module 102, specifically, the same input parameter, analysis means type data (which may be understood as a name of a corresponding analysis means or sub-analysis module) adopted by the input parameter, and analysis result data obtained by analysis are stored in a storage manner of a key value pair; the key refers to a module fingerprint and comprises corresponding input parameters and analysis means type data adopted by the input parameters; the value refers to analysis result data obtained after the input parameters are analyzed by adopting corresponding analysis means.

It should be further noted that, for the same module (i.e., the analysis sub-module 103), if the parameters (i.e., the input parameters) of the two module objects are completely identical, the module operation result (i.e., the analysis result data) must be identical. To this end, the present application defines a module fingerprint: "if there is one module M, it contains N parameters P1, P. For any one module object M, its N parameters are P1,..and Pn, respectively, then its module fingerprint is S (name=m, args= (p1=p1,.., pn=pn)). "

From the above definition, it follows that for the same module, if the parameters of two module objects are the same, the module fingerprints are necessarily the same. At this time, when determining whether the buffer module 102 stores the historical analysis result data corresponding to the input parameter and the analysis means type thereof, the analysis sub-module 103 converts the input parameter and the analysis means type data thereof into module fingerprints, and determines whether the buffer module stores the same module fingerprints.

Similarly, for the unmatched analysis sub-module 103, it needs to analyze the corresponding input parameter, and after the analysis is completed, the input parameter, the analysis means type data adopted by the input parameter, and the latest analysis result data obtained by the analysis are sent to the buffer module 102 in a key value pair manner for storage, so as to facilitate subsequent searching and matching.

Further, in the cache module 102, to facilitate storage, searching and extraction of key-value pairs, each key-value pair is stored as a single file; and because the analysis result data obtained by analyzing the biological information analysis items is mainly pictures and/or tables, the application provides a storage method, namely a text representation method, in order to store the pictures and the tables into a single file.

Specifically, a first conversion module and a second conversion module that perform conversion processing on the picture and the table respectively are preset in each analysis sub-module 103.

As shown in fig. 4, the first conversion module is configured to encode the content of the picture using base64, and then serialize the encoded content into json strings; as shown in fig. 5, the second conversion module is configured to convert the table into a serializable data type, and then serialize the data into a json string. In addition, as shown in fig. 6, for any combination of the picture and the table, the conversion processing may be performed by using the first conversion module and the second conversion module, respectively.

At this time, the analysis submodule 103 converts the analysis result data into json strings and transmits the json strings to the buffer module 102 as the values of the corresponding key-value pairs, and each key-value pair is stored as a single file, so that it is ensured that the analysis result data can be saved in a text form to a single file for the next operation to be invoked.

It will be appreciated that, to achieve the foregoing saving of analysis Result data to the cache module 102 and the reading from the cache module 102, a base class Result is defined herein, where the base class has two methods, to_list and from_list, where to_list is responsible for converting a Result object into a serializable list, and from_list is responsible for converting a serialized completion list obtained from the cache module 102 into a Result object. In addition, all analysis sub-modules 103 must define a module result subclass. And defining a Figure class which is a special result and has two attributes of format and content, and respectively storing the format of the picture and the content of the picture.

In summary, in the monogenic pan-cancer prognosis analysis system, the analysis efficiency can be obviously improved through the arrangement of the modules.

Fig. 2 shows a schematic diagram of the report generating module in an embodiment of the present invention, where the report generating module 105 may specifically include a template dividing module 1051, a variable obtaining module 1052, a variable constructing module 1053, and a rendering director module 1054.

Because the single-gene cancer prognosis rapid analysis flow is very complex, in order to facilitate the writing of the report, the present application divides the main report template into at least one sub-template by the template dividing module 1051, each sub-template may be a section of the report for presenting report data associated with the analysis result of the corresponding analysis item module 104, and in addition, the sub-templates are typeset in the main report template according to a predetermined typesetting mode, thereby greatly reducing the complexity of report writing. Further, the report template is implemented based on HTML and the template language jinjja 2.

Further, since the data in the report cannot be obtained directly from the flow result (i.e. the data extracted from the cache module 102), a small amount of calculation is required; specifically, the variable obtaining module 1052 calculates the analysis result data obtained by the sub-templates through the variables corresponding to the sub-templates, so as to obtain the values of the variables of each sub-template. It should be noted that, to facilitate the acquisition of template variables, the present application defines a variable acquisition (varget) base class, which has only one get method that returns the values of a single sub-template variable.

Further, the variable construction module 1053 is configured to construct variables of each sub-template, and fill the values of the variables obtained by the variable obtaining module 1052 into corresponding sub-templates, as report data; it should be noted that, in this application, a variable construction (Builder) base class is defined, and the base class has only one method build, and the method calls the get method of all the varget objects owned by the base class, so that all the variables of the child templates can be obtained and then returned.

The rendering director module 1054 (or called print module, described with reference to fig. 7, may include rendering directors and report rendering) is configured to perform a predetermined rendering process on the content pre-presented by each sub-template, so as to obtain a final analysis report, where the final analysis report format includes at least an HTML format and a PDF format.

Specifically, the rendering Director (Render Director) principle is as follows:

a complete report rendering often requires three types of data:

(1) Configuration (config): i.e. user specified parameters. For example, single gene pan-cancer prognosis rapid analysis report rendering requires knowledge of which gene the user specifies as the target gene.

(2) Results (result): i.e., the flow results (analysis result data obtained by each analysis sub-module 103).

(3) Environment (env): the environment, some data is neither configuration nor result, but is needed when reporting is rendered, such data is referred to as an environment. The environment is saved to a fixed location in json.

The application defines a report rendering Director (Director) base class, and the Director has three parameters config, result, env, which represent configuration, result and environment respectively. In addition, in other embodiments, the method can also be implemented by a method direct, and the method calls the build method of all the build devices owned by the method to obtain all the template variables, constructs a Report object and returns the Report object.

The report rendering principle is as follows:

the present application defines a Report class, which has three attributes:

(1) all_vars: all template variables constructed by directors.

(2) html: the HTML report is empty before running the render_html method.

(3) pdf: the PDF report is empty before the render_pdf method is run.

And there are five methods:

(1) render_html: rendering the HTML report.

(2) render_pdf: the HTML report is converted to a PDF report using wkhtmltopdf.

(3) to_html: save HTML report to file.

(4) to_pdf: saving the PDF report to the file.

(5) to_zip: the PDF report and all supplemental material are saved to a zip compression package.

In sum, in the single-gene cancer prognosis analysis system, the automatic report rendering can be realized through the arrangement of the modules, the report generation speed is improved, and the error rate is reduced.

In a second aspect, the present application also discloses a method for prognosis analysis of monogenic carcinoma, as shown in fig. 9, in a specific example, the method comprises the steps of:

step S1, an analysis instruction generation module generates N sub-analysis instructions according to the selection of a user, wherein each sub-analysis instruction comprises basic information of a target gene selected by the user and one biological information analysis item selected for the target gene (corresponding to an analysis item module A ₁ 、A ₂ …A _n One of them), at least one of the analysis means (corresponding to the matched analysis sub-module B) involved in the biological information analysis project ₁ 、B ₂ …B _m At least one of them) and input parameters corresponding to the analysis of the target gene by the analysis means.

Step S2, N analysis instructions are respectively sent to the analysis project modules (A) ₁ 、A ₂ …A _n ) The method comprises the steps of carrying out a first treatment on the surface of the The analysis project module matches the analysis means contained in the respective sub-analysis instructions to the corresponding analysis sub-module (B) ₁ 、B ₂ …B _m ) And transmitting the input parameters in the sub-analysis instruction to the analysis sub-module.

Step S3, analysis submodule (B) ₁ 、B ₂ …B _m ) When receiving the corresponding input parameters, firstly judging whether the buffer memory module stores the historical analysis result data corresponding to the input parameters and the analysis means types thereof, if so, executing the step S4, otherwise, executing the step S5.

In step S3 of the preferred embodiment, the analysis sub-module converts the corresponding input parameters and the corresponding analysis means type data into module fingerprints; and judging whether the cache module stores the corresponding historical analysis result data or not by judging whether the cache module stores the same module fingerprint.

And S4, extracting corresponding historical analysis result data in the cache module by the corresponding analysis sub-module.

And S5, analyzing the input parameters by adopting a corresponding analysis means by the corresponding analysis sub-module, sending the latest obtained analysis result data to the cache module for storage, and then extracting the latest obtained analysis result data.

In step S5 of the preferred embodiment, the analysis submodule sends the corresponding input parameter, the analysis means type data adopted by the input parameter and the analysis result data obtained by analysis to the buffer module in a key value pair mode and stores the key value pair in a storage mode.

In some preferred embodiments, in step S5, the analysis sub-module further performs the following processing on the analysis result data:

And S6, each analysis project module acquires analysis result data extracted by all analysis sub-modules contained in the analysis project module.

And S7, generating an analysis report according to a predetermined layout mode by using the analysis result data acquired by all analysis project modules through a report generation module.

In step S7 of the preferred embodiment, it comprises the following sub-steps:

step 7.1, dividing the main report template into at least one sub-template, wherein the sub-template typesets in the report template according to a preset typesetting mode, and each sub-template is used for presenting report data associated with an analysis result of a corresponding project module;

As can be seen from the implementation process shown in fig. 9, in the single-gene cancer prognosis analysis method of the present application, the instructions of the user are subjected to multi-dimensional decomposition to generate a plurality of sub-analysis instructions, and the historical analysis result data is effectively invoked by setting a pre-judgment step before each analysis sub-module performs the processing, so that repeated analysis processing is avoided, and the analysis efficiency is obviously improved; meanwhile, analysis result data, particularly 'pictures and forms', are converted into a storage format which is easy to identify and are uniformly stored (stored as a single file), so that the analysis result data can be accurately and efficiently identified, called and output, and the efficiency of the invention is further improved; in addition, by setting the automatic rendering report step, the report generation speed can be improved, and the error rate can be reduced.

Finally, it should be noted that, in the single-gene cancer prognosis analysis system and analysis method of the present application, when a user selects multiple biological information analysis items according to the needs of the user for the same target gene, each biological information analysis item (i.e., the analysis item module 104) will be assigned with at least one corresponding analysis sub-module 103 for processing, so as to obtain corresponding analysis result data (also called module result); it can be appreciated that when a plurality of analysis project modules 104 are involved, the processing order of the plurality of analysis project modules 104 may be selected appropriately according to needs, for example, processing according to a predetermined sequence or parallel processing, and in this embodiment, in order to further increase the analysis efficiency, the parallel processing manner shown in fig. 9 is preferably adopted among the plurality of analysis project modules 104.

In addition, when a certain analysis item module 104 corresponds to two or more analysis sub-modules 103, the two or more analysis sub-modules 103 may perform sorting processing according to a predetermined order (for specific biological information analysis items, see the processing order of the plurality of modules in the correlation analysis item of TCGA pan-cancer queue gene expression and molecular mechanism described above); and, finally, the plurality of analysis result data extracted by the plurality of analysis sub-modules 103 can form a module result list (or called flow result) so as to facilitate subsequent summarization and retrieval.

The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions easily conceivable by those skilled in the art within the technical scope of the present application should be covered in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A monogenic pan-carcinoma prognostic assay system, comprising:

the report generation module is used for receiving all the analysis result data acquired by the analysis project module and generating an analysis report according to a preset layout mode;

in the buffer module, the same input parameter, the analysis means type data adopted by the input parameter and the analysis result data obtained by analysis are stored in a key value pair storage mode, wherein:

the value refers to analysis result data obtained by analysis;

when judging whether the buffer memory module stores the historical analysis result data corresponding to the input parameter and the analysis means type thereof, the analysis sub-module firstly converts the input parameter and the analysis means type data thereof into module fingerprints, and the judgment is completed by judging whether the buffer memory module stores the same module fingerprints;

Correspondingly, after the analysis submodule analyzes the corresponding input parameters, the input parameters, analysis means type data adopted by the input parameters and analysis result data obtained by analysis are sent to the cache module for storage in a key value pair mode;

the biological information analysis items and the corresponding analysis submodules are as follows:

3) The analysis submodule corresponding to the correlation analysis project of TCGA (TCGA) pan-cancer queue gene expression and molecular mechanism comprises a single sample enrichment analysis module, a pan-cancer single gene score correlation module, CIBERSORTModule, MCPcounterModule and xCellModule;

2. The analysis system of claim 1, wherein when the analysis result data includes pictures and/or tables, each of the analysis sub-modules further includes:

at this time, the analysis submodule converts the analysis result data into json character strings and then transmits the json character strings to the buffer module to serve as values of corresponding key value pairs, and each key value pair is stored as a single file.

3. The analysis system of claim 2, wherein the report generation module comprises:

the template dividing module is used for dividing the main report template into at least one sub-template, typesetting is carried out on the main report template by the sub-template according to a preset typesetting mode, and each sub-template is used for presenting report data associated with an analysis result of a corresponding analysis item module;

4. A method for prognosis analysis of monogenic carcinoma, comprising the steps of:

step three, when each matched analysis sub-module receives corresponding input parameters, firstly judging whether historical analysis result data corresponding to the input parameters and analysis means types thereof are stored in a cache module, if so, performing step four, otherwise, performing step five;

the buffer memory module simultaneously stores various different input parameters, analysis means type data adopted by the different input parameters and finally obtained corresponding historical analysis result data;

step four, the analysis submodule extracts the corresponding historical analysis result data in the cache module;

step seven, generating an analysis report according to a preset layout mode by using the analysis result data acquired by all analysis project modules through a report generation module;

in the fifth step, the analysis submodule transmits corresponding input parameters, analysis means type data adopted by the input parameters and analysis result data obtained by analysis to the cache module for storage in a storage mode of a key value pair, wherein:

the value refers to analysis result data obtained by analysis, and comprises pictures and/or tables

In the third step, the analysis sub-module firstly converts corresponding input parameters and analysis means type data thereof into module fingerprints, and judges whether the corresponding historical analysis result data are stored in the cache module by judging whether the same module fingerprints are stored in the cache module;

The biological information analysis item in the first step and the corresponding analysis submodule in the second step are as follows:

5. The method according to claim 4, further comprising the following steps of processing the analysis result data by the analysis submodule:

and transmitting the obtained json character string to a cache module to serve as a value of a corresponding key value pair, and storing each key value pair into a single file.

6. The method according to claim 4, wherein the seventh step comprises the sub-steps of: