CN109671467B - Pathogen infection damage mechanism analysis method and device - Google Patents

Pathogen infection damage mechanism analysis method and device Download PDF

Info

Publication number
CN109671467B
CN109671467B CN201811521645.4A CN201811521645A CN109671467B CN 109671467 B CN109671467 B CN 109671467B CN 201811521645 A CN201811521645 A CN 201811521645A CN 109671467 B CN109671467 B CN 109671467B
Authority
CN
China
Prior art keywords
genes
whole genome
gene
pathogen
infection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811521645.4A
Other languages
Chinese (zh)
Other versions
CN109671467A (en
Inventor
伯晓晨
何松
文昱琦
宋欣雨
刘祯
杨晓曦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Academy of Military Medical Sciences AMMS of PLA
Original Assignee
Academy of Military Medical Sciences AMMS of PLA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Academy of Military Medical Sciences AMMS of PLA filed Critical Academy of Military Medical Sciences AMMS of PLA
Priority to CN201811521645.4A priority Critical patent/CN109671467B/en
Publication of CN109671467A publication Critical patent/CN109671467A/en
Application granted granted Critical
Publication of CN109671467B publication Critical patent/CN109671467B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The embodiment of the invention provides a method and a device for analyzing a pathogen infection damage mechanism, belonging to the field of gene data processing. The method comprises the following steps: acquiring whole genome expression data after a plurality of genes are silenced; obtaining a gene expression rank sequence corresponding to the whole genome expression data; acquiring whole genome expression profile data after infection by a plurality of different pathogens; constructing a set of imprinted genes for infection by the pathogen based on the whole genome expression profile data; acquiring the enrichment fractions of the gene expression rank sequence and the imprinted gene set; and determining the damage mechanism of pathogen infection according to the enrichment fraction, thereby fusing massive and various cross-platform transcriptome big data, not needing to culture the pathogen from the head and infect cells, carrying out large-scale experiments, further reducing the research and development cost and shortening the detection period. And experimental errors are reduced, so that the analysis of the damage mechanism of pathogen infection is more accurate.

Description

Pathogen infection damage mechanism analysis method and device
Technical Field
The invention relates to the field of gene data processing, in particular to a pathogen infection damage mechanism analysis method and a pathogen infection damage mechanism analysis device.
Background
Currently, the existing differential expression gene analysis analyzes the gene expression difference caused by unknown pathogen infection, and then analyzes the damage mechanism of the unknown pathogen infection aiming at the gene with the largest differential expression fold change, however, the pathogen needs to be cultured and infected cells are needed from the beginning, large-scale experiments are carried out, time and labor are wasted, and the analysis of a single gene may miss some important paths, or different research institutions only have few repetitions of the differential expression gene obtained about the same biological system.
Disclosure of Invention
The method and the device for analyzing the pathogen infection damage mechanism can solve the technical problem that time and labor are wasted when the damage mechanism infected by unknown pathogens is analyzed in the prior art.
In order to achieve the above object, the embodiments of the present invention adopt the following technical solutions:
in a first aspect, an embodiment of the present invention provides a method for analyzing a pathogen infection damage mechanism, including: acquiring whole genome expression data after a plurality of genes are silenced; obtaining a gene expression rank sequence corresponding to the whole genome expression data; acquiring whole genome expression profile data after infection by a plurality of different pathogens; constructing a set of imprinted genes for infection by the pathogen based on the whole genome expression profile data; acquiring the enrichment fractions of the gene expression rank sequence and the imprinted gene set; determining a damage mechanism of the pathogen infection based on the enrichment fraction. The method integrates massive and various cross-platform transcriptome big data, so that large-scale experiments are performed without culturing pathogens and infecting cells from the head to analyze the pathogen infection damage mechanism, further the research and development cost is reduced, and the detection period is shortened. And experimental errors are reduced, so that the analysis of the damage mechanism of pathogen infection is more accurate.
In combination with the first aspect, the present embodiments provide a first possible implementation manner of the first aspect, where the acquiring whole genome expression data after silencing a plurality of genes includes: and acquiring whole genome expression data after a plurality of genes are silenced from a LINCS database.
In combination with the first aspect, the present embodiments provide a second possible implementation manner of the first aspect, wherein the constructing of the imprinted gene set for pathogen infection based on the whole genome expression profile data comprises: obtaining differential expression quantity corresponding to the whole genome expression profile data; arranging genes corresponding to the whole genome expression profile data from top to bottom according to the differential expression quantity from high to low to obtain a rank sequence; obtaining a plurality of genes from the top and bottom of the rank sequence; using the plurality of genes as a set of imprinted genes for infection by the pathogen.
In combination with the first aspect, the present examples provide a third possible implementation manner of the first aspect, wherein the determining the damage mechanism of the pathogen infection based on the enrichment fraction comprises: determining whether the enrichment fraction is positive; if so, determining from the enrichment fraction that the cellular response to gene silencing is consistent with the cellular response to the pathogen infection injury.
In combination with the third possible embodiment of the first aspect, the present examples provide a fourth possible embodiment of the first aspect, and the determining whether the enrichment fraction is a positive number includes: determining whether a subset of target genes in the imprinted gene set are located at corresponding positions of the gene expression rank sequence under the gene silencing; if so, determining the enrichment fraction as a positive number.
In combination with the first aspect, the present embodiments provide a fifth possible implementation manner of the first aspect, wherein the determining the damage mechanism of the pathogen infection according to the enrichment fraction includes: constructing a network of associations between genes used for said gene silencing and pathogens used for said pathogen infection, weighted by said enrichment score; obtaining all incidence relations with the highest enrichment scores in the incidence networks, wherein the incidence relations are used for representing the relations between the genes and the pathogens used by the pathogen infection; and determining the damage mechanism of the pathogen infection according to the association relationship.
In a second aspect, an apparatus for analyzing a pathogen infection damage mechanism according to an embodiment of the present invention includes: the first acquisition unit is used for acquiring whole genome expression data after a plurality of genes are silenced; the first processing unit is used for acquiring a gene expression rank sequence corresponding to the whole genome expression data; the second acquisition unit is used for acquiring whole genome expression profile data after infection by a plurality of different pathogens; a second processing unit for constructing a set of imprinted genes infected by the pathogen based on the whole genome expression profile data; a third obtaining unit, configured to obtain the gene expression rank sequence and an enrichment fraction of the imprinted gene set; a fourth processing unit for determining the damage mechanism of said pathogen infection based on said enrichment fraction.
With reference to the second aspect, an embodiment of the present invention provides a first possible implementation manner of the second aspect, where the first obtaining unit is further configured to: and acquiring whole genome expression data after a plurality of genes are silenced from a LINCS database.
With reference to the second aspect, an embodiment of the present invention provides a second possible implementation manner of the second aspect, where the second processing unit is further configured to: obtaining differential expression quantity corresponding to the whole genome expression profile data; arranging genes corresponding to the whole genome expression profile data from top to bottom according to the differential expression quantity from high to low to obtain a rank sequence; obtaining a plurality of genes from the top and bottom of the sequence; using the plurality of genes as a set of imprinted genes for infection by the pathogen.
With reference to the second aspect, an embodiment of the present invention provides a third possible implementation manner of the second aspect, where the fourth processing unit is further configured to: determining whether the enrichment fraction is a positive number; if so, determining from the enrichment fraction that the cellular response to gene silencing is consistent with the cellular response to the pathogen infection injury.
Compared with the prior art, the embodiment of the invention has the following beneficial effects:
according to the method and the device for analyzing the pathogen infection damage mechanism, provided by the embodiment of the invention, the whole genome expression data after a plurality of genes are silenced is obtained; obtaining a gene expression rank sequence corresponding to the whole genome expression data; acquiring whole genome expression profile data after infection by a plurality of different pathogens; constructing a set of imprinted genes for infection by the pathogen based on the whole genome expression profile data; acquiring the enrichment fractions of the gene expression rank sequence and the imprinted gene set; and determining the damage mechanism of pathogen infection according to the enrichment fraction, thereby fusing massive and various cross-platform transcriptome big data, not needing to culture the pathogen from the head and infect cells, carrying out large-scale experiments, further reducing the research and development cost and shortening the detection period. And experimental errors are reduced, so that the analysis of the damage mechanism of pathogen infection is more accurate.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
FIG. 1 is a flowchart of a method for analyzing a pathogen infection damage mechanism according to a first embodiment of the present invention;
fig. 2 is a functional block diagram of a pathogen infection damage mechanism analysis device according to a second embodiment of the present invention;
fig. 3 is a block diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention. Thus, the following detailed description of the embodiments of the present invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
First embodiment
Fig. 1 is a flowchart of a method for analyzing pathogen infection damage mechanism according to a first embodiment of the present invention. The specific process shown in FIG. 1 will be described in detail below.
Step S101, acquiring whole genome expression data after a plurality of genes are silenced.
As an embodiment, a large amount of whole genome expression data after gene silencing is obtained from a LINCS (The Library of Cellular characteristics based on networks) database.
Optionally, after acquiring the whole genome expression data, performing data preprocessing on the whole genome expression data by steps of reducing batch effect, selecting a credible representative and the like. For example, by reducing the error due to the batch effect, the data structure adopted by the LINCS plan is divided into 4 levels (Level) of data, wherein the Level 1 of data records the fluorescence values of 978 marker genes, the Level 2 of data records the expression levels of 978 marker genes after normalization, the Level 3 of data records the gene expression levels of the whole genome, and the Level 4 of data records the Z value (expression Level capable of representing a gene) of the whole genome expression levels after the batch effect is removed. Therefore, the Level 4 data processed by the Z value is selected as the initial expression profile data. For example, it is also possible to express the trusted representation of the data by choosing it. In order to eliminate accidental errors brought by expression profile measurement as much as possible, pearson Correlation Coefficien (Pearson Correlation Coefficien) among repeated test expression profiles is calculated aiming at repeated tests of a certain silent gene under specific conditions (cell lines, concentration and measurement time). If the replicate test has only one sample, selecting this sample as representative of the replicate test; if the repeated test has more than two samples and the maximum value of the correlation coefficient is more than 0.2 (namely more than weak correlation), finding out two samples with the maximum value of the correlation coefficient, and taking the average Z value as a credible representative of the repeated test; if the trial repeat has more than two samples but the maximum correlation coefficient is less than 0.2 (i.e., below the weak correlation), the first of the two samples with the maximum correlation coefficient is found as a reliable representative of the trial repeat.
And S102, acquiring a gene expression rank sequence corresponding to the whole genome expression data.
In practical use, whole genome expression data can be converted into gene expression rank sequences to create highly reliable gene-silenced cell response datasets (i.e., gene expression rank sequences under gene silencing).
And step S103, acquiring whole genome expression profile data after infection of a plurality of different pathogens.
As one embodiment. Whole genome Expression profile data after infection with various pathogens of different origins were obtained from the GEO (Gene Expression Omnibus) database. By fusing massive and various cross-platform transcriptome big data, the cost can be reduced, the period can be shortened, and the expansion can be realized. Therefore, cell response data from different laboratory sources are integrated, and errors are reduced.
Step S104, constructing the imprinted gene set infected by the pathogen based on the whole genome expression profile data.
As an embodiment, step S104 includes: obtaining differential expression quantity corresponding to the whole genome expression profile data; arranging genes corresponding to the whole genome expression profile data from top to bottom according to the differential expression quantity from high to low to obtain a rank sequence; obtaining a plurality of genes from the top and bottom of the sequence; using the plurality of genes as a set of imprinted genes for infection by the pathogen.
In practical use, the fold difference value of the expression level of each gene under experimental conditions compared with the expression level under control conditions can be determined. If the expression of the gene under two different conditions is not different, the ratio of the two is 1; if the ratio is more than 1, the gene is up-regulated under the experimental condition (pathogen infection); if the ratio is less than 1, it indicates that the gene is down-regulated under the experimental conditions (pathogen infection). The more the ratio deviates from 1, the more significant the differential expression. Then, the genes are arranged from top to bottom according to the differential expression amount from high to low, and a rank sequence is obtained. Several genes (with the greatest differential expression, e.g., top-ranked 250 genes and bottom-ranked 250 genes) at the top and bottom of the rank sequence were then selected as the imprinted gene set for pathogen infection.
And S105, acquiring the gene expression rank sequence and the enrichment fraction of the imprinted gene set.
In practical use, the Enrichment score of the Gene expression rank sequence and the imprinted Gene Set can be calculated based on a Gene Set Enrichment Analysis algorithm (GSEA). If the enrichment score is positive, it indicates that the cellular response to silencing of the gene is consistent with the cellular response to infection with the pathogen, and the gene may be a key host gene underlying infection with the pathogen, and thus the mechanism of damage by infection with the pathogen may be resolved.
And S106, determining the damage mechanism of the pathogen infection according to the enrichment fraction.
As an embodiment, step S106 includes: determining whether the enrichment fraction is positive; if so, determining from the enrichment fraction that the cellular response to gene silencing is consistent with the cellular response to the pathogen infection injury.
Optionally, said determining whether said enrichment fraction is positive comprises: determining whether a subset of target genes in the set of imprinted genes is located in a corresponding position of the sequence of gene expression ranks under the gene silencing; if so, determining the enrichment fraction as a positive number.
Wherein the subset of target genes may be an up-regulated set of genes or a down-regulated set of genes in the imprinted set of genes.
The corresponding position refers to the top of the sequence of the gene expression rank under gene silencing when the target gene subset is an up-regulated gene set. When the target gene subset is a down-regulated gene set, the gene under gene silencing expresses the bottom of the rank sequence.
For example, an up-regulated gene set (down-regulated gene set) in a pathogen-infected imprinted gene set tends to be at the top (bottom) of the rank sequence of gene expression under gene silencing, and the enrichment score is positive; the enrichment score is negative if the up-regulated gene set (down-regulated gene set) in the pathogen-infected imprinted gene set tends to be at the bottom (top) of the gene expression rank sequence under gene silencing. An enrichment score of positive significance indicates that the effect of the gene silencing is similar to that of a pathogen infection, and that the cellular response of the gene silencing is consistent with that of the pathogen infection, and that the gene is potentially a key host gene for the pathogen infection. The gene set is detected by gene set enrichment analysis instead of expression change of a single gene, so that the technical problem that some important paths may be missed due to analysis of the single gene in the prior art can be effectively solved. The anti-noise capability is realized, and the more ideal result is obtained.
As another embodiment, step S106 includes: constructing a network of associations between genes used for said gene silencing and pathogens used for said pathogen infection, weighted by said enrichment score; obtaining all incidence relations with the highest enrichment scores in the incidence networks, wherein the incidence relations are used for representing the relations between the genes and the pathogens used by the pathogen infection; and determining the damage mechanism of the pathogen infection according to the association relation.
Optionally, constructing a network of associations between genes used for said gene silencing and pathogens used for said pathogen infection, weighted by said enrichment score, comprising: constructing a correlation network based on an enrichment score between each gene expression rank sequence and the imprinted gene set infected by the pathogen as a weight. Namely, a connecting edge can be established between each gene and the pathogen, and the weight of the connecting edge is the enrichment fraction.
Alternatively, the association of the enrichment scores is a credibility association, for example, the enrichment score between the GALC gene and the pathogen Cryptosporidium parvum is the highest, and the GALC gene may be a key host gene of Cryptosporidium parvum.
Optionally, after constructing a correlation network between the gene used for gene silencing and the pathogen used for pathogen infection with the weight of the enrichment score, further comprising: the damage mechanism of a pathogen infection is determined by analysis of a community of pathogens. For example, 59 pathogens in the association network are divided into 8 communities, which are respectively marked as 1, 2, … and 8, by using a hierarchical clustering algorithm. And then carrying out enrichment analysis on the imprinted genes of each pathogen community by using databases such as GO, KEGG and the like. For example, community 7 is a typical community of mixed classes of pathogens that contains both vaccinia virus, vaccinia virus and monkeypox virus, as well as west nile virus and bacillus anthracis. Why do these surfaces appear to be less relevant as to why do viruses and bacteria converge into one class? The KEGG channel enrichment result shows that the imprinting genes of pathogens of the community are enriched in infection channels such as herpes, papules and the like, which are consistent with phenotypes of skin herpes and papules caused by poxviridae, bacillus anthracis and west nile virus, and on the other hand, signal channels such as Th17 cell differentiation channel, IL-17 signal channel, NF-kB signal channel and the like related to inflammatory reaction caused by bacterial infection are also enriched by the imprinting genes of the community, which indicates that an antibacterial target can be used as an antiviral target of host targeting at the same time.
Next, genes were divided into 69 communities, denoted as 1, 2, …, 69, respectively, using spectral clustering algorithm. Global analysis was performed on 8 pathogen communities and 69 gene communities. For example, the gene community 40 is most closely related to the pathogen community 5, and they are collectively enriched in biological processes such as RIG-I-like receptor signaling pathways and RNA metabolism. And pathogen community 5 is closely related to other gene communities, and the community contains three subtypes of influenza A virus such as H1N1, sendai virus, respiratory syncytial virus, haemophilus ducreyi and the like, which suggests that the community has wide pathogen host factors and has a large space in the aspect of host target selection.
Optionally, the pathogen is an unknown pathogen.
According to the pathogen infection damage mechanism analysis method provided by the invention, whole genome expression data after a plurality of genes are silenced is obtained; obtaining a gene expression rank sequence corresponding to the whole genome expression data; acquiring whole genome expression profile data after infection by a plurality of different pathogens; constructing a set of imprinted genes for the pathogen infection based on the whole genome expression profile data; acquiring the enrichment fractions of the gene expression rank sequence and the imprinted gene set; and determining the damage mechanism of pathogen infection according to the enrichment fraction, thereby fusing massive and various cross-platform transcriptome big data, not needing to culture the pathogen from the head and infect cells, carrying out large-scale experiments, further reducing the research and development cost and shortening the detection period. And experimental errors are reduced, so that the analysis of the damage mechanism of pathogen infection is more accurate.
Second embodiment
Fig. 2 shows a pathogen infection damage mechanism analysis device in one-to-one correspondence with the imaging method in the first embodiment, using the pathogen infection damage mechanism analysis method shown in the first embodiment. As shown in fig. 2, the pathogen infection damage mechanism analysis device 400 includes a first acquisition unit 410, a first processing unit 420, a second acquisition unit 430, a second processing unit 440, a third acquisition unit 450, and a fourth processing unit 460. The implementation functions of the first obtaining unit 410, the first processing unit 420, the second obtaining unit 430, the second processing unit 440, the third obtaining unit 450, and the fourth processing unit 460 correspond to the corresponding steps in the first embodiment one to one, and for avoiding redundancy, detailed descriptions are not needed in this embodiment.
A first obtaining unit 410, configured to obtain whole genome expression data after silencing a plurality of genes.
Optionally, the first obtaining unit 410 is further configured to: and acquiring whole genome expression data after a plurality of genes are silenced from a LINCS database.
The first processing unit 420 is configured to obtain a gene expression rank sequence corresponding to the whole genome expression data.
A second obtaining unit 430, configured to obtain whole genome expression profile data after infection by a plurality of different pathogens.
Optionally, the second processing unit 430 is further configured to: obtaining differential expression quantity corresponding to the whole genome expression profile data; arranging genes corresponding to the whole genome expression profile data from top to bottom according to the differential expression quantity from high to low to obtain a rank sequence; obtaining a plurality of genes from the top and bottom of the sequence; using the plurality of genes as a set of imprinted genes for infection by the pathogen.
A second processing unit 440 for constructing a set of imprinted genes for the pathogen infection based on the whole genome expression profile data.
A third obtaining unit 450, configured to obtain the gene expression rank sequence and the enrichment fraction of the imprinted gene set.
A fourth processing unit 460 for determining a damage mechanism of said pathogen infection based on said enrichment fraction.
Optionally, the fourth processing unit 460 is further configured to: determining whether the enrichment fraction is positive; if so, determining from the enrichment fraction that the cellular response to gene silencing is consistent with the cellular response to the pathogen infection injury.
Optionally, said determining whether said enrichment fraction is positive comprises: determining whether a subset of target genes in the imprinted gene set are located at corresponding positions of the gene expression rank sequence under the gene silencing; if so, determining the enrichment fraction as a positive number.
Optionally, the fourth processing unit 460 is further configured to: constructing a network of associations between genes used for said gene silencing and pathogens used for said pathogen infection, weighted by said enrichment score; obtaining all incidence relations with the highest enrichment scores in the incidence networks, wherein the incidence relations are used for representing the relations between the genes and the pathogens used by the pathogen infection; and determining the damage mechanism of the pathogen infection according to the association relation.
Third embodiment
As shown in fig. 3, is a schematic diagram of an electronic device 300. The electronic device 300 includes a memory 302, a processor 304, and a computer program 303 stored in the memory 302 and capable of running on the processor 304, wherein when being executed by the processor 304, the computer program 303 implements the pathogen infection damage mechanism analysis method in the first embodiment, and is not described herein again to avoid repetition. Alternatively, the computer program 303 is executed by the processor 304 to implement the functions of the modules/units in the pathogen infection damage mechanism analysis device according to the second embodiment, and for avoiding repetition, the details are not described here again.
Illustratively, the computer program 303 may be partitioned into one or more modules/units, which are stored in the memory 302 and executed by the processor 304 to implement the present invention. One or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 303 in the electronic device 300. For example, the computer program 303 may be divided into the first obtaining unit 410, the first processing unit 420, the second obtaining unit 430, the second processing unit 440, the third obtaining unit 450, and the fourth processing unit 460 in the second embodiment, and specific functions of each unit are as described in the first embodiment or the second embodiment, which are not described herein again.
The electronic device 300 may be a desktop computer, a notebook, a palmtop, or a smart phone.
The Memory 302 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like. The memory 302 is used for storing a program, and the processor 304 executes the program after receiving an execution instruction, and the method defined by the flow disclosed in any of the foregoing embodiments of the present invention may be applied to the processor 304, or implemented by the processor 304.
The processor 304 may be an integrated circuit chip having signal processing capabilities. The Processor 304 may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field-Programmable Gate arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
It is understood that the configuration shown in fig. 3 is only a schematic configuration of the electronic device 300, and the electronic device 300 may further include more or less components than those shown in fig. 3. The components shown in fig. 3 may be implemented in hardware, software, or a combination thereof.
Fourth embodiment
An embodiment of the present invention further provides a storage medium, where instructions are stored in the storage medium, and when the instructions are run on a computer, when the computer program is executed by a processor, the pathogen infection damage mechanism analysis method in the first embodiment is implemented, and details are not repeated here in order to avoid repetition. Alternatively, the computer program is executed by a processor to implement the functions of the modules/units in the pathogen infection damage mechanism analysis device according to the second embodiment, and is not described herein again to avoid repetition.
From the above description of the embodiments, it is obvious for those skilled in the art that the present invention may be implemented by hardware or by software plus a necessary general hardware platform, and based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, or the like), and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device, or the like) to execute the method of the various implementation scenarios in the present invention.
In summary, according to the pathogen infection damage mechanism analysis method and device provided by the present invention, whole genome expression data after a plurality of genes are silenced is obtained; obtaining a gene expression rank sequence corresponding to the whole genome expression data; acquiring whole genome expression profile data after infection by a plurality of different pathogens; constructing a set of imprinted genes for infection by the pathogen based on the whole genome expression profile data; acquiring the enrichment fractions of the gene expression rank sequence and the imprinted gene set; and determining the damage mechanism of pathogen infection according to the enrichment fraction, thereby fusing massive and various cross-platform transcriptome big data, not needing to culture the pathogen from the head and infect cells, carrying out large-scale experiments, further reducing the research and development cost and shortening the detection period. And experimental errors are reduced, so that the analysis of the damage mechanism of pathogen infection is more accurate.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

Claims (4)

1. A method for analyzing a pathogen infection damage mechanism, comprising:
acquiring whole genome expression data after a plurality of genes are silenced;
obtaining a gene expression rank sequence corresponding to the whole genome expression data;
acquiring whole genome expression profile data of a plurality of different pathogens after infection from a database;
constructing a set of imprinted genes for infection by the pathogen based on the whole genome expression profile data;
acquiring the enrichment fractions of the gene expression rank sequence and the imprinted gene set;
determining a damage mechanism of said pathogen infection based on said enrichment fraction;
wherein said constructing a set of imprinted genes for infection by said pathogen based on said whole genome expression profile data comprises:
obtaining differential expression quantity corresponding to the whole genome expression profile data;
arranging genes corresponding to the whole genome expression profile data from top to bottom according to the differential expression quantity from high to low to obtain a rank sequence;
obtaining a plurality of genes from the top and bottom of the rank sequence;
(ii) treating said plurality of genes as a set of imprinted genes for infection by said pathogen;
and, said determining the damage mechanism of said pathogen infection from said enrichment score comprises:
determining whether the enrichment fraction is positive;
if so, determining from the enrichment fraction that the cellular response to gene silencing is consistent with the cellular response to the pathogen infection injury;
wherein determining whether the enrichment fraction is positive comprises:
determining whether a subset of target genes in the imprinted gene set are located at corresponding positions of the gene expression rank sequence under the gene silencing;
if yes, determining the enrichment fraction as a positive number;
alternatively, said determining the damage mechanism of said pathogen infection from said enrichment score comprises:
constructing a network of associations between genes used for said gene silencing and pathogens used for said pathogen infection, weighted by said enrichment score;
obtaining all incidence relations with the highest enrichment scores in the incidence networks, wherein the incidence relations are used for representing the relations between the genes and the pathogens used by the pathogen infection;
and determining the damage mechanism of the pathogen infection according to the association relation.
2. The method of claim 1, wherein obtaining whole genome expression data after silencing the plurality of genes comprises:
and acquiring whole genome expression data after a plurality of genes are silenced from a LINCS database.
3. A pathogen-infected damage mechanism analysis device is characterized by comprising:
the first acquisition unit is used for acquiring whole genome expression data after a plurality of genes are silenced;
the first processing unit is used for acquiring a gene expression rank sequence corresponding to the whole genome expression data;
the second acquisition unit is used for acquiring whole genome expression profile data after infection of a plurality of different pathogens from the database;
a second processing unit for constructing a set of imprinted genes for the pathogen infection based on the whole genome expression profile data;
a third obtaining unit, configured to obtain the gene expression rank sequence and an enrichment fraction of the imprinted gene set;
a fourth processing unit for determining the damage mechanism of said pathogen infection based on said enrichment fraction;
the second processing unit is also used for acquiring the differential expression quantity corresponding to the whole genome expression profile data; arranging genes corresponding to the whole genome expression profile data from top to bottom according to the differential expression quantity from high to low to obtain a rank sequence; obtaining a plurality of genes from the top and bottom of the rank sequence; (ii) treating said plurality of genes as a set of imprinted genes for infection by said pathogen;
said fourth processing unit is further for determining whether said enrichment fraction is a positive number; if so, determining from the enrichment score that the cellular response to gene silencing is consistent with the cellular response to the pathogen infection injury;
the fourth processing unit is specifically configured to determine whether a subset of target genes in the imprinted gene set is located at a position corresponding to the gene expression rank sequence under gene silencing; if yes, determining the enrichment fraction as a positive number;
the fourth processing unit is further configured to construct a correlation network between genes used for the gene silencing and pathogens used for the pathogen infection, weighted by the enrichment score; obtaining all incidence relations with the highest enrichment scores in the incidence networks, wherein the incidence relations are used for representing the relations between the genes and the pathogens used by the pathogen infection; and determining the damage mechanism of the pathogen infection according to the association relationship.
4. The apparatus of claim 3, wherein the first obtaining unit is further configured to:
and acquiring whole genome expression data after a plurality of genes are silenced from a LINCS database.
CN201811521645.4A 2018-12-12 2018-12-12 Pathogen infection damage mechanism analysis method and device Active CN109671467B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811521645.4A CN109671467B (en) 2018-12-12 2018-12-12 Pathogen infection damage mechanism analysis method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811521645.4A CN109671467B (en) 2018-12-12 2018-12-12 Pathogen infection damage mechanism analysis method and device

Publications (2)

Publication Number Publication Date
CN109671467A CN109671467A (en) 2019-04-23
CN109671467B true CN109671467B (en) 2023-03-24

Family

ID=66144324

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811521645.4A Active CN109671467B (en) 2018-12-12 2018-12-12 Pathogen infection damage mechanism analysis method and device

Country Status (1)

Country Link
CN (1) CN109671467B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104212890A (en) * 2010-02-24 2014-12-17 布罗德研究所有限公司 Methods of diagnosing infectious disease pathogens and their drug sensitivity
CN105441585A (en) * 2014-09-29 2016-03-30 天津华大基因科技有限公司 Kit and application thereof in genital tract pathogen proliferation testing
CN105582526A (en) * 2016-02-25 2016-05-18 上海市公共卫生临床中心 Application of trefoil factor 2 in preparation of medicine for treating and preventing lung/bronchial acute inflammation diseases

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040038201A1 (en) * 2002-01-22 2004-02-26 Whitehead Institute For Biomedical Research Diagnostic and therapeutic applications for biomarkers of infection
US20070203083A1 (en) * 2003-06-13 2007-08-30 Mootha Vamsi K Methods Of Regulating Metabolism And Mitochondrial Function
US8748103B2 (en) * 2008-11-07 2014-06-10 Sequenta, Inc. Monitoring health and disease status using clonotype profiles
WO2014145631A1 (en) * 2013-03-15 2014-09-18 The Broad Institute, Inc. Dendritic cell response gene expression, compositions of matters and methods of use thereof
CN104032016B (en) * 2014-06-12 2016-03-30 山东农业大学 A kind of chicken intestinal diorder Salmonella infection is correlated with the detection method of microRNA
CN108664769B (en) * 2017-03-31 2021-09-21 中国科学院上海营养与健康研究所 Drug relocation method based on cancer genome and non-specific gene tag
CN107397748B (en) * 2017-08-09 2018-10-02 黄娇英 A kind of toad cake extract and preparation method thereof with anti-infectious function
CN108038352B (en) * 2017-12-15 2021-09-14 西安电子科技大学 Method for mining whole genome key genes by combining differential analysis and association rules
CN108830045B (en) * 2018-06-29 2021-04-20 深圳先进技术研究院 Biomarker system screening method based on multiomics

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104212890A (en) * 2010-02-24 2014-12-17 布罗德研究所有限公司 Methods of diagnosing infectious disease pathogens and their drug sensitivity
CN105441585A (en) * 2014-09-29 2016-03-30 天津华大基因科技有限公司 Kit and application thereof in genital tract pathogen proliferation testing
CN105582526A (en) * 2016-02-25 2016-05-18 上海市公共卫生临床中心 Application of trefoil factor 2 in preparation of medicine for treating and preventing lung/bronchial acute inflammation diseases

Also Published As

Publication number Publication date
CN109671467A (en) 2019-04-23

Similar Documents

Publication Publication Date Title
Xu et al. Genotype-free demultiplexing of pooled single-cell RNA-seq
Linard et al. Rapid alignment-free phylogenetic identification of metagenomic sequences
Hao et al. Limited agreement of independent RNAi screens for virus-required host genes owes more to false-negative than false-positive factors
Miller et al. Improving reliability and absolute quantification of human brain microarray data by filtering and scaling probes using RNA-Seq
Jia et al. A new exhaustive method and strategy for finding motifs in ChIP-enriched regions
Chain et al. Error, reproducibility and sensitivity: a pipeline for data processing of Agilent oligonucleotide expression arrays
Zhang Strictly standardized mean difference, standardized mean difference and classical t-test for the comparison of two groups
Achlioptas et al. Two-locus association mapping in subquadratic time
Wu et al. Joint analysis of GWAS and multi-omics QTL summary statistics reveals a large fraction of GWAS signals shared with molecular phenotypes
Atias et al. Large-scale analysis of Arabidopsis transcription reveals a basal co-regulation network
Zhao et al. Pitfalls of genotyping microbial communities with rapidly growing genome collections
Shen et al. A novel algorithm for detecting multiple covariance and clustering of biological sequences
Liu et al. puma 3.0: improved uncertainty propagation methods for gene and transcript expression analysis
Jurburg et al. The community ecology perspective of omics data
Van den Berge et al. Normalization benchmark of ATAC-seq datasets shows the importance of accounting for GC-content effects
CN109671467B (en) Pathogen infection damage mechanism analysis method and device
Molinari et al. Transcriptome analysis using RNA-Seq fromexperiments with and without biological replicates: areview
Dodson et al. Genetic sequence matching using D4M big data approaches
Andrews et al. Modelling dropouts for feature selection in scRNASeq experiments
Guo et al. DAM: A Bayesian method for detecting genome-wide associations on multiple diseases
Muzio et al. networkGWAS: a network-based approach to discover genetic associations
Liu et al. A hierarchical Bayesian model for single-cell clustering using RNA-sequencing data
Waddell et al. Measuring fit of sequence data to phylogenetic model: gain of power using marginal tests
Peris et al. Normalized global alignment for protein sequences
GUDODAGI et al. Customized Computational Environment for Investigations and Compression of Genomic Data.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant