CN116312796B - Metagenome abundance estimation method and system based on expectation maximization algorithm - Google Patents
Metagenome abundance estimation method and system based on expectation maximization algorithm Download PDFInfo
- Publication number
- CN116312796B CN116312796B CN202310103910.1A CN202310103910A CN116312796B CN 116312796 B CN116312796 B CN 116312796B CN 202310103910 A CN202310103910 A CN 202310103910A CN 116312796 B CN116312796 B CN 116312796B
- Authority
- CN
- China
- Prior art keywords
- species
- metagenome
- abundance
- reference genome
- sequencing data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 33
- 238000004422 calculation algorithm Methods 0.000 title claims abstract description 19
- 238000012163 sequencing technique Methods 0.000 claims abstract description 60
- 238000013179 statistical model Methods 0.000 claims abstract description 15
- 238000004590 computer program Methods 0.000 claims description 6
- 208000015181 infectious disease Diseases 0.000 claims description 3
- 230000002458 infectious effect Effects 0.000 claims description 3
- 238000011156 evaluation Methods 0.000 claims description 2
- 238000004458 analytical method Methods 0.000 abstract description 3
- 230000035945 sensitivity Effects 0.000 abstract description 3
- 241000894007 species Species 0.000 description 63
- 230000000694 effects Effects 0.000 description 5
- 238000010219 correlation analysis Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 241000203069 Archaea Species 0.000 description 1
- 241000894006 Bacteria Species 0.000 description 1
- 241001386813 Kraken Species 0.000 description 1
- 240000005893 Pteridium aquilinum Species 0.000 description 1
- 235000009936 Pteridium aquilinum Nutrition 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 238000000205 computational method Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 244000052769 pathogen Species 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/10—Sequence alignment; Homology search
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Abstract
The application belongs to the technical field of belief analysis, and particularly relates to a metagenome abundance estimation method based on an expectation maximization algorithm (EM), which comprises the steps of introducing the influence of species reference genome similarity comparison on information of a reference genome unique comparison position based on comparison information of metagenome sequencing data, and constructing the probability of occurrence of unique comparison and multiple comparison observed by a statistical model depiction; and (3) adopting an EM to solve the constructed statistical model to estimate the species abundance in the metagenome. The application quantifies the influence of the similarity of the species reference genome, so the accuracy of Gao Hong genome abundance estimation can be improved on the species level, and the sensitivity and the specificity of metagenome species identification can be improved.
Description
Technical Field
The application belongs to the technical field of bioinformatics, and particularly relates to a metagenome abundance estimation method and system based on a expectation maximization algorithm.
Technical Field
Metagenomic sequencing (mNGS) technology has broad application prospects in clinical microbiology, and a variety of computational methods based on metagenomic sequencing data have been developed for rapid identification of pathogens in clinical samples. Wherein, the metagenome classification algorithm centrafuge realizes rapid classification of metagenome sequencing data based on BWT (Burrows-Wheeler transform) and FM (ferrocina-Manzini) indexes and uses a smaller index space. Moreover, centrafuge uses the EM (estimation-Maximization) algorithm to estimate species abundance in metagenomic sequencing data. In addition, the centrafuge has wider application scenes, and can analyze not only short-reading long sequences but also long-reading long sequences. Although centrafuge has better performance in species identification, it is less effective in identifying low abundance species. Compared with a mapping-based method, the method has the advantages that the read matching precision of the centering is low, the estimated value of the abundance is influenced, and the false positive rate is high.
Based on the metagenome sequencing data comparison result, the metagenome classifier mainly adopts 2 strategies to distribute metagenome sequencing data with multiple comparison results. The first strategy directly assigns according to the number of species unique comparison reads: when the read i multiple aligned to species A and species B, if the unique aligned reads of species A are greater in number than species B, then the read i will be assigned to species A with a greater probability or directly to species A. The second strategy is to construct a probability model to characterize the metagenome sequencing data comparison result, and the sequence abundance or species abundance of each species is estimated by solving the probability model, and the representative algorithms are centrafuge and braicken. The second strategy can give a probabilistic interpretation of the allocation results compared to the first strategy. However, the representative algorithms Centrifuge and braicken as the second strategy are both deficient in probabilistic model construction. The shortcoming of the probability model constructed by the Centrifuge algorithm is that: (1) The unique alignment and the multiple alignment are not subjected to differentiation treatment on sequencing data of different species; (2) The effect of genomic similarity on unique alignment sequencing data was not characterized. The probability model constructed by the Bracken algorithm has the following defects: (1) distributing the comparison result of only Kraken; (2) model solving is not completely based on observation samples; (3) The effect of genomic similarity on unique alignment sequencing data was not directly characterized.
In view of this, the present application has been proposed.
Disclosure of Invention
In order to solve the technical problems, the application introduces the proportion of the unique comparison position of the reference genome to quantify the influence of the similarity comparison information of the species reference genome based on the comparison information of the metagenome sequencing data, and constructs the probability of occurrence of the unique comparison and multiple comparison observed by the statistical model; calculating the proportion of unique comparison positions of corresponding reference genomes based on the comparison information of the metagenome sequencing data; the statistical model constructed is solved using an Expectation-Maximization (EM) algorithm, species abundance in the metagenome is estimated, and a reference genome length is introduced at an Expectation step (E-step).
Therefore, the core objective of the application is to provide a metagenomic abundance assessment method and system.
In order to achieve the above purpose, the present application proposes the following technical scheme:
the application firstly provides a metagenome abundance assessment method, which comprises the following steps:
1) Obtaining a unique comparison position proportion of a reference genome;
2) Obtaining the occurrence probability of single comparison and multiple comparison of sequencing data;
3) Metagenomic species abundance was assessed using a expectation maximization algorithm.
Further, the step 1) is obtained as follows: based on the comparison information of the metagenome sequencing data, counting the number of unique comparison sequencing sequences and the number of multiple comparison sequences on the reference genome, and calculating the ratio of the number of sequencing sequences uniquely compared to the reference genome to the number of sequencing sequences all compared to the reference genome.
Further, the obtaining in step 2) is: based on the reference genome unique alignment position proportion, the influence of species reference genome similarity alignment information is quantified, and a statistical model is constructed to characterize the occurrence probability of the observed unique alignment and multiple alignments.
Further, the statistical model is:
wherein,
r is the number of metagenomic sequencing data,
s is the number of species in the metagenome,
and->The abundance of species j and k, respectively, the parameter to be estimated,
l j and l k The average length of the genomes of species j and k respectively,
C ij for the probability of comparing the sequencing data i to species j, when the sequencing data i is compared only to species j, the probability is equal to P j ,P j The ratio of unique alignment positions on reference genome j; when sequencing data i is multiple aligned to species j, the probability is equal to 1-P j The method comprises the steps of carrying out a first treatment on the surface of the When sequencing data i is not aligned to species j, the probability is equal to 0.
Further, the step 3) specifically includes:
and (3) solving the statistical model constructed in the step (2) by adopting an expectation maximization algorithm, and estimating the abundance of the species in the metagenome.
Further, the solving specifically includes: introducing a reference genome length in the desired step for calculating the number n of sequencing data from species j j The formula is:
based on n again j Updating abundance of species jThe formula is as follows:
further, the metagenome in the above method is an infectious metagenome.
The application also provides a metagenome abundance estimation system, which comprises the following components:
assembly 1): a reference genome unique alignment position ratio calculation component;
assembly 2): the sequencing data unique comparison and multiple comparison occurrence probability statistics component;
assembly 3): metagenomic species abundance assessment component.
Further, the obtaining of the component 1) is as follows: based on the comparison information of the metagenome sequencing data, counting the number of unique comparison sequencing sequences and the number of multiple comparison sequences on the reference genome, and calculating the ratio of the number of sequencing sequences uniquely compared to the reference genome to the number of sequencing sequences all compared to the reference genome.
Further, the obtaining of the component 2) is as follows: based on the reference genome unique alignment position proportion, the influence of species reference genome similarity alignment information is quantified, and a statistical model is constructed to characterize the occurrence probability of the observed unique alignment and multiple alignments.
Further, the statistical model is:
wherein,
r is the number of metagenomic sequencing data,
s is the number of species in the metagenome,
and->The abundance of species j and k, respectively, the parameter to be estimated,
l j and l k The average length of the genomes of species j and k respectively,
C ij for the probability of comparing the sequencing data i to species j, when the sequencing data i is compared only to species j, the probability is equal to P j ,P j The ratio of unique alignment positions on reference genome j; when sequencing data i is multiple aligned to species j, the probability is equal to 1-P j The method comprises the steps of carrying out a first treatment on the surface of the When sequencing data i is not aligned to species j, the probability is equal to 0.
Further, the assembly 3) specifically includes:
and (3) estimating the abundance of the species in the metagenome by adopting a statistical model constructed by the expectation maximization algorithm solving component 2).
Further, the solving specifically includes: introducing a reference genome length in the desired step for calculating the number n of sequencing data from species j j The formula is:
based on n again j Updating abundance of species jThe formula is as follows:
further, the metagenome in the above system is an infectious metagenome.
The present application also provides an electronic device including: a processor and a memory; the processor is connected to a memory, wherein the memory is configured to store a computer program, and the processor is configured to invoke the computer program to perform the method according to any of the preceding claims.
The present application also provides a computer storage medium storing a computer program comprising program instructions which, when executed by a processor, perform a method as claimed in any one of the preceding claims.
The application has the beneficial technical effects that:
the application quantifies the influence of species reference genome similarity on comparison information, and differentially processes the sequencing data of unique comparison and multiple comparison for different species, thereby improving the accuracy of Gao Hong genome abundance estimation on the species level and providing necessary technical support for improving the sensitivity and specificity of metagenome species identification.
Drawings
FIG. 1, a metagenomic abundance estimation flow chart based on a expectation maximization algorithm of the present application;
FIG. 2, correlation analysis of abundance estimates and true values for different methods;
FIG. 3, correlation analysis of abundance estimates and true values for different methods (5% outlier removal).
Detailed Description
Embodiments of the present application will be described in detail below with reference to examples, but it will be understood by those skilled in the art that the following examples are only for illustrating the present application and should not be construed as limiting the scope of the present application. The specific conditions are not noted in the examples and are carried out according to conventional conditions or conditions recommended by the manufacturer. The reagents or apparatus used were conventional products commercially available without the manufacturer's attention.
Some definitions of terms unless defined otherwise below, all technical and scientific terms used in the detailed description of the application are intended to have the same meaning as commonly understood by one of ordinary skill in the art. While the following terms are believed to be well understood by those skilled in the art, the following definitions are set forth to better explain the present application.
The term "about" in the present application means a range of accuracy that one skilled in the art can understand while still guaranteeing the technical effect of the features in question. The term generally means a deviation of + -10%, preferably + -5%, from the indicated value.
As used herein, the terms "comprising," "including," "having," "containing," or "involving" are inclusive or open-ended and do not exclude additional unrecited elements or method steps. The term "consisting of …" is considered to be a preferred embodiment of the term "comprising". If a certain group is defined below to contain at least a certain number of embodiments, this should also be understood to disclose a group that preferably consists of only these embodiments.
Furthermore, the terms first, second, third, (a), (b), (c), and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and that the embodiments of the application described herein are capable of operation in other sequences than described or illustrated herein.
The application is illustrated below in connection with specific embodiments.
Example 1 estimation method establishment
As shown in fig. 1, the metagenome abundance estimation method based on the expectation maximization algorithm provided by the application comprises the following steps:
s1, sequencing data comparison: and selecting proper alignment tools and an alignment database, performing alignment analysis on the metagenome sequencing data, and outputting alignment information of each piece of sequencing data. The comparison information at least needs to contain the identification information of the reference genome on the comparison, and distinguish between the unique comparison and the multiple comparison. For multiple aligned sequencing data, reference genome identification information on all alignments should be included.
S2, calculating the proportion of unique comparison positions of the reference genome: according to the comparison information of the metagenome sequencing data, the number of the unique comparison sequencing sequences and the number of multiple comparison sequences on the reference genome j (j=1, 2,..n, n is the number of the reference genome) are counted, and then the ratio of the unique comparison positions on the reference genome j, namely the ratio of the number of the sequencing sequences uniquely compared to the reference genome j to the number of the sequencing sequences uniquely compared to the reference genome j, is calculated.
S3, calculating the unique comparison and multiple comparison probability of the sequencing data:
based on the reference genome unique alignment position proportion, the influence of species reference genome similarity alignment information is quantified, and a statistical model is constructed to characterize the occurrence probability of the observed unique alignment and multiple alignments.
The statistical model is as follows:
r is the number of metagenomic sequencing data,
s is the number of species in the metagenome,
and->The abundance of species j and k, respectively, the parameter to be estimated,
l j and l k The average length of the genomes of species j and k respectively,
C ij for the probability that sequencing data i is aligned to species j, if sequencing data i is aligned to reference genome j only,the probability is P j The method comprises the steps of carrying out a first treatment on the surface of the If multiple alignments are made to reference genome j, the probability is 1-P j 。P j The ratio of positions is uniquely aligned for reference genome j (see step S2 for calculation).
S4, estimating the species abundance of the metagenome by adopting an EM algorithm: based on the alignment information of the metagenomic sequencing data, species abundance is estimated as follows:
1) Initial step (I-step): the initial value of abundance of species j is:
s: number of species in metagenome
2) Desired step (E-step):
n j : the number of sequences from species j;
C ij : sequencing the probability that sequence i is from species j, if species j is not aligned, the probability is 0; if the species j is uniquely compared or multiple compared, the probability is S3;
3) Maximizing step (M-step) updates the abundance of species j:
abundance of updated species j
The EM algorithm stops iterating if the difference between the pre-update and post-update species abundance estimates satisfies the following condition:
after estimating the abundance of a species, the estimated abundance of species j can be converted to the sequence abundance of that species by the following formula:
example 2 evaluation of Effect
This example compares the merits of the methods of the present application with conventional methods in terms of metagenomic abundance estimation.
1) Generating simulated metagenome data: the present example is based on reference genomes of 4078 bacteria and 200 archaea, using a simulator Mason to generate 2000 ten thousand sequences of 100bp in length, from which 10000 sequences were randomly extracted for abundance estimation.
2) Analysis of simulated metagenomic data: the above simulated data were subjected to abundance estimation using centrifuge, which has been more widely used, and the metagenomic abundance estimation method of the present application.
3) The correlation analysis of the abundance estimate values of the two methods with the true abundance values shows (fig. 2) that the Pearson correlation coefficient of the centrafuge is only 0.26, whereas the Pearson correlation coefficient of the metagenomic abundance estimation method of the present application is 0.6. If 5% of the outlier estimates were removed from both methods, the Pearson correlation of centrafuge was also only increased to 0.46, whereas the Pearson correlation of the metagenomic abundance estimation method of the present application was increased to 0.95 (fig. 3).
As can be seen from example 2, compared with the more widely used centrafuge, the metagenomic abundance estimation method provided by the application can obviously improve the accuracy of metagenomic abundance estimation, thereby being beneficial to improving the sensitivity and specificity of metagenomic species identification.
Finally, it should be noted that the above embodiments are merely illustrative of the technical solution of the present application, and not limiting thereof; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the application.
Claims (5)
1. A method for assessing metagenomic abundance, comprising the steps of:
1) Obtaining a unique comparison position proportion of a reference genome;
2) Obtaining the occurrence probability of single comparison and multiple comparison of sequencing data;
3) Assessing metagenomic species abundance using a expectation maximization algorithm;
the step 2) is obtained by: quantifying the influence of species reference genome similarity comparison information based on the reference genome unique comparison position proportion, and constructing a statistical model to characterize the occurrence probability of the observed unique comparison and multiple comparison;
the statistical model is as follows:
wherein,
r is the number of metagenomic sequencing data,
s is the number of species in the metagenome,
and->The abundance of species j and k, respectively, the parameter to be estimated,
l j and l k The average length of the genomes of species j and k respectively,
C ij for the probability of comparing the sequencing data i to species j, when the sequencing data i is compared only to species j, the probability is equal to P j ,P j For uniquely aligning positions on reference genome jProportion of the components; when sequencing data i is multiple aligned to species j, the probability is equal to 1-P j The method comprises the steps of carrying out a first treatment on the surface of the When sequencing data i is not aligned to species j, its probability is equal to 0;
the evaluation of step 3) is as follows:
solving the statistical model constructed in the step 2) by adopting an expectation maximization algorithm, and estimating species abundance in a metagenome;
the solving is as follows: introducing a reference genome length in the desired step for calculating the number n of sequencing data from species j j The formula is:
based on n again j Updating abundance of species jThe formula is as follows:
2. the method of evaluating according to claim 1, wherein,
the step 1) is obtained by: based on the alignment information of the metagenome sequencing data, the ratio of the number of sequencing sequences uniquely aligned to the reference genome to the number of sequencing sequences all aligned to the reference genome is calculated.
3. The assessment method according to any one of claims 1-2, wherein said metagenome is an infectious metagenome.
4. An electronic device, comprising: a processor and a memory; the processor is connected to a memory, wherein the memory is adapted to store a computer program, the processor being adapted to invoke the computer program to perform the method of any of claims 1-3.
5. A computer storage medium, characterized in that the computer storage medium stores a computer program comprising program instructions which, when executed by a processor, perform the method of any of claims 1-3.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211690228 | 2022-12-27 | ||
CN2022116902289 | 2022-12-27 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116312796A CN116312796A (en) | 2023-06-23 |
CN116312796B true CN116312796B (en) | 2023-11-14 |
Family
ID=86800458
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310103910.1A Active CN116312796B (en) | 2022-12-27 | 2023-02-07 | Metagenome abundance estimation method and system based on expectation maximization algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116312796B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104911194A (en) * | 2015-06-04 | 2015-09-16 | 山东农业大学 | Wheat male sterility genes WMS and application of anther specific promoter thereof |
CN105408909A (en) * | 2013-07-09 | 2016-03-16 | 莱克斯奥根有限公司 | Transcript determination method |
CN113186311A (en) * | 2021-04-27 | 2021-07-30 | 中国医学科学院北京协和医院 | Application of vaginal microorganism in differential diagnosis of chronic pelvic pain syndrome |
CN113337590A (en) * | 2021-06-03 | 2021-09-03 | 深圳华大基因股份有限公司 | Second-generation sequencing method and library construction method |
CN113337589A (en) * | 2021-05-24 | 2021-09-03 | 华南理工大学 | Method for screening genes related to synthesis of target compound and application |
CN114402084A (en) * | 2019-06-27 | 2022-04-26 | 赛福医药公司 | Developing classifiers for stratifying patients |
CN115331737A (en) * | 2022-08-11 | 2022-11-11 | 黄琨 | Method for analyzing pathogenic bacteria in intestinal flora and quantifying regional characteristics of flora |
-
2023
- 2023-02-07 CN CN202310103910.1A patent/CN116312796B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105408909A (en) * | 2013-07-09 | 2016-03-16 | 莱克斯奥根有限公司 | Transcript determination method |
CN104911194A (en) * | 2015-06-04 | 2015-09-16 | 山东农业大学 | Wheat male sterility genes WMS and application of anther specific promoter thereof |
CN114402084A (en) * | 2019-06-27 | 2022-04-26 | 赛福医药公司 | Developing classifiers for stratifying patients |
CN113186311A (en) * | 2021-04-27 | 2021-07-30 | 中国医学科学院北京协和医院 | Application of vaginal microorganism in differential diagnosis of chronic pelvic pain syndrome |
CN113337589A (en) * | 2021-05-24 | 2021-09-03 | 华南理工大学 | Method for screening genes related to synthesis of target compound and application |
CN113337590A (en) * | 2021-06-03 | 2021-09-03 | 深圳华大基因股份有限公司 | Second-generation sequencing method and library construction method |
CN115331737A (en) * | 2022-08-11 | 2022-11-11 | 黄琨 | Method for analyzing pathogenic bacteria in intestinal flora and quantifying regional characteristics of flora |
Non-Patent Citations (2)
Title |
---|
RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome;Bo Li et al.;《BMC Bioinformatics》;第1-16页 * |
应用基因表达系列分析( SAGE) 技术研究高温处理前后家蚕的基因表达差异;鲍忠赞等;《蚕业科学》;第456-467页 * |
Also Published As
Publication number | Publication date |
---|---|
CN116312796A (en) | 2023-06-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
BR112020013636A2 (en) | method to facilitate the prenatal diagnosis of a genetic disorder from a maternal sample associated with the pregnant woman, method for identifying contamination associated with at least one between preparation of sequencing library and high-throughput sequencing and method for characterization associated with at least one between sequencing library preparation and sequencing | |
CN105279397A (en) | Method for identifying key proteins in protein-protein interaction network | |
CN110111843B (en) | Method, apparatus and storage medium for clustering nucleic acid sequences | |
CN116523320B (en) | Intellectual Property Risk Intelligent Analysis Method Based on Internet Big Data | |
CN109887546B (en) | Single-gene or multi-gene copy number detection system and method based on next-generation sequencing | |
Radley et al. | Entropy sorting of single-cell RNA sequencing data reveals the inner cell mass in the human pre-implantation embryo | |
CN111226281B (en) | Method and device for determining chromosome aneuploidy and constructing classification model | |
CN107463797B (en) | Biological information analysis method and device for high-throughput sequencing, equipment and storage medium | |
KR20220073732A (en) | Method, apparatus and computer readable medium for adaptive normalization of analyte levels | |
CN116312796B (en) | Metagenome abundance estimation method and system based on expectation maximization algorithm | |
WO2023124779A1 (en) | Third-generation sequencing data analysis method and device for point mutation detection | |
US20210130888A1 (en) | Method, apparatus, and system for detecting chromosome aneuploidy | |
CN109686400B (en) | Enrichment degree inspection method and device, readable medium and storage controller | |
Ranjbar et al. | Bayesian normalization model for label-free quantitative analysis by LC-MS | |
Yang et al. | Improved detection algorithm for copy number variations based on hidden Markov model | |
CN116206680A (en) | Method, device, equipment and storage medium for detecting tandem repeat area | |
JP4576194B2 (en) | Compound structure estimation apparatus, compound structure estimation method and program thereof | |
WO2023231184A1 (en) | Feature screening method and apparatus, storage medium, and electronic device | |
CN117951695A (en) | Industrial unknown threat detection method and system | |
GUINEL | Pre-natal testing using low-coverage next-generation sequencing data | |
Mukherjee et al. | Finding Overlapping Rmaps via Gaussian Mixture Model Clustering | |
Jia | Bioinformatic Insights into the Challenges of miRNA-Based BRCA Status Classification | |
Mukherjee et al. | Finding Overlapping Rmaps via Clustering | |
CN117894367A (en) | Screening and evaluating method for conservation of specific sequences of microorganisms | |
Wu et al. | Measurement uncertainty in cell image segmentation data analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |