CN112200270B - Data partition filling method for correcting high-throughput omics data loss - Google Patents

Data partition filling method for correcting high-throughput omics data loss Download PDF

Info

Publication number
CN112200270B
CN112200270B CN202011285428.7A CN202011285428A CN112200270B CN 112200270 B CN112200270 B CN 112200270B CN 202011285428 A CN202011285428 A CN 202011285428A CN 112200270 B CN112200270 B CN 112200270B
Authority
CN
China
Prior art keywords
data
filling
group
partition
deletion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011285428.7A
Other languages
Chinese (zh)
Other versions
CN112200270A (en
Inventor
刘骁
冀树伸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jin Fu Kang Biotechnology Shanghai Ltd By Share Ltd
Original Assignee
Jin Fu Kang Biotechnology Shanghai Ltd By Share Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jin Fu Kang Biotechnology Shanghai Ltd By Share Ltd filed Critical Jin Fu Kang Biotechnology Shanghai Ltd By Share Ltd
Priority to CN202011285428.7A priority Critical patent/CN112200270B/en
Publication of CN112200270A publication Critical patent/CN112200270A/en
Application granted granted Critical
Publication of CN112200270B publication Critical patent/CN112200270B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biophysics (AREA)
  • Chemical & Material Sciences (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Probability & Statistics with Applications (AREA)
  • Health & Medical Sciences (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention discloses a data partition filling method for correcting high-throughput omics data loss, which comprises the following steps of: calculating partition critical values Blow and Bup according to the grouping condition and the data detection distribution condition of the high-throughput omics data expression matrix to realize the partition of data; sorting the data according to the missing amount from more to less, and dividing the data into three partitions of real missing, unstable missing and technical missing according to the partition critical point; and filling the data of the three partitions by using corresponding filling algorithms respectively. The invention can ensure that the data filling is closer to reality, on one hand, the negative influence of data distortion on data grouping can be reduced, and on the other hand, the problem of excessive data remodeling caused by using a single filling algorithm is avoided; experiments show that the method has strong data restoration robustness, and compared with other methods, the data grouping result filled by the method is closest to a real result, and the effectiveness of the method is proved.

Description

Data partition filling method for correcting high-throughput omics data loss
Technical Field
The invention belongs to the field of biological information data analysis, and particularly relates to a data partition filling method for correcting high-throughput omics data loss.
Background
High-throughput omics technology was developed after 2000 years, has gradually become one of the most important means for studying the micro-molecular world, and is widely applied to the life science research fields such as genomics, transcriptomics and proteomics. However, due to the high sensitivity of high throughput detection instruments, and the random fluctuation and time-dependent nature of biomolecules, some biomolecules are often not detected, i.e., their detection value is zero or close to zero. When a data set containing a lot of missing values is trained, the presence of the missing values can greatly affect the performance of the machine learning model, and can lead to the misinterpretation of biological significance. How to recover the missing data and restore the real expression of the missing data as much as possible is an important challenge in omics data analysis.
At present, filling algorithms aiming at high-throughput omics data all use a single fixed mode, and common filling algorithms are as follows: mean number padding, median padding, KNN padding, and the like. However, in practical experiments, the reasons for the loss of molecular detection values are manifold, and the common reasons for the loss are as follows: 1) True absence, this molecule does not exist; 2) An unstable deletion, wherein the expression of the molecule is unstable and can be detected when the molecule is detected, and can not be detected when the molecule is detected; 3) The technical defects are as follows: a molecule can be detected in most samples due to instability of the detection instrument, but there are cases where the detection value of the sample is empty. In this context, the use of a fixed pattern does not satisfy the computational requirements. And different adaptive algorithm combinations are used for adaptive filling according to different conditions, so that the filling distortion condition can be effectively avoided.
Disclosure of Invention
The invention aims to overcome the problems in the prior art and provide a data partition filling method for correcting high-throughput omics data loss, wherein a distribution model of data loss caused by different factors in high-throughput omics experimental big data is filled by using different algorithms according to different models, wherein the loss caused by unstable molecular expression is the difficulty of the filling algorithm, and the non-overfitting filling of the data is realized according to the integral loss probability and the intra-group loss probability of the data based on the Bayesian algorithm.
In order to achieve the technical purpose and achieve the technical effect, the invention is realized by the following technical scheme:
a data zone population method for correcting high throughput omics data loss, the method comprising the steps of:
the method comprises the following steps: calculating partition critical values Blow and Bup according to the grouping condition and the data detection distribution condition of the high-throughput omics data expression matrix to realize the partition of data;
step two: sorting the data according to the deletion amount from more to less, and dividing the data into three partitions of real deletion, unstable deletion and technical deletion according to a partition critical value;
step three: and filling the data of the three partitions by using corresponding filling algorithms respectively.
Further, the specific steps of the first step are as follows:
(1) Calculating the detection rate of each molecule in each group of high throughput omics data expression matrices: the detection rate of the molecules in the group i = the number of samples of which the detection value is not 0/the total number of samples of the group i;
(2) Calculating partition critical values Blow and Bup, aiming at the grouping of each high-throughput omics data expression matrix, dividing all molecules contained in each sample in the group into three clusters according to the detection expression quantity of the molecules by using a k-means algorithm, and calculating the median of the detection rate of the molecules contained in each cluster, wherein the minimum and maximum two medians are the partition critical values Blow and Bup.
Further, the third step comprises the following specific steps:
(1) Filling of true misses: filling is not carried out when the molecular detection rate is less than the minimum critical value;
(2) Filling of unstable deletions:
and (3) for the deletion caused by unstable self-expression of the molecules, filling is carried out after predicting the filling number by using a Bayesian algorithm: calculating the number of samples needing to be filled by using a Bayesian algorithm, firstly calculating the potential deletion rate missp of the molecules in the group, wherein the used formula is as follows:
missp = PA (PBA/((PBA PA) + (0.05 x (1-PA)))), where PBA is the intra-group deletion rate of the molecule in the data set and PA is the overall deletion rate of a molecule in the data set, using the formula: IN = min (Mj/2, (1-missp) × Mi), calculating the number IN of the molecules to be filled IN the group, where Mi is the number of samples not detected IN the group, and Mj represents the number of samples detected IN the group; finally, carrying out a random algorithm on the samples with the detection value of 0 IN the reorganization, selecting IN samples needing to be filled, and filling by using the nonzero minimum value IN the group;
(3) Filling of technical deletions: for molecules with a detection rate greater than the maximum threshold, null filling is performed using the median of the molecular detection values of the set.
The invention has the beneficial effects that:
according to the technical characteristics of high-throughput omics detection, a regression algorithm is used for establishing a data loss model, data are partitioned according to three conditions of real loss, unstable loss and technical loss, and then data filling calculation is performed by respectively using a minimum value algorithm, a Bayes algorithm and a median algorithm; therefore, the data filling is closer to reality, on one hand, the negative influence of data distortion on data grouping can be reduced, and on the other hand, the problem of excessive data remodeling caused by using a single filling algorithm is avoided; experiments show that the method has strong data restoration robustness, and compared with other methods, the data grouping result filled by the method is closest to a real result, and the effectiveness of the method is proved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention and do not constitute a limitation of the invention. In the drawings:
FIG. 1 is a schematic flow diagram of the present invention;
FIG. 2 is a graph of the correlation regression fit trend of the detection rate and expression level in each of the grouped samples according to the present invention;
FIG. 3 is a graph showing a comparison of the number of proteins before filling the sample in the present invention;
FIG. 4 is a graph showing a comparison of the number of proteins in a sample filled in the present invention;
FIG. 5 is a diagram illustrating the clustering of samples before data padding according to the present invention;
FIG. 6 is a diagram illustrating the clustering of the samples after data padding according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
A data zone population method for correcting high throughput omics data loss as shown in figure 1, said method comprising the steps of:
the method comprises the following steps: calculating partition critical values Blow and Bup according to the grouping condition and the data detection distribution condition of the high-throughput omics data expression matrix, and realizing the partition of data, wherein the specific steps are as follows:
(1) Calculating the detection rate of each molecule in each group of high throughput omics data expression matrices: the detection rate of the molecules in the group i = the number of samples of which the detection value is not 0/the total number of samples of the group i;
(2) Calculating partition critical values Blow and Bup, aiming at the grouping of each high-throughput omics data expression matrix, dividing all molecules contained in each sample in the group into three cluster according to the detection expression quantity by using a k-means algorithm, and calculating the median of the detection rate of the molecules contained in each cluster, wherein the minimum median and the maximum median are the partition critical values Blow and Bup;
step two: sorting the data according to the missing amount from more to less, and dividing the data into three partitions of real missing, unstable missing and technical missing according to a partition critical value;
step three: filling the data of the three partitions by using corresponding filling algorithms respectively, and specifically comprising the following steps of:
(1) Filling of true misses: filling is not carried out when the molecular detection rate is less than the minimum critical value;
(2) Filling of unstable deletions:
filling after predicting the filling number of the deletion generated by unstable self-expression of the molecules by using a Bayesian algorithm: calculating the number of samples needing to be filled by using a Bayesian algorithm, firstly calculating the potential deletion rate missp of the molecules in the group, wherein the used formula is as follows:
missp = PA (PBA/((PBA PA) + (0.05 x (1-PA)))), where PBA is the intra-group deletion rate of the molecule in the data set and PA is the overall deletion rate of a molecule in the data set, again using the formula: IN = min (Mj/2, (1-missp) × Mi), calculating the number IN of the molecules to be filled IN the group, where Mi is the number of samples not detected IN the group, and Mj represents the number of samples detected IN the group; finally, carrying out a random algorithm on the samples with the detection value of 0 IN the reorganization, selecting IN samples needing to be filled, and filling by using the nonzero minimum value IN the group;
(3) Filling of technical deletions: for molecules with detection rates greater than the maximum threshold, null filling is performed using the median of the set of molecular detection values.
Taking the proteome data of the blood sample of the liver cancer patient as an example, g1 to g7 represent 7 different disease states and stages in clinic respectively:
1. the clinical samples are subjected to proteomics experiments through a mass spectrometer, the signal value of each detected protein in each sample is recorded in a data matrix analyzed by the mass spectrometer, and the value without the detected signal value is marked as 0, namely a deletion value.
2. Calculating the relevance of the detection rate and the expression quantity of each protein in each grouped sample, and performing regression fitting by using a locally-weighted polymeric regression;
as shown in fig. 2, a regression fitting trend graph of correlation between the detection rate and the expression amount in each grouped sample, in which black dots represent different detected proteins, the abscissa is the deletion rate of each protein, the ordinate is the expression value of each protein, line a is an expression change fitting curve, and lines B and C are two partition values Blow and Bup; the detection rate and the expression quantity of the protein are in positive correlation on the whole, but from the trend of a fitting curve, the protein in a section with lower detection rate is hardly expressed and belongs to a low-expression section, the detection rate of a middle section and the protein expression quantity fitting curve have a very fast rising trend of 45 degrees and belong to a transition section, and the expression quantity of the protein stably rises after the final detection rate is greater than a critical value and belongs to a stable section; respectively corresponding to three conditions of real deletion, unstable deletion and technical deletion.
The deletion of the low expression segment protein is considered as the unstable detection condition caused by the instability of the protein expression, wherein the deletion of the low expression segment protein is considered as the unstable detection condition caused by the low expression amount, the insufficient sensitivity of a mass spectrometer cannot be detected, the deletion of the stable segment protein is considered as the insufficient accuracy of the mass spectrometer, the expression amount of the protein is not detected, and the deletion between the sensitivity and the accuracy of the mass spectrometer, namely the deletion of the intermediate transition segment protein is considered as the unstable detection condition caused by the protein expression.
3. Dividing the data into 3 clusters through a kmeans algorithm, and calculating the median of the detection rate of molecules contained in each cluster, wherein the minimum median and the maximum median, namely partition critical values Blow and Bup are respectively 0.15 and 0.5;
4. partitioning: sorting the data according to the missing amount, and dividing the data into three partitions of real missing, unstable missing and technical missing according to two deletion rate partition values of 0.15 and 0.5;
5. filling different partition data by using different filling algorithms respectively to obtain a filled recovery data matrix;
5.1 filling of true deletions: filling is not carried out when the molecular detection rate is less than the minimum critical value;
5.2 filling of unstable deletions:
and (3) for the deletion caused by unstable self-expression of the molecules, filling is carried out after predicting the filling number by using a Bayesian algorithm: calculating the number of samples needing to be filled by using a Bayesian algorithm, firstly, calculating the potential deletion rate missp of the molecule IN the group, wherein the missp = PA (((PBA). PA) + (0.05). PA))), the PBA is the deletion rate IN the group of the molecule IN the data set, the PA is the overall deletion rate of a certain molecule IN the data set, and then the number IN of the molecules needing to be filled IN the group is calculated by using the formula IN = min (Mj/2, (1-missp). Mi, wherein Mi is the number of the samples not detected IN the group, and Mj represents the number of the samples detected IN the group; finally, carrying out a random algorithm on the samples with the detection value of 0 IN the reorganization, selecting IN samples needing to be filled, and filling by using the nonzero minimum value IN the group;
5.3 filling of technical deletions: filling null values by using the median of the molecular detection values of the group for the molecules with the detection rates greater than the maximum critical value;
as shown in fig. 3 and 4, in the comparison graph of the number of proteins before and after filling, the difference between the number of detected proteins of each group of samples becomes small after filling, the change between groups is relatively stable, and the clinical logic is satisfied, because the samples of the same group belong to the same clinical stage, the large difference in the number of protein expression should not occur;
as shown in fig. 5 and 6, in the sample cluster maps before and after filling, the approximate clustering tendency of the samples among the groups can be seen before data filling, but there is a sample interspersed clustering condition, which belongs to a clustering error, the samples belonging to the same group after data filling can be gathered in the same branch, and the front and back association sequence of different branches accords with the disease progression logic corresponding to each group clinically, so that the clinical grouping meaning of each group can be accurately explained.
In the description herein, references to the description of "one embodiment," "an example," "a specific example," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed.

Claims (2)

1. A data partition population method for correcting high-throughput omics data loss, characterized by: the method comprises the following steps:
the method comprises the following steps: calculating partition critical values Blow and Bup according to the grouping condition and the data detection distribution condition of the high-throughput omics data expression matrix to realize the partition of data;
step two: sorting the data according to the deletion amount from more to less, and dividing the data into three partitions of real deletion, unstable deletion and technical deletion according to a partition critical value;
step three: filling the data of the three partitions by using corresponding filling algorithms respectively;
the specific steps of the first step are as follows:
(1) Calculating the detection rate of each molecule in each group of high throughput omics data expression matrices: the detection rate of the molecule in the group i = number of samples whose detection value is not 0/total number of samples of the group i;
(2) Calculating partition critical values Blow and Bup, aiming at the grouping of each high-throughput omics data expression matrix, dividing all molecules contained in each sample in the group into three cluster according to the detection expression quantity of the molecules by using a k-means algorithm, and calculating the median of the detection rate of the molecules contained in each cluster, wherein the minimum and maximum two medians are the partition critical values Blow and Bup.
2. The data partition filling method for correcting high throughput omics data loss as defined in claim 1, wherein: the third step comprises the following specific steps:
(1) Filling of true misses: filling is not carried out when the molecular detection rate is less than the minimum critical value;
(2) Filling of unstable deletions:
filling after predicting the filling number of the deletion generated by unstable self-expression of the molecules by using a Bayesian algorithm: calculating the number of samples needing to be filled by using a Bayesian algorithm, firstly calculating the potential deletion rate missp of the molecules in the group, wherein the used formula is as follows: missp = PA (PBA/((PBA PA) + (0.05 x (1-PA)))), where PBA is the intra-group deletion rate of the molecule in the data set and PA is the overall deletion rate of a molecule in the data set, using the formula: IN = min (Mj/2, (1-missp) × Mi), calculating the number IN of the molecules to be filled IN the group, where Mi is the number of samples not detected IN the group, and Mj represents the number of samples detected IN the group; finally, carrying out a random algorithm on the samples with the detection value of 0 IN the reorganization, selecting IN samples needing to be filled, and filling by using the nonzero minimum value IN the group;
(3) Filling in technology deficiency: for molecules with a detection rate greater than the maximum threshold, null filling is performed using the median of the molecular detection values of the set.
CN202011285428.7A 2020-11-17 2020-11-17 Data partition filling method for correcting high-throughput omics data loss Active CN112200270B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011285428.7A CN112200270B (en) 2020-11-17 2020-11-17 Data partition filling method for correcting high-throughput omics data loss

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011285428.7A CN112200270B (en) 2020-11-17 2020-11-17 Data partition filling method for correcting high-throughput omics data loss

Publications (2)

Publication Number Publication Date
CN112200270A CN112200270A (en) 2021-01-08
CN112200270B true CN112200270B (en) 2022-12-20

Family

ID=74034013

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011285428.7A Active CN112200270B (en) 2020-11-17 2020-11-17 Data partition filling method for correcting high-throughput omics data loss

Country Status (1)

Country Link
CN (1) CN112200270B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6618467B1 (en) * 1999-04-02 2003-09-09 Wisconsin Alumni Research Foundation Megavoltage computed tomography during radiotherapy
CN103745136A (en) * 2013-12-26 2014-04-23 中国农业大学 Efficient haplotype inference and deleted genotype fill method
CN109478231A (en) * 2016-04-01 2019-03-15 20/20基因系统股份有限公司 The method and composition of the obvious Lung neoplasm of benign and malignant radiograph is distinguished in help
WO2020132596A2 (en) * 2018-12-22 2020-06-25 Mdna Life Sciences Inc. Mitochondrial dna deletions associated with endometriosis

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020169730A1 (en) * 2001-08-29 2002-11-14 Emmanuel Lazaridis Methods for classifying objects and identifying latent classes

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6618467B1 (en) * 1999-04-02 2003-09-09 Wisconsin Alumni Research Foundation Megavoltage computed tomography during radiotherapy
CN103745136A (en) * 2013-12-26 2014-04-23 中国农业大学 Efficient haplotype inference and deleted genotype fill method
CN109478231A (en) * 2016-04-01 2019-03-15 20/20基因系统股份有限公司 The method and composition of the obvious Lung neoplasm of benign and malignant radiograph is distinguished in help
WO2020132596A2 (en) * 2018-12-22 2020-06-25 Mdna Life Sciences Inc. Mitochondrial dna deletions associated with endometriosis

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
多壁碳纳米管QuEChERS结合超高效液相色谱-串联质谱法测定鸡蛋中兽药多残留;莫迎等;《食品安全质量检测学报》;20190825;全文 *
有效解决数据缺失问题的聚集查询算法;孙舟等;《计算机工程与应用》;20180810(第24期);全文 *
生物高通量测序片段拼接与分子标记识别算法研究;王春宇;《中国博士学位论文全文数据库 (信息科技辑)》;20160315;全文 *

Also Published As

Publication number Publication date
CN112200270A (en) 2021-01-08

Similar Documents

Publication Publication Date Title
Lin et al. CIDR: Ultrafast and accurate clustering through imputation for single-cell RNA-seq data
US10580518B2 (en) Method and apparatus for performing similarity searching
AU2015331619B2 (en) Haplotype phasing models
CN110502277B (en) Code bad smell detection method based on BP neural network
Witt Population size versus runtime of a simple evolutionary algorithm
CN109166604B (en) Calculation method for predicting key protein by fusing multi-data features
Poptsova et al. The power of phylogenetic approaches to detect horizontally transferred genes
Chowdhury et al. Adaptive regulatory genes cardinality for reconstructing genetic networks
Wang et al. Ppisb: a novel network-based algorithm of predicting protein-protein interactions with mixed membership stochastic blockmodel
CN113299338A (en) Knowledge graph-based synthetic lethal gene pair prediction method, system, terminal and medium
Brinda Novel computational techniques for mapping and classification of Next-Generation Sequencing data
CN112200270B (en) Data partition filling method for correcting high-throughput omics data loss
Hu et al. A novel network-based algorithm for predicting protein-protein interactions using gene ontology
Fonseca et al. Model-agnostic approaches to handling noisy labels when training sound event classifiers
CN112908414B (en) Large-scale single-cell typing method, system and storage medium
CN111243658B (en) Biomolecular network construction and optimization method based on deep learning
Vuong et al. Venice: A new algorithm for finding marker genes in single-cell transcriptomic data
CN105373831A (en) A k-nearest neighbor prediction global optimization method based on group stage sample training
CN112446492B (en) Biological network comparison method based on genetic algorithm
US10671632B1 (en) Automated pipeline
CN111488496B (en) Sliding window-based Tango tree construction method and system
CN112750501A (en) Optimized analysis method for macrovirome process
Rasmussen et al. Inferring drift, genetic differentiation, and admixture graphs from low-depth sequencing data
CN117809734A (en) Dimension reduction modeling method and system for gene regulation network
CN117746992A (en) Proteomics data batch effect correction method based on metric learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant