CN106778078A - DNA sequence dna similitude comparison method based on kendall coefficient correlations - Google Patents
DNA sequence dna similitude comparison method based on kendall coefficient correlations Download PDFInfo
- Publication number
- CN106778078A CN106778078A CN201611186639.9A CN201611186639A CN106778078A CN 106778078 A CN106778078 A CN 106778078A CN 201611186639 A CN201611186639 A CN 201611186639A CN 106778078 A CN106778078 A CN 106778078A
- Authority
- CN
- China
- Prior art keywords
- dna sequence
- dna
- sequence dna
- words
- kendall
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 108091028043 Nucleic acid sequence Proteins 0.000 title claims abstract description 100
- 238000000034 method Methods 0.000 title claims abstract description 22
- 239000013598 vector Substances 0.000 claims abstract description 29
- 239000011159 matrix material Substances 0.000 claims description 11
- 108020004414 DNA Proteins 0.000 description 8
- 241000894007 species Species 0.000 description 6
- 239000012634 fragment Substances 0.000 description 4
- 238000001712 DNA sequencing Methods 0.000 description 3
- 238000007405 data analysis Methods 0.000 description 2
- WKBPZYKAUNRMKP-UHFFFAOYSA-N 1-[2-(2,4-dichlorophenyl)pentyl]1,2,4-triazole Chemical compound C=1C=C(Cl)C=C(Cl)C=1C(CCC)CN1C=NC=N1 WKBPZYKAUNRMKP-UHFFFAOYSA-N 0.000 description 1
- 108010022579 ATP dependent 26S protease Proteins 0.000 description 1
- 241001439211 Almeida Species 0.000 description 1
- 241000283084 Balaenoptera musculus Species 0.000 description 1
- 241000283081 Balaenoptera physalus Species 0.000 description 1
- 241000282805 Ceratotherium simum Species 0.000 description 1
- 241000289427 Didelphidae Species 0.000 description 1
- 241000282326 Felis catus Species 0.000 description 1
- 241000282575 Gorilla Species 0.000 description 1
- 241000283118 Halichoerus grypus Species 0.000 description 1
- 241000282620 Hylobates sp. Species 0.000 description 1
- 241000289569 Macropus robustus Species 0.000 description 1
- 241000289371 Ornithorhynchus anatinus Species 0.000 description 1
- 241000282577 Pan troglodytes Species 0.000 description 1
- 241001504519 Papio ursinus Species 0.000 description 1
- 241000283150 Phoca vitulina Species 0.000 description 1
- 241000282405 Pongo abelii Species 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 239000011031 topaz Substances 0.000 description 1
- 229910052853 topaz Inorganic materials 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
Landscapes
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Biophysics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The present invention discloses the DNA sequence dna similitude comparison method based on kendall coefficient correlations, and it comprises the following steps:1) N bars DNA sequence dna to be compared is obtained;2) length k is chosen, the corresponding k words that each pair combines DNA sequence dna are obtained in the way of sliding window, and be combined into corresponding vector 3) with step 2) acquired in k words, calculate the number of times that each k word occurs in DNA sequence dna and calculate the frequency vector that k words occur in DNA sequence dna, be designated as xi, all of k word frequency rate of DNA sequence dna is designated as X={ xi};4) combination of two is carried out to N bar DNA sequence dna k term vectors, that is, is obtainedCombination, each combination k word frequency vector is designated as x, y;5) the k word frequency vectors of every kind of combination are x, y, calculate its corresponding kendall coefficient correlation;6) the N*N rank similarity factor matrixes of N bar DNA sequence dnas are set up, to obtain the similitude and evolutionary relationship figure of DNA sequence dna.The present invention improves the effect that DNA sequence dna similitude is compared, and simplifies computational complexity and shortens operation time.
Description
Technical field
The present invention relates to computer and bioinformatics process field, more particularly to the DNA based on kendall coefficient correlations
Sequence similarity comparison method.
Background technology
The central task of bioinformatics, is to extract conceptual knowledge from vast as the open sea DNA sequence data.Biological information
The task that scholar is faced, is not only the efficient data storage meanses of solution, and need to develop effective data analysis tool.
Because only that using new, effective data analysis tool, DNA sequence dna information could be converted into Biological Knowledge, and understand fully
The 26S Proteasome Structure and Function information that they are contained, and then thoroughly understand the biological significance representated by them.
The theoretical foundation that DNA sequence dna is compared is Evolution Theory, if having enough similitudes between two DNA sequence dnas,
Just speculate the two may have common evolution ancestors, by DNA sequence dna the replacement of residue, residue or DNA sequencing fragment lack
The hereditary variation process such as mistake and DNA sequence dna restructuring develops respectively.It is different that DNA sequence dna phase Sihe DNA sequence dna is homologous
Concept, the similarity degree between DNA sequence dna can be the parameter of quantization, and DNA sequence dna it is whether homologous need evolve it is true
Checking.It is actually to use certain specific Mathematical Modeling or algorithm that DNA sequence dna is compared, and finds out two or more DNA sequence dnas
Between maximum matching base number.
The frequency and positional information that topaz is beautiful, Wang Tianming et al. is occurred using the k words in DNA sequence dna construct a probability
Distribution, this distribution represents the distance between two vectors, be worth smaller species closer to.Vinga and Almeida are proposed and are based on
The DNA sequence dna comparative approach of word frequency rate:The number of times that all length occurs for the word of k by way of sliding window, obtains k words
Number or frequency vector, so a DNA sequence dna is mapped as a vector in higher-dimension theorem in Euclid space, so as to by DNA sequence dna it
Between similarity system design be converted to comparing between vector.
It is exactly that two DNA sequence dnas are compared with specific algorithm that double DNA sequence dnas are compared, so as to obtain this two DNA
The matching of maximum similitude between sequence.Kendall coefficient correlations are widely used in time DNA sequence dna, the hydrology, water quality DNA
The dependency prediction of sequence etc., but it be not used for the matching of DNA sequence dna similitude.
The content of the invention
It is an object of the invention to overcome the deficiencies in the prior art, there is provided the DNA sequence dna phase based on kendall coefficient correlations
Like property comparison method, one is built on N bar DNA sequence dnasRank similarity factor matrix, the evolution for obtaining N bar DNA sequence dnas is closed
System, while improving efficiency and raising operation efficiency that DNA sequence dna similitude is compared.
The technical solution adopted by the present invention is:
DNA sequence dna similitude comparison method based on kendall coefficient correlations, it comprises the following steps:
1) N bars DNA sequence dna to be compared is obtained;
2) length k is chosen, each pair is obtained in the way of sliding window and is combined the corresponding k words of DNA sequence dna, and be combined into phase
The vector answered
3) with step 2) acquired in k words, calculate the number of times that each k word occurs in DNA sequence dna, that is, calculate k words in DNA
The frequency vector occurred in sequence, is designated as xi;
4) combination of two is carried out to N bar DNA sequence dna k term vectors, that is, is obtainedCombination, each mix vector is designated as X=
{xi, Y={ yi}。
5) the k word frequency vectors of every kind of combination are xi, yi, calculate its corresponding kendall coefficient correlation;
6) N × N rank correlation matrixs of N bar DNA sequence dnas are set up, to obtain analog information and the evolution of DNA sequence dna
Graph of a relation.
Further, the step 2) in, the word frequency vector that its length is k is taken to DNA sequence dna.
Further, the step 5) in, can as follows obtain the kendall coefficient correlations of the k words of DNA sequence dna;
A) by following formula, the k words of DNA sequence dna A to be compared are obtained, wherein DNA sequence dna A length is set to n:
B) by following formula, the frequency that k words occur is calculated:xi={ i-th k wordRepeat in DNA sequence dna A
Number of times;
C) to the X for combining, Y-direction amount, by following formula, calculates kendall coefficient correlationsIt is characterized in that:tx
It is { xi},{yiIn possess uniformity logarithm, tyIt is { xi,yiPossessing inconsistency logarithm, T is { xi,yiPossess and differ k words
Total number.
D) t in step c)x, tyCan be obtained by following formula, tx=(xi-yi)*(xi-yi) be jack per line, then it is known as { xi,
yiIn uniformity logarithm, tyCan be obtained by following formula, ty=(xi-yi)*(xi-yi) be contrary sign, then it is known as { xi,yiIn differ
Cause property logarithm
The kendall coefficient correlations τ for being obtained is the number that a value is [- 1,1], when the value of τ represents two closer to 1
Degree of correlation is stronger between bar DNA sequence dna, when the value of τ is related negative sense between two DNA sequence dnas of -1 expression, works as τ
Value represent that two DNA sequence dnas do not exist correlation close to 0.
The kendall correlation matrixs of N*N ranks are built, this matrix is symmetrical matrix, and value on diagonal is 1, can be with
The affinity information two-by-two of N bar DNA sequence dnas is obtained, the relation of the evolution of N bar DNA sequence dnas is thus constructed.
DNA sequence dna similitude comparison method of the present invention based on kendall coefficient correlations, is asked for using sliding window mode
The k word frequency vectors of DNA sequence dna to be analyzed, the k term vectors to N bar DNA sequence dnas carry out combination of two, related using kendall
Coefficient seeks its coefficient correlation to the k word frequency vectors of corresponding DNA sequence dna, enabling carry out similitude inspection to a plurality of DNA sequence dna
Survey, testing result is effectively reflected the evolutionary relationship between DNA sequence dna.This method is more succinct, need to only build one symmetrically
Matrix, the value on matrix diagonal left to bottom right is 1, simplifies computational complexity, improves operation efficiency, kendall
Coefficient can obtain the good degree of accuracy as the characteristic value of description DNA sequence dna similitude prediction.
Brief description of the drawings
The present invention is described in further details below in conjunction with the drawings and specific embodiments;
Fig. 1 is the schematic flow sheet of DNA sequence dna similitude comparison method of the present invention based on kendall coefficient correlations;
Fig. 2 is the evolution of the DNA sequence dna of DNA sequence dna similitude comparison method of the present invention based on kendall coefficient correlations
Graph of a relation.
Specific embodiment
As shown in Figure 1 or 2,20 DNA encoding DNA sequence dnas of species are used to the method for the present invention as analysis object
As a example by be further elaborated, comprise the following steps:As shown in figure 1, the present embodiment based on kendall coefficient correlations
DNA sequence dna similitude comparison method comprises the following steps:
1) 20 DNA encoding DNA sequence dnas of species are selected as initial DNA sequence dna, the DNA sequence dna title of 20 species and
Length is shown in Table 1;
Species name | DNA sequence dna length |
baboon | 16522 |
bluewhale | 16403 |
cat | 17010 |
common_chimpanzee | 16564 |
cow | 16339 |
fin_whale | 16399 |
gibbon | 16473 |
gorilla | 16365 |
grayseal | 16798 |
harborseal | 16827 |
horse | 16661 |
human | 16570 |
mouse | 16296 |
opossum | 17085 |
orangutan | 16390 |
pigmy_chimpanzee | 16555 |
platypus | 17020 |
rat | 16301 |
wallaroo | 16897 |
whiterhinoceros | 16833 |
Table 1:Species DNA sequence dna information
2) the initial DNA sequence dna to step 1 obtains its k word, and combines these k words, obtains the k word frequency of initial DNA sequence dna
Rate vector is (referring to Vinga, S.Almeida, J.S.Alignment-free sequence comparison area review
[J].Bioinformatics.513-523.2003).The characteristics of the method is the short dna to seeking length k by sliding window mode
Sequence appears in frequency in DNA sequence dna to be measured, and to 4 bases { A, T, G, C } of DNA, it is 2 to take k length, then corresponding to k words has 42
=16 kinds, k words 4 are corresponded to if k=33=64 kinds;Such as the DNA sequence dna A=ATAACTA, its k word W of DNA sequencing fragment to be measured2=
{ AT, TA, AA, TT, AG, GA, AC, CA, CT ... }, its frequency vectorBe worth for 1,
2,1,0,0,0,1,0,1,0…};DNA sequencing fragment B=ACAACTTA to be measured, its k words frequency vector for 0,1,1,1,0,0,
2,1,1,0…};
3) correspondence N bar DNA sequence dnas, can obtain N number of k words frequency vector, and its combination of two is obtainedCombination, each
Combination frequency vector is designated as X, Y
4) calculated by following formulaObtain kendall coefficient correlations, wherein txIt is { xi,yiAnd other k word frequency
Possess uniformity logarithm, t between rateyIt is { xi,yiPossessing inconsistency logarithm and other k word frequency rates between, T is { xi,yiGather around
Differ k word total numbers, step 2) in DNA sequence dna A, B fragment k words total number be T=7;
5) step 4) in tx, tyCan be obtained by following formula, tx=(xi-yi)×(xi-yi) be jack per line, then it is referred to as { xi,yi}
Middle uniformity logarithm, tyCan be obtained by following formula, ty=(xi-yi)×(xi-yi) be contrary sign, then it is referred to as { xi,yiIn inconsistency
Logarithm;
6) it is the kendall correlation matrixs of N*N ranks to build matrix, and this matrix is symmetrical matrix, and diagonal line value is 1,
Upper triangular matrix can be generally classified as.Because similitude and distance are into negative correlativing relation, so, build evolutionary relationship figure it
Before, similarity figure is taken opposite number and is converted to distance by us, and builds evolutionary relationship figure with this, refer to Fig. 2.
Interpretation of result:By the Pearson correlation coefficients between calculating and editing distance, it has been found that application kendall meters
The DNA sequence dna similitude for calculating is -0.94 with the coefficient correlation of editing distance, illustrates what is calculated using the inventive method
The characteristics of DNA sequence dna similitude has high precision, and can be a kind of the non-of replacement editing distance by being quickly calculated
Normal effective method.
Embodiments of the invention are the foregoing is only, the scope of the claims of the invention is not thereby limited, it is every to utilize this hair
Equivalent structure or equivalent flow conversion that bright specification and accompanying drawing content are made, or directly or indirectly it is used in other related skills
Art field, is included within the scope of the present invention.
Claims (5)
1. the DNA sequence dna similitude comparison method of kendall coefficient correlations is based on, it is characterised in that:It comprises the following steps:
1) N bars DNA sequence dna to be compared is obtained;
2) length k is chosen, each pair is obtained in the way of sliding window and is combined the corresponding k words of DNA sequence dna, and be combined into corresponding
Vector;
3) with step 2) acquired in k words, calculate the number of times that each k word occurs in DNA sequence dna, that is, calculate k words in DNA sequence dna
The frequency vector of middle appearance, is designated as xi;
4) combination of two is carried out to N bar DNA sequence dna k term vectors, that is, is obtainedCombination, each mix vector is designated as X={ xi},Y
={ yi};
5) the k word frequency vectors of every kind of combination are xi, yi, calculate its corresponding kendall coefficient correlation;
6) N × N rank correlation matrixs of N bar DNA sequence dnas are set up, to obtain the analog information and evolutionary relationship of DNA sequence dna
Figure.
2. the DNA sequence dna similitude comparison method of kendall coefficient correlations is based on according to claim 1, it is characterised in that:
The step 2) in, the word frequency vector that its length is k is taken to DNA sequence dna.
3. the DNA sequence dna similitude comparison method of kendall coefficient correlations is based on according to claim 1, it is characterised in that:
The step 5) in, the kendall coefficient correlations of the k words of DNA sequence dna are obtained as follows:
A) by following formula, the k words of DNA sequence dna A to be compared are obtained, wherein DNA sequence dna A length is set to n:
B) by following formula, the frequency that k words occur is calculated:xi={ i-th k wordThat repeats in DNA sequence dna A is secondary
Number };
C) to the X for combining, Y-direction amount, by following formula, calculates kendall coefficient correlationsIt is characterized in that:txIt is
{xi},{yiIn possess uniformity logarithm, tyIt is { xi,yiPossessing inconsistency logarithm, T is { xi,yiPossess that to differ k words total
Number;
D) t in step c)x, tyCan be obtained by following formula, tx=(xi-yi)*(xi-yi) be jack per line, then it is known as { xi,yiIn
Uniformity logarithm, tyCan be obtained by following formula, ty=(xi-yi)*(xi-yi) be contrary sign, then it is known as { xi,yiIn inconsistency
Logarithm.
4. the DNA sequence dna similitude comparison method of kendall coefficient correlations is based on according to claim 1, it is characterised in that:
The kendall coefficient correlations τ for being obtained is the number that a value is [- 1,1], when the value of τ is closer to 1 expression, two DNA sequences
Degree of correlation is stronger between row, when the value of τ is related negative sense between two DNA sequence dnas of -1 expression, when the value of τ is approached
Represent that two DNA sequence dnas do not exist correlation in 0.
5. the DNA sequence dna similitude comparison method of kendall coefficient correlations is based on according to claim 1, it is characterised in that:
The kendall correlation matrixs of N*N ranks are built in step 6, this matrix is symmetrical matrix, and the value on diagonal is 1, can be obtained
To the affinity information two-by-two of N bar DNA sequence dnas, the relation of the evolution of N bar DNA sequence dnas is thus constructed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611186639.9A CN106778078B (en) | 2016-12-20 | 2016-12-20 | DNA sequence dna similitude comparison method based on kendall related coefficient |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611186639.9A CN106778078B (en) | 2016-12-20 | 2016-12-20 | DNA sequence dna similitude comparison method based on kendall related coefficient |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106778078A true CN106778078A (en) | 2017-05-31 |
CN106778078B CN106778078B (en) | 2019-04-09 |
Family
ID=58896076
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611186639.9A Expired - Fee Related CN106778078B (en) | 2016-12-20 | 2016-12-20 | DNA sequence dna similitude comparison method based on kendall related coefficient |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106778078B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108846262A (en) * | 2018-05-31 | 2018-11-20 | 广西大学 | The method that RNA secondary structure distance based on DFT calculates phylogenetic tree construction |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040101846A1 (en) * | 2002-11-22 | 2004-05-27 | Collins Patrick J. | Methods for identifying suitable nucleic acid probe sequences for use in nucleic acid arrays |
CN102732609A (en) * | 2011-04-08 | 2012-10-17 | 博奥生物有限公司 | Method for detecting similarity of oligonucleotide and target genome |
WO2014019164A1 (en) * | 2012-08-01 | 2014-02-06 | 深圳华大基因研究院 | Method and device for analyzing microbial community composition |
CN104395900A (en) * | 2013-03-15 | 2015-03-04 | 北京未名博思生物智能科技开发有限公司 | Spatial arithmetic method of sequence alignment |
CN104657628A (en) * | 2015-01-08 | 2015-05-27 | 深圳华大基因科技服务有限公司 | Proton-based transcriptome sequencing data comparison and analysis method and system |
WO2016058089A1 (en) * | 2014-10-17 | 2016-04-21 | The Hospital For Sick Children | Dna methylation markers for overgrowth syndromes |
EP3081257A1 (en) * | 2015-04-17 | 2016-10-19 | Sorin CRM SAS | Active implantable medical device for cardiac stimulation comprising means for detecting a remodelling or reverse remodelling phenomenon of the patient |
CN106203471A (en) * | 2016-06-22 | 2016-12-07 | 南京航空航天大学 | A kind of based on the Spectral Clustering merging Kendall Tau distance metric |
-
2016
- 2016-12-20 CN CN201611186639.9A patent/CN106778078B/en not_active Expired - Fee Related
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040101846A1 (en) * | 2002-11-22 | 2004-05-27 | Collins Patrick J. | Methods for identifying suitable nucleic acid probe sequences for use in nucleic acid arrays |
CN102732609A (en) * | 2011-04-08 | 2012-10-17 | 博奥生物有限公司 | Method for detecting similarity of oligonucleotide and target genome |
WO2014019164A1 (en) * | 2012-08-01 | 2014-02-06 | 深圳华大基因研究院 | Method and device for analyzing microbial community composition |
CN104395900A (en) * | 2013-03-15 | 2015-03-04 | 北京未名博思生物智能科技开发有限公司 | Spatial arithmetic method of sequence alignment |
WO2016058089A1 (en) * | 2014-10-17 | 2016-04-21 | The Hospital For Sick Children | Dna methylation markers for overgrowth syndromes |
CN104657628A (en) * | 2015-01-08 | 2015-05-27 | 深圳华大基因科技服务有限公司 | Proton-based transcriptome sequencing data comparison and analysis method and system |
EP3081257A1 (en) * | 2015-04-17 | 2016-10-19 | Sorin CRM SAS | Active implantable medical device for cardiac stimulation comprising means for detecting a remodelling or reverse remodelling phenomenon of the patient |
CN106203471A (en) * | 2016-06-22 | 2016-12-07 | 南京航空航天大学 | A kind of based on the Spectral Clustering merging Kendall Tau distance metric |
Non-Patent Citations (1)
Title |
---|
黄玉娟: "基于k词的DNA序列分析的模型研究及应用", 《中国博士学位论文全文数据库(基础科学辑)》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108846262A (en) * | 2018-05-31 | 2018-11-20 | 广西大学 | The method that RNA secondary structure distance based on DFT calculates phylogenetic tree construction |
Also Published As
Publication number | Publication date |
---|---|
CN106778078B (en) | 2019-04-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Jackson et al. | PHRAPL: phylogeographic inference using approximate likelihoods | |
Steinley | Local optima in K-means clustering: what you don't know may hurt you. | |
Rabosky et al. | Clade age and species richness are decoupled across the eukaryotic tree of life | |
Yuan et al. | Bayesian mediation analysis. | |
Moon et al. | Two-stage sensitivity-based group screening in computer experiments | |
Epprecht et al. | Variable selection and forecasting via automated methods for linear models: LASSO/adaLASSO and Autometrics | |
US20110208495A1 (en) | Method, system, and program for generating prediction model based on multiple regression analysis | |
US20240029834A1 (en) | Drug Optimization by Active Learning | |
Ayadi et al. | BiMine+: an efficient algorithm for discovering relevant biclusters of DNA microarray data | |
Bezáková et al. | Graph model selection using maximum likelihood | |
Chesneau et al. | Statistical theory and practice of the inverse power Muth distribution | |
Laffont et al. | Multivariate analysis of longitudinal ordinal data with mixed effects models, with application to clinical outcomes in osteoarthritis | |
Saad et al. | A family of exact goodness-of-fit tests for high-dimensional discrete distributions | |
CN106778078A (en) | DNA sequence dna similitude comparison method based on kendall coefficient correlations | |
Rabin et al. | Two directional Laplacian pyramids with application to data imputation | |
Xue et al. | Comparison of population-based algorithms for optimizing thinnings and rotation using a process-based growth model | |
CN114880490A (en) | Knowledge graph completion method based on graph attention network | |
CN114678070A (en) | Single cell RNA sequencing data dimension reduction method, equipment and readable storage medium | |
CN113095467A (en) | Quantum biological population quantity estimation method | |
Zamanzadeh et al. | Autopopulus: a novel framework for autoencoder imputation on large clinical datasets | |
Barakat et al. | Exact prediction intervals for future current records and record range from any continuous distribution | |
Gustafsson et al. | Large-scale reverse engineering by the lasso | |
Minerva et al. | Evolutionary approaches for statistical modelling | |
Morgan et al. | Experimental design | |
Min et al. | Bayesian variable selection in Poisson change-point regression analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20190409 |