CN102419774B - Method for clustering single nucleotide polymorphism (SNP) data - Google Patents

Method for clustering single nucleotide polymorphism (SNP) data Download PDF

Info

Publication number
CN102419774B
CN102419774B CN 201110418812 CN201110418812A CN102419774B CN 102419774 B CN102419774 B CN 102419774B CN 201110418812 CN201110418812 CN 201110418812 CN 201110418812 A CN201110418812 A CN 201110418812A CN 102419774 B CN102419774 B CN 102419774B
Authority
CN
China
Prior art keywords
data
snp
dense cell
cluster
dimension
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN 201110418812
Other languages
Chinese (zh)
Other versions
CN102419774A (en
Inventor
吴悦
贾敏
雷州
刘宗田
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Shanghai for Science and Technology
Original Assignee
University of Shanghai for Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Shanghai for Science and Technology filed Critical University of Shanghai for Science and Technology
Priority to CN 201110418812 priority Critical patent/CN102419774B/en
Publication of CN102419774A publication Critical patent/CN102419774A/en
Application granted granted Critical
Publication of CN102419774B publication Critical patent/CN102419774B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a method for clustering single nucleotide polymorphism (SNP) data. The method comprises the following steps of: firstly, pre-processing original SNP data, and converting the format of the original SNP data into data format which can be processed by using the method; secondly, performing grid division on the pre-processed SNP data, and dividing each dimension of the SNP data into three grids according to an expression value, in each sample, of each SNP site; thirdly, calculating the density of the divided grids to obtain a sub-space which comprises clusters; fourthly, clustering the obtained sub-space to obtain the classified SNP data, wherein each cluster is a set of co-expression SNP sites; and finally, storing a clustering result into a file. By adoption of the method, the problem of clustering of high-dimension classification data is solved, and the SNP data can be quickly clustered with high quality.

Description

A kind of clustering method towards the SNP data
Technical field
The present invention relates to extensive higher-dimension classifying type data are carried out the correlation technique of clustering processing, particularly design a kind of clustering method towards the SNP data, belong to field of computer technology.
Background technology
The high dimensional data cluster has become an important research direction in the data mining.Because along with the progress of technology so that Data Collection becomes more and more easier, cause that the database scale is increasing, complicacy is more and more higher, such as various types of trade transaction datas, Web document, gene expression data etc., their dimension (attribute) can reach hundreds and thousands of dimensions usually, even higher.
SNP is the abbreviation of single nucleotide polymorphism, and the meaning is single nucleotide polymorphism, mainly refers on genomic level by the caused dna sequence polymorphism of the variation of single core thuja acid.Present studies show that estimates at 3,000,000 SNP sites in the human genome.The SNP data refer to the expression data of SNP site in sample, belong to higher-dimension classifying type data.
Be subjected to the impact of " dimensionality effect ", manyly be used in the Clustering Effect that often can't obtain on the higher dimensional space at the good clustering method of low-dimensional data space performance.In order to address this problem, R.Agrawal has proposed the concept of subspace clustering first, to solve the clustering problem of high dimensional data.But the subspace clustering algorithm is only applicable to continuous data, and is not suitable for the classifying type data.
Summary of the invention
The problem that exists for solving above-mentioned prior art, the object of the present invention is to provide a kind of clustering method towards the SNP data, the data type of subspace clustering algorithm process is expanded to classifying type, efficiently solve the clustering problem of SNP data, improve the cluster accuracy.For achieving the above object, the technical solution used in the present invention is that its concrete operation step is as follows:
A. original SNP data are carried out pre-service, convert the manageable data layout of clustering method to;
B. pretreated SNP data being carried out grid divides;
C. calculate the density of the grid after dividing, obtain comprising the subspace of cluster;
D. cluster is carried out in the subspace that step C is obtained, and obtains by the SNP data of minute good class;
E. cluster result is saved in the file.
Among the above-mentioned steps A original SNP data are carried out pre-service, the operation steps that converts the manageable data layout of clustering method to is as follows:
A1) data encoding: the data mode that the SNP chip detection derives is such, each SNP site is a kind of somatotype result, always having four kinds of somatotype results, is respectively wild homozygous AA, sudden change heterozygous AB, the homozygous BB of sudden change and somatotype fail flag NC; SNP data AA is encoded to 0, AB to be encoded to 1, BB and to be encoded to 2;
A2) data scrubbing: if full line data all are NC, these full line data are all deleted so, if several NC data are arranged in the delegation, and the data value that is next sample same loci with these several NC data replacements then; Have in the delegation and surpass 10% NC data, then delete entire row data.
Among the above-mentioned steps B pretreated SNP data being carried out the operation steps that grid divides is, according to the expression value of each SNP site in each sample every one dimension of SNP data is divided into 3 grids, represent Spatial Dimension with K, at this moment K=1;
Subspace among the above-mentioned steps C refers to the set of dense cell, and described dense cell refers to mesh-density greater than the grid of density threshold, the density of the grid after calculating among the above-mentioned steps C is divided, and the operation steps of subspace that obtains comprising cluster is as follows:
C1) density of all grids in the calculating K dimension space obtains the dense cell collection in the K dimension space;
C2) according to the create-rule of K dimension candidate dense cell collection, generate K+1 dimension candidate dense cell collection;
C3) judge whether K+1 dimension candidate dense cell collection is empty, does not then make K=K+1 and forwards step C2 to for sky) continue to generate the more subspace of higher-dimension, be the highest n-dimensional subspace n that comprises cluster for sky represents the K n-dimensional subspace n, then forward above-mentioned steps D to;
Above-mentioned steps C2) create-rule of K dimension candidate dense cell collection is in:
Input:
Figure 139606DEST_PATH_IMAGE001
,
Figure 646942DEST_PATH_IMAGE001
Refer to the dense cell collection of all K-1 dimensions
Output:
Figure 298504DEST_PATH_IMAGE002
,
Figure 406137DEST_PATH_IMAGE002
Refer to K dimension candidate dense cell collection
Figure 801346DEST_PATH_IMAGE002
To generate like this: appoint and get
Figure 350139DEST_PATH_IMAGE001
Two dense cell
Figure 438181DEST_PATH_IMAGE003
With
Figure 515334DEST_PATH_IMAGE004
, and if only if
Figure 714234DEST_PATH_IMAGE003
With Expression data in a front k-2 sample is identical, and the expression data in k-1 sample is established not simultaneously
Figure 438793DEST_PATH_IMAGE005
, get
Figure 193123DEST_PATH_IMAGE003
Expression data in a front k-2 sample,
Figure 930135DEST_PATH_IMAGE003
Expression data in k-1 sample,
Figure 453520DEST_PATH_IMAGE004
Expression data in k-1 sample is done attended operation, and the result is exactly K dimension candidate dense cell
Figure 696413DEST_PATH_IMAGE006
, Set be
Figure 213162DEST_PATH_IMAGE002
Wherein
Figure 856633DEST_PATH_IMAGE007
The vector that represents the i dimension, "<" expression lexicographic order.
Obtain among the above-mentioned steps D by the SNP data of minute good class, refer to that each class is the set in the SNP site of coexpression after the cluster, the operation steps that cluster is carried out in the subspace that above-mentioned steps D obtains C is as follows:
D1) every sub spaces is mapped respectively G=<V, E 〉, wherein G refers to figure, and V refers to the summit of figure, and E refers to the summit of the limit of figure: figure
Figure 254117DEST_PATH_IMAGE008
Represent the interior SNP site set of dense cell i of subspace, the limit of figure
Figure 983038DEST_PATH_IMAGE009
It is coplanar representing dense cell i, j;
D2) to step D1) in the figure that does carry out depth-first search and obtain connected subgraph, a connected subgraph is exactly a cluster, each cluster is the set in the SNP site of coexpression.
Above-mentioned steps D1) in
Figure 61853DEST_PATH_IMAGE009
Representing dense cell i, j is coplanar referring to:
Two dense cell i are identical with the expression data of j in k-1 sample, and only the expression data in a sample is not identical.
The SNP data have the characteristic of two equipotential polymorphisms, so the dense cell under the one dimension state is coplanar.
Cluster result refers to total what clusters among the above-mentioned steps E, and what clusters every one-dimensional subspace has, and what the set in SNP site is in each cluster.
Compared with the prior art a kind of clustering method towards the SNP data of the present invention has following apparent outstanding feature and remarkable advantage:
(1) the present invention expands to classifying type with the data type of subspace clustering algorithm process, has effectively solved the clustering problem of SNP data.
(2) the present invention compares with traditional subspace clustering algorithm, does not carry out cut operator, although reduced the treatment effeciency of algorithm, can ensure effective information and not cut, and has improved cluster accuracy and cluster quality.
(3) the present invention compares with the conventional subspace clustering algorithm, with List<Set<Integer〉〉 structure storage dense cell collection, convenience and high-efficiency has been accomplished in additions and deletions operation to the dense cell collection, utilization factor to internal memory also is greatly improved, thereby greatly improved the execution efficient of algorithm, remedied the delay of not carrying out cut operator.
Description of drawings
Fig. 1 is the process flow diagram of a kind of clustering method towards the SNP data of the present invention;
Fig. 2 carries out pretreated process flow diagram with original SNP data among the present invention;
Fig. 3 is the density of the grid after the calculating among the present invention is divided, and obtains comprising the process flow diagram of the subspace of cluster;
Fig. 4 carries out the process flow diagram of cluster to the subspace that obtains among the present invention.
Embodiment
The present invention is further detailed explanation below in conjunction with Figure of description and specific embodiment.
Embodiment one:
Referring to Fig. 1, this is characterized in that towards the clustering method of SNP data:
A. original SNP data are carried out pre-service, convert the manageable data layout of clustering method to;
B. pretreated SNP data being carried out grid divides;
C. calculate the density of the grid after dividing, obtain comprising the subspace of cluster;
D. cluster is carried out in the subspace that step C is obtained, and obtains by the SNP data of minute good class;
E. cluster result is saved in the file.
Embodiment two:
Present embodiment and embodiment one are basic identical, and special feature is as follows:
Referring to Fig. 2~Fig. 4, in the described steps A original SNP data are carried out pre-service, the operation steps that converts the manageable data layout of clustering method to is as follows:
A1) data encoding: the data mode that the SNP chip detection derives is such, and each SNP site is a kind of somatotype result, always has four kinds of somatotype results, is respectively wild homozygous AA, sudden change heterozygous AB, the homozygous BB of sudden change and somatotype fail flag NC; SNP data AA is encoded to 0, AB to be encoded to 1, BB and to be encoded to 2;
A2) data scrubbing: full line data that have all are NC, and these full line data are all deleted so, have plenty of several NC data are arranged in the delegation, the data value that is next sample same loci with these several NC data replacements then; Have in the delegation and surpass 10% NC data, then delete entire row data.
Among the described step B pretreated SNP data being carried out the operation steps that grid divides is, according to the expression value of each SNP site in each sample every one dimension of SNP data is divided into 3 grids, represent Spatial Dimension with K, at this moment K=1.
Subspace among the described step C refers to the set of dense cell, and described dense cell refers to that mesh-density is greater than the grid of density threshold; The density of the grid after described calculating is divided, the operation steps of subspace that obtains comprising cluster is as follows:
C1) density of all grids in the calculating K dimension space obtains the dense cell collection in the K dimension space;
C2) according to the create-rule of K dimension candidate dense cell collection, generate K+1 dimension candidate dense cell collection;
C3) judge whether K+1 dimension candidate dense cell collection is empty, does not then make K=K+1 and forwards step C2 to for sky) continue to generate the more subspace of higher-dimension, be the highest n-dimensional subspace n that comprises cluster for sky represents the K n-dimensional subspace n, then forward described step D to.
Described step C2) create-rule of K dimension candidate dense cell collection is in:
Input:
Figure 372879DEST_PATH_IMAGE001
, Refer to the dense cell collection of all K-1 dimensions
Output:
Figure 95165DEST_PATH_IMAGE002
,
Figure 977670DEST_PATH_IMAGE002
Refer to K dimension candidate dense cell collection
Figure 392471DEST_PATH_IMAGE002
To generate like this: appoint and get
Figure 335019DEST_PATH_IMAGE001
Two dense cell
Figure 772954DEST_PATH_IMAGE003
With
Figure 193571DEST_PATH_IMAGE004
, and if only if With
Figure 327060DEST_PATH_IMAGE004
Expression data in a front k-2 sample is identical, and the expression data in k-1 sample is established not simultaneously
Figure 252291DEST_PATH_IMAGE005
, get
Figure 538916DEST_PATH_IMAGE003
Expression data in a front k-2 sample, Expression data in k-1 sample,
Figure 150343DEST_PATH_IMAGE004
Expression data in k-1 sample is done attended operation, and the result is exactly K dimension candidate dense cell ,
Figure 135409DEST_PATH_IMAGE006
Set be
Figure 316991DEST_PATH_IMAGE002
Wherein
Figure 772243DEST_PATH_IMAGE007
The vector that represents the i dimension, "<" expression lexicographic order.
Obtain among the described step D by the SNP data of minute good class, refer to that each class is the set in the SNP site of coexpression after the cluster, the operation steps that cluster is carried out in the described subspace that step C is obtained is as follows:
D1) every sub spaces is mapped respectively G=<V, E 〉, wherein G refers to figure, and V refers to the summit of figure, and E refers to the summit of the limit of figure: figure
Figure 672066DEST_PATH_IMAGE008
Represent the interior SNP site set of dense cell i of subspace, the limit of figure
Figure 300494DEST_PATH_IMAGE009
It is coplanar representing dense cell i, j;
D2) to step D1) in the figure that does carry out depth-first search and obtain connected subgraph, a connected subgraph is exactly a cluster, each cluster is the set in the SNP site of coexpression.
Described step D1) in Representing dense cell i, j is coplanar referring to:
Two dense cell i are identical with the expression data of j in k-1 sample, and only the expression data in a sample is not identical;
The SNP data have the characteristic of two equipotential polymorphisms, so the dense cell under the one dimension state is coplanar.
Cluster result refers to total what clusters in the described step e, and what clusters every one-dimensional subspace has, and what the set in SNP site is in each cluster.
Embodiment three:
With reference to Fig. 1~Fig. 4, a kind of clustering method towards the SNP data of the present invention, take the SNP data clusters of patients with hypertension as example, its concrete steps are as follows:
(1) original SNP data are carried out pre-service, convert the manageable data layout of clustering method to, as shown in Figure 2, its concrete steps are as follows:
A) data encoding: the data mode that the SNP chip detection derives is such, each SNP site is a kind of somatotype result, always having four kinds of somatotype results, is respectively wild homozygous AA, sudden change heterozygous AB, the homozygous BB of sudden change and somatotype fail flag NC; SNP data AA is encoded to 0, AB to be encoded to 1, BB and to be encoded to 2;
B) data scrubbing: full line data that have all are NC, and these full line data are all deleted so, have plenty of several NC data are arranged in the delegation, the data value that is next sample same loci with these several NC data replacements then; Have in the delegation and surpass 10% NC data, then delete entire row data.
(2) pretreated SNP data being carried out grid divides.
Its concrete operation step is, according to the expression value of each SNP site in each sample every one dimension of SNP data is divided into 3 grids, represents Spatial Dimension with K, this moment K=1;
(3) density of the grid after calculate dividing obtains comprising the subspace of cluster.
Wherein the subspace refers to the set of dense cell, and described dense cell refers to that mesh-density is greater than the grid of density threshold.As shown in Figure 3, its concrete steps are as follows:
A) density of all grids in the calculating K dimension space obtains the dense cell collection in the K dimension space;
B) according to the create-rule of K dimension candidate dense cell collection, generate K+1 dimension candidate dense cell collection;
Wherein the create-rule of K dimension candidate dense cell collection is:
Input: ,
Figure 349855DEST_PATH_IMAGE001
Refer to the dense cell collection of all K-1 dimensions
Output:
Figure 267127DEST_PATH_IMAGE002
,
Figure 423302DEST_PATH_IMAGE002
Refer to K dimension candidate dense cell collection
Figure 220356DEST_PATH_IMAGE002
To generate like this: appoint and get
Figure 829192DEST_PATH_IMAGE001
Two dense cell
Figure 65001DEST_PATH_IMAGE003
With
Figure 75683DEST_PATH_IMAGE004
, and if only if
Figure 43639DEST_PATH_IMAGE003
With Expression data in a front k-2 sample is identical, and the expression data in k-1 sample is established not simultaneously
Figure 398845DEST_PATH_IMAGE005
, get Expression data in a front k-2 sample,
Figure 668469DEST_PATH_IMAGE003
Expression data in k-1 sample,
Figure 314214DEST_PATH_IMAGE004
Expression data in k-1 sample is done attended operation, and the result is exactly K dimension candidate dense cell
Figure 563930DEST_PATH_IMAGE006
,
Figure 549204DEST_PATH_IMAGE006
Set be
Figure 858962DEST_PATH_IMAGE002
Wherein
Figure 742736DEST_PATH_IMAGE007
The vector that represents the i dimension, "<" expression lexicographic order.
C) judge whether K+1 dimension candidate dense cell collection is empty, does not continue to generate the more subspace of higher-dimension for sky then makes K=K+1 and forwards step (b) to, and representing the K n-dimensional subspace n for sky is the highest n-dimensional subspace n that comprises cluster, then forwards step (4) to;
(4) cluster is carried out in the subspace that step (3) is obtained, and obtains by the SNP data of minute good class.
Wherein obtain by the SNP data of minute good class, refer to that each class is the set in the SNP site of coexpression after the cluster.As shown in Figure 4, its concrete steps are as follows:
A) every sub spaces is mapped respectively G=<V, E 〉, wherein G refers to figure, and V refers to the summit of figure, and E refers to the summit of the limit of figure: figure
Figure 796142DEST_PATH_IMAGE008
Represent the interior SNP site set of dense cell i of subspace, the limit of figure
Figure 635922DEST_PATH_IMAGE009
It is coplanar representing dense cell i, j;
Wherein
Figure 116582DEST_PATH_IMAGE009
Representing dense cell i, j is coplanar referring to:
Two dense cell i are identical with the expression data of j in k-1 sample, and only the expression data in a sample is not identical.
The SNP data have the characteristic of two equipotential polymorphisms, so the dense cell under the one dimension state is coplanar.
B) figure that does in the step (a) is carried out depth-first search and obtain connected subgraph, a connected subgraph is exactly a cluster, and each cluster is the set in the SNP site of coexpression.
(5) cluster result is saved in the file.
Wherein cluster result refers to total what clusters, and what clusters every one-dimensional subspace has, and what the set in SNP site is in each cluster.
Experimental result shows that the present invention expands to classifying type with the data type of subspace clustering algorithm process, has effectively solved the clustering problem of SNP data, and has improved cluster accuracy and cluster quality.
More than a kind of clustering method towards the SNP data of the present invention is described in detail, just be used for helping to understand method of the present invention and core concept; Simultaneously, for one of ordinary skill in the art, according to method of the present invention and thought, all can change to some extent on embodiment and range of application, in sum, this description should not be construed as limitation of the present invention.

Claims (5)

1. clustering method towards the SNP data is characterized in that concrete operation step is as follows:
A. original SNP data are carried out pre-service, convert the manageable data layout of clustering method to, concrete steps are:
A1) data encoding: the data mode that the SNP chip detection derives is such, and each SNP site is a kind of somatotype result, always has four kinds of somatotype results, is respectively wild homozygous AA, sudden change heterozygous AB, the homozygous BB of sudden change and somatotype fail flag NC; SNP data AA is encoded to 0, AB to be encoded to 1, BB and to be encoded to 2;
A2) data scrubbing: full line data that have all are NC, and these full line data are all deleted so, have plenty of several NC data are arranged in the delegation, the data value that is next sample same loci with these several NC data replacements then; Have in the delegation and surpass 10% NC data, then delete entire row data;
B. pretreated SNP data being carried out grid divides;
C. calculate the density of the grid after dividing, obtain comprising the subspace of cluster;
D. cluster is carried out in the subspace that step C is obtained, and obtains by the SNP data of minute good class;
E. cluster result is saved in the file.
2. a kind of clustering method towards the SNP data according to claim 1, it is characterized in that the operation steps of among the described step B pretreated SNP data being carried out the grid division is, according to the expression value of each SNP site in each sample every one dimension of SNP data is divided into 3 grids, represent Spatial Dimension with K, at this moment K=1.
3. a kind of clustering method towards the SNP data according to claim 1 is characterized in that subspace among the described step C refers to the set of dense cell, and described dense cell refers to that mesh-density is greater than the grid of density threshold; The density of the grid after described calculating is divided, the operation steps of subspace that obtains comprising cluster is as follows:
C1) density of all grids in the calculating K dimension space obtains the dense cell collection in the K dimension space;
C2) according to the create-rule of K dimension candidate dense cell collection, generate K+1 dimension candidate dense cell collection, concrete steps are:
Input refers to the dense cell collection of all K-1 dimensions;
Output ,
Figure 721863DEST_PATH_IMAGE001
Refer to K dimension candidate dense cell collection;
Figure 167888DEST_PATH_IMAGE001
To generate like this: appoint two dense cell of getting
Figure 298655DEST_PATH_IMAGE002
With
Figure 499829DEST_PATH_IMAGE003
, and if only if
Figure 83257DEST_PATH_IMAGE002
With
Figure 332973DEST_PATH_IMAGE003
Expression data in a front k-2 sample is identical, and the expression data in k-1 sample is established not simultaneously
Figure DEST_PATH_IMAGE002A
, get
Figure 318247DEST_PATH_IMAGE002
Expression data in a front k-2 sample,
Figure 206435DEST_PATH_IMAGE002
Expression data in k-1 sample,
Figure 277160DEST_PATH_IMAGE003
Expression data in k-1 sample is done attended operation, and the result is exactly K dimension candidate dense cell
Figure 330566DEST_PATH_IMAGE004
,
Figure 170346DEST_PATH_IMAGE004
Set be
Figure 713323DEST_PATH_IMAGE001
Wherein The vector that represents the i dimension, "<" expression lexicographic order;
C3) judge whether K+1 dimension candidate dense cell collection is empty, does not then make K=K+1 and forwards step C2 to for sky) continue to generate the more subspace of higher-dimension, be the highest n-dimensional subspace n that comprises cluster for sky represents the K n-dimensional subspace n, then forward described step D to.
4. a kind of clustering method towards the SNP data according to claim 1, it is characterized in that obtaining among the described step D being divided the SNP data of good class, refer to that each class is the set in the SNP site of coexpression after the cluster, the operation steps that cluster is carried out in the described subspace that step C is obtained is as follows:
D1) every sub spaces is mapped respectively G=<V, E 〉, wherein G refers to figure, and V refers to the summit of figure, and E refers to the summit of the limit of figure: figure
Figure 862862DEST_PATH_IMAGE006
Represent the interior SNP site set of dense cell i of subspace, the limit of figure It is coplanar representing dense cell i, j; Described
Figure 271026DEST_PATH_IMAGE007
Representing dense cell i, j is coplanar referring to: two dense cell i are identical with the expression data of j in k-1 sample, and only the expression data in a sample is not identical; The SNP data have the characteristic of two equipotential polymorphisms, so the dense cell under the one dimension state is coplanar;
D2) to step D1) in the figure that does carry out depth-first search and obtain connected subgraph, a connected subgraph is exactly a cluster, each cluster is the set in the SNP site of coexpression.
5. a kind of clustering method towards the SNP data according to claim 1 is characterized in that cluster result refers to total what clusters in the described step e, and what clusters every one-dimensional subspace has, and what the set in SNP site is in each cluster.
CN 201110418812 2011-12-15 2011-12-15 Method for clustering single nucleotide polymorphism (SNP) data Expired - Fee Related CN102419774B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201110418812 CN102419774B (en) 2011-12-15 2011-12-15 Method for clustering single nucleotide polymorphism (SNP) data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201110418812 CN102419774B (en) 2011-12-15 2011-12-15 Method for clustering single nucleotide polymorphism (SNP) data

Publications (2)

Publication Number Publication Date
CN102419774A CN102419774A (en) 2012-04-18
CN102419774B true CN102419774B (en) 2013-04-03

Family

ID=45944187

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201110418812 Expired - Fee Related CN102419774B (en) 2011-12-15 2011-12-15 Method for clustering single nucleotide polymorphism (SNP) data

Country Status (1)

Country Link
CN (1) CN102419774B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106339416B (en) * 2016-08-15 2019-11-08 常熟理工学院 Educational data clustering method based on grid fast searching density peaks
CN106909942B (en) * 2017-02-28 2022-09-13 北京邮电大学 Subspace clustering method and device for high-dimensionality big data

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101211355A (en) * 2006-12-30 2008-07-02 中国科学院计算技术研究所 Image inquiry method based on clustering

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI396106B (en) * 2009-08-17 2013-05-11 Univ Nat Pingtung Sci & Tech Grid-based data clustering method

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101211355A (en) * 2006-12-30 2008-07-02 中国科学院计算技术研究所 Image inquiry method based on clustering

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
单核苷酸多态性分析算法的研究与应用;王峻;《哈尔滨工业大学博士学位论文》;20110430;第50至53页 *
王峻.单核苷酸多态性分析算法的研究与应用.《哈尔滨工业大学博士学位论文》.2011,第50至53页.
胡泱,陈刚.一种有效的基于网格和密度的聚类分析算法.《计算机应用》.2003,第23卷(第12期),第64页至第67页. *

Also Published As

Publication number Publication date
CN102419774A (en) 2012-04-18

Similar Documents

Publication Publication Date Title
CN102629305B (en) Feature selection method facing to SNP (Single Nucleotide Polymorphism) data
Edla et al. A prototype-based modified DBSCAN for gene clustering
Zheng et al. Gene differential coexpression analysis based on biweight correlation and maximum clique
WO2010042888A1 (en) A computational method for comparing, classifying, indexing, and cataloging of electronically stored linear information
CN106845536B (en) Parallel clustering method based on image scaling
CN102419774B (en) Method for clustering single nucleotide polymorphism (SNP) data
EP3955256A1 (en) Non-redundant gene clustering method and system, and electronic device
US9008974B2 (en) Taxonomic classification system
CN103119606B (en) A kind of clustering method of large-scale image data and device
CN113808669A (en) Metagenome sequence assembling method
Muflikhah et al. DNA sequence of hepatitis B virus clustering using hierarchical k-means algorithm
CN109145111B (en) Multi-feature text data similarity calculation method based on machine learning
Gill et al. Genetic Algorithm Based Approach To CircuitPartitioning
Swiercz et al. GRASShopPER—An algorithm for de novo assembly based on GPU alignments
CN105760478A (en) Large-scale distributed data clustering method based on machine learning
AU2021346093A1 (en) Method and system for subsampling of cells from single-cell genomics dataset
CN108764991B (en) Supply chain information analysis method based on K-means algorithm
US9342653B2 (en) Identification of ribosomal DNA sequences
CN111931861A (en) Anomaly detection method for heterogeneous data set and computer-readable storage medium
Liu et al. Cellular Similarity based Imputation for Single cell RNA Sequencing Data
CN114708919B (en) Rapid low-loss population single cell big data simplification method
Srivastava et al. Alevin: An integrated method for dscRNA-seq quantification
Spitz et al. Predicting Document Creation Times in News Citation Networks
CN113436674B (en) Incremental community detection method-TSEIA based on TOPSIS seed expansion
Eyüpoğlu Clustering of mitochondrial D-loop sequences using similarity matrix, PCA and K-means algorithm

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20130403

Termination date: 20151215

EXPY Termination of patent right or utility model