CN107526946A - Merge the gene expression data cancer classification method of self study and low-rank representation - Google Patents

Merge the gene expression data cancer classification method of self study and low-rank representation Download PDF

Info

Publication number
CN107526946A
CN107526946A CN201611207518.8A CN201611207518A CN107526946A CN 107526946 A CN107526946 A CN 107526946A CN 201611207518 A CN201611207518 A CN 201611207518A CN 107526946 A CN107526946 A CN 107526946A
Authority
CN
China
Prior art keywords
mrow
matrix
low
rank
samples
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201611207518.8A
Other languages
Chinese (zh)
Other versions
CN107526946B (en
Inventor
於东军
夏春秋
韩珂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Science and Technology
Original Assignee
Nanjing University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Science and Technology filed Critical Nanjing University of Science and Technology
Priority to CN201611207518.8A priority Critical patent/CN107526946B/en
Publication of CN107526946A publication Critical patent/CN107526946A/en
Application granted granted Critical
Publication of CN107526946B publication Critical patent/CN107526946B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Landscapes

  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Public Health (AREA)
  • Bioethics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention discloses a kind of gene expression data cancer classification method for merging self study and low-rank representation, including:Step 1, data set is expressed for given cancer gene, data are merged into structure data matrix, and make normalized;Step 2, the data matrix for obtaining, are decomposed using low-rank expression, obtain a low-rank matrix and a sparse matrix;Step 3, the label information using training set, calculate the initial point of each classification respectively in low-rank matrix and sparse matrix;Step 4, a kind of unsupervised clustering is used in low-rank matrix and sparse matrix respectively, obtain the prediction result based on low-rank matrix and sparse matrix respectively;Two step 5, contrast prediction results, if without prediction identical sample or reaching maximum iteration, export the prediction result based on low-rank expression matrix;Otherwise, prediction identical sample is removed into test set and adds training set, return to step 3.Precision of prediction can be improved in the case of using a small amount of mark sample using the present invention, reduce the time in mark sample and human cost.

Description

Gene expression data cancer classification method combining self-learning and low-rank representation
Technical Field
The invention relates to the field of bioinformatics gene expression and cancer classification, in particular to a gene expression data cancer classification method combining self-learning and low-rank representation.
Background
Cancer is a fatal disease caused by abnormal growth of cells, and a completely effective treatment method has not been available so far. Early diagnosis can effectively help cancer treatment, so that how to accurately classify and predict cancers is a very valuable problem. With the development of high-throughput technology, gene expression data on cancer is rapidly accumulating, and machine learning technology has advanced sufficiently in recent years, so it has become possible to predict cancer categories using gene expression data and machine learning, for example, (1) Chen, x.y.and Jian, c.r. gene expression based on standardized restriction analysis.neuronoputting 2014; 143 (2) Liao, Q., Guan, N.and Zhang, Q.Gauss-Seidel based non-innovative amplification for gene expression in,2016IEEE International conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE; 2016, p.2364-2368, (3) Liu, J.X., et al, RPCA-Based Tumor Classification Using Gene expression data. IEEE ACM T Computt Bi 2015; 12(4), 964, 970, etc. However, most of the existing methods are unsupervised methods and supervised methods, and both methods have respective defects.
Unsupervised method learning approaches discover potential structures from unlabeled data by proposing a model. Since all samples are unlabeled, the label information cannot be used for error correction in the training of the model. This feature of unsupervised learning results in a model with poor prediction capability and no effective prediction accuracy. The supervised learning approach is in contrast to unsupervised learning, which trains models by using labeled data. The model obtained by the supervised learning method can provide higher prediction accuracy due to the fact that label data can be used in training. However, training a model by using a supervision method requires a large amount of labeled data, which is very expensive and consumes a lot of manpower and time, especially for labeling gene expression data.
In view of the defects that both learning methods cannot overcome, the semi-supervised learning provides a new idea for solving the problems: models are trained with large amounts of unlabeled data and small amounts of labeled data, which can provide much better prediction than unsupervised methods. The improved self-learning model of the method is a traditional semi-supervised learning method, and the method comprises the steps of adding samples with higher reliability in prediction into a training set, continuously carrying out iterative training and prediction, and finally classifying data in all test sets. Today, a number of effective semi-supervised methods have been used for the analysis of cancer gene expression data, such as: (1) cai, x.f., et al.local and Global prediction Semi-collaborative on Random basis for cancer classification. ieee J Biomed Health 2014; 18(2) 500-. However, the processing of gene expression data remains challenging:
(1) gene expression data with high dimensionality
Since each feature of gene expression data corresponds to one gene, and humans have not less than 2.5 ten thousand genes, the gene expression data often has several tens of thousands of feature components. When the traditional classification method is used for processing high-dimensional data, the traditional classification method is very sensitive to noise and redundancy in the data, and accurate prediction is difficult to provide;
(2) data set of gene expression data is small
Because the gene expression is expensive to measure by using the gene microarray technology, and the time and labor cost is high, a data set obtained at one time is very small, and often only dozens or hundreds of samples are contained, and an effective model is difficult to train by using an excessively small data volume.
Disclosure of Invention
The invention aims to provide a gene expression data cancer classification method combining self-learning and low-rank representation, which solves the problems of cancer classification prediction by using gene expression data in the prior art: the data dimension is high, the test set is small, and the labeled data is few.
The technical solution for realizing the purpose of the invention is as follows: a gene expression data cancer classification method combining self-learning and low-rank representation comprises the following steps:
step 1, for a given cancer gene expression data set, a set with label data is a training set, and a non-label data set is a testing set; merging the data to construct a data matrix X, and carrying out normalization processing;
step 2, decomposing the obtained data matrix by using a low-rank expression method to obtain a low-rank matrix Z and a sparse matrix E;
step 3, respectively calculating initial point coordinates p of each category i on the low-rank matrix Z and the sparse matrix E by using label information of the training set(i)
Step 4, respectively using an unsupervised clustering method on the low-rank matrix Z and the sparse matrix E to respectively obtain prediction results l based on the low-rank matrix Z and the sparse matrix EZAnd lE
Step 5, comparing the two prediction results lZAnd lEIf the same sample is not predicted or the maximum iteration number is reached, outputting a prediction result l based on the low-rank expression matrixZ(ii) a Otherwise, removing the test set from the samples with the same prediction and adding the samples into the training set, and returning to the step 3.
Compared with the prior art, the invention has the following remarkable advantages: 1) the method combines a low-rank representation method, and can extract essential global features from original high-dimensional data; (2) the method uses low-rank matrix information and sparse matrix information obtained by decomposition in low-rank representation, and is more effective than a traditional method based on low-rank representation (only information in one matrix is utilized).
Drawings
FIG. 1 is an exemplary flow chart of a method for cancer classification incorporating self-learning and low rank representation of gene expression data.
FIG. 2 is a schematic diagram of a cancer gene expression data set, wherein (a), (b) and (c) are an original data matrix and a low-rank matrix and a sparse matrix after low-rank decomposition, respectively, and each expression value in the matrix corresponds to a gray value of a pixel point. In the bar above each matrix, each color patch corresponds to a category of cancer.
Detailed Description
The use of gene expression data for cancer classification prediction is a typical high-dimensional small sample problem. In order to solve the problem, a feature extraction method commonly used for matrix recovery in the field of image processing is used for obtaining an intrinsic low-dimensional structure of data by restricting the rank of a data matrix by taking low-rank representation as a reference.
By using a semi-supervised learning method and a feature extraction method, the problem of using gene expression data for cancer classification prediction can be solved.
Embodiments of the present invention will now be described in detail, by way of example, with reference to the accompanying drawings.
As shown in FIG. 1, according to the preferred embodiment of the present invention, the self-learning and low rank representation gene expression data cancer classification method is fused for class prediction of samples in a cancer gene expression data set. In order to reflect the actual application situation, part of the data is regarded as label-free data and is defined as a test set; the remaining sample set is defined as the training set. In the training and predicting process, only the label information of the samples in the training set can be used, and the class information of the test set is used for comparing with the predicted class of the test set. . The classification prediction is divided into two stages: the feature extraction stage and the training and prediction stage are shown in fig. 1, and the implementation of the two stages is described in detail below.
(1) Feature extraction stage
Firstly, combining the feature vectors in the training set and the test set to construct a data matrix X, wherein the obtained data matrix X needs to satisfy the following requirements:
X=[x1,x2,…,xn]∈Rd×n
wherein x isiIs a column vector of one gene expression data sample, the vector dimension being d. X has n samples in total, and n is the sum of the number of samples in the training set and the test set. Each vector needs to be normalized.
Secondly, decomposing the given data matrix X by using a low-rank expression method, wherein the obtained low-rank matrix Z and the obtained sparse matrix E need to meet the following conditions:
s.t.,X=XZ+E
wherein Z is [ Z ]1,z2,...,zn]∈Rn×n,||Z||*=∑iσi(Z) is the nuclear norm, σ, of Zi(Z) is the ith singular value of Z;l of finger E2,1A norm; λ is the equilibrium parameter. In this example, a better result is obtained by selecting λ 2. An Alternating direction multiplier algorithm (Alternating directive methods of Multipliers) may be used to solve the above equation.
FIG. 2 is a schematic representation of a cancer gene expression data set comprising a total of 14 cancer types (BR: breast cancer, PR: prostate cancer, LU: lung cancer, CO: colorectal adenocarcinoma, LY: lymphoma, BL: transitional cell carcinoma of the bladder, ML: melanoma, UT: endometrial adenocarcinoma, LE: leukemia, RE: renal cell carcinoma, PA: pancreatic cancer, OV: ovarian adenocarcinoma, MS: pleural mesothelioma, CNS: central nervous system cancer), 198 samples, each having a total of 11370 characteristic components. It can be seen that there is no significant feature in the data distribution in (a); (b) the characteristics of medium data distribution are obvious, cancer samples of the same category fall into the same subspace, and classification based on the matrix is obviously better than that based on the original matrix; (c) the data distribution also has certain characteristics, due to sparsity, non-zero values in the matrix are few, and characteristic components with more non-zero values can cause larger influence on the final classification result. The classification based on the sparse matrix provides a different visual angle, and plays an auxiliary role in the final classification result.
(2) Training and prediction phase
Different from the characteristic of separation of training and prediction in the traditional supervision method, the method is a semi-supervised clustering method, and the training and prediction of the model are carried out simultaneously. As shown in fig. 1, the training and prediction are iterated through three stages.
Firstly, respectively solving initial points of the matrix Z and the matrix E. Initial point coordinates for each category i in ZThe calculation method is as follows:
wherein,is ZlThe number of points in the ith cluster in the cluster,is ZlJ sample of the ith cluster, ZlIs a matrix composed of labeled samples in Z; the same method can be used to obtain the initial point coordinates of each category i in E
Second step, based on the initial point obtained in the first stepAndan unsupervised clustering method is used for the low rank matrix Z and the sparse matrix E. In this example, a standard K-means algorithm is selected as the unsupervised clustering method, and the distance metric selects the Minkowski distanceTo achieve better results. The prediction results based on the low-rank matrix and the sparse matrix are respectively expressed as l by using vectorsZAnd lE
Thirdly, comparing the two predicted results lZAnd lESelecting a proper unlabeled sample as a labeled sample and adding the labeled sample into the next iteration; or ending the iteration and outputting the result.
The manner of selecting the appropriate sample is as follows:
(1) an unlabeled sample is selected as the labeled sample to be added to the next iteration if and only if the following holds:
wherein,is the clustering result lZThe prediction result of the ith sample in the prediction table,is the clustering result lEThe prediction result of the ith sample;
(2) a set S is defined, initialized to an empty set, and all samples meeting the above criteria are placed in it.
The specific criteria for judging whether the algorithm is finished are as follows:
(1) if the iteration times reach the maximum times, terminating the algorithm; otherwise, removing the test set from the sample with the same prediction, adding the sample into the training set, and entering the next iteration;
(2) if S is an empty set, terminating the algorithm; otherwise, removing the test set from the sample with the same prediction and adding the sample into the training set, and entering the next iteration.
If the algorithm is not terminated, the training set and the test set are updated as follows: if sample i is in S, it will beFrom ZuIs removed and Z is addedlPerforming the following steps; will be provided withFrom EuRemoving and adding ElIn (1). Wherein Z islIs a matrix of labeled samples in Z, ZuA matrix composed of unlabeled samples in Z; elIs a matrix of labeled samples in E, EuThe matrix is composed of unlabeled samples in E.
After the algorithm is terminated, return toZAnd outputting the result as a prediction result.

Claims (9)

1. A gene expression data cancer classification method combining self-learning and low-rank representation is characterized by comprising the following steps:
step 1, for a given cancer gene expression data set, a set with label data is a training set, and a non-label data set is a testing set; merging the data to construct a data matrix X, and carrying out normalization processing;
step 2, decomposing the obtained data matrix by using a low-rank expression method to obtain a low-rank matrix Z and a sparse matrix E;
step (ii) of3. Respectively calculating initial point coordinates p of each category i on a low-rank matrix Z and a sparse matrix E by using label information of a training set(i)
Step 4, respectively using an unsupervised clustering method on the low-rank matrix Z and the sparse matrix E to respectively obtain prediction results l based on the low-rank matrix Z and the sparse matrix EZAnd lE
Step 5, comparing the two prediction results lZAnd lEIf the same sample is not predicted or the maximum iteration number is reached, outputting a prediction result l based on the low-rank expression matrixZ(ii) a Otherwise, removing the test set from the samples with the same prediction and adding the samples into the training set, and returning to the step 3.
2. The method for cancer classification based on gene expression data combining self-learning and low-rank representation according to claim 1, wherein: the given cancer gene expression dataset of step 1 comprises tagged data and untagged data, wherein the tag is a cancer class.
3. The method for classifying cancer by gene expression data combining self-learning and low rank expression according to claim 1, wherein the data matrix X obtained in step 1 satisfies the following requirements:
X=[x1,x2,…,xn]∈Rd×n
wherein x isiThe method comprises the following steps of (1) obtaining a column vector of a gene expression data sample, wherein the vector dimension is d, n samples are shared in X, and n is the sum of the numbers of the samples in a training set and a test set; each vector needs to be normalized.
4. The method for cancer classification based on gene expression data combining self-learning and low-rank representation according to claim 1, wherein: in the step 2, for a given data matrix X, decomposition is performed by using a low-rank expression method, and the obtained low-rank matrix Z and sparse matrix E need to satisfy the following conditions:
<mrow> <munder> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> <mrow> <mi>Z</mi> <mo>,</mo> <mi>E</mi> </mrow> </munder> <mo>|</mo> <mo>|</mo> <mi>Z</mi> <mo>|</mo> <msub> <mo>|</mo> <mo>*</mo> </msub> <mo>+</mo> <mi>&amp;lambda;</mi> <mo>|</mo> <mo>|</mo> <mi>E</mi> <mo>|</mo> <msub> <mo>|</mo> <mrow> <mn>2</mn> <mo>,</mo> <mn>1</mn> </mrow> </msub> </mrow>
s.t.,X=XZ+E
wherein | Z | Y calculation*=∑iσi(Z) is the nuclear norm, σ, of Zi(Z) is the ith singular value of Z;l of finger E2,1A norm; λ is the equilibrium parameter.
5. The method for cancer classification based on gene expression data combining self-learning and low-rank representation according to claim 1, wherein: in step 3, the initial point coordinates of each category i in ZThe calculation method is as follows:
<mrow> <msubsup> <mi>p</mi> <mi>z</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msubsup> <mo>=</mo> <mfrac> <mrow> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <msubsup> <mi>n</mi> <mi>l</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msubsup> </munderover> <msubsup> <mi>z</mi> <mrow> <mi>l</mi> <mo>,</mo> <mi>j</mi> </mrow> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msubsup> </mrow> <msubsup> <mi>n</mi> <mi>l</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msubsup> </mfrac> </mrow>
wherein,is ZlThe number of points in the ith cluster in the cluster,is ZlJ sample of the ith cluster, ZlIs a matrix composed of labeled samples in Z; e initial point coordinates for each category iThe calculation method is as follows: :
<mrow> <msubsup> <mi>p</mi> <mi>E</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msubsup> <mo>=</mo> <mfrac> <mrow> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <msubsup> <mi>n</mi> <mi>l</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msubsup> </munderover> <msubsup> <mi>e</mi> <mrow> <mi>l</mi> <mo>,</mo> <mi>j</mi> </mrow> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msubsup> </mrow> <msubsup> <mi>n</mi> <mi>l</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msubsup> </mfrac> </mrow>
wherein,is ElJ sample of the ith cluster, ElAnd E is a matrix formed by labeled samples.
6. The method for cancer classification based on gene expression data combining self-learning and low-rank representation according to claim 1, wherein: in step 4, an unsupervised clustering method is used for the low-rank matrix Z and the sparse matrix E, the method needs to determine an initial clustering center and select a distance metric to measure the similarity of two samples, and in this step, the initial points are respectivelyAndthe distance metric is the Minkowski distance.
7. The method for cancer classification based on gene expression data combining self-learning and low-rank representation according to claim 1, wherein: in said step 5, the two predictions l are comparedZAnd lESelecting an unlabeled sample as an labeled sample and adding the labeled sample into next iteration, wherein the specific implementation mode is as follows:
(1) an unlabeled sample is selected as the labeled sample to be added to the next iteration if and only if the following holds:
<mrow> <msub> <mi>l</mi> <msub> <mi>z</mi> <mi>i</mi> </msub> </msub> <mo>=</mo> <msub> <mi>l</mi> <msub> <mi>e</mi> <mi>i</mi> </msub> </msub> </mrow>
wherein,is the clustering result lZThe prediction result of the ith sample in the prediction table,is the clustering result lEThe prediction result of the ith sample;
(2) a set S is defined, initialized to an empty set, and all samples meeting the above criteria are placed in it.
8. The method for cancer classification based on gene expression data combining self-learning and low-rank representation according to claim 1, wherein: in step 5, the specific criteria for judging whether the algorithm is finished are as follows:
(1) if the iteration times reach the maximum times, terminating the algorithm; otherwise, removing the test set from the samples with the same prediction, adding the samples into the training set, returning to the step 3, and entering the next iteration;
(2) if S is an empty set, terminating the algorithm; otherwise, removing the test set from the samples with the same prediction, adding the samples into the training set, returning to the step 3, and entering the next iteration;
after the algorithm is terminated, return toZAnd outputting the result as a prediction result.
9. The method for cancer classification by fusing self-learning and low-rank representative gene expression data according to claim 7 or 8, wherein: in the step 5, the specific definitions of removing the test set from the samples with the same prediction and adding the samples into the training set are as follows:
if sample i is in S, it will beFrom ZuIs removed and Z is addedlPerforming the following steps; will be provided withFrom EuRemoving and adding ElPerforming the following steps; wherein Z islIs a matrix of labeled samples in Z, ZuA matrix composed of unlabeled samples in Z; elIs a matrix of labeled samples in E, EuThe matrix is composed of unlabeled samples in E.
CN201611207518.8A 2016-12-23 2016-12-23 Gene expression data cancer classification method combining self-learning and low-rank representation Active CN107526946B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611207518.8A CN107526946B (en) 2016-12-23 2016-12-23 Gene expression data cancer classification method combining self-learning and low-rank representation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611207518.8A CN107526946B (en) 2016-12-23 2016-12-23 Gene expression data cancer classification method combining self-learning and low-rank representation

Publications (2)

Publication Number Publication Date
CN107526946A true CN107526946A (en) 2017-12-29
CN107526946B CN107526946B (en) 2021-07-06

Family

ID=60748589

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611207518.8A Active CN107526946B (en) 2016-12-23 2016-12-23 Gene expression data cancer classification method combining self-learning and low-rank representation

Country Status (1)

Country Link
CN (1) CN107526946B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108169728A (en) * 2018-01-12 2018-06-15 西安电子科技大学 Range extension target detection method based on Minkowski distances
CN109378039A (en) * 2018-08-20 2019-02-22 中国矿业大学 Oncogene based on discrete constraint and the norm that binds expresses spectral-data clustering method
CN109671468A (en) * 2018-12-13 2019-04-23 韶关学院 A kind of feature gene selection and cancer classification method
CN109903166A (en) * 2018-12-25 2019-06-18 阿里巴巴集团控股有限公司 A kind of data Risk Forecast Method, device and equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102722892A (en) * 2012-06-13 2012-10-10 西安电子科技大学 SAR (synthetic aperture radar) image change detection method based on low-rank matrix factorization
CN103400143A (en) * 2013-07-12 2013-11-20 中国科学院自动化研究所 Data subspace clustering method based on multiple view angles
US20140025689A1 (en) * 2012-04-24 2014-01-23 International Business Machines Corporation Determining a similarity between graphs
CN103793600A (en) * 2014-01-16 2014-05-14 西安电子科技大学 Isolated component analysis and linear discriminant analysis combined cancer forecasting method
CN105427296A (en) * 2015-11-11 2016-03-23 北京航空航天大学 Ultrasonic image low-rank analysis based thyroid lesion image identification method
CN106096654A (en) * 2016-06-13 2016-11-09 南京信息工程大学 A kind of cell atypia automatic grading method tactful based on degree of depth study and combination
CN106202968A (en) * 2016-07-28 2016-12-07 北京博源兴康科技有限公司 The data analysing method of cancer and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140025689A1 (en) * 2012-04-24 2014-01-23 International Business Machines Corporation Determining a similarity between graphs
CN102722892A (en) * 2012-06-13 2012-10-10 西安电子科技大学 SAR (synthetic aperture radar) image change detection method based on low-rank matrix factorization
CN103400143A (en) * 2013-07-12 2013-11-20 中国科学院自动化研究所 Data subspace clustering method based on multiple view angles
CN103793600A (en) * 2014-01-16 2014-05-14 西安电子科技大学 Isolated component analysis and linear discriminant analysis combined cancer forecasting method
CN105427296A (en) * 2015-11-11 2016-03-23 北京航空航天大学 Ultrasonic image low-rank analysis based thyroid lesion image identification method
CN106096654A (en) * 2016-06-13 2016-11-09 南京信息工程大学 A kind of cell atypia automatic grading method tactful based on degree of depth study and combination
CN106202968A (en) * 2016-07-28 2016-12-07 北京博源兴康科技有限公司 The data analysing method of cancer and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ANINDYA HALDER ET AL.: ""Semi-supervised fuzzy K-NN for cancer classification from microarray gene expression data"", 《2014 FIRST INTERNATIONAL CONFERENCE ON AUTOMATION, CONTROL, ENERGY AND SYSTEMS (ACES)》 *
刘晋苏: ""微小型无人直升机航拍动态阴影检测研究"", 《中国优秀硕士学位论文全文数据库 工程科技II辑》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108169728A (en) * 2018-01-12 2018-06-15 西安电子科技大学 Range extension target detection method based on Minkowski distances
CN109378039A (en) * 2018-08-20 2019-02-22 中国矿业大学 Oncogene based on discrete constraint and the norm that binds expresses spectral-data clustering method
CN109378039B (en) * 2018-08-20 2022-02-25 中国矿业大学 Tumor gene expression profile data clustering method based on discrete constraint and capping norm
CN109671468A (en) * 2018-12-13 2019-04-23 韶关学院 A kind of feature gene selection and cancer classification method
CN109671468B (en) * 2018-12-13 2023-08-15 韶关学院 Characteristic gene selection and cancer classification method
CN109903166A (en) * 2018-12-25 2019-06-18 阿里巴巴集团控股有限公司 A kind of data Risk Forecast Method, device and equipment
CN109903166B (en) * 2018-12-25 2024-01-30 创新先进技术有限公司 Data risk prediction method, device and equipment

Also Published As

Publication number Publication date
CN107526946B (en) 2021-07-06

Similar Documents

Publication Publication Date Title
EP3478728B1 (en) Method and system for cell annotation with adaptive incremental learning
CN107526946B (en) Gene expression data cancer classification method combining self-learning and low-rank representation
Drab et al. Clustering in analytical chemistry
CN108038352B (en) Method for mining whole genome key genes by combining differential analysis and association rules
Kersten Simultaneous feature selection and Gaussian mixture model estimation for supervised classification problems
Cheng et al. DGCyTOF: Deep learning with graphic cluster visualization to predict cell types of single cell mass cytometry data
Yang et al. Image-based classification of protein subcellular location patterns in human reproductive tissue by ensemble learning global and local features
CN107016416B (en) Data classification prediction method based on neighborhood rough set and PCA fusion
Liu et al. SRAS‐net: Low‐resolution chromosome image classification based on deep learning
CN103440508A (en) Remote sensing image target recognition method based on visual word bag model
CN106485289A (en) A kind of sorting technique of the grade of magnesite ore and equipment
Shim et al. Active cluster annotation for wafer map pattern classification in semiconductor manufacturing
El Malki et al. Machine learning for optimal electrode wettability in lithium ion batteries
Yang et al. Stacking-based and improved convolutional neural network: a new approach in rice leaf disease identification
Tu et al. Robust learning of mislabeled training samples for remote sensing image scene classification
CN111863135B (en) False positive structure variation filtering method, storage medium and computing device
CN117078960A (en) Near infrared spectrum analysis method and system based on image feature extraction
Li et al. SpaDiT: Diffusion Transformer for Spatial Gene Expression Prediction using scRNA-seq
CN117034110A (en) Stem cell exosome detection method based on deep learning
CN114818900A (en) Semi-supervised feature extraction method and user credit risk assessment method
Lobo et al. Bayesian residual analysis for spatially correlated data
Zhang et al. Multi-modal Learning with Missing Modality in Predicting Axillary Lymph Node Metastasis
CN113724060A (en) Credit risk assessment method and system
Ahmed et al. A CNN-based novel approach for the detection of compound Bangla handwritten characters
CN113033170A (en) Table standardization processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant