CN110033824A - A kind of gene expression profile classification method based on shared dictionary learning - Google Patents

A kind of gene expression profile classification method based on shared dictionary learning Download PDF

Info

Publication number
CN110033824A
CN110033824A CN201910296287.XA CN201910296287A CN110033824A CN 110033824 A CN110033824 A CN 110033824A CN 201910296287 A CN201910296287 A CN 201910296287A CN 110033824 A CN110033824 A CN 110033824A
Authority
CN
China
Prior art keywords
dictionary
matrix
classification
projection
sparse
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910296287.XA
Other languages
Chinese (zh)
Inventor
彭绍亮
刘伟
李非
杨亚宁
李肯立
潘佳铭
骆嘉伟
刘云浩
田李
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Original Assignee
Hunan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University filed Critical Hunan University
Priority to CN201910296287.XA priority Critical patent/CN110033824A/en
Publication of CN110033824A publication Critical patent/CN110033824A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation

Landscapes

  • Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to gene expression profile classification fields, disclose a kind of gene expression profile classification method based on shared dictionary learning, belong to sparse dictionary and learn excavation and application in biological big data.This method constructs a shared dictionary first, which can obtain the sample of all categories;Then training dictionary also trains projection matrix while training dictionary, and projection matrix can widen the distance between different type sample to the projection of test sample;Finally, rebuilding the distance between coefficient coding vector of test sample by using dictionary determines classification.This method can quickly and efficiently classify to gene expression profile data, this helps to distinguish cancer species and its hypotype, help the pathogenesis from molecular level understanding tumour, and provide the solution of gene level for thoroughly treatment tumour.This method has an ability of shared sample, and when a small amount of sample is able to maintain stable projection ability and compares with general classification method, and classification accuracy has very big promotion.

Description

A kind of gene expression profile classification method based on shared dictionary learning
Technical field:
Classify field the invention belongs to gene expression profile, more particularly to tumor cell line gene expression profile data Classification method, in particular to a kind of gene expression profile classification method based on shared dictionary learning.
Background technique:
Tumour is a kind of disease for seriously threatening human life health.For a long time, researcher controls seeking always More the best means of tumour.However tumor type is numerous, even same tumour can also be divided into many different hypotypes, and The treatment means of different subtype tumour are different.Therefore, that accurately and quickly staging can be played farthest and is controlled Therapeutic effect extends the life for even saving patient.Carrying out classification to tumour using oncogene express spectra is when former compares New staging means, this method speed is fast, assorting process automation, a large amount of human and material resources can be saved, at For the research hotspot in current cancer classification field.But Most current conventional machines learning method divides oncogene express spectra Class accuracy rate is generally lower, needs to design the classification method being more suitable for.Dictionary learning classification method is that a kind of comparison is suitble to handle The method of gene expression profile data, but general dictionary learning method only focuses on being promoted trained dictionary to the reconstruct energy of sample Power, and ignore its distinguishing ability to sample.Meanwhile there are a large amount of redundancy and noises for gene expression profile data, so that general Logical dictionary learning method is unable to fully obtain required data characteristics using sample data, this will lead to CustomDict The classifier classification capacity of learning method is weaker, and then influences final classification results, causes classification performance that required standard is not achieved.
Summary of the invention:
The technical problem to be solved in the invention is to give full play in dictionary learning method dictionary to the reconstruct energy of sample Power obtains the sample of all categories in combination with sample sharing, to improve the mapping and distinguishing ability of feature, to solve Conventional method not high problem of classification accuracy in tumor cell line gene expression profile classification problem.It is of the invention in order to realize Purpose is achieved through the following technical solutions:
A kind of gene expression profile classification method based on shared dictionary learning, comprising the following steps:
Step 1: initialization dictionaryWith projection matrix P, include the following steps,
1.1. gene expression profile training sample set Y=[Y is inputted1, Y2..., Yc], wherein, c is total classification number, YcIt is Classification is the subset of the training set data of c.
1.2. make dictionary with random number sequenceInitialization, dictionaryWherein D0It is shared dictionary, to obtain The sample of all categories divides dictionary D=[D1, D2..., Dc],DcIt is corresponding training subset YcSub- dictionary.
1.3. the principal component analysis transition matrix initial projection matrix P of training sample set Y is used.
Step 2: calculating and updates sparse coding coefficient matrixInclude the following steps,
2.1. the sparse coding coefficient matrix by dividing coefficient matrix to obtain dictionaryDivide coefficient square Battle array X=[X1,X2,...Xc],XcIt is the subsystem number that classification is c, XTIt is the transposed matrix of X, X0It is sparse coefficient of the Y on D, (X0)TIt is X0Transposed matrix.
2.2. minimum target function is obtained by rarefaction representationMinimum target function representation are as follows:
Wherein, c is total classification number,It is the subsystem matrix number for the training sample that classification is c,It is to sentence Other fidelity term, by allowing dictionaryEvery a kind of training sample after projection is minimized the error and then restored, is enhanced with this similar The expression ability of sample and the expression ability for weakening inhomogeneity sample, to ensureFarthest restore P;It is dilute Item is dredged, matrix is adjusted by parameter lambdaSparse degree to keep sparse coefficient matrixSparsity;F (X) is that coefficient is sentenced Other item, distribution within class is allowed to minimize for the distribution by adjusting X and distribution between class maximizes, to ensure dictionaryTo training sample set Y has relevant resolving ability.
2.3. minimum target function is fixedMiddle dictionaryWith the value of projection matrix P so that target to be asked becomes At sparse coding coefficient matrixSubclass code coefficient is calculated using projection iterative methodFinally subclass code coefficientGroup Synthesize sparse coding coefficient matrix
Step 3: projection matrix P, fixed minimum target function are updatedMiddle dictionaryWith sparse coding coefficient MatrixValue, projection matrix P is projected directly into training sample set Y.
Step 4: dictionary is updatedFixed minimum target functionMiddle projection matrix P and sparse coding coefficient MatrixValue, using projection iterative method calculate class small pin for the case dictionaryThen class small pin for the case dictionaryIt is combined into dictionary
Step 5: minimum target function is sought by the way of gradient declineA locally optimal solution, ask Circulation executes step 3 and step 4 in solution preocess, no longer changes until reconstructed error tends to be steady, obtains finally obtained word Allusion quotationWith projection matrix P.
Step 6: by the distance between sparse coding vector come discriminating test data category, including the following steps,
6.1. dictionary step 5 obtainedTest data set y is passed to projection matrix P.
6.2. y is projected with projection matrix P, y is allowed to project to the space of a low-dimensional, the sample after being projected
6.3. dictionary is used in lower dimensional spaceIt is rightCarrying out sparse linear indicates to obtain sparse coding vector u.
6.4. the distance between sparse coding vector u is used to come pair as judgment basisCarry out final classification.
Compared with existing invention, the invention discloses a kind of gene expression profile classification sides based on shared dictionary learning Method, this method while sample re-configurability, also improve the feature extraction of sample in focusing on dictionary using shared performance And distinguishing ability, can accurately promptly it classify to tumor cell line gene expression profile data.This method can overcome Some shortcomings in conventional sorting methods and general dictionary learning: sample distinguishing ability is weak, classification performance is poor.
Detailed description of the invention:
Fig. 1 is dictionary training process flow chart;
Fig. 2 is shared dictionary learning method schematic diagram.
Specific embodiment
The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments:
Dictionary training process shown in FIG. 1 includes step 1 to this five steps of step 5, and particular content is as follows:
Step 1: initialization dictionaryWith projection matrix P, include the following steps,
1.1. gene expression profile training sample set Y=[Y is inputted1, Y2..., Yc], wherein, c is total classification number, YcIt is Classification is the subset of the training set data of c.
1.2. make dictionary with random number sequenceInitialization, dictionaryWherein D0It is shared dictionary, to obtain The sample of all categories divides dictionary D=[D1, D2..., Dc],DcIt is corresponding training subset YcSub- dictionary.
1.3. the principal component analysis transition matrix initial projection matrix P of training sample set Y is used.
Step 2: calculating and updates sparse coding coefficient matrixInclude the following steps,
2.1. the sparse coding coefficient matrix by dividing coefficient matrix to obtain dictionaryDivide coefficient square Battle array X=[X1,X2,...Xc],XcIt is the subsystem number that classification is c, XTIt is the transposed matrix of X, X0It is sparse coefficient of the Y on D, (X0)TIt is X0Transposed matrix.
2.2. minimum target function is obtained by rarefaction representationMinimum target function representation are as follows:
Wherein, c is total classification number,It is the subsystem matrix number for the training sample that classification is c,It is to sentence Other fidelity term, by allowing dictionaryEvery a kind of training sample after projection is minimized the error and then restored, is enhanced with this similar The expression ability of sample and the expression ability for weakening inhomogeneity sample, to ensureFarthest restore P;It is dilute Item is dredged, matrix is adjusted by parameter lambdaSparse degree to keep sparse coefficient matrixSparsity;F (X) is that coefficient is sentenced Other item, distribution within class is allowed to minimize for the distribution by adjusting X and distribution between class maximizes, to ensure dictionaryTo training sample set Y has relevant resolving ability, and it is as shown in Figure 2 to share dictionary learning method principle.
2.3. minimum target function is fixedMiddle dictionaryWith the value of projection matrix P so that target to be asked becomes At sparse coding coefficient matrixSubclass code coefficient is calculated using projection iterative methodFinally subclass code coefficientGroup Synthesize sparse coding coefficient matrix
Step 3: projection matrix P, fixed minimum target function are updatedMiddle dictionaryWith sparse coding coefficient MatrixValue, projection matrix P is projected directly into training sample set Y.
Step 4: dictionary is updatedFixed minimum target functionMiddle projection matrix P and sparse coding coefficient MatrixValue, using projection iterative method calculate class small pin for the case dictionaryThen class small pin for the case dictionaryIt is combined into dictionary
Step 5: minimum target function is sought by the way of gradient declineA locally optimal solution, ask Circulation executes step 3 and step 4 in solution preocess, no longer changes until reconstructed error tends to be steady, obtains finally obtained word Allusion quotationWith projection matrix P.
Step 6: by the distance between sparse coding vector come discriminating test data category, including the following steps,
6.1. dictionary step 5 obtainedTest data set y is passed to projection matrix P.
6.2. y is projected with projection matrix P, y is made to project to the space of a low-dimensional, the sample after being projected
6.3. dictionary is used in lower dimensional spaceIt is rightCarrying out sparse linear indicates to obtain sparse coding vector u.
6.4. the distance between sparse coding vector u is used to come pair as judgment basisCarry out final classification.
The above is only a preferred embodiment of the present invention, protection scope of the present invention is not limited merely to above-mentioned implementation Example, all technical solutions belonged under thinking of the present invention all belong to the scope of protection of the present invention.It should be pointed out that for the art Those of ordinary skill for, several improvements and modifications without departing from the principles of the present invention, these improvements and modifications It should be regarded as protection scope of the present invention.

Claims (1)

1. a kind of gene expression profile classification method based on shared dictionary learning, which comprises the following steps:
Step 1: initialization dictionaryWith projection matrix P, include the following steps,
1.1. gene expression profile training sample set Y=[Y is inputted1, Y2..., Yc], wherein, c is total classification number, YcIt is that classification is The subset of the training set data of c;
1.2. make dictionary with random number sequenceInitialization, dictionaryWherein D0It is shared dictionary, it is all to obtain The sample of classification divides dictionary D=[D1, D2..., Dc],DcIt is corresponding training subset YcSub- dictionary;
1.3. the principal component analysis transition matrix initial projection matrix P of training sample set Y is used;
Step 2: calculating and updates sparse coding coefficient matrixInclude the following steps,
2.1. the sparse coding coefficient matrix by dividing coefficient matrix to obtain dictionaryDivide coefficient matrix X= [X1,X2,...Xc],XcIt is the subsystem number that classification is c, XTIt is the transposed matrix of X, X0It is sparse coefficient of the Y on D, (X0)TIt is X0 Transposed matrix;
2.2. minimum target function is obtained by rarefaction representationMinimum target function representation are as follows:
Wherein, c is total classification number,It is the subsystem matrix number for the training sample that classification is c,It is to differentiate to protect True item, by allowing dictionaryEvery a kind of training sample after projection is minimized the error and then restored, similar sample is enhanced with this Expression ability and also weaken inhomogeneity sample expression ability, to ensureFarthest restore P;It is sparse , matrix is adjusted by parameter lambdaSparse degree to keep sparse coefficient matrixSparsity;F (X) is that coefficient differentiates , distribution within class is allowed to minimize for the distribution by adjusting X and distribution between class maximizes, to ensure dictionaryTo training sample set Y Has relevant resolving ability;
2.3. minimum target function is fixedMiddle dictionaryWith the value of projection matrix P so that target to be asked becomes sparse Code coefficient matrixSubclass code coefficient is calculated using projection iterative methodFinally subclass code coefficientIt is combined into dilute Dredge code coefficient matrix
Step 3: projection matrix P, fixed minimum target function are updatedMiddle dictionaryWith sparse coding coefficient matrixValue, projection matrix P is projected directly into training sample set Y;
Step 4: dictionary is updatedFixed minimum target functionMiddle projection matrix P and sparse coding coefficient matrixValue, using projection iterative method calculate class small pin for the case dictionaryThen class small pin for the case dictionaryIt is combined into dictionary
Step 5: minimum target function is sought by the way of gradient declineA locally optimal solution, solution procedure Middle circulation executes step 3 and step 4, no longer changes until reconstructed error tends to be steady, obtains finally obtained dictionaryWith Projection matrix P;
Step 6: by the distance between sparse coding vector come discriminating test data category, including the following steps,
6.1. dictionary step 5 obtainedTest data set y is passed to projection matrix P;
6.2. y is projected with projection matrix P, y is allowed to project to the space of a low-dimensional, the sample after being projected
6.3. dictionary is used in lower dimensional spaceIt is rightCarrying out sparse linear indicates to obtain sparse coding vector u;
6.4. the distance between sparse coding vector u is used to come pair as judgment basisCarry out final classification.
CN201910296287.XA 2019-04-13 2019-04-13 A kind of gene expression profile classification method based on shared dictionary learning Pending CN110033824A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910296287.XA CN110033824A (en) 2019-04-13 2019-04-13 A kind of gene expression profile classification method based on shared dictionary learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910296287.XA CN110033824A (en) 2019-04-13 2019-04-13 A kind of gene expression profile classification method based on shared dictionary learning

Publications (1)

Publication Number Publication Date
CN110033824A true CN110033824A (en) 2019-07-19

Family

ID=67238193

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910296287.XA Pending CN110033824A (en) 2019-04-13 2019-04-13 A kind of gene expression profile classification method based on shared dictionary learning

Country Status (1)

Country Link
CN (1) CN110033824A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115896997A (en) * 2022-11-14 2023-04-04 东华大学 Online big data dictionary generation method based on fiber flow multi-branch index
CN116052769A (en) * 2023-02-15 2023-05-02 哈尔滨工业大学 Cell gene expression quantity reproduction method and system based on sparse coding

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105844223A (en) * 2016-03-18 2016-08-10 常州大学 Face expression algorithm combining class characteristic dictionary learning and shared dictionary learning
US20180165809A1 (en) * 2016-12-02 2018-06-14 Panagiotis Stanitsas Computer vision for cancerous tissue recognition
CN108573263A (en) * 2018-05-10 2018-09-25 西安理工大学 A kind of dictionary learning method of co-ordinative construction rarefaction representation and low-dimensional insertion
CN108985177A (en) * 2018-06-21 2018-12-11 南京师范大学 A kind of facial image classification method of the quick low-rank dictionary learning of combination sparse constraint

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105844223A (en) * 2016-03-18 2016-08-10 常州大学 Face expression algorithm combining class characteristic dictionary learning and shared dictionary learning
US20180165809A1 (en) * 2016-12-02 2018-06-14 Panagiotis Stanitsas Computer vision for cancerous tissue recognition
CN108573263A (en) * 2018-05-10 2018-09-25 西安理工大学 A kind of dictionary learning method of co-ordinative construction rarefaction representation and low-dimensional insertion
CN108985177A (en) * 2018-06-21 2018-12-11 南京师范大学 A kind of facial image classification method of the quick low-rank dictionary learning of combination sparse constraint

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115896997A (en) * 2022-11-14 2023-04-04 东华大学 Online big data dictionary generation method based on fiber flow multi-branch index
CN116052769A (en) * 2023-02-15 2023-05-02 哈尔滨工业大学 Cell gene expression quantity reproduction method and system based on sparse coding
CN116052769B (en) * 2023-02-15 2024-06-25 哈尔滨工业大学 Cell gene expression quantity reproduction method and system based on sparse coding

Similar Documents

Publication Publication Date Title
CN102930301A (en) Image classification method based on characteristic weight learning and nuclear sparse representation
CN104346459B (en) A kind of text classification feature selection approach based on term frequency and chi
CN115240772B (en) Method for analyzing single cell pathway activity based on graph neural network
Wang et al. Human peripheral blood leukocyte classification method based on convolutional neural network and data augmentation
CN110033824A (en) A kind of gene expression profile classification method based on shared dictionary learning
CN108171012A (en) A kind of gene sorting method and device
Wang et al. Random forest and Bayesian prediction for Hepatitis B virus reactivation
CN103258147A (en) Parallel evolution super-network DNA micro array gene data sorting system and method based on GPU
CN103593674A (en) Cervical lymph node ultrasonoscopy feature selection method
Petukhov et al. Case-control analysis of single-cell RNA-seq studies
Chen et al. Multi-granularity regularized re-balancing for class incremental learning
Wei et al. Secuer: Ultrafast, scalable and accurate clustering of single-cell RNA-seq data
Hassanzadeh et al. EEvoU-Net: An ensemble of evolutionary deep fully convolutional neural networks for medical image segmentation
Huang Feature Selection and Feature Stability Measurement Method for High‐Dimensional Small Sample Data Based on Big Data Technology
Ouyang et al. Predicting multiple types of associations between miRNAs and diseases based on graph regularized weighted tensor decomposition
CN108388918A (en) Data characteristics selection method with structure retention performance
Yang et al. Random subspace aggregation for cancer prediction with gene expression profiles
CN106021225B (en) A kind of Chinese Maximal noun phrase recognition methods based on the simple noun phrase of Chinese
CN104268564B (en) It is a kind of based on the sparse Gene Expression Data Analysis method for blocking power
Xu et al. Gene mutation classification using CNN and BiGRU network
Vidyasagar Probabilistic methods in cancer biology
Kulan et al. In silico identification of critical proteins associated with learning process and immune system for Down syndrome
Xu et al. CGUFS: A clustering-guided unsupervised feature selection algorithm for gene expression data
CN105975983A (en) Data classification method based on sparse Fisher vector coding
Li et al. An ensemble classification algorithm of micro-blog sentiment based on feature selection and differential evolution

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190719

RJ01 Rejection of invention patent application after publication