CN110033824A - A kind of gene expression profile classification method based on shared dictionary learning - Google Patents
A kind of gene expression profile classification method based on shared dictionary learning Download PDFInfo
- Publication number
- CN110033824A CN110033824A CN201910296287.XA CN201910296287A CN110033824A CN 110033824 A CN110033824 A CN 110033824A CN 201910296287 A CN201910296287 A CN 201910296287A CN 110033824 A CN110033824 A CN 110033824A
- Authority
- CN
- China
- Prior art keywords
- dictionary
- matrix
- classification
- projection
- sparse
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
- G16B25/10—Gene or protein expression profiling; Expression-ratio estimation or normalisation
Landscapes
- Health & Medical Sciences (AREA)
- Genetics & Genomics (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Image Analysis (AREA)
Abstract
The invention belongs to gene expression profile classification fields, disclose a kind of gene expression profile classification method based on shared dictionary learning, belong to sparse dictionary and learn excavation and application in biological big data.This method constructs a shared dictionary first, which can obtain the sample of all categories;Then training dictionary also trains projection matrix while training dictionary, and projection matrix can widen the distance between different type sample to the projection of test sample;Finally, rebuilding the distance between coefficient coding vector of test sample by using dictionary determines classification.This method can quickly and efficiently classify to gene expression profile data, this helps to distinguish cancer species and its hypotype, help the pathogenesis from molecular level understanding tumour, and provide the solution of gene level for thoroughly treatment tumour.This method has an ability of shared sample, and when a small amount of sample is able to maintain stable projection ability and compares with general classification method, and classification accuracy has very big promotion.
Description
Technical field:
Classify field the invention belongs to gene expression profile, more particularly to tumor cell line gene expression profile data
Classification method, in particular to a kind of gene expression profile classification method based on shared dictionary learning.
Background technique:
Tumour is a kind of disease for seriously threatening human life health.For a long time, researcher controls seeking always
More the best means of tumour.However tumor type is numerous, even same tumour can also be divided into many different hypotypes, and
The treatment means of different subtype tumour are different.Therefore, that accurately and quickly staging can be played farthest and is controlled
Therapeutic effect extends the life for even saving patient.Carrying out classification to tumour using oncogene express spectra is when former compares
New staging means, this method speed is fast, assorting process automation, a large amount of human and material resources can be saved, at
For the research hotspot in current cancer classification field.But Most current conventional machines learning method divides oncogene express spectra
Class accuracy rate is generally lower, needs to design the classification method being more suitable for.Dictionary learning classification method is that a kind of comparison is suitble to handle
The method of gene expression profile data, but general dictionary learning method only focuses on being promoted trained dictionary to the reconstruct energy of sample
Power, and ignore its distinguishing ability to sample.Meanwhile there are a large amount of redundancy and noises for gene expression profile data, so that general
Logical dictionary learning method is unable to fully obtain required data characteristics using sample data, this will lead to CustomDict
The classifier classification capacity of learning method is weaker, and then influences final classification results, causes classification performance that required standard is not achieved.
Summary of the invention:
The technical problem to be solved in the invention is to give full play in dictionary learning method dictionary to the reconstruct energy of sample
Power obtains the sample of all categories in combination with sample sharing, to improve the mapping and distinguishing ability of feature, to solve
Conventional method not high problem of classification accuracy in tumor cell line gene expression profile classification problem.It is of the invention in order to realize
Purpose is achieved through the following technical solutions:
A kind of gene expression profile classification method based on shared dictionary learning, comprising the following steps:
Step 1: initialization dictionaryWith projection matrix P, include the following steps,
1.1. gene expression profile training sample set Y=[Y is inputted1, Y2..., Yc], wherein, c is total classification number, YcIt is
Classification is the subset of the training set data of c.
1.2. make dictionary with random number sequenceInitialization, dictionaryWherein D0It is shared dictionary, to obtain
The sample of all categories divides dictionary D=[D1, D2..., Dc],DcIt is corresponding training subset YcSub- dictionary.
1.3. the principal component analysis transition matrix initial projection matrix P of training sample set Y is used.
Step 2: calculating and updates sparse coding coefficient matrixInclude the following steps,
2.1. the sparse coding coefficient matrix by dividing coefficient matrix to obtain dictionaryDivide coefficient square
Battle array X=[X1,X2,...Xc],XcIt is the subsystem number that classification is c, XTIt is the transposed matrix of X, X0It is sparse coefficient of the Y on D,
(X0)TIt is X0Transposed matrix.
2.2. minimum target function is obtained by rarefaction representationMinimum target function representation are as follows:
Wherein, c is total classification number,It is the subsystem matrix number for the training sample that classification is c,It is to sentence
Other fidelity term, by allowing dictionaryEvery a kind of training sample after projection is minimized the error and then restored, is enhanced with this similar
The expression ability of sample and the expression ability for weakening inhomogeneity sample, to ensureFarthest restore P;It is dilute
Item is dredged, matrix is adjusted by parameter lambdaSparse degree to keep sparse coefficient matrixSparsity;F (X) is that coefficient is sentenced
Other item, distribution within class is allowed to minimize for the distribution by adjusting X and distribution between class maximizes, to ensure dictionaryTo training sample set
Y has relevant resolving ability.
2.3. minimum target function is fixedMiddle dictionaryWith the value of projection matrix P so that target to be asked becomes
At sparse coding coefficient matrixSubclass code coefficient is calculated using projection iterative methodFinally subclass code coefficientGroup
Synthesize sparse coding coefficient matrix
Step 3: projection matrix P, fixed minimum target function are updatedMiddle dictionaryWith sparse coding coefficient
MatrixValue, projection matrix P is projected directly into training sample set Y.
Step 4: dictionary is updatedFixed minimum target functionMiddle projection matrix P and sparse coding coefficient
MatrixValue, using projection iterative method calculate class small pin for the case dictionaryThen class small pin for the case dictionaryIt is combined into dictionary
Step 5: minimum target function is sought by the way of gradient declineA locally optimal solution, ask
Circulation executes step 3 and step 4 in solution preocess, no longer changes until reconstructed error tends to be steady, obtains finally obtained word
Allusion quotationWith projection matrix P.
Step 6: by the distance between sparse coding vector come discriminating test data category, including the following steps,
6.1. dictionary step 5 obtainedTest data set y is passed to projection matrix P.
6.2. y is projected with projection matrix P, y is allowed to project to the space of a low-dimensional, the sample after being projected
6.3. dictionary is used in lower dimensional spaceIt is rightCarrying out sparse linear indicates to obtain sparse coding vector u.
6.4. the distance between sparse coding vector u is used to come pair as judgment basisCarry out final classification.
Compared with existing invention, the invention discloses a kind of gene expression profile classification sides based on shared dictionary learning
Method, this method while sample re-configurability, also improve the feature extraction of sample in focusing on dictionary using shared performance
And distinguishing ability, can accurately promptly it classify to tumor cell line gene expression profile data.This method can overcome
Some shortcomings in conventional sorting methods and general dictionary learning: sample distinguishing ability is weak, classification performance is poor.
Detailed description of the invention:
Fig. 1 is dictionary training process flow chart;
Fig. 2 is shared dictionary learning method schematic diagram.
Specific embodiment
The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments:
Dictionary training process shown in FIG. 1 includes step 1 to this five steps of step 5, and particular content is as follows:
Step 1: initialization dictionaryWith projection matrix P, include the following steps,
1.1. gene expression profile training sample set Y=[Y is inputted1, Y2..., Yc], wherein, c is total classification number, YcIt is
Classification is the subset of the training set data of c.
1.2. make dictionary with random number sequenceInitialization, dictionaryWherein D0It is shared dictionary, to obtain
The sample of all categories divides dictionary D=[D1, D2..., Dc],DcIt is corresponding training subset YcSub- dictionary.
1.3. the principal component analysis transition matrix initial projection matrix P of training sample set Y is used.
Step 2: calculating and updates sparse coding coefficient matrixInclude the following steps,
2.1. the sparse coding coefficient matrix by dividing coefficient matrix to obtain dictionaryDivide coefficient square
Battle array X=[X1,X2,...Xc],XcIt is the subsystem number that classification is c, XTIt is the transposed matrix of X, X0It is sparse coefficient of the Y on D,
(X0)TIt is X0Transposed matrix.
2.2. minimum target function is obtained by rarefaction representationMinimum target function representation are as follows:
Wherein, c is total classification number,It is the subsystem matrix number for the training sample that classification is c,It is to sentence
Other fidelity term, by allowing dictionaryEvery a kind of training sample after projection is minimized the error and then restored, is enhanced with this similar
The expression ability of sample and the expression ability for weakening inhomogeneity sample, to ensureFarthest restore P;It is dilute
Item is dredged, matrix is adjusted by parameter lambdaSparse degree to keep sparse coefficient matrixSparsity;F (X) is that coefficient is sentenced
Other item, distribution within class is allowed to minimize for the distribution by adjusting X and distribution between class maximizes, to ensure dictionaryTo training sample set
Y has relevant resolving ability, and it is as shown in Figure 2 to share dictionary learning method principle.
2.3. minimum target function is fixedMiddle dictionaryWith the value of projection matrix P so that target to be asked becomes
At sparse coding coefficient matrixSubclass code coefficient is calculated using projection iterative methodFinally subclass code coefficientGroup
Synthesize sparse coding coefficient matrix
Step 3: projection matrix P, fixed minimum target function are updatedMiddle dictionaryWith sparse coding coefficient
MatrixValue, projection matrix P is projected directly into training sample set Y.
Step 4: dictionary is updatedFixed minimum target functionMiddle projection matrix P and sparse coding coefficient
MatrixValue, using projection iterative method calculate class small pin for the case dictionaryThen class small pin for the case dictionaryIt is combined into dictionary
Step 5: minimum target function is sought by the way of gradient declineA locally optimal solution, ask
Circulation executes step 3 and step 4 in solution preocess, no longer changes until reconstructed error tends to be steady, obtains finally obtained word
Allusion quotationWith projection matrix P.
Step 6: by the distance between sparse coding vector come discriminating test data category, including the following steps,
6.1. dictionary step 5 obtainedTest data set y is passed to projection matrix P.
6.2. y is projected with projection matrix P, y is made to project to the space of a low-dimensional, the sample after being projected
6.3. dictionary is used in lower dimensional spaceIt is rightCarrying out sparse linear indicates to obtain sparse coding vector u.
6.4. the distance between sparse coding vector u is used to come pair as judgment basisCarry out final classification.
The above is only a preferred embodiment of the present invention, protection scope of the present invention is not limited merely to above-mentioned implementation
Example, all technical solutions belonged under thinking of the present invention all belong to the scope of protection of the present invention.It should be pointed out that for the art
Those of ordinary skill for, several improvements and modifications without departing from the principles of the present invention, these improvements and modifications
It should be regarded as protection scope of the present invention.
Claims (1)
1. a kind of gene expression profile classification method based on shared dictionary learning, which comprises the following steps:
Step 1: initialization dictionaryWith projection matrix P, include the following steps,
1.1. gene expression profile training sample set Y=[Y is inputted1, Y2..., Yc], wherein, c is total classification number, YcIt is that classification is
The subset of the training set data of c;
1.2. make dictionary with random number sequenceInitialization, dictionaryWherein D0It is shared dictionary, it is all to obtain
The sample of classification divides dictionary D=[D1, D2..., Dc],DcIt is corresponding training subset YcSub- dictionary;
1.3. the principal component analysis transition matrix initial projection matrix P of training sample set Y is used;
Step 2: calculating and updates sparse coding coefficient matrixInclude the following steps,
2.1. the sparse coding coefficient matrix by dividing coefficient matrix to obtain dictionaryDivide coefficient matrix X=
[X1,X2,...Xc],XcIt is the subsystem number that classification is c, XTIt is the transposed matrix of X, X0It is sparse coefficient of the Y on D, (X0)TIt is X0
Transposed matrix;
2.2. minimum target function is obtained by rarefaction representationMinimum target function representation are as follows:
Wherein, c is total classification number,It is the subsystem matrix number for the training sample that classification is c,It is to differentiate to protect
True item, by allowing dictionaryEvery a kind of training sample after projection is minimized the error and then restored, similar sample is enhanced with this
Expression ability and also weaken inhomogeneity sample expression ability, to ensureFarthest restore P;It is sparse
, matrix is adjusted by parameter lambdaSparse degree to keep sparse coefficient matrixSparsity;F (X) is that coefficient differentiates
, distribution within class is allowed to minimize for the distribution by adjusting X and distribution between class maximizes, to ensure dictionaryTo training sample set Y
Has relevant resolving ability;
2.3. minimum target function is fixedMiddle dictionaryWith the value of projection matrix P so that target to be asked becomes sparse
Code coefficient matrixSubclass code coefficient is calculated using projection iterative methodFinally subclass code coefficientIt is combined into dilute
Dredge code coefficient matrix
Step 3: projection matrix P, fixed minimum target function are updatedMiddle dictionaryWith sparse coding coefficient matrixValue, projection matrix P is projected directly into training sample set Y;
Step 4: dictionary is updatedFixed minimum target functionMiddle projection matrix P and sparse coding coefficient matrixValue, using projection iterative method calculate class small pin for the case dictionaryThen class small pin for the case dictionaryIt is combined into dictionary
Step 5: minimum target function is sought by the way of gradient declineA locally optimal solution, solution procedure
Middle circulation executes step 3 and step 4, no longer changes until reconstructed error tends to be steady, obtains finally obtained dictionaryWith
Projection matrix P;
Step 6: by the distance between sparse coding vector come discriminating test data category, including the following steps,
6.1. dictionary step 5 obtainedTest data set y is passed to projection matrix P;
6.2. y is projected with projection matrix P, y is allowed to project to the space of a low-dimensional, the sample after being projected
6.3. dictionary is used in lower dimensional spaceIt is rightCarrying out sparse linear indicates to obtain sparse coding vector u;
6.4. the distance between sparse coding vector u is used to come pair as judgment basisCarry out final classification.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910296287.XA CN110033824A (en) | 2019-04-13 | 2019-04-13 | A kind of gene expression profile classification method based on shared dictionary learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910296287.XA CN110033824A (en) | 2019-04-13 | 2019-04-13 | A kind of gene expression profile classification method based on shared dictionary learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110033824A true CN110033824A (en) | 2019-07-19 |
Family
ID=67238193
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910296287.XA Pending CN110033824A (en) | 2019-04-13 | 2019-04-13 | A kind of gene expression profile classification method based on shared dictionary learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110033824A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115896997A (en) * | 2022-11-14 | 2023-04-04 | 东华大学 | Online big data dictionary generation method based on fiber flow multi-branch index |
CN116052769A (en) * | 2023-02-15 | 2023-05-02 | 哈尔滨工业大学 | Cell gene expression quantity reproduction method and system based on sparse coding |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105844223A (en) * | 2016-03-18 | 2016-08-10 | 常州大学 | Face expression algorithm combining class characteristic dictionary learning and shared dictionary learning |
US20180165809A1 (en) * | 2016-12-02 | 2018-06-14 | Panagiotis Stanitsas | Computer vision for cancerous tissue recognition |
CN108573263A (en) * | 2018-05-10 | 2018-09-25 | 西安理工大学 | A kind of dictionary learning method of co-ordinative construction rarefaction representation and low-dimensional insertion |
CN108985177A (en) * | 2018-06-21 | 2018-12-11 | 南京师范大学 | A kind of facial image classification method of the quick low-rank dictionary learning of combination sparse constraint |
-
2019
- 2019-04-13 CN CN201910296287.XA patent/CN110033824A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105844223A (en) * | 2016-03-18 | 2016-08-10 | 常州大学 | Face expression algorithm combining class characteristic dictionary learning and shared dictionary learning |
US20180165809A1 (en) * | 2016-12-02 | 2018-06-14 | Panagiotis Stanitsas | Computer vision for cancerous tissue recognition |
CN108573263A (en) * | 2018-05-10 | 2018-09-25 | 西安理工大学 | A kind of dictionary learning method of co-ordinative construction rarefaction representation and low-dimensional insertion |
CN108985177A (en) * | 2018-06-21 | 2018-12-11 | 南京师范大学 | A kind of facial image classification method of the quick low-rank dictionary learning of combination sparse constraint |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115896997A (en) * | 2022-11-14 | 2023-04-04 | 东华大学 | Online big data dictionary generation method based on fiber flow multi-branch index |
CN116052769A (en) * | 2023-02-15 | 2023-05-02 | 哈尔滨工业大学 | Cell gene expression quantity reproduction method and system based on sparse coding |
CN116052769B (en) * | 2023-02-15 | 2024-06-25 | 哈尔滨工业大学 | Cell gene expression quantity reproduction method and system based on sparse coding |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102930301A (en) | Image classification method based on characteristic weight learning and nuclear sparse representation | |
CN104346459B (en) | A kind of text classification feature selection approach based on term frequency and chi | |
CN115240772B (en) | Method for analyzing single cell pathway activity based on graph neural network | |
Wang et al. | Human peripheral blood leukocyte classification method based on convolutional neural network and data augmentation | |
CN110033824A (en) | A kind of gene expression profile classification method based on shared dictionary learning | |
CN108171012A (en) | A kind of gene sorting method and device | |
Wang et al. | Random forest and Bayesian prediction for Hepatitis B virus reactivation | |
CN103258147A (en) | Parallel evolution super-network DNA micro array gene data sorting system and method based on GPU | |
CN103593674A (en) | Cervical lymph node ultrasonoscopy feature selection method | |
Petukhov et al. | Case-control analysis of single-cell RNA-seq studies | |
Chen et al. | Multi-granularity regularized re-balancing for class incremental learning | |
Wei et al. | Secuer: Ultrafast, scalable and accurate clustering of single-cell RNA-seq data | |
Hassanzadeh et al. | EEvoU-Net: An ensemble of evolutionary deep fully convolutional neural networks for medical image segmentation | |
Huang | Feature Selection and Feature Stability Measurement Method for High‐Dimensional Small Sample Data Based on Big Data Technology | |
Ouyang et al. | Predicting multiple types of associations between miRNAs and diseases based on graph regularized weighted tensor decomposition | |
CN108388918A (en) | Data characteristics selection method with structure retention performance | |
Yang et al. | Random subspace aggregation for cancer prediction with gene expression profiles | |
CN106021225B (en) | A kind of Chinese Maximal noun phrase recognition methods based on the simple noun phrase of Chinese | |
CN104268564B (en) | It is a kind of based on the sparse Gene Expression Data Analysis method for blocking power | |
Xu et al. | Gene mutation classification using CNN and BiGRU network | |
Vidyasagar | Probabilistic methods in cancer biology | |
Kulan et al. | In silico identification of critical proteins associated with learning process and immune system for Down syndrome | |
Xu et al. | CGUFS: A clustering-guided unsupervised feature selection algorithm for gene expression data | |
CN105975983A (en) | Data classification method based on sparse Fisher vector coding | |
Li et al. | An ensemble classification algorithm of micro-blog sentiment based on feature selection and differential evolution |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190719 |
|
RJ01 | Rejection of invention patent application after publication |