CN108763873A - A kind of gene sorting method and relevant device - Google Patents
A kind of gene sorting method and relevant device Download PDFInfo
- Publication number
- CN108763873A CN108763873A CN201810522807.XA CN201810522807A CN108763873A CN 108763873 A CN108763873 A CN 108763873A CN 201810522807 A CN201810522807 A CN 201810522807A CN 108763873 A CN108763873 A CN 108763873A
- Authority
- CN
- China
- Prior art keywords
- sample
- gene
- training
- training sample
- attribute
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a kind of gene sorting method and relevant device, method includes:The gene data as training sample of input is standardized, gene data includes several attributes;With the difference at the interval of training sample and foreign peoples's neighbour's sample and training sample and the interval of similar neighbour's sample, and the weight vectors of corresponding attribute establish the logistic regression majorized function for the expectation interval difference for indicating all training samples, and minimum Optimized model is established as norm constraint item using the weight vectors for corresponding to attribute, the weight vectors of corresponding attribute are calculated by interative computation;It is sorted to attributive character according to the value of obtained weight vectors, classification based training is carried out to the gene data as training sample according to the attributive character after sequence, the optimal characteristics collection as classification basis is obtained, to classify to gene data to be sorted.Gene sorting method and relevant device provided by the invention, can obtain higher nicety of grading.
Description
Technical field
The present invention relates to technical field of data processing, more particularly to a kind of gene sorting method and system.The present invention is also
It is related to a kind of gene Clustering device and a kind of computer readable storage medium.
Background technology
In recent years, the research of Cancerous disease is closed by medical domain and the more and more researchers of field of biology
Note.Wherein, DNA microarray technology can study Cancerous disease from gene angle, be currently employed research means it
One.DNA microarray (DNA microarray) is also referred to as genetic chip, is that DNA microarray painting is distributed in one piece of special substrate
Layer can determine the expression data of thousands of a genes, data basis is provided for disease research via primary test.
After obtaining a large amount of gene expression data, need therefrom to select the gene for carrying important diseases information, i.e.,
The information carried according to gene classifies to gene, is carried out by machine learning method in the prior art.However, due to
The cancer gene expression data sample studied is seldom, and the gene dimension that each sample includes is thousands of, therefore this side
Method needs a large amount of computing resource;Also, it is decisive to gene Clustering problem there was only portion gene in these genes
Effect.These problems are resulted in has very big difficulty using traditional machine learning method, can not obtain satisfactory
Classification results.
Invention content
In view of this, a kind of gene sorting method of present invention offer and relevant device, can obtain higher classification essence
Degree.
To achieve the above object, the present invention provides the following technical solutions:
A kind of gene sorting method, including:
The gene data as training sample of input is standardized, the gene data includes several categories
Property;
With the difference at the interval of training sample and foreign peoples's neighbour's sample and training sample and the interval of similar neighbour's sample,
And the weight vectors of corresponding attribute establish the logistic regression majorized function for the expectation interval difference for indicating all training samples, and
Weight vectors to correspond to attribute are established as norm constraint item minimizes Optimized model, and correspondence is calculated by interative computation
The weight vectors of attribute;
It is sorted to attributive character according to the value of obtained weight vectors, according to the attributive character after sequence to as training
The gene data of sample carries out classification based training, obtains the optimal characteristics collection as basis of classifying, with to gene data to be sorted into
Row classification.
Optionally, include to the method for input being standardized as the gene data of training sample:To input
The gene data as training sample carry out deviation standardization or standard deviation standardization.
Optionally, deviation standardization is carried out to the gene data as training sample of input, transfer function is such as
Under:
Wherein, the training sample set of input is expressed asxi∈RI, yi∈ 1,2 ..., and C } indicate sample xi's
Label, for showing sample xiClassification, N indicate training sample total number, I indicate gene data dimension;xijIndicate the
The value of i sample attribute j,Expression takes the maximum value of attribute j in all training samples,Expression takes the minimum value of attribute j in all training samples.
Optionally, the minimum Optimized model established indicates as follows:
Wherein, wt+1Indicate to correspond to the weight vectors of attribute, z when the t+1 times iterationi t+1=| xi-αiHi NM|-|xi-βiHi NH
|, Hi NM∈RI×kIndicate sample xiNeighbour's sample matrix in foreign peoples's sample, Hi NH∈RI×kIndicate sample xiIn similar sample
In neighbour's sample matrix, αiIndicate foreign peoples's sample about sample xiCoefficient vector, βiIndicate similar sample about sample xi
Coefficient vector, k indicate priori setting neighbour's number, T indicate setting iterations;Wherein, the training sample set of input
It is expressed asxi∈RI, yi∈ 1,2 ..., and C } indicate sample xiLabel, for showing sample xiClassification, N tables
Show that the total number of training sample, I indicate the dimension of gene data.
Optionally, α is obtained respectively by solving the following Optimized model that minimizesiAnd βi:
Wherein, wtIndicate to correspond to the weight vectors of attribute when the t times iteration.
Optionally, the attributive character according to after sequence carries out classification based training to the gene data as training sample,
It obtains and includes as the optimal characteristics collection of classification basis:
According to the attributive character after sequence, ten folding cross validations are carried out to the gene data as training sample, selection makes
The best attribute set of classifying quality forms the optimal characteristics collection.
Optionally, further include:Gene data according to the optimal characteristics set pair as training sample carries out feature choosing
It selects, regains training sample set;
The method classified to gene data to be sorted includes:
Gene data to be sorted is standardized;
According to the optimal characteristics set pair, treated that gene data to be sorted carries out feature selecting;
To the gene data to be sorted after feature selecting, is concentrated in the training sample regained and find its arest neighbors sample
This, according to the classification of gene data to be sorted described in the class prediction of the nearest samples.
A kind of gene Clustering system, including:
Submodule is handled, is standardized for the gene data as training sample to input, the gene
Data include several attributes;
Operation submodule, for the interval of training sample and foreign peoples's neighbour's sample and training sample and similar neighbour's sample
The difference at this interval, and correspond to the logic of the expectation interval difference of all training samples of weight vectors foundation expression of attribute
Regression optimization function, and minimum Optimized model is established as norm constraint item using the weight vectors for corresponding to attribute, it is transported by iteration
Calculate the weight vectors that corresponding attribute is calculated;
Training submodule, for being sorted to attributive character according to the value of obtained weight vectors, according to the attribute after sequence
Feature carries out classification based training to the gene data as training sample, the optimal characteristics collection as classification basis is obtained, to treat
Classification gene data is classified.
A kind of gene Clustering device, including:
Memory, for storing computer program;
Processor, the step of gene sorting method as described above is realized when for executing the computer program.
A kind of computer readable storage medium is stored with computer program on the computer readable storage medium, described
The step of gene sorting method as described above is realized when computer program is executed by processor.
As shown from the above technical solution, gene sorting method provided by the present invention trains sample to the conduct of input first
This gene data is standardized, and then, the minimum for establishing the weight vectors for corresponding to attribute for gene data is excellent
Change model, the weight vectors of corresponding attribute is calculated by interative computation, specially with training sample and foreign peoples's neighbour's sample
Interval and training sample and the interval of similar neighbour's sample difference, and the weight vectors of corresponding attribute establish and indicate institute
There is the logistic regression majorized function of the expectation interval difference of training sample, and to correspond to the weight vectors of attribute as norm constraint item
It establishes and minimizes Optimized model;It is then sorted to attributive character according to the value of obtained weight vectors, according to the attribute after sequence
Feature carries out classification based training to the gene data as training sample, the optimal characteristics collection as classification basis is obtained, to treat
Classification gene data is classified.
Gene sorting method provided by the invention, based on interval principle is maximized, with training sample and foreign peoples's neighbour's sample
Interval and training sample and the interval of similar neighbour's sample difference, and the weight vectors of corresponding attribute establish and indicate institute
There is the logistic regression majorized function of the expectation interval difference of training sample, and to correspond to the weight vectors of attribute as norm constraint item
It establishes and minimizes Optimized model, to solve the weight vectors of corresponding attribute, the weight vectors that solution obtains attribute is made to have more
Sparsity so that higher nicety of grading can be obtained by carrying out gene Clustering.
A kind of gene Clustering system provided by the invention can reach above-mentioned advantageous effect.
A kind of gene Clustering device provided by the invention, can reach above-mentioned advantageous effect.
A kind of computer readable storage medium provided by the invention, can reach above-mentioned advantageous effect.
Description of the drawings
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, below will to embodiment or
Attached drawing needed to be used in the description of the prior art is briefly described, it should be apparent that, the accompanying drawings in the following description is only
Some embodiments of the present invention, for those of ordinary skill in the art, without creative efforts, also
It can be obtain other attached drawings according to these attached drawings.
Fig. 1 is a kind of flow chart of gene sorting method provided in an embodiment of the present invention;
Classify to gene data to be sorted in a kind of Fig. 2 gene sorting methods provided in an embodiment of the present invention
Method flow diagram;
Fig. 3 is a kind of schematic diagram of gene Clustering system provided in an embodiment of the present invention.
Specific implementation mode
In order to make those skilled in the art more fully understand the technical solution in the present invention, below in conjunction with the present invention
Attached drawing in embodiment, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described reality
It is only a part of the embodiment of the present invention to apply example, instead of all the embodiments.Based on the embodiments of the present invention, this field
The every other embodiment that those of ordinary skill is obtained without making creative work, should all belong to this hair
The range of bright protection.
Referring to FIG. 1, a kind of gene sorting method provided in an embodiment of the present invention, includes the following steps:
S10:The gene data as training sample of input is standardized, if the gene data includes
Dry attribute.
The training sample set of the gene expression data of input is expressed asWherein, xi∈Rl, yi∈{1,2,…,
C } indicate sample xiLabel, for showing sample xiClassification, N indicates that the total number of training sample, I indicate gene data
Dimension indicates that gene data includes I attribute.Wherein each classification represents a kind of disease.
It refers to the unit limitation for removing attribute data to be standardized to gene data, is converted into dimensionless
Pure values.
Feature unit due to gene data per dimension is different, by being concentrated to training sample in this step
Gene data be standardized, remove each attribute data unit limitation, be converted into nondimensional pure values,
In order to which the index of commensurate or magnitude can not be compared and weight.
In the specific implementation, include deviation mark to the method being standardized as the gene data of training sample
Standardization processing or standard deviation standardization, but not limited to this, other standardization processing methods can also be used, are also all existed
In the scope of the present invention.
In a kind of specific implementation mode of this method, using deviation standardization processing method to input in this step
Gene data as training sample is standardized, and transfer function is as follows:
Wherein, xijIndicate the value of i-th of sample attribute j,Expression, which takes in all training samples, to be belonged to
The maximum value of property j,Expression takes the minimum value of attribute j in all training samples.
S11:With the difference at the interval of training sample and foreign peoples's neighbour's sample and training sample and the interval of similar neighbour's sample
Value, and the logistic regression majorized function of the expectation interval difference of all training samples of weight vectors foundation expression of attribute is corresponded to,
And established as norm constraint item using to correspond to the weight vectors of attribute and minimize Optimized model, it is calculated pair by interative computation
Answer the weight vectors of attribute.
In this step, the weight vectors of corresponding gene data attribute are calculated according to training sample, are specially directed to gene number
It is established according to the weight vectors of corresponding attribute and minimizes Optimized model, the weight of corresponding attribute is calculated by solving model
Vector.
Wherein, foreign peoples neighbour sample refer to training sample concentrate with current training sample be not belonging to it is same category of simultaneously
And be neighbour's sample of current training sample, similar neighbour's sample refers to concentrating to belong to current training sample in training sample
Neighbour's sample same category of and for current training sample.In this method, in the minimum Optimized model of foundation, with
The difference of training sample and the interval and training sample and the interval of similar neighbour's sample of foreign peoples's neighbour's sample, and corresponding category
Property weight vectors establish the logistic regression majorized function of the expectation interval difference for indicating all training samples.
In one embodiment, this step specifically includes following procedure:
S110:In t=0, w is initializedt=[w1,w2,…,wl]T=[1/I, 1/I ..., 1/I]T, wjIndicate attribute j's
Weight.Iterations are set as T.
S111:Established minimum Optimized model is solved, the minimum Optimized model established indicates as follows:
Wherein, wt+1Indicate to correspond to the weight vectors of attribute, z when the t+1 times iterationi t+1=| xi-αiHi NM|-|xi-βiHi NH
|, Hi NM∈RI×kIndicate sample xiNeighbour's sample matrix in foreign peoples's sample, Hi NH∈RI×kIndicate sample xiIn similar sample
In neighbour's sample matrix, αiIndicate foreign peoples's sample about sample xiCoefficient vector, βiIndicate similar sample about sample xi
Coefficient vector, k indicate priori setting neighbour's number,
In the present embodiment method, based on interval principle is maximized, with the interval and instruction of training sample and foreign peoples's neighbour's sample
Practice the difference of sample and the interval of similar neighbour's sample, and the weight vectors of corresponding attribute establish all training samples of expression
Expectation interval difference logistic regression majorized function, and to correspond to the weight vectors of attribute as norm constraint item, present embodiment
In specifically with correspond to attribute weight vectors be L1 norm constraint items, establish minimize Optimized model, to solve corresponding attribute
Weight vectors.The weight vectors that this method obtains have more sparsity.
In present embodiment, specifically α can be obtained respectively by solving the following Optimized model that minimizesiAnd βi:
Wherein, wtIndicate to correspond to the weight vectors of attribute when the t times iteration.
S112:If | | wt+1-wt| |≤θ then exports weight vectors, and enables w=wt+1, otherwise enable t=t+1, return to step
S111。
S12:It is sorted to attributive character according to the value of obtained weight vectors, according to the attributive character after sequence to conduct
The gene data of training sample carries out classification based training, the optimal characteristics collection as classification basis is obtained, with to gene number to be sorted
According to classifying.
In this step, according to the weight vectors that interative computation in previous step obtains, according to the value of weight vectors to attribute
Feature ordering.
In the specific implementation, according to the attributive character after sequence, ten can be carried out to the gene data as training sample
Cross validation is rolled over, the attribute set that selection keeps classifying quality best forms the optimal characteristics collection.KNN points can specifically be used
Class device carries out classification based training to training sample.
Above in gene sorting method provided by the invention according to gene data training sample carry out model training mistake
Journey is described, below to how the process classified to the gene data to be sorted of input is described.
Further include step S13 after obtaining optimal characteristics collection in the present embodiment gene sorting method:According to it is described most
Excellent feature set carries out feature selecting to the gene data as training sample, regains training sample set.
Obtained optimal characteristics set representations areIt is trained using obtained optimal characteristics set pair in this step
Gene data in sample set carries out feature selecting, regains training sample set, is expressed as
Referring to FIG. 2, carrying out classification assessment to gene data sample to be sorted, following procedure is specifically included:
S20:Gene data to be sorted is standardized.
The gene data to be sorted of input is expressed as x, x ∈ Rl。
In the specific implementation, the method being standardized to gene data to be sorted includes deviation standardization
Or standard deviation standardization, but not limited to this, other standardization processing methods can also be used, are all protected in the present invention
In range.
In a kind of specific implementation mode of the present embodiment, using deviation standardization processing method to the to be sorted of input
Gene data is standardized, and transfer function is as follows:
Wherein, xjIndicate the value of gene data attribute j to be sorted,Expression takes all training samples
The maximum value of middle attribute j,Expression takes the minimum value of attribute j in all training samples.
S21:According to the optimal characteristics set pair, treated that gene data to be sorted carries out feature selecting.
According to obtained optimal characteristics collectionFeature selecting is carried out to gene data x to be sorted, obtains base
Because of data x '.
S22:To the gene data to be sorted after feature selecting, is concentrated in the training sample regained and find it recently
Adjacent sample, according to the classification of gene data to be sorted described in the class prediction of the nearest samples.
To the gene data x ' to be sorted after feature selecting, in training sample setMiddle its arest neighbors of searching, into
One step predicts the classification of gene data to be sorted according to the classification of obtained nearest samples.
Gene sorting method of the present invention is described in detail with a specific example below.
The purpose of this specific example is to differentiate two different leukaemia, i.e. acute lymphoblastic leukemia (Acute
Lymphoblastic Leukemia, ALL) and acute myeloid leukemia (Acute Myeloid Leukemia, AML).It provides
Data set be divided into two subsets:38 training samples (27 ALL, 14 AML), for selecting gene and adjustment grader
Weight;34 test samples (20 ALL, 14 AML) are used for the performance of evaluation system acquired results.Each sample
There are 7129 features, corresponding normalized gene expression values are extracted from microarray images.ALL is considered as the 1st class, AML is regarded
For the 2nd class.Specific implementation process is as follows:
Model training process includes the following steps:
(a) training sample set of the gene expression data inputted is expressed asWherein, xi∈Rl, yi∈{1,
2 ..., C } indicate sample xiLabel, for showing sample xiClassification, N indicate training sample total number, I indicate gene
The dimension of data indicates that gene data includes I attribute.Wherein each classification represents a kind of disease, in this specific example
Including two class of ALL and AML, N=38 and I=7129.
(b) deviation standardization is carried out to the gene data that training sample is concentrated, transfer function indicates as follows:
Wherein, xijIndicate the value of i-th of sample attribute j,Expression takes attribute in all training samples
The maximum value of j,Expression takes the minimum value of attribute j in all training samples.
(c) the corresponding weight vectors of computation attribute.
(1) in t=0, initializationwjIndicate attribute j's
Weight.Iterations are set as T=38, and set allowable error θ=0.01.
(2) established minimum Optimized model is solved, the minimum Optimized model established indicates as follows:
Wherein, wt+1Indicate to correspond to the weight vectors of attribute, z when the t+1 times iterationi t+1=| xi-αiHi NM|-|xi-βiHi NH
|, Hi NM∈RI×kIndicate sample xiNeighbour's sample matrix in foreign peoples's sample, Hi NH∈RI×kIndicate sample xiIn similar sample
In neighbour's sample matrix, αiIndicate foreign peoples's sample about sample xiCoefficient vector, βiIndicate similar sample about sample xi
Coefficient vector, k indicate priori setting neighbour's number.
In this specific example, k can be chosen by leave one cross validation in set { 2,4 ..., 10 }.
(3) if | | wt+1-wt| |≤θ then exports weight vectors, and enables w=wt+1, otherwise enable t=t+1, return to step
(2)。
(d) optimal characteristics collection is obtained.The value for the weight vectors being calculated according to above-mentioned steps arranges attributive character
Sequence carries out ten folding cross validations using KNN graders according to the attributive character after sequence on training sample set, and selection can make
Obtain the best attribute set composition optimal characteristics collection F of classifying quality.
(e) feature selecting is carried out to training sample set according to optimal characteristics collection F, obtains the training after carrying out feature selecting
Sample set
Evaluation process includes the following steps:
(a) gene data x, x ∈ R to be sorted are inputted7129。
(b) deviation standardization is carried out to gene data to be sorted, transfer function indicates as follows:
(c) feature selecting is carried out to gene data x to be sorted according to optimal characteristics collection F, obtained gene data is expressed as
x′。
(d) to x ' in the training sample set regainedMiddle its nearest samples of searching, according to arest neighbors sample
The classification of this class prediction gene data x to be sorted.
Classified by the test samples of the dimension of this method pair 34 7129, this method and traditional Relief algorithms,
The results contrast that LH-Relief algorithms are tested on identical data set, as shown in table 1.
Table 1
Discrimination (%) | Accurate rate (%) | Recall rate (%) | F-measure (%) | |
This method | 99.13 | 99.14 | 98.75 | 98.97 |
Relief algorithms | 74.35 | 72.19 | 71.88 | 71.73 |
LH-Relief algorithms | 76.09 | 75.53 | 72.04 | 72.69 |
It can be seen from the above results compared with conventional method, using this method to gene data Classification and Identification rate more
Height, nicety of grading higher.
Correspondingly, referring to FIG. 3, the embodiment of the present invention also provides a kind of gene Clustering system, including:
Submodule 30 is handled, is standardized for the gene data as training sample to input, the base
Because data include several attributes.
Operation submodule 31, for the interval of training sample and foreign peoples's neighbour's sample and training sample and similar neighbour
The difference at the interval of sample, and the weight vectors of corresponding attribute establish patrolling for the expectation interval difference for indicating all training samples
Regression optimization function is collected, and minimum Optimized model is established as norm constraint item using the weight vectors for corresponding to attribute, passes through iteration
The weight vectors of corresponding attribute are calculated in operation.
Training submodule 32, for being sorted to attributive character according to the value of obtained weight vectors, according to the category after sequence
Property feature to as training sample gene data carry out classification based training, obtain as classify basis optimal characteristics collection, with right
Gene data to be sorted is classified.
Gene Clustering system provided in this embodiment is based on maximizing interval principle it can be seen from the above, with instruction
Practice the difference of sample and the interval and training sample and the interval of similar neighbour's sample of foreign peoples's neighbour's sample, and corresponding attribute
Weight vectors establish the logistic regression majorized function of the expectation interval difference for indicating all training samples, and to correspond to attribute
Weight vectors are norm constraint item foundation minimum Optimized model makes solution obtain to solve the weight vectors of corresponding attribute
The weight vectors of attribute have more sparsity so that higher nicety of grading can be obtained by carrying out gene Clustering.
In the specific implementation, the processing method that each module carries out gene data in the present embodiment gene Clustering system is equal
The detailed description in the above-mentioned embodiment about gene sorting method is can refer to, details are not described herein.
Correspondingly, the embodiment of the present invention also provides a kind of gene Clustering device, including:
Memory, for storing computer program;
Processor, the step of gene sorting method as described above is realized when for executing the computer program.
Gene Clustering device provided in this embodiment, based on interval principle is maximized, with training sample and foreign peoples's neighbour's sample
The difference at this interval and training sample and the interval of similar neighbour's sample, and the weight vectors foundation of corresponding attribute indicate
The logistic regression majorized function of the expectation interval difference of all training samples, and to correspond to the weight vectors of attribute as norm constraint
Item, which is established, minimizes Optimized model, to solve the weight vectors of corresponding attribute, the weight vectors that solution obtains attribute is made to have more
There is sparsity so that higher nicety of grading can be obtained by carrying out gene Clustering.
Correspondingly, the embodiment of the present invention also provides a kind of computer readable storage medium, the computer-readable storage medium
Computer program is stored in matter, the computer program realizes gene sorting method as described above when being executed by processor
Step.
Computer readable storage medium provided in this embodiment, when the computer program stored thereon is executed by processor,
It realizes based on interval principle is maximized, with the interval of training sample and foreign peoples's neighbour's sample and training sample and similar neighbour's sample
The difference at this interval, and correspond to the logic of the expectation interval difference of all training samples of weight vectors foundation expression of attribute
Regression optimization function, and established as norm constraint item using to correspond to the weight vectors of attribute and minimize Optimized model, to solve pair
The weight vectors for answering attribute make the weight vectors that solution obtains attribute have more sparsity so that carrying out gene Clustering can obtain
Obtain higher nicety of grading.
A kind of gene sorting method provided by the present invention and relevant device are described in detail above.Herein
Applying specific case, principle and implementation of the present invention are described, and the explanation of above example is only intended to sides
Assistant solves the method and its core concept of the present invention.It should be pointed out that for those skilled in the art,
Without departing from the principles of the invention, can be with several improvements and modifications are made to the present invention, these improvement and modification
It falls into the protection domain of the claims in the present invention.
Claims (10)
1. a kind of gene sorting method, which is characterized in that including:
The gene data as training sample of input is standardized, the gene data includes several attributes;
With the difference at the interval of training sample and foreign peoples's neighbour's sample and training sample and the interval of similar neighbour's sample and right
It answers the weight vectors of attribute to establish the logistic regression majorized function for the expectation interval difference for indicating all training samples, and is belonged to corresponding
Property weight vectors be norm constraint item establish minimize Optimized model, the weight of corresponding attribute is calculated by interative computation
Vector;
It is sorted to attributive character according to the value of obtained weight vectors, according to the attributive character after sequence to as training sample
Gene data carries out classification based training, the optimal characteristics collection as classification basis is obtained, to classify to gene data to be sorted.
2. gene sorting method according to claim 1, which is characterized in that the gene number as training sample of input
Include according to the method being standardized:To input as the gene data of training sample carry out deviation standardization or
Person's standard deviation standardization.
3. gene sorting method according to claim 1, which is characterized in that the gene number as training sample of input
According to deviation standardization is carried out, transfer function is as follows:
Wherein, the training sample set of input is expressed asxi∈RI, yi∈ 1,2 ..., and C } indicate sample xiLabel,
For showing sample xiClassification, N indicate training sample total number, I indicate gene data dimension;xijIndicate i-th of sample
The value of this attribute j,
Expression takes the maximum value of attribute j in all training samples,
Expression takes the minimum value of attribute j in all training samples.
4. gene sorting method according to claim 1, which is characterized in that the minimum Optimized model established indicates such as
Under:
Wherein, wt+1Indicate to correspond to the weight vectors of attribute, z when the t+1 times iterationi t+1=| xi-αiHi NM|-|xi-βiHi NH|, Hi NM
∈RI×kIndicate sample xiNeighbour's sample matrix in foreign peoples's sample, Hi NH∈RI×kIndicate sample xiIt is close in similar sample
Adjacent sample matrix, αiIndicate foreign peoples's sample about sample xiCoefficient vector, βiIndicate similar sample about sample xiCoefficient to
Amount, k indicate that neighbour's number of priori setting, T indicate the iterations of setting;Wherein, the training sample set of input is expressed asxi∈RI, yi∈ 1,2 ..., and C } indicate sample xiLabel, for showing sample xiClassification, N indicate training sample
This total number, I indicate the dimension of gene data.
5. gene sorting method according to claim 4, which is characterized in that divided by solving the following Optimized model that minimizes
α is not obtainediAnd βi:
Wherein, wtIndicate to correspond to the weight vectors of attribute when the t times iteration.
6. gene sorting method according to claim 1, which is characterized in that the attributive character according to after sequence is to making
Classification based training is carried out for the gene data of training sample, acquisition includes as the optimal characteristics collection of classification basis:
According to the attributive character after sequence, to carrying out ten folding cross validations as the gene data of training sample, selection makes classification
The best attribute set of effect forms the optimal characteristics collection.
7. gene sorting method according to claim 1, which is characterized in that further include:According to the optimal characteristics set pair
Gene data as training sample carries out feature selecting, regains training sample set;
The method classified to gene data to be sorted includes:
Gene data to be sorted is standardized;
According to the optimal characteristics set pair, treated that gene data to be sorted carries out feature selecting;
To the gene data to be sorted after feature selecting, is concentrated in the training sample regained and find its nearest samples, root
The classification of gene data to be sorted described in class prediction according to the nearest samples.
8. a kind of gene Clustering system, which is characterized in that including:
Submodule is handled, is standardized for the gene data as training sample to input, the gene data
Including several attributes;
Operation submodule, for between the interval and training sample and similar neighbour's sample of training sample and foreign peoples's neighbour's sample
Every difference, and the weight vectors of corresponding attribute establish the logistic regression of the expectation interval difference for indicating all training samples and optimize
Function, and minimum Optimized model is established as norm constraint item using the weight vectors for corresponding to attribute, it is calculated by interative computation
To the weight vectors of corresponding attribute;
Training submodule, for being sorted to attributive character according to the value of obtained weight vectors, according to the attributive character after sequence
Classification based training is carried out to the gene data as training sample, the optimal characteristics collection as classification basis is obtained, with to be sorted
Gene data is classified.
9. a kind of gene Clustering device, which is characterized in that including:
Memory, for storing computer program;
Processor, realizing the gene sorting method as described in any one of claim 1 to 7 when for executing the computer program
Step.
10. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium
Program realizes the step of the gene sorting method as described in any one of claim 1 to 7 when the computer program is executed by processor
Suddenly.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810522807.XA CN108763873A (en) | 2018-05-28 | 2018-05-28 | A kind of gene sorting method and relevant device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810522807.XA CN108763873A (en) | 2018-05-28 | 2018-05-28 | A kind of gene sorting method and relevant device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108763873A true CN108763873A (en) | 2018-11-06 |
Family
ID=64002900
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810522807.XA Pending CN108763873A (en) | 2018-05-28 | 2018-05-28 | A kind of gene sorting method and relevant device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108763873A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109670552A (en) * | 2018-12-24 | 2019-04-23 | 苏州大学 | A kind of image classification method, device, equipment and readable storage medium storing program for executing |
CN113971604A (en) * | 2020-07-22 | 2022-01-25 | 中移(苏州)软件技术有限公司 | Data processing method, device and storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2009067655A2 (en) * | 2007-11-21 | 2009-05-28 | University Of Florida Research Foundation, Inc. | Methods of feature selection through local learning; breast and prostate cancer prognostic markers |
CN101923604A (en) * | 2010-07-23 | 2010-12-22 | 福建师范大学 | Classification method for weighted KNN oncogene expression profiles based on neighborhood rough set |
CN104598774A (en) * | 2015-02-04 | 2015-05-06 | 河南师范大学 | Feature gene selection method based on logistic and relevant information entropy |
CN105938523A (en) * | 2016-03-31 | 2016-09-14 | 陕西师范大学 | Feature selection method and application based on feature identification degree and independence |
CN106991296A (en) * | 2017-04-01 | 2017-07-28 | 大连理工大学 | Ensemble classifier method based on the greedy feature selecting of randomization |
CN107193993A (en) * | 2017-06-06 | 2017-09-22 | 苏州大学 | The medical data sorting technique and device selected based on local learning characteristic weight |
CN107563435A (en) * | 2017-08-30 | 2018-01-09 | 哈尔滨工业大学深圳研究生院 | Higher-dimension unbalanced data sorting technique based on SVM |
-
2018
- 2018-05-28 CN CN201810522807.XA patent/CN108763873A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2009067655A2 (en) * | 2007-11-21 | 2009-05-28 | University Of Florida Research Foundation, Inc. | Methods of feature selection through local learning; breast and prostate cancer prognostic markers |
CN101923604A (en) * | 2010-07-23 | 2010-12-22 | 福建师范大学 | Classification method for weighted KNN oncogene expression profiles based on neighborhood rough set |
CN104598774A (en) * | 2015-02-04 | 2015-05-06 | 河南师范大学 | Feature gene selection method based on logistic and relevant information entropy |
CN105938523A (en) * | 2016-03-31 | 2016-09-14 | 陕西师范大学 | Feature selection method and application based on feature identification degree and independence |
CN106991296A (en) * | 2017-04-01 | 2017-07-28 | 大连理工大学 | Ensemble classifier method based on the greedy feature selecting of randomization |
CN107193993A (en) * | 2017-06-06 | 2017-09-22 | 苏州大学 | The medical data sorting technique and device selected based on local learning characteristic weight |
CN107563435A (en) * | 2017-08-30 | 2018-01-09 | 哈尔滨工业大学深圳研究生院 | Higher-dimension unbalanced data sorting technique based on SVM |
Non-Patent Citations (3)
Title |
---|
HONGMIN CAI等: "Feature weight estimation for gene selection: a local hyperlinear learning approach", 《BMC BIOINFORMATICS》 * |
YIJUN SUN等: "A Local-Learning-Based Feature Selection for High-Dimensional Data Analysis", 《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》 * |
潘巍等: "基于间隔损失和L1范数调节的特征选择方法研究", 《智能计算机与应用》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109670552A (en) * | 2018-12-24 | 2019-04-23 | 苏州大学 | A kind of image classification method, device, equipment and readable storage medium storing program for executing |
CN113971604A (en) * | 2020-07-22 | 2022-01-25 | 中移(苏州)软件技术有限公司 | Data processing method, device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Brancati et al. | A deep learning approach for breast invasive ductal carcinoma detection and lymphoma multi-classification in histological images | |
Derrac et al. | Fuzzy nearest neighbor algorithms: Taxonomy, experimental analysis and prospects | |
US7088854B2 (en) | Method and apparatus for generating special-purpose image analysis algorithms | |
CN114730463A (en) | Multi-instance learner for tissue image classification | |
Argyriou et al. | An algorithm for transfer learning in a heterogeneous environment | |
US20180165413A1 (en) | Gene expression data classification method and classification system | |
Javed et al. | Multiplex cellular communities in multi-gigapixel colorectal cancer histology images for tissue phenotyping | |
CN106991430A (en) | A kind of cluster number based on point of proximity method automatically determines Spectral Clustering | |
CN108629373A (en) | A kind of image classification method, system, equipment and computer readable storage medium | |
CN112800927B (en) | Butterfly image fine-granularity identification method based on AM-Softmax loss | |
Intrator | Making a low-dimensional representation suitable for diverse tasks | |
Qin et al. | Spot detection and image segmentation in DNA microarray data | |
Kumar et al. | An amalgam method efficient for finding of cancer gene using CSC from micro array data | |
CN108763873A (en) | A kind of gene sorting method and relevant device | |
Menaka et al. | Chromenet: A CNN architecture with comparison of optimizers for classification of human chromosome images | |
Yang et al. | High throughput analysis of breast cancer specimens on the grid | |
Vengatesan et al. | The performance analysis of microarray data using occurrence clustering | |
Krishnapuram et al. | Joint classifier and feature optimization for cancer diagnosis using gene expression data | |
Arora | Classification of human metaspread images using convolutional neural networks | |
Weber et al. | Perron cluster analysis and its connection to graph partitioning for noisy data | |
Rathore et al. | CBISC: a novel approach for colon biopsy image segmentation and classification | |
Yao et al. | Augdmc: Data augmentation guided deep multiple clustering | |
CN116612307A (en) | Solanaceae disease grade identification method based on transfer learning | |
CN113177602B (en) | Image classification method, device, electronic equipment and storage medium | |
CN113838519B (en) | Gene selection method and system based on adaptive gene interaction regularization elastic network model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181106 |
|
RJ01 | Rejection of invention patent application after publication |