CN107193993A - The medical data sorting technique and device selected based on local learning characteristic weight - Google Patents
The medical data sorting technique and device selected based on local learning characteristic weight Download PDFInfo
- Publication number
- CN107193993A CN107193993A CN201710419357.7A CN201710419357A CN107193993A CN 107193993 A CN107193993 A CN 107193993A CN 201710419357 A CN201710419357 A CN 201710419357A CN 107193993 A CN107193993 A CN 107193993A
- Authority
- CN
- China
- Prior art keywords
- data
- sample
- assessed
- weight
- weight vectors
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of medical data sorting technique selected based on local learning characteristic weight, the property value of sample is obtained according to training sample set first, the corresponding weight vectors of weight update mode computation attribute that gradient declines are utilized according to property value, therefore convergence can be ensured, the stopping criterion of algorithm can be reached quickly, the calculating time is reduced, computation complexity is reduced;Feature selecting, which is carried out, according to the weight vectors calculated obtains optimal characteristics collection, feature selecting is carried out in optimal feature subset again after data sample to be assessed is standardized, the data sample to be assessed after feature selecting, which is classified, again can just make data sample realize dimensionality reduction, therefore method provided in an embodiment of the present invention realizes the complexity for reducing calculating while dimensionality reduction again, reduces the calculating time.Present invention also offers a kind of medical data sorter selected based on local learning characteristic weight, above-mentioned technique effect can be equally realized.
Description
Technical field
The present invention relates to medical diagnostic field, selected more specifically to a kind of based on local learning characteristic weight
Medical data sorting technique and device.
Background technology
With the development of artificial intelligence, computer technology also plays an important role in the medical field, realizes medical treatment
Artificial intelligence in field.A large amount of authoritative knowledge experiences of the physianthropy expert of computer technology and various fields are blended,
Medical diagnosis system is developed, various clinical problems can be efficiently solved, the effect of auxiliary diagnosis is served.
In medical diagnosis system, DNA microarray technology, i.e. genetic chip are introduced, just can be using genetic chip
The level of the quantitative substantial amounts of gene expression data of analysis of same time, just can be with the essence of postgraduate's thing by these data.
But it is due to the development of DNA microarray technology, result in the explosive increase of gene expression data, in these substantial amounts of gene table
Important gene is selected up in data, new challenge is proposed for prior art.
Local hyperplane (Local Hyperlane, LH-Relief) algorithm can be realized to be entered to lots of genes expression data
Row dimensionality reduction, that is, screen out otiose gene expression data, selects important gene, the problem of reducing redundancy.But should
Algorithm causes the calculating of algorithm to be answered in the application containing noisy data and high position data, convergence can not be guaranteed
Miscellaneous degree is high.
Therefore, while how realization to lots of genes Data Dimensionality Reduction, the computation complexity of algorithm is reduced, is this area skill
The problem of art personnel need to solve.
The content of the invention
It is an object of the invention to provide it is a kind of based on local learning characteristic weight select medical data sorting technique, with
Realizing reduces the computation complexity of algorithm while to lots of genes Data Dimensionality Reduction.
To achieve the above object, the embodiments of the invention provide following technical scheme:
A kind of medical data sorting technique selected based on local learning characteristic weight, including:
S101:The first sample collection of medical data is obtained, first sample attribute is obtained;
S102:The initial weight vector of the first sample attribute is set, initial weight vector is regard as this power
Weight vector;
S103:The update mode declined by gradient is updated to this weight vectors, after obtaining iteration once under
Secondary weight vectors;
S104:Judge to determine whether rule is set up, if so, then performing institute's time weight vectors as final weight vector
S105;If it is not, then using next weight vectors as this weight vectors, returning to S103;Wherein | | wt+1-wt| |≤θ advises for determination
Then, wtFor this weight vectors, wt+1For next weight vectors, θ is stopping criterion;
S105:Feature selecting is carried out according to final weight vector, aspect indexing subset is obtained;
S106:The first sample collection is subjected to feature selecting according to the aspect indexing subset, obtained after feature selecting
The second sample set;
S107:The first data to be assessed are obtained, and second is obtained according to aspect indexing subset progress feature selecting and are treated
Assess data;
S108:The second data to be assessed are classified on second sample set, classification results are obtained.
Preferably, the first sample collection for obtaining medical data, obtains first sample attribute, including:
The first sample collection of medical data is obtained, first sample attribute is obtained, and deviation is carried out to the first sample collection
Standardization;
Preferably, the update mode declined by gradient is updated to this weight vectors, obtains iteration once
Next weight vectors afterwards, including:
Pass through ruleThis weight vectors is updated, power next time after obtaining iteration once
Weight vector wt+1, J (w) is optimization object function, by maximizing J (w)=(zi t+1)Twt+1Calculating is obtained.
Preferably, the first data to be assessed of the acquisition, and obtained according to aspect indexing subset progress feature selecting
Second data to be assessed, including:
The first data to be assessed are obtained, deviation standardization are carried out, and feature is carried out according to the aspect indexing subset
Selection obtains the second data to be assessed.
Preferably, the second data to be assessed are classified on second sample set, obtains classification results, including:
The second data to be assessed are classified using k nearest neighbor grader on second sample set, classification knot is obtained
Really.
A kind of medical data sorter selected based on local learning characteristic weight, including:
First sample collection acquisition module, the first sample collection for obtaining medical data, obtains first sample attribute;
Initial weight limitation setup module, the initial weight vector for setting the first sample attribute, will be described first
Beginning weight vectors are used as this weight vectors;
Next weight vectors acquisition module, the update mode for being declined by gradient is carried out more to this weight vectors
Newly, the next weight vectors after obtaining iteration once;
Judge module, for judging to determine whether rule is set up, if so, then regarding the next weight vectors as final power
Weight vector, calls aspect indexing subset acquisition module;If it is not, then using next weight vectors as this weight vectors, calling institute
State next weight vectors acquisition module;Wherein determine that rule is | | wt+1-wt| |≤θ, wtFor this weight vectors, wt+1For next time
Weight vectors, θ is stopping criterion;
The aspect indexing subset acquisition module, for carrying out feature selecting according to final weight vector, obtains spy
Levy subset of indices;
Second sample set acquisition module, for the first sample collection to be carried out into feature choosing according to the aspect indexing subset
Select, obtain the second sample set after feature selecting;
Second data acquisition module to be assessed, for obtaining the first data to be assessed, and according to the aspect indexing subset
Carry out feature selecting and obtain the second data to be assessed;
Sort module, for classifying on second sample set to the second data to be assessed, obtains classification results.
Preferably, the first sample collection acquisition module specifically for:
The first sample collection of medical data is obtained, first sample attribute is obtained, and deviation is carried out to the first sample collection
Standardization.
Preferably, the next weight vectors acquisition module specifically for:
Pass through ruleThis weight vectors is updated, power next time after obtaining iteration once
Weight vector wt+1, J (w) is optimization object function, by maximizing J (w)=(zi t+1)Twt+1Calculating is obtained.
Preferably, the described second data acquisition module to be assessed specifically for:
The first data to be assessed are obtained, deviation standardization are carried out, and feature is carried out according to the aspect indexing subset
Selection obtains the second data to be assessed.
Preferably, the sort module specifically for:
The second data to be assessed are classified using k nearest neighbor grader on second sample set, classification knot is obtained
Really.
By above scheme, a kind of medical treatment selected based on local learning characteristic weight provided in an embodiment of the present invention
Data classification method, obtains the property value of sample according to training sample set first, and the weight that gradient declines is utilized according to property value
The corresponding weight vectors of update mode computation attribute, therefore convergence can be ensured, it can reach that the stopping of algorithm is accurate quickly
Then, the calculating time is reduced, reduces computation complexity;Feature selecting, which is carried out, according to the weight vectors calculated obtains optimal characteristics
Collection, feature selecting is carried out in optimal feature subset after data sample to be assessed is standardized again, then by after feature selecting
Data sample to be assessed, which is classified, can just make data sample realize dimensionality reduction, therefore method provided in an embodiment of the present invention is realized
The complexity of calculating is reduced while dimensionality reduction again, the calculating time is reduced.Present invention also offers one kind based on local study
The medical data sorter of feature weight selection, can equally realize above-mentioned technique effect.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
There is the accompanying drawing used required in technology description to be briefly described, it should be apparent that, drawings in the following description are only this
Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, can be with
Other accompanying drawings are obtained according to these accompanying drawings.
Fig. 1 is a kind of medical data sorting technique flow chart disclosed in the embodiment of the present invention;
Fig. 2 is a kind of medical data sorter structural representation disclosed in the embodiment of the present invention;
Fig. 3 is the convergence Comparative result of a kind of medical data sorting technique and LH-RELIEF disclosed in the embodiment of the present invention
Figure.
Fig. 4 is a kind of medical data sorting technique and LH-RELIEF average behavior performance disclosed in the embodiment of the present invention
Comparison diagram.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Site preparation is described, it is clear that described embodiment is only a part of embodiment of the invention, rather than whole embodiments.It is based on
Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of creative work is not made
Embodiment, belongs to the scope of protection of the invention.
Referring to Fig. 1, the embodiment of the invention discloses a kind of medical data classification selected based on local learning characteristic weight
Method.Specifically:
S101:The first sample collection of medical data is obtained, first sample attribute is obtained.
Specifically, the first sample collection of medical data is obtainedThe sample attribute of first sample collection is obtained,
It is used as first sample attribute.Wherein xi∈RI, yi∈ 1,2 ..., and C } it is xiLabel, show xiClassification, N is training sample
Number, I is the dimension of sample, and C is classification sum.
S102:The initial weight vector of the first sample attribute is set, initial weight vector is regard as this power
Weight vector.
Specifically, initial weight vector w is set0=[1/I, 1/I ..., 1/I]t, wherein t is iterations, current t=
0, that is, iteration is not started to, by initial weight vector w0It is used as this weight vectors wt。
S103:The update mode declined by gradient is updated to this weight vectors, after obtaining iteration once under
Secondary weight vectors.
Specifically, an iteration is carried out, i.e., by this weight vectors wtThe update mode declined using gradient is carried out once
Update, obtain next weight vectors wt+1。
S104:Judge to determine whether rule is set up, if so, then performing institute's time weight vectors as final weight vector
S105;If it is not, then using next weight vectors as this weight vectors, returning to S103;Wherein determine that rule is | | wt+1-wt||
≤ θ, wtFor this weight vectors, wt+1For next weight vectors, θ is stopping criterion.
Specifically, a stopping criterion θ is set, and judged | | wt+1-wt| | whether≤θ sets up, if set up, by under
Secondary weight vectors wt+1It is used as final weight vector w, w=[w1,w2,...,wI]t∈ R, carry out S105;, will if invalid
Next weight vectors wt+1It is used as this weight vectors wt, and S103 is returned, carry out new an iteration.
S105:Feature selecting is carried out according to final weight vector, aspect indexing subset is obtained.
Specifically, feature selecting is carried out by nicety of grading according to final weight vector w, obtains corresponding aspect indexing
CollectionThe Feature Dimension Reduction to first sample is realized, so as to reduce amount of calculation and calculating time.
S106:The first sample collection is subjected to feature selecting according to the aspect indexing subset, obtained after feature selecting
The second sample set.
Specifically, by first sample collectionAccording to aspect indexing subsetCarry out feature selecting,
Obtain the second sample setEach of which sample xi∈R|F|, | F | < I.
S107:The first data to be assessed are obtained, and second is obtained according to aspect indexing subset progress feature selecting and are treated
Assess data.
Specifically, the first data sample x to be assessed, x ∈ R are obtainedI, current sample x do not carry out dimension-reduction treatment, sample dimension
For I.By data sample according to aspect indexing subsetFeature selecting is carried out, the second data x ' to be assessed is obtained.
S108:The second data to be assessed are classified on second sample set, classification results are obtained.
Specifically, in the second sample setData x ' to be assessed to second classifies, and obtains classification knot
Really, classification results are obtained.It can be diagnosed using this classification results data sample x to be assessed to first.
Therefore, a kind of medical data classification side selected based on local learning characteristic weight provided in an embodiment of the present invention
Method, obtains the property value of sample according to training sample set first, and the weight update mode meter that gradient declines is utilized according to property value
The corresponding weight vectors of attribute are calculated, therefore convergence can be ensured, the stopping criterion of algorithm can be reached quickly, reduces and calculates
Time, reduce computation complexity;Feature selecting is carried out according to the weight vectors calculated and obtains optimal characteristics collection, by number to be assessed
Carry out feature selecting after being standardized according to sample in optimal feature subset again, then by the data sample to be assessed after feature selecting
Being classified can just make data sample realize dimensionality reduction, therefore method provided in an embodiment of the present invention drops while realize dimensionality reduction again
The low complexity calculated, reduces the calculating time.
The embodiment of the invention discloses a kind of specific medical data classification side selected based on local learning characteristic weight
Method, is different from an embodiment, and the present invention has done specific restriction to S101, other step contents and a upper embodiment substantially phase
Together, detailed content may refer to an embodiment, and here is omitted.Specifically, S101 includes:
The first sample collection of medical data is obtained, first sample attribute is obtained, and deviation is carried out to the first sample collection
Standardization;
Specifically, the first sample collection of medical data is obtainedThe sample attribute of first sample collection is obtained,
It is used as first sample attribute.Wherein xi∈RI, yi∈ 1,2 ..., and C } it is xiLabel, show xiClassification, N is training sample
Number, I is the dimension of sample, and C is classification sum.
It should be noted that different characteristic attributes often has different dimension and dimensional unit, such situation meeting
The result of data analysis is had influence on, in order to eliminate influence that different dimensions and dimensional unit cause, it is necessary to first sample collectionDeviation standardization is carried out, to solve the comparativity between characteristic attribute data.Deviation standardization
Transfer function isWherein, xijFor j-th of attribute of i-th of sample,To take the maximum of attribute j in all training sample data,For all numbers
According to middle attribute j minimum value.After being standardized, each index of characteristic is all the same order of magnitude, be more conducive to pair
These data carry out Comprehensive Correlation evaluation, and the characteristic used in the embodiment of the present invention is to carry out after deviation standardization
Data.
The embodiment of the invention discloses a kind of specific medical data classification side selected based on local learning characteristic weight
Method, is different from an embodiment, and the present invention has done specific restriction to S103, other step contents and a upper embodiment substantially phase
Together, detailed content may refer to an embodiment, and here is omitted.Specifically, S103 includes:
Pass through ruleThis weight vectors is updated, power next time after obtaining iteration once
Weight vector wt+1, J (w) is optimization object function, by maximizing J (w)=(zi t+1)Twt+1Calculating is obtained.
Specifically, maximizeJ (w) is solved, to next weight vectors wt+1It is updated.
WhereinWithIt is sample respectively
xiNeighbour's sample matrix in foreign peoples's sample and similar sample, k is neighbour's number that priori is set.αiAnd βiIt is different respectively
Class sample and similar sample xiOn coefficient vector.SolveIt is excellent
Change problem can obtain αi;SolveOptimization problem can obtain βi,
Therefore formula can be utilized by optimization object function J (w)To this weight vectors wt
It is updated the next weight vectors w after obtaining iteration oncet+1。
The weight update mode declined using gradient ensure that convergence, when convergence can ensure, it becomes possible to compared with
The stopping criterion of algorithm is reached soon, therefore can just reduce the complexity of calculating, reduces the time calculated.
The embodiment of the invention discloses a kind of specific medical data classification side selected based on local learning characteristic weight
Method, is different from an embodiment, and the present invention has done specific restriction to S107, other step contents and a upper embodiment substantially phase
Together, detailed content may refer to an embodiment, and here is omitted.Specifically, S107 includes:
The first data to be assessed are obtained, deviation standardization are carried out, and feature is carried out according to the aspect indexing subset
Selection obtains the second data to be assessed.
Specifically, credit data sample x to be assessed is obtained, the first data to be assessed, wherein x ∈ R are used asI, treated to first
The method for assessing the deviation standardization that data are introduced using above-described embodiment is standardized, i.e.,
It should be noted that the first data to be assessed used in the present invention are to carry out the number after profit standardization
According to carrying out deviation standardization to the first data to be assessed, equally avoid dimension and dimensional unit between characteristic
Different Effects data results, data are standardized, and are that each index of data to be assessed is in the same order of magnitude,
It is appropriate for Comprehensive Correlation evaluation.
The embodiment of the invention discloses a kind of specific medical data classification side selected based on local learning characteristic weight
Method, is different from an embodiment, and the present invention has done specific restriction to S108, other step contents and a upper embodiment substantially phase
Together, detailed content may refer to an embodiment, and here is omitted.Specifically, S108 includes:
The second data to be assessed are classified using k nearest neighbor grader on second sample set, classification knot is obtained
Really.
Specifically, in the second sample setOn the basis of, using k nearest neighbor grader to the second number to be assessed
Classified according to x ', obtain classification results, obtain classification results.Can be using this classification results to the first data sample to be assessed
This x is diagnosed.
A kind of medical data selected based on local learning characteristic weight provided in an embodiment of the present invention is classified below and filled
Put and be introduced, a kind of medical data sorter described below and a kind of above-described medical data sorting technique can be with
It is cross-referenced.
Referring to Fig. 2, a kind of medical data classification selected based on local learning characteristic weight provided in an embodiment of the present invention
Device, is specifically included:
First sample collection acquisition module 201, the first sample collection for obtaining medical data, obtains first sample attribute.
Specifically, first sample collection acquisition module 201 obtains the first sample collection of medical dataObtain
The sample attribute of first sample collection, is used as first sample attribute.Wherein xi∈RI, yi∈ 1,2 ..., and C } it is xiLabel, table
Bright xiClassification, N is the number of training sample, and I is the dimension of sample, and C is classification sum.
Initial weight limitation setup module 202, the initial weight vector for setting the first sample attribute, will be described
Initial weight vector is used as this weight vectors.
Specifically, initial weight limitation setup module 202 is set to initial weight vector, i.e., initial weight vector is w0=
[1/I,1/I,...,1/I]t, wherein t is iterations, and current t=0 does not start to iteration, by initial weight vector w0Make
For this weight vectors wt。
Next weight vectors acquisition module 203, the update mode for being declined by gradient is carried out to this weight vectors
Update, the next weight vectors after obtaining iteration once.
Specifically, an iteration is carried out to this weight vectors by next weight vectors acquisition module 203, i.e., by this
Weight vectors wtThe update mode declined using gradient is once updated, and obtains next weight vectors wt+1。
Judge module 204, for judging to go whether set pattern is then set up, if so, then using the next weight vectors as most
Whole weight vectors, call aspect indexing subset acquisition module;If it is not, then using next weight vectors as this weight vectors, adjusting
With the next weight vectors acquisition module;Wherein determine that rule is | | wt+1-wt| |≤θ, wtFor this weight vectors, wt+1For
Next weight vectors, θ is stopping criterion.
Specifically, a stopping criterion θ is set in judge module 204, judged | | wt+1-wt| | whether≤θ sets up, such as
Fruit is, then by next weight vectors wt+1It is used as final weight vector w, w=[w1,w2,...,wI]t∈ R, call aspect indexing
Collect acquisition module 205,;If it is not, then by next weight vectors wt+1It is used as this weight vectors wt, next weight is called again
Vectorial acquisition module 203, carries out new an iteration.
The aspect indexing subset acquisition module 205, for carrying out feature selecting according to final weight vector, is obtained
Aspect indexing subset.
Specifically, aspect indexing subset acquisition module 205 carries out feature choosing according to final weight vector w by nicety of grading
Select, obtain corresponding aspect indexing subsetRealize to the Feature Dimension Reduction of first sample, thus reduce amount of calculation with
And calculate the time.
Second sample set acquisition module 206, it is special for the first sample collection to be carried out according to the aspect indexing subset
Selection is levied, the second sample set after feature selecting is obtained.
Specifically, in the second sample set acquisition module 206, by first sample collectionAccording to aspect indexing
SubsetFeature selecting is carried out, the second sample set is obtainedEach of which sample xi∈R|F|, | F
| < I.
Second data acquisition module 207 to be assessed, for obtaining the first data to be assessed, and according to aspect indexing
Collection carries out feature selecting and obtains the second data to be assessed.
Specifically, the second data acquisition module 207 to be assessed obtains the first data sample x to be assessed, x ∈ RI, current sample
This x does not carry out dimension-reduction treatment, and sample dimension is I.By data sample according to aspect indexing subsetCarry out feature choosing
Select, obtain the second data x ' to be assessed.
Sort module 208, for classifying on second sample set to the second data to be assessed, obtains classification knot
Really.
Specifically, sort module 208 by the second data x ' to be assessed in the second sample setClassified,
Classification results are obtained, classification results are obtained.It can be diagnosed using this classification results data sample x to be assessed to first.
Therefore, a kind of medical data classification side selected based on local learning characteristic weight provided in an embodiment of the present invention
Method, the property value of sample is obtained by first sample collection acquisition module 201, is obtained according to property value in next weight vectors first
In module 203, the corresponding weight vectors of weight update mode computation attribute declined using gradient, therefore convergence can be ensured
Property, the stopping criterion of algorithm can be reached quickly, the calculating time is reduced, and reduce computation complexity;Second sample set obtains mould
Block 206 carries out feature selecting according to the weight vectors calculated and obtains optimal characteristics collection, the second data acquisition module 207 to be assessed
Feature selecting is carried out in optimal feature subset again after data sample to be assessed is standardized, then will be to be evaluated after feature selecting
Estimating data sample and being classified can just make data sample realize dimensionality reduction, therefore method provided in an embodiment of the present invention realizes dimensionality reduction
While reduce the complexity of calculating again, reduce the calculating time.
Classify the embodiment of the invention discloses a kind of specific medical data selected based on local learning characteristic weight and fill
Put, be different from an embodiment, the present invention has done specific restriction to first sample collection acquisition module 201, other step contents
Roughly the same with a upper embodiment, detailed content may refer to an embodiment, and here is omitted.Above-mentioned first sample collection is obtained
Modulus block 201 specifically for:
The first sample collection of medical data is obtained, first sample attribute is obtained, and deviation is carried out to the first sample collection
Standardization.
Specifically, first sample collection acquisition module 201 obtains the first sample collection of medical dataObtain
The sample attribute of first sample collection, is used as first sample attribute.Wherein xi∈RI, yi∈ 1,2 ..., and C } it is xiLabel, table
Bright xiClassification, N is the number of training sample, and I is the dimension of sample, and C is classification sum.
It should be noted that different characteristic attributes often has different dimension and dimensional unit, such situation meeting
The result of data analysis is had influence on, in order to eliminate influence that different dimensions and dimensional unit cause, it is necessary to first sample collectionDeviation standardization is carried out, to solve the comparativity between characteristic attribute data.Deviation standardization
Transfer function isWherein, xijFor j-th of attribute of i-th of sample,To take the maximum of attribute j in all training sample data,For all numbers
According to middle attribute j minimum value.After being standardized, each index of characteristic is all the same order of magnitude, be more conducive to pair
These data carry out Comprehensive Correlation evaluation, and the characteristic used in the embodiment of the present invention is to carry out after deviation standardization
Data.
Classify the embodiment of the invention discloses a kind of specific medical data selected based on local learning characteristic weight and fill
Put, be different from an embodiment, the present invention has done specific restriction to next weight vectors acquisition module 203, in other steps
Appearance is roughly the same with a upper embodiment, and detailed content may refer to an embodiment, and here is omitted.Above-mentioned next weight to
Measure acquisition module 203 specifically for:
Pass through ruleThis weight vectors is updated, power next time after obtaining iteration once
Weight vector wt+1, J (w) is by maximizing optimization object function J (w)=(zi t+1)Twt+1Calculating is obtained.
Specifically, in next weight vectors acquisition module 203, maximize firstSolve J
(w), to next weight vectors wt+1It is updated.
WhereinWithIt is sample respectively
xiNeighbour's sample matrix in foreign peoples's sample and similar sample, k is neighbour's number that priori is set.αiAnd βiIt is different respectively
Class sample and similar sample xiOn coefficient vector.SolveIt is excellent
Change problem can obtain αi;SolveOptimization problem can obtain βi,
Therefore formula can be utilized by J (w)To this weight vectors wtIt is updated and obtains
Iteration once after next weight vectors wt+1.Wherein, optimization object function J (w) is by maximizing J (w)=(zi t+1)Twt+1Meter
Obtain.
The weight update mode declined using gradient ensure that convergence, when convergence can ensure, it becomes possible to compared with
The stopping criterion of algorithm is reached soon, therefore can just reduce the complexity of calculating, reduces the time calculated.
Classify the embodiment of the invention discloses a kind of specific medical data selected based on local learning characteristic weight and fill
Put, be different from an embodiment, the present invention has done specific restriction to the second data acquisition module 207 to be assessed, other steps
Content is roughly the same with a upper embodiment, and detailed content may refer to an embodiment, and here is omitted.Above-mentioned second is to be evaluated
Estimate data acquisition module 207 specifically for:
The first data to be assessed are obtained, deviation standardization are carried out, and feature is carried out according to the aspect indexing subset
Selection obtains the second data to be assessed.
Specifically, the second data acquisition module 207 to be assessed obtains credit data sample x to be assessed, to be evaluated as first
Estimate data, wherein x ∈ RI, the method progress standard for the deviation standardization that the first data to be assessed are introduced using above-described embodiment
Change is handled, i.e.,
It should be noted that the first data to be assessed used in the present invention are to carry out the number after profit standardization
According to carrying out deviation standardization to the first data to be assessed, equally avoid dimension and dimensional unit between characteristic
Different Effects data results, data are standardized, and are that each index of data to be assessed is in the same order of magnitude,
It is appropriate for Comprehensive Correlation evaluation.
Classify the embodiment of the invention discloses a kind of specific medical data selected based on local learning characteristic weight and fill
Put, be different from an embodiment, the present invention has done specific restriction to sort module 208, and other step contents are implemented with upper one
Example is roughly the same, and detailed content may refer to an embodiment, and here is omitted.Above-mentioned sort module 208 specifically for:
The second data to be assessed are classified using k nearest neighbor grader on second sample set, classification knot is obtained
Really.
Specifically, sort module 208 is in the second sample setOn the basis of, using k nearest neighbor grader to
Two data x ' to be assessed are classified, and are obtained classification results, are obtained classification results.It can be treated using this classification results to first
Data sample x is assessed to be diagnosed.
The embodiment of the invention discloses a kind of medical data sorting technique based on local learning characteristic weight, specific bag
Include:
The embodiment of the present invention is tested in embryo data set (CNS) data set, altogether comprising 34 trouble in the data set
Person's sample, each sample has 7129 genes.This 34 samples include 25 classic medulloblastomas (C) and 9 promote knot
Hyperblastosis medulloblastoma (D) is formed, therefore has 2 classes.CNS data sets are divided into two subsets:23 training samples
(6 C, 17 D), for the weight of Select gene and adjustment grader, 11 test samples (3 C, 8 D), for evaluating
The performance of system acquired results.Each sample standard deviation has 7129 features.C is considered as the first kind by us, and D is considered as Equations of The Second Kind.
Specific implementation step is divided into two module progress, specific as follows:
Model training module:
S301, inputs medical data sample setIt is used as first sample collection, wherein xi∈RI, yi∈{1,
2 ..., C } it is xiLabel, show xiClassification, N is the number of training sample, and I is the dimension of sample, and C is classification sum.This
In N=23, I=7129, C=2.
S302, deviation standardization is carried out to the first sample collection, and transfer function isWherein, xijFor j-th of attribute of i-th of sample,
To take the maximum of attribute j in all training sample data,For the minimum of attribute j in all data
Value.
S303, sets the initial weight vector w of the first sample attribute0=[1/I, 1/I ..., 1/I]t, will be described first
Beginning weight vectors are used as this weight vectors.Wherein t is iterations, and current t=0 does not start to iteration, by initial weight
Vectorial w0It is used as this weight vectors wt, iterations is 30 times altogether, i.e., carry out 30 iteration altogether.
S304, the update mode declined by gradient is updated to this weight vectors, after obtaining iteration once under
Secondary weight vectors.
Specifically, maximizeSolving-optimizing object function J (w), to next weight vectors wt+1Enter
Row updates.
WhereinWithIt is sample respectively
xiNeighbour's sample matrix in foreign peoples's sample and similar sample, k is neighbour's number that priori is set.αiAnd βiIt is different respectively
Class sample and similar sample xiOn coefficient vector.SolveIt is excellent
Change problem can obtain αi;SolveOptimization problem can obtain βi,
Therefore formula can be utilized by J (w)To this weight vectors wtIt is updated and obtains
Iteration once after next weight vectors wt+1。
S305, judges to determine whether rule is set up, if so, then performing institute's time weight vectors as final weight vector
S306;If it is not, then using next weight vectors as this weight vectors, returning to S304;Wherein determine that rule is | | wt+1-wt||
≤ θ, wtFor this weight vectors, wt+1For next weight vectors, θ is stopping criterion.
Specifically, stopping criterion θ=0.001 is set, and judged | | wt+1-wt| | whether≤θ sets up, if set up,
Then by next weight vectors wt+1It is used as final weight vector w, w=[w1,w2,...,wI]t∈R7129, carry out S306;If not into
It is vertical, then by next weight vectors wt+1It is used as this weight vectors wt, and S304 is returned, carry out new an iteration.
S306, carries out feature selecting according to final weight vector, obtains aspect indexing subset.
Specifically, feature selecting is carried out by nicety of grading according to final weight vector w, obtains corresponding aspect indexing
CollectionThe Feature Dimension Reduction to first sample is realized, so as to reduce amount of calculation and calculating time.
S307, carries out feature selecting according to the aspect indexing subset by the first sample collection, obtains after feature selecting
The second sample set.
Specifically, by first sample collectionAccording to aspect indexing subsetCarry out feature selecting,
Obtain the second sample setEach of which sample xi∈R|F|, | F | < 7129.
Evaluation module:
S308, obtains the first data to be assessed.
Specifically, input credit data sample x to be assessed and be used as the first data sample to be assessed, x ∈ RI。
First data to be assessed are carried out deviation standardization by S309.
Specifically, credit data sample x to be assessed is obtained, the first data to be assessed, wherein x ∈ R are used asI, treated to first
The method for assessing the deviation standardization that data are introduced using above-described embodiment is standardized, i.e.,
S310, according to aspect indexing subsetFeature selecting is carried out to the first data to be assessed, second is obtained
Data x ' to be assessed.
Second data to be assessed are classified, divided by S311 on second sample set using k nearest neighbor grader
Class result.
Specifically, in the second sample setOn the basis of, using k nearest neighbor grader to the second number to be assessed
Classified according to x ', obtain classification results, obtain classification results.Can be using this classification results to the first data sample to be assessed
This x is diagnosed.
A kind of medical data sorting technique based on local learning characteristic weight is proposed by the present invention, to LH-RELIEF
Feature selection approach improved, extract 23 7129 dimension training sample in feature combination F, 1≤length (F)
≤ 7129, the test sample to 11 7129 dimensions is classified.The method that this experiment is proposed is with LH-RELIEF algorithms in identical
Compared on data set, 78 training samples are taken at random 10 times, convergence in mean result is as shown in figure 3, average behavior result such as Fig. 4
It is shown.It can be seen that the present invention than MSVM-RFE algorithmic statement obtain faster, it is identical have selected mutually homogenic number in the case of,
With more preferable classification performance.
Contrast during what table 1 gave that two methods each obtain be preferably averaged classification performance.The present invention compares LH-RELIEF
Method improves about 2 percentage points.
The contrast of the LH-RELIEF of table 1 and the present invention preferably classification performance
Method | Discrimination (%) |
The present invention | 70.91(10) |
LH-RELIEF | 69.09(10) |
The embodiment of each in this specification is described by the way of progressive, and what each embodiment was stressed is and other
Between the difference of embodiment, each embodiment identical similar portion mutually referring to.
The foregoing description of the disclosed embodiments, enables professional and technical personnel in the field to realize or using the present invention.
A variety of modifications to these embodiments will be apparent for those skilled in the art, as defined herein
General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, it is of the invention
The embodiments shown herein is not intended to be limited to, and is to fit to and principles disclosed herein and features of novelty phase one
The most wide scope caused.
Claims (10)
1. a kind of medical data sorting technique selected based on local learning characteristic weight, it is characterised in that including:
S101:The first sample collection of medical data is obtained, first sample attribute is obtained;
S102:Set the first sample attribute initial weight vector, using the initial weight vector as this weight to
Amount;
S103:The update mode declined by gradient is updated to this weight vectors, power next time after obtaining iteration once
Weight vector;
S104:Judge to determine whether rule is set up, if so, institute's time weight vectors then are performed into S105 as final weight vector;
If it is not, then using next weight vectors as this weight vectors, returning to S103;Wherein | | wt+1-wt| |≤θ is regular to determine, wt
For this weight vectors, wt+1For next weight vectors, θ is stopping criterion;
S105:Feature selecting is carried out according to final weight vector, aspect indexing subset is obtained;
S106:The first sample collection is subjected to feature selecting according to the aspect indexing subset, the after feature selecting is obtained
Two sample sets;
S107:Obtain the first data to be assessed, and feature selecting carried out according to the aspect indexing subset to obtain second to be assessed
Data;
S108:The second data to be assessed are classified on second sample set, classification results are obtained.
2. medical data sorting technique according to claim 1, it is characterised in that the first sample of the acquisition medical data
This collection, obtains first sample attribute, including:
The first sample collection of medical data is obtained, first sample attribute is obtained, and deviation standard is carried out to the first sample collection
Change is handled.
3. medical data sorting technique according to claim 1, it is characterised in that the renewal side declined by gradient
Formula is updated to this weight vectors, the next weight vectors after obtaining iteration once, including:
Pass through ruleThis weight vectors is updated, the next weight after obtaining iteration once to
Measure wt+1, J (w) is optimization object function, by maximizing J (w)=(zi t+1)Twt+1Calculating is obtained.
4. medical data sorting technique according to claim 1, it is characterised in that the acquisition first data to be assessed,
And the second data to be assessed are obtained according to aspect indexing subset progress feature selecting, including:
The first data to be assessed are obtained, deviation standardization are carried out, and feature selecting is carried out according to the aspect indexing subset
Obtain the second data to be assessed.
5. the medical data sorting technique according to Claims 1-4 any one, it is characterised in that in second sample
The second data to be assessed are classified on this collection, classification results are obtained, including:
The second data to be assessed are classified using k nearest neighbor grader on second sample set, classification results are obtained.
6. a kind of medical data sorter selected based on local learning characteristic weight, it is characterised in that including:
First sample collection acquisition module, the first sample collection for obtaining medical data, obtains first sample attribute;
Initial weight limitation setup module, the initial weight vector for setting the first sample attribute, by the initial power
Weight vector is used as this weight vectors;
Next weight vectors acquisition module, the update mode for being declined by gradient is updated to this weight vectors, is obtained
To iteration once after next weight vectors;
Judge module, for judge determine rule whether set up, if so, then using the next weight vectors as final weight to
Amount, calls aspect indexing subset acquisition module;If it is not, then using next weight vectors as this weight vectors, call it is described under
Secondary weight vectors acquisition module;Wherein determine that rule is | | wt+1-wt| |≤θ, wtFor this weight vectors, wt+1For next weight
Vector, θ is stopping criterion;
The aspect indexing subset acquisition module, for carrying out feature selecting according to final weight vector, obtains feature rope
Introduction collection;
Second sample set acquisition module, for the first sample collection to be carried out into feature selecting according to the aspect indexing subset,
Obtain the second sample set after feature selecting;
Second data acquisition module to be assessed, is carried out for obtaining the first data to be assessed, and according to the aspect indexing subset
Feature selecting obtains the second data to be assessed;
Sort module, for classifying on second sample set to the second data to be assessed, obtains classification results.
7. medical data sorter according to claim 6, it is characterised in that the first sample collection acquisition module tool
Body is used for:
The first sample collection of medical data is obtained, first sample attribute is obtained, and deviation standard is carried out to the first sample collection
Change is handled.
8. medical data sorter according to claim 6, it is characterised in that the next weight vectors acquisition module
Specifically for:
Pass through ruleThis weight vectors is updated, the next weight after obtaining iteration once to
Measure wt+1, J (w) is optimization object function, by maximizing J (w)=(zi t+1)Twt+1Calculating is obtained.
9. medical data sorter according to claim 6, it is characterised in that the second data acquisition mould to be assessed
Block specifically for:
The first data to be assessed are obtained, deviation standardization are carried out, and feature selecting is carried out according to the aspect indexing subset
Obtain the second data to be assessed.
10. the medical data sorter according to claim 6 to 9 any one, it is characterised in that the sort module
Specifically for:
The second data to be assessed are classified using k nearest neighbor grader on second sample set, classification results are obtained.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710419357.7A CN107193993A (en) | 2017-06-06 | 2017-06-06 | The medical data sorting technique and device selected based on local learning characteristic weight |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710419357.7A CN107193993A (en) | 2017-06-06 | 2017-06-06 | The medical data sorting technique and device selected based on local learning characteristic weight |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107193993A true CN107193993A (en) | 2017-09-22 |
Family
ID=59877175
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710419357.7A Pending CN107193993A (en) | 2017-06-06 | 2017-06-06 | The medical data sorting technique and device selected based on local learning characteristic weight |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107193993A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108763873A (en) * | 2018-05-28 | 2018-11-06 | 苏州大学 | A kind of gene sorting method and relevant device |
CN109243561A (en) * | 2018-08-10 | 2019-01-18 | 上海交通大学 | Model optimization method and system of treatment scheme recommendation system |
CN113657499A (en) * | 2021-08-17 | 2021-11-16 | 中国平安财产保险股份有限公司 | Rights and interests allocation method and device based on feature selection, electronic equipment and medium |
JP2022500798A (en) * | 2019-01-29 | 2022-01-04 | 深▲せん▼市商▲湯▼科技有限公司Shenzhen Sensetime Technology Co., Ltd. | Image processing methods and equipment, computer equipment and computer storage media |
CN113971604A (en) * | 2020-07-22 | 2022-01-25 | 中移(苏州)软件技术有限公司 | Data processing method, device and storage medium |
-
2017
- 2017-06-06 CN CN201710419357.7A patent/CN107193993A/en active Pending
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108763873A (en) * | 2018-05-28 | 2018-11-06 | 苏州大学 | A kind of gene sorting method and relevant device |
CN109243561A (en) * | 2018-08-10 | 2019-01-18 | 上海交通大学 | Model optimization method and system of treatment scheme recommendation system |
CN109243561B (en) * | 2018-08-10 | 2020-07-28 | 上海交通大学 | Model optimization method and system of treatment scheme recommendation system |
JP2022500798A (en) * | 2019-01-29 | 2022-01-04 | 深▲せん▼市商▲湯▼科技有限公司Shenzhen Sensetime Technology Co., Ltd. | Image processing methods and equipment, computer equipment and computer storage media |
JP7076648B2 (en) | 2019-01-29 | 2022-05-27 | 深▲セン▼市商▲湯▼科技有限公司 | Image processing methods and equipment, computer equipment and computer storage media |
CN113971604A (en) * | 2020-07-22 | 2022-01-25 | 中移(苏州)软件技术有限公司 | Data processing method, device and storage medium |
CN113657499A (en) * | 2021-08-17 | 2021-11-16 | 中国平安财产保险股份有限公司 | Rights and interests allocation method and device based on feature selection, electronic equipment and medium |
CN113657499B (en) * | 2021-08-17 | 2023-08-11 | 中国平安财产保险股份有限公司 | Rights and interests distribution method and device based on feature selection, electronic equipment and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107193993A (en) | The medical data sorting technique and device selected based on local learning characteristic weight | |
CN103544506B (en) | A kind of image classification method and device based on convolutional neural networks | |
US7236623B2 (en) | Analyte recognition for urinalysis diagnostic system | |
CN103489009B (en) | Mode identification method based on adaptive correction neutral net | |
CN110163234A (en) | A kind of model training method, device and storage medium | |
CN106845421A (en) | Face characteristic recognition methods and system based on multi-region feature and metric learning | |
CN109117380A (en) | A kind of method for evaluating software quality, device, equipment and readable storage medium storing program for executing | |
CN110222782A (en) | There are supervision two-category data analysis method and system based on Density Clustering | |
CN111834010A (en) | COVID-19 detection false negative identification method based on attribute reduction and XGboost | |
CN108416364A (en) | Integrated study data classification method is merged in subpackage | |
CN106326913A (en) | Money laundering account determination method and device | |
CN112633337A (en) | Unbalanced data processing method based on clustering and boundary points | |
CN110363230A (en) | Stacking integrated sewage handling failure diagnostic method based on weighting base classifier | |
CN110533116A (en) | Based on the adaptive set of Euclidean distance at unbalanced data classification method | |
CN112801231B (en) | Decision model training method and device for business object classification | |
CN111639882A (en) | Deep learning-based power utilization risk judgment method | |
CN113159216A (en) | Positive sample expansion method for surface defect detection | |
CN111414930B (en) | Deep learning model training method and device, electronic equipment and storage medium | |
Gunter et al. | Variable selection for optimal decision making | |
Oliveira et al. | A multi-objective approach for calibration and detection of cervical cells nuclei | |
CN101299242A (en) | Method and device for determining threshold value in human body skin tone detection | |
CN111639688B (en) | Local interpretation method of Internet of things intelligent model based on linear kernel SVM | |
Sangalli et al. | Expert load matters: operating networks at high accuracy and low manual effort | |
CN112488188A (en) | Feature selection method based on deep reinforcement learning | |
CN107766887A (en) | A kind of local weighted deficiency of data mixes clustering method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170922 |