CN107193993A - The medical data sorting technique and device selected based on local learning characteristic weight - Google Patents

The medical data sorting technique and device selected based on local learning characteristic weight Download PDF

Info

Publication number
CN107193993A
CN107193993A CN201710419357.7A CN201710419357A CN107193993A CN 107193993 A CN107193993 A CN 107193993A CN 201710419357 A CN201710419357 A CN 201710419357A CN 107193993 A CN107193993 A CN 107193993A
Authority
CN
China
Prior art keywords
data
sample
assessed
weight
weight vectors
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710419357.7A
Other languages
Chinese (zh)
Inventor
张莉
黄晓娟
王邦军
张召
李凡长
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou University
Original Assignee
Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University filed Critical Suzhou University
Priority to CN201710419357.7A priority Critical patent/CN107193993A/en
Publication of CN107193993A publication Critical patent/CN107193993A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of medical data sorting technique selected based on local learning characteristic weight, the property value of sample is obtained according to training sample set first, the corresponding weight vectors of weight update mode computation attribute that gradient declines are utilized according to property value, therefore convergence can be ensured, the stopping criterion of algorithm can be reached quickly, the calculating time is reduced, computation complexity is reduced;Feature selecting, which is carried out, according to the weight vectors calculated obtains optimal characteristics collection, feature selecting is carried out in optimal feature subset again after data sample to be assessed is standardized, the data sample to be assessed after feature selecting, which is classified, again can just make data sample realize dimensionality reduction, therefore method provided in an embodiment of the present invention realizes the complexity for reducing calculating while dimensionality reduction again, reduces the calculating time.Present invention also offers a kind of medical data sorter selected based on local learning characteristic weight, above-mentioned technique effect can be equally realized.

Description

The medical data sorting technique and device selected based on local learning characteristic weight
Technical field
The present invention relates to medical diagnostic field, selected more specifically to a kind of based on local learning characteristic weight Medical data sorting technique and device.
Background technology
With the development of artificial intelligence, computer technology also plays an important role in the medical field, realizes medical treatment Artificial intelligence in field.A large amount of authoritative knowledge experiences of the physianthropy expert of computer technology and various fields are blended, Medical diagnosis system is developed, various clinical problems can be efficiently solved, the effect of auxiliary diagnosis is served.
In medical diagnosis system, DNA microarray technology, i.e. genetic chip are introduced, just can be using genetic chip The level of the quantitative substantial amounts of gene expression data of analysis of same time, just can be with the essence of postgraduate's thing by these data. But it is due to the development of DNA microarray technology, result in the explosive increase of gene expression data, in these substantial amounts of gene table Important gene is selected up in data, new challenge is proposed for prior art.
Local hyperplane (Local Hyperlane, LH-Relief) algorithm can be realized to be entered to lots of genes expression data Row dimensionality reduction, that is, screen out otiose gene expression data, selects important gene, the problem of reducing redundancy.But should Algorithm causes the calculating of algorithm to be answered in the application containing noisy data and high position data, convergence can not be guaranteed Miscellaneous degree is high.
Therefore, while how realization to lots of genes Data Dimensionality Reduction, the computation complexity of algorithm is reduced, is this area skill The problem of art personnel need to solve.
The content of the invention
It is an object of the invention to provide it is a kind of based on local learning characteristic weight select medical data sorting technique, with Realizing reduces the computation complexity of algorithm while to lots of genes Data Dimensionality Reduction.
To achieve the above object, the embodiments of the invention provide following technical scheme:
A kind of medical data sorting technique selected based on local learning characteristic weight, including:
S101:The first sample collection of medical data is obtained, first sample attribute is obtained;
S102:The initial weight vector of the first sample attribute is set, initial weight vector is regard as this power Weight vector;
S103:The update mode declined by gradient is updated to this weight vectors, after obtaining iteration once under Secondary weight vectors;
S104:Judge to determine whether rule is set up, if so, then performing institute's time weight vectors as final weight vector S105;If it is not, then using next weight vectors as this weight vectors, returning to S103;Wherein | | wt+1-wt| |≤θ advises for determination Then, wtFor this weight vectors, wt+1For next weight vectors, θ is stopping criterion;
S105:Feature selecting is carried out according to final weight vector, aspect indexing subset is obtained;
S106:The first sample collection is subjected to feature selecting according to the aspect indexing subset, obtained after feature selecting The second sample set;
S107:The first data to be assessed are obtained, and second is obtained according to aspect indexing subset progress feature selecting and are treated Assess data;
S108:The second data to be assessed are classified on second sample set, classification results are obtained.
Preferably, the first sample collection for obtaining medical data, obtains first sample attribute, including:
The first sample collection of medical data is obtained, first sample attribute is obtained, and deviation is carried out to the first sample collection Standardization;
Preferably, the update mode declined by gradient is updated to this weight vectors, obtains iteration once Next weight vectors afterwards, including:
Pass through ruleThis weight vectors is updated, power next time after obtaining iteration once Weight vector wt+1, J (w) is optimization object function, by maximizing J (w)=(zi t+1)Twt+1Calculating is obtained.
Preferably, the first data to be assessed of the acquisition, and obtained according to aspect indexing subset progress feature selecting Second data to be assessed, including:
The first data to be assessed are obtained, deviation standardization are carried out, and feature is carried out according to the aspect indexing subset Selection obtains the second data to be assessed.
Preferably, the second data to be assessed are classified on second sample set, obtains classification results, including:
The second data to be assessed are classified using k nearest neighbor grader on second sample set, classification knot is obtained Really.
A kind of medical data sorter selected based on local learning characteristic weight, including:
First sample collection acquisition module, the first sample collection for obtaining medical data, obtains first sample attribute;
Initial weight limitation setup module, the initial weight vector for setting the first sample attribute, will be described first Beginning weight vectors are used as this weight vectors;
Next weight vectors acquisition module, the update mode for being declined by gradient is carried out more to this weight vectors Newly, the next weight vectors after obtaining iteration once;
Judge module, for judging to determine whether rule is set up, if so, then regarding the next weight vectors as final power Weight vector, calls aspect indexing subset acquisition module;If it is not, then using next weight vectors as this weight vectors, calling institute State next weight vectors acquisition module;Wherein determine that rule is | | wt+1-wt| |≤θ, wtFor this weight vectors, wt+1For next time Weight vectors, θ is stopping criterion;
The aspect indexing subset acquisition module, for carrying out feature selecting according to final weight vector, obtains spy Levy subset of indices;
Second sample set acquisition module, for the first sample collection to be carried out into feature choosing according to the aspect indexing subset Select, obtain the second sample set after feature selecting;
Second data acquisition module to be assessed, for obtaining the first data to be assessed, and according to the aspect indexing subset Carry out feature selecting and obtain the second data to be assessed;
Sort module, for classifying on second sample set to the second data to be assessed, obtains classification results.
Preferably, the first sample collection acquisition module specifically for:
The first sample collection of medical data is obtained, first sample attribute is obtained, and deviation is carried out to the first sample collection Standardization.
Preferably, the next weight vectors acquisition module specifically for:
Pass through ruleThis weight vectors is updated, power next time after obtaining iteration once Weight vector wt+1, J (w) is optimization object function, by maximizing J (w)=(zi t+1)Twt+1Calculating is obtained.
Preferably, the described second data acquisition module to be assessed specifically for:
The first data to be assessed are obtained, deviation standardization are carried out, and feature is carried out according to the aspect indexing subset Selection obtains the second data to be assessed.
Preferably, the sort module specifically for:
The second data to be assessed are classified using k nearest neighbor grader on second sample set, classification knot is obtained Really.
By above scheme, a kind of medical treatment selected based on local learning characteristic weight provided in an embodiment of the present invention Data classification method, obtains the property value of sample according to training sample set first, and the weight that gradient declines is utilized according to property value The corresponding weight vectors of update mode computation attribute, therefore convergence can be ensured, it can reach that the stopping of algorithm is accurate quickly Then, the calculating time is reduced, reduces computation complexity;Feature selecting, which is carried out, according to the weight vectors calculated obtains optimal characteristics Collection, feature selecting is carried out in optimal feature subset after data sample to be assessed is standardized again, then by after feature selecting Data sample to be assessed, which is classified, can just make data sample realize dimensionality reduction, therefore method provided in an embodiment of the present invention is realized The complexity of calculating is reduced while dimensionality reduction again, the calculating time is reduced.Present invention also offers one kind based on local study The medical data sorter of feature weight selection, can equally realize above-mentioned technique effect.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing There is the accompanying drawing used required in technology description to be briefly described, it should be apparent that, drawings in the following description are only this Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, can be with Other accompanying drawings are obtained according to these accompanying drawings.
Fig. 1 is a kind of medical data sorting technique flow chart disclosed in the embodiment of the present invention;
Fig. 2 is a kind of medical data sorter structural representation disclosed in the embodiment of the present invention;
Fig. 3 is the convergence Comparative result of a kind of medical data sorting technique and LH-RELIEF disclosed in the embodiment of the present invention Figure.
Fig. 4 is a kind of medical data sorting technique and LH-RELIEF average behavior performance disclosed in the embodiment of the present invention Comparison diagram.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation is described, it is clear that described embodiment is only a part of embodiment of the invention, rather than whole embodiments.It is based on Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of creative work is not made Embodiment, belongs to the scope of protection of the invention.
Referring to Fig. 1, the embodiment of the invention discloses a kind of medical data classification selected based on local learning characteristic weight Method.Specifically:
S101:The first sample collection of medical data is obtained, first sample attribute is obtained.
Specifically, the first sample collection of medical data is obtainedThe sample attribute of first sample collection is obtained, It is used as first sample attribute.Wherein xi∈RI, yi∈ 1,2 ..., and C } it is xiLabel, show xiClassification, N is training sample Number, I is the dimension of sample, and C is classification sum.
S102:The initial weight vector of the first sample attribute is set, initial weight vector is regard as this power Weight vector.
Specifically, initial weight vector w is set0=[1/I, 1/I ..., 1/I]t, wherein t is iterations, current t= 0, that is, iteration is not started to, by initial weight vector w0It is used as this weight vectors wt
S103:The update mode declined by gradient is updated to this weight vectors, after obtaining iteration once under Secondary weight vectors.
Specifically, an iteration is carried out, i.e., by this weight vectors wtThe update mode declined using gradient is carried out once Update, obtain next weight vectors wt+1
S104:Judge to determine whether rule is set up, if so, then performing institute's time weight vectors as final weight vector S105;If it is not, then using next weight vectors as this weight vectors, returning to S103;Wherein determine that rule is | | wt+1-wt|| ≤ θ, wtFor this weight vectors, wt+1For next weight vectors, θ is stopping criterion.
Specifically, a stopping criterion θ is set, and judged | | wt+1-wt| | whether≤θ sets up, if set up, by under Secondary weight vectors wt+1It is used as final weight vector w, w=[w1,w2,...,wI]t∈ R, carry out S105;, will if invalid Next weight vectors wt+1It is used as this weight vectors wt, and S103 is returned, carry out new an iteration.
S105:Feature selecting is carried out according to final weight vector, aspect indexing subset is obtained.
Specifically, feature selecting is carried out by nicety of grading according to final weight vector w, obtains corresponding aspect indexing CollectionThe Feature Dimension Reduction to first sample is realized, so as to reduce amount of calculation and calculating time.
S106:The first sample collection is subjected to feature selecting according to the aspect indexing subset, obtained after feature selecting The second sample set.
Specifically, by first sample collectionAccording to aspect indexing subsetCarry out feature selecting, Obtain the second sample setEach of which sample xi∈R|F|, | F | < I.
S107:The first data to be assessed are obtained, and second is obtained according to aspect indexing subset progress feature selecting and are treated Assess data.
Specifically, the first data sample x to be assessed, x ∈ R are obtainedI, current sample x do not carry out dimension-reduction treatment, sample dimension For I.By data sample according to aspect indexing subsetFeature selecting is carried out, the second data x ' to be assessed is obtained.
S108:The second data to be assessed are classified on second sample set, classification results are obtained.
Specifically, in the second sample setData x ' to be assessed to second classifies, and obtains classification knot Really, classification results are obtained.It can be diagnosed using this classification results data sample x to be assessed to first.
Therefore, a kind of medical data classification side selected based on local learning characteristic weight provided in an embodiment of the present invention Method, obtains the property value of sample according to training sample set first, and the weight update mode meter that gradient declines is utilized according to property value The corresponding weight vectors of attribute are calculated, therefore convergence can be ensured, the stopping criterion of algorithm can be reached quickly, reduces and calculates Time, reduce computation complexity;Feature selecting is carried out according to the weight vectors calculated and obtains optimal characteristics collection, by number to be assessed Carry out feature selecting after being standardized according to sample in optimal feature subset again, then by the data sample to be assessed after feature selecting Being classified can just make data sample realize dimensionality reduction, therefore method provided in an embodiment of the present invention drops while realize dimensionality reduction again The low complexity calculated, reduces the calculating time.
The embodiment of the invention discloses a kind of specific medical data classification side selected based on local learning characteristic weight Method, is different from an embodiment, and the present invention has done specific restriction to S101, other step contents and a upper embodiment substantially phase Together, detailed content may refer to an embodiment, and here is omitted.Specifically, S101 includes:
The first sample collection of medical data is obtained, first sample attribute is obtained, and deviation is carried out to the first sample collection Standardization;
Specifically, the first sample collection of medical data is obtainedThe sample attribute of first sample collection is obtained, It is used as first sample attribute.Wherein xi∈RI, yi∈ 1,2 ..., and C } it is xiLabel, show xiClassification, N is training sample Number, I is the dimension of sample, and C is classification sum.
It should be noted that different characteristic attributes often has different dimension and dimensional unit, such situation meeting The result of data analysis is had influence on, in order to eliminate influence that different dimensions and dimensional unit cause, it is necessary to first sample collectionDeviation standardization is carried out, to solve the comparativity between characteristic attribute data.Deviation standardization Transfer function isWherein, xijFor j-th of attribute of i-th of sample,To take the maximum of attribute j in all training sample data,For all numbers According to middle attribute j minimum value.After being standardized, each index of characteristic is all the same order of magnitude, be more conducive to pair These data carry out Comprehensive Correlation evaluation, and the characteristic used in the embodiment of the present invention is to carry out after deviation standardization Data.
The embodiment of the invention discloses a kind of specific medical data classification side selected based on local learning characteristic weight Method, is different from an embodiment, and the present invention has done specific restriction to S103, other step contents and a upper embodiment substantially phase Together, detailed content may refer to an embodiment, and here is omitted.Specifically, S103 includes:
Pass through ruleThis weight vectors is updated, power next time after obtaining iteration once Weight vector wt+1, J (w) is optimization object function, by maximizing J (w)=(zi t+1)Twt+1Calculating is obtained.
Specifically, maximizeJ (w) is solved, to next weight vectors wt+1It is updated.
WhereinWithIt is sample respectively xiNeighbour's sample matrix in foreign peoples's sample and similar sample, k is neighbour's number that priori is set.αiAnd βiIt is different respectively Class sample and similar sample xiOn coefficient vector.SolveIt is excellent Change problem can obtain αi;SolveOptimization problem can obtain βi,
Therefore formula can be utilized by optimization object function J (w)To this weight vectors wt It is updated the next weight vectors w after obtaining iteration oncet+1
The weight update mode declined using gradient ensure that convergence, when convergence can ensure, it becomes possible to compared with The stopping criterion of algorithm is reached soon, therefore can just reduce the complexity of calculating, reduces the time calculated.
The embodiment of the invention discloses a kind of specific medical data classification side selected based on local learning characteristic weight Method, is different from an embodiment, and the present invention has done specific restriction to S107, other step contents and a upper embodiment substantially phase Together, detailed content may refer to an embodiment, and here is omitted.Specifically, S107 includes:
The first data to be assessed are obtained, deviation standardization are carried out, and feature is carried out according to the aspect indexing subset Selection obtains the second data to be assessed.
Specifically, credit data sample x to be assessed is obtained, the first data to be assessed, wherein x ∈ R are used asI, treated to first The method for assessing the deviation standardization that data are introduced using above-described embodiment is standardized, i.e.,
It should be noted that the first data to be assessed used in the present invention are to carry out the number after profit standardization According to carrying out deviation standardization to the first data to be assessed, equally avoid dimension and dimensional unit between characteristic Different Effects data results, data are standardized, and are that each index of data to be assessed is in the same order of magnitude, It is appropriate for Comprehensive Correlation evaluation.
The embodiment of the invention discloses a kind of specific medical data classification side selected based on local learning characteristic weight Method, is different from an embodiment, and the present invention has done specific restriction to S108, other step contents and a upper embodiment substantially phase Together, detailed content may refer to an embodiment, and here is omitted.Specifically, S108 includes:
The second data to be assessed are classified using k nearest neighbor grader on second sample set, classification knot is obtained Really.
Specifically, in the second sample setOn the basis of, using k nearest neighbor grader to the second number to be assessed Classified according to x ', obtain classification results, obtain classification results.Can be using this classification results to the first data sample to be assessed This x is diagnosed.
A kind of medical data selected based on local learning characteristic weight provided in an embodiment of the present invention is classified below and filled Put and be introduced, a kind of medical data sorter described below and a kind of above-described medical data sorting technique can be with It is cross-referenced.
Referring to Fig. 2, a kind of medical data classification selected based on local learning characteristic weight provided in an embodiment of the present invention Device, is specifically included:
First sample collection acquisition module 201, the first sample collection for obtaining medical data, obtains first sample attribute.
Specifically, first sample collection acquisition module 201 obtains the first sample collection of medical dataObtain The sample attribute of first sample collection, is used as first sample attribute.Wherein xi∈RI, yi∈ 1,2 ..., and C } it is xiLabel, table Bright xiClassification, N is the number of training sample, and I is the dimension of sample, and C is classification sum.
Initial weight limitation setup module 202, the initial weight vector for setting the first sample attribute, will be described Initial weight vector is used as this weight vectors.
Specifically, initial weight limitation setup module 202 is set to initial weight vector, i.e., initial weight vector is w0= [1/I,1/I,...,1/I]t, wherein t is iterations, and current t=0 does not start to iteration, by initial weight vector w0Make For this weight vectors wt
Next weight vectors acquisition module 203, the update mode for being declined by gradient is carried out to this weight vectors Update, the next weight vectors after obtaining iteration once.
Specifically, an iteration is carried out to this weight vectors by next weight vectors acquisition module 203, i.e., by this Weight vectors wtThe update mode declined using gradient is once updated, and obtains next weight vectors wt+1
Judge module 204, for judging to go whether set pattern is then set up, if so, then using the next weight vectors as most Whole weight vectors, call aspect indexing subset acquisition module;If it is not, then using next weight vectors as this weight vectors, adjusting With the next weight vectors acquisition module;Wherein determine that rule is | | wt+1-wt| |≤θ, wtFor this weight vectors, wt+1For Next weight vectors, θ is stopping criterion.
Specifically, a stopping criterion θ is set in judge module 204, judged | | wt+1-wt| | whether≤θ sets up, such as Fruit is, then by next weight vectors wt+1It is used as final weight vector w, w=[w1,w2,...,wI]t∈ R, call aspect indexing Collect acquisition module 205,;If it is not, then by next weight vectors wt+1It is used as this weight vectors wt, next weight is called again Vectorial acquisition module 203, carries out new an iteration.
The aspect indexing subset acquisition module 205, for carrying out feature selecting according to final weight vector, is obtained Aspect indexing subset.
Specifically, aspect indexing subset acquisition module 205 carries out feature choosing according to final weight vector w by nicety of grading Select, obtain corresponding aspect indexing subsetRealize to the Feature Dimension Reduction of first sample, thus reduce amount of calculation with And calculate the time.
Second sample set acquisition module 206, it is special for the first sample collection to be carried out according to the aspect indexing subset Selection is levied, the second sample set after feature selecting is obtained.
Specifically, in the second sample set acquisition module 206, by first sample collectionAccording to aspect indexing SubsetFeature selecting is carried out, the second sample set is obtainedEach of which sample xi∈R|F|, | F | < I.
Second data acquisition module 207 to be assessed, for obtaining the first data to be assessed, and according to aspect indexing Collection carries out feature selecting and obtains the second data to be assessed.
Specifically, the second data acquisition module 207 to be assessed obtains the first data sample x to be assessed, x ∈ RI, current sample This x does not carry out dimension-reduction treatment, and sample dimension is I.By data sample according to aspect indexing subsetCarry out feature choosing Select, obtain the second data x ' to be assessed.
Sort module 208, for classifying on second sample set to the second data to be assessed, obtains classification knot Really.
Specifically, sort module 208 by the second data x ' to be assessed in the second sample setClassified, Classification results are obtained, classification results are obtained.It can be diagnosed using this classification results data sample x to be assessed to first.
Therefore, a kind of medical data classification side selected based on local learning characteristic weight provided in an embodiment of the present invention Method, the property value of sample is obtained by first sample collection acquisition module 201, is obtained according to property value in next weight vectors first In module 203, the corresponding weight vectors of weight update mode computation attribute declined using gradient, therefore convergence can be ensured Property, the stopping criterion of algorithm can be reached quickly, the calculating time is reduced, and reduce computation complexity;Second sample set obtains mould Block 206 carries out feature selecting according to the weight vectors calculated and obtains optimal characteristics collection, the second data acquisition module 207 to be assessed Feature selecting is carried out in optimal feature subset again after data sample to be assessed is standardized, then will be to be evaluated after feature selecting Estimating data sample and being classified can just make data sample realize dimensionality reduction, therefore method provided in an embodiment of the present invention realizes dimensionality reduction While reduce the complexity of calculating again, reduce the calculating time.
Classify the embodiment of the invention discloses a kind of specific medical data selected based on local learning characteristic weight and fill Put, be different from an embodiment, the present invention has done specific restriction to first sample collection acquisition module 201, other step contents Roughly the same with a upper embodiment, detailed content may refer to an embodiment, and here is omitted.Above-mentioned first sample collection is obtained Modulus block 201 specifically for:
The first sample collection of medical data is obtained, first sample attribute is obtained, and deviation is carried out to the first sample collection Standardization.
Specifically, first sample collection acquisition module 201 obtains the first sample collection of medical dataObtain The sample attribute of first sample collection, is used as first sample attribute.Wherein xi∈RI, yi∈ 1,2 ..., and C } it is xiLabel, table Bright xiClassification, N is the number of training sample, and I is the dimension of sample, and C is classification sum.
It should be noted that different characteristic attributes often has different dimension and dimensional unit, such situation meeting The result of data analysis is had influence on, in order to eliminate influence that different dimensions and dimensional unit cause, it is necessary to first sample collectionDeviation standardization is carried out, to solve the comparativity between characteristic attribute data.Deviation standardization Transfer function isWherein, xijFor j-th of attribute of i-th of sample,To take the maximum of attribute j in all training sample data,For all numbers According to middle attribute j minimum value.After being standardized, each index of characteristic is all the same order of magnitude, be more conducive to pair These data carry out Comprehensive Correlation evaluation, and the characteristic used in the embodiment of the present invention is to carry out after deviation standardization Data.
Classify the embodiment of the invention discloses a kind of specific medical data selected based on local learning characteristic weight and fill Put, be different from an embodiment, the present invention has done specific restriction to next weight vectors acquisition module 203, in other steps Appearance is roughly the same with a upper embodiment, and detailed content may refer to an embodiment, and here is omitted.Above-mentioned next weight to Measure acquisition module 203 specifically for:
Pass through ruleThis weight vectors is updated, power next time after obtaining iteration once Weight vector wt+1, J (w) is by maximizing optimization object function J (w)=(zi t+1)Twt+1Calculating is obtained.
Specifically, in next weight vectors acquisition module 203, maximize firstSolve J (w), to next weight vectors wt+1It is updated.
WhereinWithIt is sample respectively xiNeighbour's sample matrix in foreign peoples's sample and similar sample, k is neighbour's number that priori is set.αiAnd βiIt is different respectively Class sample and similar sample xiOn coefficient vector.SolveIt is excellent Change problem can obtain αi;SolveOptimization problem can obtain βi,
Therefore formula can be utilized by J (w)To this weight vectors wtIt is updated and obtains Iteration once after next weight vectors wt+1.Wherein, optimization object function J (w) is by maximizing J (w)=(zi t+1)Twt+1Meter Obtain.
The weight update mode declined using gradient ensure that convergence, when convergence can ensure, it becomes possible to compared with The stopping criterion of algorithm is reached soon, therefore can just reduce the complexity of calculating, reduces the time calculated.
Classify the embodiment of the invention discloses a kind of specific medical data selected based on local learning characteristic weight and fill Put, be different from an embodiment, the present invention has done specific restriction to the second data acquisition module 207 to be assessed, other steps Content is roughly the same with a upper embodiment, and detailed content may refer to an embodiment, and here is omitted.Above-mentioned second is to be evaluated Estimate data acquisition module 207 specifically for:
The first data to be assessed are obtained, deviation standardization are carried out, and feature is carried out according to the aspect indexing subset Selection obtains the second data to be assessed.
Specifically, the second data acquisition module 207 to be assessed obtains credit data sample x to be assessed, to be evaluated as first Estimate data, wherein x ∈ RI, the method progress standard for the deviation standardization that the first data to be assessed are introduced using above-described embodiment Change is handled, i.e.,
It should be noted that the first data to be assessed used in the present invention are to carry out the number after profit standardization According to carrying out deviation standardization to the first data to be assessed, equally avoid dimension and dimensional unit between characteristic Different Effects data results, data are standardized, and are that each index of data to be assessed is in the same order of magnitude, It is appropriate for Comprehensive Correlation evaluation.
Classify the embodiment of the invention discloses a kind of specific medical data selected based on local learning characteristic weight and fill Put, be different from an embodiment, the present invention has done specific restriction to sort module 208, and other step contents are implemented with upper one Example is roughly the same, and detailed content may refer to an embodiment, and here is omitted.Above-mentioned sort module 208 specifically for:
The second data to be assessed are classified using k nearest neighbor grader on second sample set, classification knot is obtained Really.
Specifically, sort module 208 is in the second sample setOn the basis of, using k nearest neighbor grader to Two data x ' to be assessed are classified, and are obtained classification results, are obtained classification results.It can be treated using this classification results to first Data sample x is assessed to be diagnosed.
The embodiment of the invention discloses a kind of medical data sorting technique based on local learning characteristic weight, specific bag Include:
The embodiment of the present invention is tested in embryo data set (CNS) data set, altogether comprising 34 trouble in the data set Person's sample, each sample has 7129 genes.This 34 samples include 25 classic medulloblastomas (C) and 9 promote knot Hyperblastosis medulloblastoma (D) is formed, therefore has 2 classes.CNS data sets are divided into two subsets:23 training samples (6 C, 17 D), for the weight of Select gene and adjustment grader, 11 test samples (3 C, 8 D), for evaluating The performance of system acquired results.Each sample standard deviation has 7129 features.C is considered as the first kind by us, and D is considered as Equations of The Second Kind. Specific implementation step is divided into two module progress, specific as follows:
Model training module:
S301, inputs medical data sample setIt is used as first sample collection, wherein xi∈RI, yi∈{1, 2 ..., C } it is xiLabel, show xiClassification, N is the number of training sample, and I is the dimension of sample, and C is classification sum.This In N=23, I=7129, C=2.
S302, deviation standardization is carried out to the first sample collection, and transfer function isWherein, xijFor j-th of attribute of i-th of sample, To take the maximum of attribute j in all training sample data,For the minimum of attribute j in all data Value.
S303, sets the initial weight vector w of the first sample attribute0=[1/I, 1/I ..., 1/I]t, will be described first Beginning weight vectors are used as this weight vectors.Wherein t is iterations, and current t=0 does not start to iteration, by initial weight Vectorial w0It is used as this weight vectors wt, iterations is 30 times altogether, i.e., carry out 30 iteration altogether.
S304, the update mode declined by gradient is updated to this weight vectors, after obtaining iteration once under Secondary weight vectors.
Specifically, maximizeSolving-optimizing object function J (w), to next weight vectors wt+1Enter Row updates.
WhereinWithIt is sample respectively xiNeighbour's sample matrix in foreign peoples's sample and similar sample, k is neighbour's number that priori is set.αiAnd βiIt is different respectively Class sample and similar sample xiOn coefficient vector.SolveIt is excellent Change problem can obtain αi;SolveOptimization problem can obtain βi,
Therefore formula can be utilized by J (w)To this weight vectors wtIt is updated and obtains Iteration once after next weight vectors wt+1
S305, judges to determine whether rule is set up, if so, then performing institute's time weight vectors as final weight vector S306;If it is not, then using next weight vectors as this weight vectors, returning to S304;Wherein determine that rule is | | wt+1-wt|| ≤ θ, wtFor this weight vectors, wt+1For next weight vectors, θ is stopping criterion.
Specifically, stopping criterion θ=0.001 is set, and judged | | wt+1-wt| | whether≤θ sets up, if set up, Then by next weight vectors wt+1It is used as final weight vector w, w=[w1,w2,...,wI]t∈R7129, carry out S306;If not into It is vertical, then by next weight vectors wt+1It is used as this weight vectors wt, and S304 is returned, carry out new an iteration.
S306, carries out feature selecting according to final weight vector, obtains aspect indexing subset.
Specifically, feature selecting is carried out by nicety of grading according to final weight vector w, obtains corresponding aspect indexing CollectionThe Feature Dimension Reduction to first sample is realized, so as to reduce amount of calculation and calculating time.
S307, carries out feature selecting according to the aspect indexing subset by the first sample collection, obtains after feature selecting The second sample set.
Specifically, by first sample collectionAccording to aspect indexing subsetCarry out feature selecting, Obtain the second sample setEach of which sample xi∈R|F|, | F | < 7129.
Evaluation module:
S308, obtains the first data to be assessed.
Specifically, input credit data sample x to be assessed and be used as the first data sample to be assessed, x ∈ RI
First data to be assessed are carried out deviation standardization by S309.
Specifically, credit data sample x to be assessed is obtained, the first data to be assessed, wherein x ∈ R are used asI, treated to first The method for assessing the deviation standardization that data are introduced using above-described embodiment is standardized, i.e.,
S310, according to aspect indexing subsetFeature selecting is carried out to the first data to be assessed, second is obtained Data x ' to be assessed.
Second data to be assessed are classified, divided by S311 on second sample set using k nearest neighbor grader Class result.
Specifically, in the second sample setOn the basis of, using k nearest neighbor grader to the second number to be assessed Classified according to x ', obtain classification results, obtain classification results.Can be using this classification results to the first data sample to be assessed This x is diagnosed.
A kind of medical data sorting technique based on local learning characteristic weight is proposed by the present invention, to LH-RELIEF Feature selection approach improved, extract 23 7129 dimension training sample in feature combination F, 1≤length (F) ≤ 7129, the test sample to 11 7129 dimensions is classified.The method that this experiment is proposed is with LH-RELIEF algorithms in identical Compared on data set, 78 training samples are taken at random 10 times, convergence in mean result is as shown in figure 3, average behavior result such as Fig. 4 It is shown.It can be seen that the present invention than MSVM-RFE algorithmic statement obtain faster, it is identical have selected mutually homogenic number in the case of, With more preferable classification performance.
Contrast during what table 1 gave that two methods each obtain be preferably averaged classification performance.The present invention compares LH-RELIEF Method improves about 2 percentage points.
The contrast of the LH-RELIEF of table 1 and the present invention preferably classification performance
Method Discrimination (%)
The present invention 70.91(10)
LH-RELIEF 69.09(10)
The embodiment of each in this specification is described by the way of progressive, and what each embodiment was stressed is and other Between the difference of embodiment, each embodiment identical similar portion mutually referring to.
The foregoing description of the disclosed embodiments, enables professional and technical personnel in the field to realize or using the present invention. A variety of modifications to these embodiments will be apparent for those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, it is of the invention The embodiments shown herein is not intended to be limited to, and is to fit to and principles disclosed herein and features of novelty phase one The most wide scope caused.

Claims (10)

1. a kind of medical data sorting technique selected based on local learning characteristic weight, it is characterised in that including:
S101:The first sample collection of medical data is obtained, first sample attribute is obtained;
S102:Set the first sample attribute initial weight vector, using the initial weight vector as this weight to Amount;
S103:The update mode declined by gradient is updated to this weight vectors, power next time after obtaining iteration once Weight vector;
S104:Judge to determine whether rule is set up, if so, institute's time weight vectors then are performed into S105 as final weight vector; If it is not, then using next weight vectors as this weight vectors, returning to S103;Wherein | | wt+1-wt| |≤θ is regular to determine, wt For this weight vectors, wt+1For next weight vectors, θ is stopping criterion;
S105:Feature selecting is carried out according to final weight vector, aspect indexing subset is obtained;
S106:The first sample collection is subjected to feature selecting according to the aspect indexing subset, the after feature selecting is obtained Two sample sets;
S107:Obtain the first data to be assessed, and feature selecting carried out according to the aspect indexing subset to obtain second to be assessed Data;
S108:The second data to be assessed are classified on second sample set, classification results are obtained.
2. medical data sorting technique according to claim 1, it is characterised in that the first sample of the acquisition medical data This collection, obtains first sample attribute, including:
The first sample collection of medical data is obtained, first sample attribute is obtained, and deviation standard is carried out to the first sample collection Change is handled.
3. medical data sorting technique according to claim 1, it is characterised in that the renewal side declined by gradient Formula is updated to this weight vectors, the next weight vectors after obtaining iteration once, including:
Pass through ruleThis weight vectors is updated, the next weight after obtaining iteration once to Measure wt+1, J (w) is optimization object function, by maximizing J (w)=(zi t+1)Twt+1Calculating is obtained.
4. medical data sorting technique according to claim 1, it is characterised in that the acquisition first data to be assessed, And the second data to be assessed are obtained according to aspect indexing subset progress feature selecting, including:
The first data to be assessed are obtained, deviation standardization are carried out, and feature selecting is carried out according to the aspect indexing subset Obtain the second data to be assessed.
5. the medical data sorting technique according to Claims 1-4 any one, it is characterised in that in second sample The second data to be assessed are classified on this collection, classification results are obtained, including:
The second data to be assessed are classified using k nearest neighbor grader on second sample set, classification results are obtained.
6. a kind of medical data sorter selected based on local learning characteristic weight, it is characterised in that including:
First sample collection acquisition module, the first sample collection for obtaining medical data, obtains first sample attribute;
Initial weight limitation setup module, the initial weight vector for setting the first sample attribute, by the initial power Weight vector is used as this weight vectors;
Next weight vectors acquisition module, the update mode for being declined by gradient is updated to this weight vectors, is obtained To iteration once after next weight vectors;
Judge module, for judge determine rule whether set up, if so, then using the next weight vectors as final weight to Amount, calls aspect indexing subset acquisition module;If it is not, then using next weight vectors as this weight vectors, call it is described under Secondary weight vectors acquisition module;Wherein determine that rule is | | wt+1-wt| |≤θ, wtFor this weight vectors, wt+1For next weight Vector, θ is stopping criterion;
The aspect indexing subset acquisition module, for carrying out feature selecting according to final weight vector, obtains feature rope Introduction collection;
Second sample set acquisition module, for the first sample collection to be carried out into feature selecting according to the aspect indexing subset, Obtain the second sample set after feature selecting;
Second data acquisition module to be assessed, is carried out for obtaining the first data to be assessed, and according to the aspect indexing subset Feature selecting obtains the second data to be assessed;
Sort module, for classifying on second sample set to the second data to be assessed, obtains classification results.
7. medical data sorter according to claim 6, it is characterised in that the first sample collection acquisition module tool Body is used for:
The first sample collection of medical data is obtained, first sample attribute is obtained, and deviation standard is carried out to the first sample collection Change is handled.
8. medical data sorter according to claim 6, it is characterised in that the next weight vectors acquisition module Specifically for:
Pass through ruleThis weight vectors is updated, the next weight after obtaining iteration once to Measure wt+1, J (w) is optimization object function, by maximizing J (w)=(zi t+1)Twt+1Calculating is obtained.
9. medical data sorter according to claim 6, it is characterised in that the second data acquisition mould to be assessed Block specifically for:
The first data to be assessed are obtained, deviation standardization are carried out, and feature selecting is carried out according to the aspect indexing subset Obtain the second data to be assessed.
10. the medical data sorter according to claim 6 to 9 any one, it is characterised in that the sort module Specifically for:
The second data to be assessed are classified using k nearest neighbor grader on second sample set, classification results are obtained.
CN201710419357.7A 2017-06-06 2017-06-06 The medical data sorting technique and device selected based on local learning characteristic weight Pending CN107193993A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710419357.7A CN107193993A (en) 2017-06-06 2017-06-06 The medical data sorting technique and device selected based on local learning characteristic weight

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710419357.7A CN107193993A (en) 2017-06-06 2017-06-06 The medical data sorting technique and device selected based on local learning characteristic weight

Publications (1)

Publication Number Publication Date
CN107193993A true CN107193993A (en) 2017-09-22

Family

ID=59877175

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710419357.7A Pending CN107193993A (en) 2017-06-06 2017-06-06 The medical data sorting technique and device selected based on local learning characteristic weight

Country Status (1)

Country Link
CN (1) CN107193993A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108763873A (en) * 2018-05-28 2018-11-06 苏州大学 A kind of gene sorting method and relevant device
CN109243561A (en) * 2018-08-10 2019-01-18 上海交通大学 Model optimization method and system of treatment scheme recommendation system
CN113657499A (en) * 2021-08-17 2021-11-16 中国平安财产保险股份有限公司 Rights and interests allocation method and device based on feature selection, electronic equipment and medium
JP2022500798A (en) * 2019-01-29 2022-01-04 深▲せん▼市商▲湯▼科技有限公司Shenzhen Sensetime Technology Co., Ltd. Image processing methods and equipment, computer equipment and computer storage media
CN113971604A (en) * 2020-07-22 2022-01-25 中移(苏州)软件技术有限公司 Data processing method, device and storage medium

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108763873A (en) * 2018-05-28 2018-11-06 苏州大学 A kind of gene sorting method and relevant device
CN109243561A (en) * 2018-08-10 2019-01-18 上海交通大学 Model optimization method and system of treatment scheme recommendation system
CN109243561B (en) * 2018-08-10 2020-07-28 上海交通大学 Model optimization method and system of treatment scheme recommendation system
JP2022500798A (en) * 2019-01-29 2022-01-04 深▲せん▼市商▲湯▼科技有限公司Shenzhen Sensetime Technology Co., Ltd. Image processing methods and equipment, computer equipment and computer storage media
JP7076648B2 (en) 2019-01-29 2022-05-27 深▲セン▼市商▲湯▼科技有限公司 Image processing methods and equipment, computer equipment and computer storage media
CN113971604A (en) * 2020-07-22 2022-01-25 中移(苏州)软件技术有限公司 Data processing method, device and storage medium
CN113657499A (en) * 2021-08-17 2021-11-16 中国平安财产保险股份有限公司 Rights and interests allocation method and device based on feature selection, electronic equipment and medium
CN113657499B (en) * 2021-08-17 2023-08-11 中国平安财产保险股份有限公司 Rights and interests distribution method and device based on feature selection, electronic equipment and medium

Similar Documents

Publication Publication Date Title
CN107193993A (en) The medical data sorting technique and device selected based on local learning characteristic weight
CN103544506B (en) A kind of image classification method and device based on convolutional neural networks
US7236623B2 (en) Analyte recognition for urinalysis diagnostic system
CN103489009B (en) Mode identification method based on adaptive correction neutral net
CN110163234A (en) A kind of model training method, device and storage medium
CN106845421A (en) Face characteristic recognition methods and system based on multi-region feature and metric learning
CN109117380A (en) A kind of method for evaluating software quality, device, equipment and readable storage medium storing program for executing
CN110222782A (en) There are supervision two-category data analysis method and system based on Density Clustering
CN111834010A (en) COVID-19 detection false negative identification method based on attribute reduction and XGboost
CN108416364A (en) Integrated study data classification method is merged in subpackage
CN106326913A (en) Money laundering account determination method and device
CN112633337A (en) Unbalanced data processing method based on clustering and boundary points
CN110363230A (en) Stacking integrated sewage handling failure diagnostic method based on weighting base classifier
CN110533116A (en) Based on the adaptive set of Euclidean distance at unbalanced data classification method
CN112801231B (en) Decision model training method and device for business object classification
CN111639882A (en) Deep learning-based power utilization risk judgment method
CN113159216A (en) Positive sample expansion method for surface defect detection
CN111414930B (en) Deep learning model training method and device, electronic equipment and storage medium
Gunter et al. Variable selection for optimal decision making
Oliveira et al. A multi-objective approach for calibration and detection of cervical cells nuclei
CN101299242A (en) Method and device for determining threshold value in human body skin tone detection
CN111639688B (en) Local interpretation method of Internet of things intelligent model based on linear kernel SVM
Sangalli et al. Expert load matters: operating networks at high accuracy and low manual effort
CN112488188A (en) Feature selection method based on deep reinforcement learning
CN107766887A (en) A kind of local weighted deficiency of data mixes clustering method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170922