CN106529598A - Classification method and system based on imbalanced medical image data set - Google Patents

Classification method and system based on imbalanced medical image data set Download PDF

Info

Publication number
CN106529598A
CN106529598A CN201610997896.4A CN201610997896A CN106529598A CN 106529598 A CN106529598 A CN 106529598A CN 201610997896 A CN201610997896 A CN 201610997896A CN 106529598 A CN106529598 A CN 106529598A
Authority
CN
China
Prior art keywords
sample
subset
medical image
image data
classifier
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610997896.4A
Other languages
Chinese (zh)
Other versions
CN106529598B (en
Inventor
韩赫
李建强
张苓琳
胡启东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN201610997896.4A priority Critical patent/CN106529598B/en
Publication of CN106529598A publication Critical patent/CN106529598A/en
Application granted granted Critical
Publication of CN106529598B publication Critical patent/CN106529598B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/03Recognition of patterns in medical or anatomical images

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a classification method and system based on an imbalanced medical image data set. The method comprises a step of extracting the green channel component of an original medical image, a step of using the histogram equalization to correct an extracted gray image, a step of extracting a texture feature, a wavelet feature and an auxiliary wheel feature from the corrected image, a step of ranking extracted feature samples according to a distance between the samples, a step of dividing uniform feature subsets on the ranked samples, and ensuring the difference between the subsets, a step of using an SVM algorithm and a BP neural network algorithm to train the feature subsets to produce sub classifiers, a step of combining the sub classifiers, and voting to obtain a final classification result. By using the technical scheme of the invention, the negative sample classification accuracy in multi-classification integrated learning is improved significantly, and the high skew of data set sample distribution and the negative sample accuracy in multi-classifier training in the medical field are improved obviously. The reduction of misdiagnosis is helped, and thus the practical value of the classifier is improved.

Description

A kind of sorting technique and system based on unbalanced medical image data sets
Technical field
The invention belongs to machine learning field, more particularly to a kind of sorting technique based on unbalanced medical image data sets With system.
Background technology
In many reality machines learning classification tasks, the training dataset of grader often has unbalanced point of height Sample of the sample size of cloth problem, i.e. some classes far more than other classes.And traditional learning algorithm is whole for grader Body nicety of grading is usually partial to for minority class mistake to be divided into many several classes ofs, but in many realistic problems, the classification essence of minority class Degree is important all the better.Such as medical diagnosis on disease, credit card fraud detecting, network intrusions detecting.For such classification problem, such as medical treatment is led It is exactly data set sample distribution high inclination that the data set in domain has a common feature, the number of positive sample (i.e. normal sample) Quantity of the amount far above negative sample (i.e. ill sample).The grader for training out with such data set has significantly " to be had Negative sample mistake can be divided into positive sample by bias ", and this is extremely serious for sufferer, causes mistaken diagnosis, misses optimal treatment Time.So the classification accuracy for effectively improving negative sample is most important.Same example also leaks through a credit card swindleness The loss deceived is more much smaller than the loss for refusing a normal person.So the learning method higher for minority class nicety of grading is past It is past more of practical meaning.
In this reality, the achievement of machine learning is pushed to by having hampered for the unbalanced problem of data set of generally existing Practical application, periodical " International Journal of Computer Science and Network " 2 months 2013 In 1st phase volume 2 by Rushi Longadge, Snehalata Dongre written paper " Class Imbalance Problem in Data Mining:Analysis and summary in Review " solve this problem existing method.It is specifically divided into three major types: Sampling, algorithm, feature selecting.Sampling is divided into lack sampling and over-sampling again, and most widely used wherein in lack sampling is random owing Sampling, it is balanced that random lack sampling reaches sample by removing the sample in many several classes ofs at random.But the method has one asks Topic, the useful information in many several classes of samples for removing simultaneously also are removed, and information will be caused to lose, impact final classification device Accuracy rate.And it is most widely used in over-sampling be random over-sampling, random over-sampling by replicate generate minority class sample To reach equiblibrium mass distribution.But the method there is also problem, exactly extra generation data not only increase the training time, and The extra similar minority class sample for generating is likely to result in grader over-fitting;Algorithm usually introduces " Cost- Sensitive " learning methods, the i.e. mistake point by improving minority class sample are lost, similar to the weights for increasing minority class sample, It is balanced with the cum rights for reaching data distribution.But the method has a problem that to be exactly between many several classes of samples and minority class sample There is no a general value in weights difference, this generally requires rule of thumb to judge or test repeatedly;Feature selecting then passes through Choose a subset of existing feature set to be that grader is optimal performance, this is conducive to the features training collection of high latitude.But It is that, as algorithm, the selection of subset does not equally have general subset, it is also desirable to micro-judgment and test repeatedly.
Meeting " International Conference on Knowledge Discovery&Data Mining " 1998 Year 164--168 page by Philip K Chan, written " the Toward Scalable Learning of Salvatore J Stolfo with Non-Uniform Class and Cost Distributions:A Case Study in Credit Card A kind of uniform sampling approach is proposed in Fraud Detection ", from unlike the method for sampling before, the method is not only The sample in many several classes ofs need not be ignored, cause useful information to be lost.And extra sample point will not be generated, produce the training time Increase, or cause the grader for producing to have over-fitting problem.Implement process as follows:
1. first most multiclass sample size in training set is rounded up divided by minority class sample size result, it is determined that training Collection quantity.
2. will averagely divide by subset quantity except many several classes ofs after.
3. the portion after dividing and then is therefrom extracted, it is poor with the sample size of minority class sample size random from other parts Extraction is gathered together enough.
4. finally gather with whole minority class samples as the uniform subset of sample, generate all subsets by that analogy.
The method not only make use of whole samples, not cause sample information to lose, while serving equalizing training concentration Minority class imbalanced training sets problem.And it is final it is demonstrated experimentally that not only increasing integrated study grader using the method for sampling Overall accuracy rate, and be obviously improved for the accuracy rate of minority class sample classification has.
In sum, in the solution of the unbalanced problem of training dataset in integrated study, adopting in pretreatment Quadrat method often has more preferable applicability.But the above-mentioned method of sampling all only only account in training set minority class sample with it is many The equal number problem of several classes of sample, without a property in view of Ensemble Learning Algorithms, i.e., the difference between sub-classifier Property.Because in integrated study, what is obtained is integrated, individual learner answers " well different ", and exactly individual learner will have Certain accuracy, i.e. learner performance can not be too poor, and will have diversity, i.e., will have difference between learner.Identical Learning algorithm under, increase the simplest method of otherness and be just to increase the otherness between training set.
The content of the invention
For the unbalanced problem of training set in integrated study, the present invention provides a kind of based on unbalanced medical image data The sorting technique and system of collection.
The present invention proposes a kind of new method of sampling, and each sample and minority class center of a sample in many several classes ofs is calculated before sampling Minkowski Distance between point, first extracts distant sample when extracting with minority class quantity identical sample in many several classes ofs This.So under the premise of ensureing that training set is uniform, while increased the difference between training set.According to the property of integrated study, The accuracy rate of minority class in classification is not only improved, and improves the overall accuracy rate of system.This should for the reality of grader With there is very strong practical significance.
For achieving the above object, the present invention is adopted the following technical scheme that:
The present invention provides a kind of sorting technique based on unbalanced medical image data sets, including:
Extract original medical image green channel component;
The gray level image extracted using histogram equalization amendment;
Respectively from revised image zooming-out textural characteristics, wavelet character, the auxiliary feature of wheel;
To the feature samples that extract by sample separation from sequence;
Uniform characteristics subset is divided to the sample after sequence, and ensures the otherness between subset;
Character subset is respectively trained using SVM algorithm and BP neural network algorithm and produces sub-classifier;
Combination sub-classifier, ballot draw final classification result.
Preferably, the green that the green channel component is colored medical image to be contained in 3 components of red, green, blue is divided Amount.
Preferably, the histogram equalization is the side that a kind of utilization greyscale transformation automatically adjusts picture contrast quality Method.
Preferably, the gray level image extracts green channel component image.
Preferably, the textural characteristics, wavelet character, the auxiliary feature of wheel are respectively:Medical image according to texture analysis at The feature that extracts after the feature that extracts after reason, Wavelet transformation process, take turns the feature extracted after auxiliary method is processed.
Preferably, the sample separation is with a distance from being calculated using Minkowski Distance formula.
Preferably, described by sample separation from sequencer procedure be:As a example by this sentences three classification, calculate first minimum The central point of sample in class, then by this basis of various kinds in secondary minority class and the Minkowski Distance of minority class central point from remote Sort near, then calculate the central point of minority class and all samples in secondary minority class, finally by this basis of various kinds in many several classes ofs With the Minkowski Distance of this central point from as far as nearly sequence, classify by that analogy more.
Preferably, the division uniform characteristics subset process is:Practice most multiclass sample size is concentrated divided by minimum class sample This quantity result rounds up, and determines training subset quantity;Afterwards other classes in addition to minimum class are averagely drawn by subset quantity Point, then other classes respectively extract the portion after dividing, and adjacent part from this part poor with the sample size of minimum class sample size is taken out Take and gather together enough;The sample of the last quantity such as all kinds of gathers as the uniform subset of sample, generates all uniform subsets by that analogy.
Preferably, it is described ensure subset between otherness be according to distance-taxis after ordered data collection, by training son Subset after collection quantity is divided is equally orderly, and having differences property from each other, i.e. distance are from as far as near.
Preferably, the use SVM algorithm is respectively trained character subset with BP neural network algorithm produces sub-classifier SVM algorithm and BP neural network Algorithm for Training are given respectively by ready-portioned character subset as, generate twice character subset Sub-classifier.
Preferably, the combination sub-classifier, votes and show that final classification result is:Test medical image is respectively by instructing The sub-classifier classification perfected, statistical classification result, most multiclass are final classification result.
The present invention also provides a kind of categorizing system based on unbalanced medical image data sets, including:
Green channel classification extraction element, is configured to extract original medical image green channel component;
Histogram equalization device, is configured to, with the gray level image that histogram equalization amendment is extracted;
Feature deriving means, are configured to from revised image zooming-out textural characteristics, wavelet character, take turns auxiliary spy Levy;
Sample collator, is configured to the feature samples for extracting by sample separation from sequence;
Uniform sampling device, is configured to divide the sample after sequence uniform characteristics subset, and ensures the difference between subset The opposite sex;
Sub-classifier trainer, is configured with SVM algorithm and is respectively trained character subset with BP neural network algorithm Produce sub-classifier;
As a result balloting device, is configured to combine sub-classifier, and ballot draws final classification result.
Preferably, the green that the green channel component is colored medical image to be contained in 3 components of red, green, blue is divided Amount.
Preferably, the histogram equalization is the side that a kind of utilization greyscale transformation automatically adjusts picture contrast quality Method.
Preferably, the gray level image extracts green channel component image.
Preferably, the textural characteristics, wavelet character, the auxiliary feature of wheel are respectively:Medical image according to texture analysis at The feature that extracts after the feature that extracts after reason, Wavelet transformation process, take turns the feature extracted after auxiliary method is processed.
Preferably, the sample separation is with a distance from being calculated using Minkowski Distance formula.
Preferably, the sample collator processing procedure is:As a example by this sentences three classification, calculate first minimum The central point of sample in class, then by this basis of various kinds in secondary minority class and the Minkowski Distance of minority class central point from remote Sort near, then calculate the central point of minority class and all samples in secondary minority class, finally by this basis of various kinds in many several classes ofs With the Minkowski Distance of this central point from as far as nearly sequence, classify by that analogy more.
Preferably, the uniform sampling device processing procedure is:Practice most multiclass sample size is concentrated divided by minimum class sample This quantity result rounds up, and determines training subset quantity;Afterwards other classes in addition to minimum class are averagely drawn by subset quantity Point, then other classes respectively extract the portion after dividing, and adjacent part from this part poor with the sample size of minimum class sample size is taken out Take and gather together enough;The sample of the last quantity such as all kinds of gathers as the uniform subset of sample, generates all uniform subsets by that analogy.
Preferably, it is described ensure subset between otherness be according to distance-taxis after ordered data collection, by training son Subset after collection quantity is divided is equally orderly, and having differences property from each other, i.e. distance are from as far as near.
Preferably, the sub-classifier trainer be by ready-portioned character subset give respectively SVM algorithm and BP neural network Algorithm for Training, generates the sub-classifier of twice character subset.
Categorizing system based on unbalanced medical image data sets according to claim 12, it is characterised in that institute Stating result balloting device processing procedure is:Test medical image is classified by the sub-classifier for training respectively, statistical classification result, Most multiclass is final classification result.
The new method of sampling proposed by the present invention is obviously improved to negative sample classification accuracy in many classification of integrated study, This in data set sample distribution high inclination in such as medical field, multi-categorizer training negative sample accuracy rate have and substantially carry Rise.Contribute to reducing mistaken diagnosis, so as to improve the practical value of grader.
Description of the drawings
With reference to accompanying drawing, from the following detailed description to the embodiment of the present invention, the present invention is better understood with, is similar in accompanying drawing Label indicate similar part, wherein:
Fig. 1 shows the one of the categorizing system based on unbalanced medical image data sets according to an embodiment of the invention Individual detailed diagram;
Fig. 2 shows the one of the sorting technique based on unbalanced medical image data sets according to an embodiment of the invention Individual detailed diagram;
Fig. 3 shows uniform sampling schematic diagram according to an embodiment of the invention.
Specific embodiment
The feature and exemplary embodiment of various aspects of the present invention is described more fully below.Explained below covers many Detail, to provide complete understanding of the present invention.It will be apparent, however, to one skilled in the art that The present invention can be implemented in the case of some details in not needing these details.Below to the description of embodiment only It is in order to the example by illustrating the present invention is providing to clearer understanding of the invention.The present invention is not limited to set forth below Any concrete configuration and algorithm, but cover coherent element, part and calculation under the premise of without departing from the spirit of the present invention Any modification, replacement and the improvement of method.
Multiple problems in view of the above, the present invention propose a kind of classification based on unbalanced medical image data sets Method and system.With reference to Fig. 1 and Fig. 2, the sorting technique based on unbalanced medical image data sets according to the present invention is illustrated With the example of system.Fig. 1 shows the classification system based on unbalanced medical image data sets according to an embodiment of the invention One detailed diagram of system;Fig. 2 shows dividing based on unbalanced medical image data sets according to an embodiment of the invention One detailed diagram of class method;
As shown in figure 1, including that green is logical according to a kind of categorizing system based on unbalanced medical image data sets of the present invention Road classification extraction element 101, histogram equalization device 102, feature deriving means 103, sample collator 104, uniformly adopt Sampling device 105, sub-classifier trainer 106, result balloting device 107.Their function is as follows:Extract original medical image Green channel component (that is, execution step S201).The gray level image extracted using histogram equalization amendment (that is, performs step Rapid S202).Respectively from revised image zooming-out textural characteristics, wavelet character, the auxiliary feature (that is, execution step S203) of wheel.It is right The feature samples for extracting are by sample separation from sequence (that is, execution step S204).Uniform characteristics are divided to the sample after sequence Subset, and ensure the otherness (that is, execution step S205) between subset.Instructed with BP neural network algorithm respectively using SVM algorithm Practice character subset and produce sub-classifier (that is, execution step S206).To combine sub-classifier, ballot draws final classification result (that is, execution step S207).
Specifically, sample collator 104 introduces the distance that Minkowski Distance is calculated between sample, and ordering rule is root According to this basis of various kinds in many several classes ofs with the Minkowski Distance of minority class central point from as far as nearly sequence.Uniform sampling device 105 be using sequence after sample set carry out uniform sampling because sample set in order, then can obtain the sample with otherness This subset.Below, provide the example by the sorting technique according to the present invention based on unbalanced medical image data sets and system:
This introduces detailed process as a example by sentencing eye fundus image.Colored eye fundus image contains 3 components of red, green, blue.Due to red Colouring component brightness highest, blood vessel and background contrasts it is low, be difficult to distinguish target blood and eyeground background;Blue component contrast It is low with brightness, and noise jamming is serious;The brightness of green component is moderate, and blood vessel is higher with background contrasts, can be very well The colored optical fundus blood vessel distribution of reaction.So extracting green channel (G passages) component to training set.
Histogram equalization is a kind of method that utilization greyscale transformation automatically adjusts picture contrast quality, and basic thought is Greyscale transformation function is obtained by the probability density function of gray level, it is one kind based on Cumulative Distribution Function transform method Histogram Modification Methods.So extracting the sorted gray level image of green channel using histogram equalization to correct to training set Image.
It is for revised gray level image, special from extraction is processed by wavelet transformation, the auxiliary method of wheel and texture analysis respectively Collection, as three kinds of independent data sets to train grader afterwards.Training set now is changed into three independent feature sets, point Wei not wavelet character collection, the auxiliary feature set of wheel and texture feature set.
It is exactly that data set sample distribution is highly inclined for the data set of these three medical fields has a common feature Tiltedly, quantity of the quantity of positive sample (i.e. normal sample) far above negative sample (i.e. ill sample).Trained with such data set Grader out has significantly " excess kurtosis ", negative sample mistake can be divided into positive sample, and this is very tight for sufferer Weight, mistaken diagnosis is caused, golden hour is missed.So the classification accuracy for effectively improving negative sample is most important.
As described in background technology, existing solution can not thoroughly solve the problem, then with reference to existing method, Propose positive and negative sample distribution during one kind not only can ensure training set balanced, and differences between samples between training subset can be improved The method of sampling of property, so as to effectively improve the overall accuracy rate of the classification accuracy and grader of negative sample.Detailed process is:Draw Enter the distance that Minkowski Distance is calculated between sample, computing formula is as follows:
Wherein, d12For the x of sample1And x2Between distance, p represents the dimension of sample point attribute, numbers of the k for property value.
As a example by this sentences three classification, the central point of in minimum several classes of sample is calculated first, then will be each in secondary minority class Sample, then is calculated in minority class and secondary minority class from as far as nearly sequence according to the Minkowski Distance of minority class central point The central point of all samples, finally by the Minkowski Distance of this basis of various kinds in many several classes ofs and this central point from as far as nearly row Sequence.Otherness sampling after sample after sequence is is ready.
For sorted three feature samples collection, the uniform sampling after sampling is improved respectively, as shown in figure 3, with three points As a example by class, first by (i.e. the first kind) sample size of most multiclass in training set divided by minimum class (i.e. the 3rd class) sample size result to On round, determine training subset quantity.Afterwards other classes in addition to minimum class are averagely divided by subset quantity.Then other classes Each portion extracted after dividing, from adjacent part of this part extract poor with the sample size of minimum class sample size are gathered together enough.It is last each The sample of the quantity such as class gathers as the uniform subset of sample, generates all subsets by that analogy.
This is arrived, using the new method of sampling, not only positive and negative sample distribution is balanced for the training subset of generation, and between subset Having differences property, the accuracy rate of the integrated grader negative sample of the sub-classifier gone out by these traineds and overall accuracy rate are all Can be lifted.And the method for sampling is all suitable for many classification in two classification.
The character subset for being obtained by the three category feature data sets sampling that previous step is obtained afterwards, respectively using SVMs Twice and character subset and separate sub-classifier are obtained with the training of BP neural network learning algorithm.
Most all mutually independent at last sub-classifier is combined, and test eye fundus image is respectively by the sub-classifier for training Classification, most statistical classification result, multiclass are final classification result.
The method and system are applicable not only to eye fundus image classification, and other unbalanced medical image classification are suitable for.
Need clearly, to the invention is not limited in particular configuration that is described above and illustrating in figure and process.Also, For brevity, the detailed description to known method technology is omitted here.In the above-described embodiments, have been described and illustrated some Concrete step is as an example.But, method of the present invention process is not limited to described and illustrated concrete steps, this area Technical staff can understand the present invention spirit after, be variously modified, change and add, or change step between Order.
Functional block shown in structures described above block diagram can be implemented as hardware, software, firmware or their group Close.When realizing in hardware, its may, for example, be electronic circuit, special IC (ASIC), appropriate firmware, insert Part, function card etc..When being realized with software mode, the element of the present invention is used to perform program or the generation of required task Code section.Program or code segment can be stored in machine readable media, or are being passed by the data-signal carried in carrier wave Defeated medium or communication links send." machine readable media " can include can store or transmission information any medium. The example of machine readable media includes electronic circuit, semiconductor memory devices, ROM, flash memory, erasable ROM (EROM), soft Disk, CD-ROM, CD, hard disk, fiber medium, radio frequency (RF) link, etc..Code segment can be via such as internet, inline The computer network of net etc. is downloaded.
The present invention can be realized in other specific forms, without deviating from its spirit and essential characteristics.For example, particular implementation Algorithm described in example can be changed, and system architecture is without departing from the essence spirit of the present invention.Therefore, it is current Embodiment be all counted as in all respects being exemplary rather than it is determinate, the scope of the present invention by claims rather than Foregoing description is defined, also, the whole changes fallen in the range of the implication and equivalent of claim are so as to all be included in Among the scope of the present invention.

Claims (10)

1. a kind of sorting technique based on unbalanced medical image data sets, it is characterised in that include:
Extract original medical image green channel component;
The gray level image extracted using histogram equalization amendment;
Respectively from revised image zooming-out textural characteristics, wavelet character, the auxiliary feature of wheel;
To the feature samples that extract by sample separation from sequence;
Uniform characteristics subset is divided to the sample after sequence, and ensures the otherness between subset;
Character subset is respectively trained using SVM algorithm and BP neural network algorithm and produces sub-classifier;
Combination sub-classifier, ballot draw final classification result.
2. the sorting technique based on unbalanced medical image data sets according to claim 1, it is characterised in that described to press Sample separation from sequencer procedure is:The central point of sample in minimum several classes of is calculated first, then by each sample in secondary minority class , then own in calculating minority class and secondary minority class from as far as nearly sequence according to the Minkowski Distance of minority class central point The central point of sample, finally by the Minkowski Distance of this basis of various kinds in many several classes ofs and this central point from as far as nearly sequence, Many classification are by that analogy.
3. the sorting technique based on unbalanced medical image data sets according to claim 1, it is characterised in that described stroke Point uniform characteristics subset process is:Practice and concentrate most multiclass sample size to round up divided by minimum class sample size result, it is determined that Training subset quantity;Afterwards other classes in addition to minimum class are averagely divided by subset quantity, then other classes respectively extract division Portion afterwards, from adjacent part of this part extract poor with the sample size of minimum class sample size are gathered together enough;The quantity such as finally all kinds of Sample gathers as the uniform subset of sample, generates all uniform subsets by that analogy.
4. the sorting technique based on unbalanced medical image data sets according to claim 1, it is characterised in that the guarantor Card subset between otherness be according to distance-taxis after ordered data collection, by training subset quantity divide after subset equally have Sequence, and having differences property from each other, i.e. distance are from as far as near.
5. the sorting technique based on unbalanced medical image data sets according to claim 1, it is characterised in that described to make It is respectively trained character subset and is produced sub-classifier and be with SVM algorithm and BP neural network algorithm and ready-portioned character subset is divided SVM algorithm and BP neural network Algorithm for Training are not given, the sub-classifier of twice character subset is generated.
6. a kind of categorizing system based on unbalanced medical image data sets, it is characterised in that include:
Green channel classification extraction element, is configured to extract original medical image green channel component;
Histogram equalization device, is configured to, with the gray level image that histogram equalization amendment is extracted;
Feature deriving means, are configured to from revised image zooming-out textural characteristics, wavelet character, take turns auxiliary feature;
Sample collator, is configured to the feature samples for extracting by sample separation from sequence;
Uniform sampling device, is configured to divide the sample after sequence uniform characteristics subset, and ensures the otherness between subset;
Sub-classifier trainer, is configured with SVM algorithm and is respectively trained character subset generation with BP neural network algorithm Sub-classifier;
As a result balloting device, is configured to combine sub-classifier, and ballot draws final classification result.
7. the categorizing system based on unbalanced medical image data sets according to claim 6, it is characterised in that the sample This collator processing procedure is:The central point of sample in minimum several classes of as a example by this sentences three classification, is calculated first, then will In secondary minority class, the Minkowski Distance of this basis of various kinds and minority class central point is from as far as nearly sequence, then calculates minority class With the central point of all samples in secondary minority class, finally by the Minkowski of this basis of various kinds in many several classes ofs and this central point away from From sorting from close to, classify by that analogy more.
8. the categorizing system based on unbalanced medical image data sets according to claim 6, it is characterised in that it is described Even sampling apparatus processing procedure is:Practice and concentrate most multiclass sample size to round up divided by minimum class sample size result, it is determined that Training subset quantity;Afterwards other classes in addition to minimum class are averagely divided by subset quantity, then other classes respectively extract division Portion afterwards, from adjacent part of this part extract poor with the sample size of minimum class sample size are gathered together enough;The quantity such as finally all kinds of Sample gathers as the uniform subset of sample, generates all uniform subsets by that analogy.
9. the categorizing system based on unbalanced medical image data sets according to claim 6, it is characterised in that the guarantor Card subset between otherness be according to distance-taxis after ordered data collection, by training subset quantity divide after subset equally have Sequence, and having differences property from each other, i.e. distance are from as far as near.
10. the categorizing system based on unbalanced medical image data sets according to claim 6, it is characterised in that described Sub-classifier trainer is and gives SVM algorithm and BP neural network Algorithm for Training respectively by ready-portioned character subset, raw Into the sub-classifier of twice character subset.
CN201610997896.4A 2016-11-11 2016-11-11 Method and system for classifying medical image data sets based on imbalance Active CN106529598B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610997896.4A CN106529598B (en) 2016-11-11 2016-11-11 Method and system for classifying medical image data sets based on imbalance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610997896.4A CN106529598B (en) 2016-11-11 2016-11-11 Method and system for classifying medical image data sets based on imbalance

Publications (2)

Publication Number Publication Date
CN106529598A true CN106529598A (en) 2017-03-22
CN106529598B CN106529598B (en) 2020-05-08

Family

ID=58351504

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610997896.4A Active CN106529598B (en) 2016-11-11 2016-11-11 Method and system for classifying medical image data sets based on imbalance

Country Status (1)

Country Link
CN (1) CN106529598B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108230322A (en) * 2018-01-28 2018-06-29 浙江大学 A kind of eyeground feature detection device based on weak sample labeling
CN108805091A (en) * 2018-06-15 2018-11-13 北京字节跳动网络技术有限公司 Method and apparatus for generating model
CN108846405A (en) * 2018-04-11 2018-11-20 东莞迪赛软件技术有限公司 Uneven medical insurance data classification method based on SSGAN
CN110069997A (en) * 2019-03-22 2019-07-30 北京字节跳动网络技术有限公司 Scene classification method, device and electronic equipment
CN110704662A (en) * 2019-10-17 2020-01-17 广东工业大学 Image classification method and system
CN111046891A (en) * 2018-10-11 2020-04-21 杭州海康威视数字技术股份有限公司 Training method of license plate recognition model, and license plate recognition method and device
CN111758105A (en) * 2018-05-18 2020-10-09 谷歌有限责任公司 Learning data enhancement strategy
CN112138394A (en) * 2020-10-16 2020-12-29 腾讯科技(深圳)有限公司 Image processing method, image processing device, electronic equipment and computer readable storage medium
CN112491797A (en) * 2020-10-28 2021-03-12 北京工业大学 Intrusion detection method and system based on unbalanced industrial control data set

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101989289A (en) * 2009-08-06 2011-03-23 富士通株式会社 Data clustering method and device
CN104091073A (en) * 2014-07-11 2014-10-08 中国人民解放军国防科学技术大学 Sampling method for unbalanced transaction data of fictitious assets
CN104809226A (en) * 2015-05-07 2015-07-29 武汉大学 Method for early classifying imbalance multi-variable time sequence data
CN105760889A (en) * 2016-03-01 2016-07-13 中国科学技术大学 Efficient imbalanced data set classification method
CN106056130A (en) * 2016-05-18 2016-10-26 天津大学 Combined downsampling linear discrimination classification method for unbalanced data sets

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101989289A (en) * 2009-08-06 2011-03-23 富士通株式会社 Data clustering method and device
CN104091073A (en) * 2014-07-11 2014-10-08 中国人民解放军国防科学技术大学 Sampling method for unbalanced transaction data of fictitious assets
CN104809226A (en) * 2015-05-07 2015-07-29 武汉大学 Method for early classifying imbalance multi-variable time sequence data
CN105760889A (en) * 2016-03-01 2016-07-13 中国科学技术大学 Efficient imbalanced data set classification method
CN106056130A (en) * 2016-05-18 2016-10-26 天津大学 Combined downsampling linear discrimination classification method for unbalanced data sets

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JI-JIANG YANG 等: "Exploiting ensemble learning for automatic cataract detection and grading", 《COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE》 *
胡志军 等: "基于距离排序的快速支持向量机分类算法", 《计算机应用与软件》 *
陈红波: "基于多分类器选择集成的农作物叶部病害识别研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108230322A (en) * 2018-01-28 2018-06-29 浙江大学 A kind of eyeground feature detection device based on weak sample labeling
CN108230322B (en) * 2018-01-28 2021-11-09 浙江大学 Eye ground characteristic detection device based on weak sample mark
CN108846405A (en) * 2018-04-11 2018-11-20 东莞迪赛软件技术有限公司 Uneven medical insurance data classification method based on SSGAN
CN111758105A (en) * 2018-05-18 2020-10-09 谷歌有限责任公司 Learning data enhancement strategy
CN108805091A (en) * 2018-06-15 2018-11-13 北京字节跳动网络技术有限公司 Method and apparatus for generating model
CN108805091B (en) * 2018-06-15 2021-08-10 北京字节跳动网络技术有限公司 Method and apparatus for generating a model
CN111046891A (en) * 2018-10-11 2020-04-21 杭州海康威视数字技术股份有限公司 Training method of license plate recognition model, and license plate recognition method and device
CN110069997A (en) * 2019-03-22 2019-07-30 北京字节跳动网络技术有限公司 Scene classification method, device and electronic equipment
CN110069997B (en) * 2019-03-22 2021-07-20 北京字节跳动网络技术有限公司 Scene classification method and device and electronic equipment
CN110704662A (en) * 2019-10-17 2020-01-17 广东工业大学 Image classification method and system
CN112138394A (en) * 2020-10-16 2020-12-29 腾讯科技(深圳)有限公司 Image processing method, image processing device, electronic equipment and computer readable storage medium
CN112491797A (en) * 2020-10-28 2021-03-12 北京工业大学 Intrusion detection method and system based on unbalanced industrial control data set

Also Published As

Publication number Publication date
CN106529598B (en) 2020-05-08

Similar Documents

Publication Publication Date Title
CN106529598A (en) Classification method and system based on imbalanced medical image data set
CN107330889B (en) A kind of Chinese medicine tongue color coating colour automatic analysis method based on convolutional neural networks
CN102842032B (en) Method for recognizing pornography images on mobile Internet based on multi-mode combinational strategy
Duggal et al. Prediction of thyroid disorders using advanced machine learning techniques
CN108399431A (en) Disaggregated model training method and sorting technique
CN110210486A (en) A kind of generation confrontation transfer learning method based on sketch markup information
CN107563428A (en) Classification of Polarimetric SAR Image method based on generation confrontation network
CN108776774A (en) A kind of human facial expression recognition method based on complexity categorization of perception algorithm
CN110084803A (en) Eye fundus image method for evaluating quality based on human visual system
CN108460421A (en) The sorting technique of unbalanced data
CN109800781A (en) A kind of image processing method, device and computer readable storage medium
CN109993201A (en) A kind of image processing method, device and readable storage medium storing program for executing
CN108764302A (en) A kind of bill images sorting technique based on color characteristic and bag of words feature
Zhu et al. Automatic diabetic retinopathy screening via cascaded framework based on image-and lesion-level features fusion
CN109635669A (en) Image classification method, the training method of device and disaggregated model, device
Jun et al. Tournament based ranking CNN for the cataract grading
Urdal et al. Prognostic prediction of histopathological images by local binary patterns and RUSBoost
Paswan et al. Detection and classification of blood cancer from microscopic cell images using SVM KNN and NN classifier
Manjramkar Survey of diabetic retinopathy screening methods
CN106250913A (en) A kind of combining classifiers licence plate recognition method based on local canonical correlation analysis
Rampun et al. Breast density classification using local ternary patterns in mammograms
CN104361224B (en) Confidence sorting technique and confidence machine
CN109472307A (en) A kind of method and apparatus of training image disaggregated model
CN113486202A (en) Method for classifying small sample images
CN108510483A (en) A kind of calculating using VLAD codings and SVM generates color image tamper detection method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant