CN111091163A - Minimum distance classification method and device, computer equipment and storage medium - Google Patents

Minimum distance classification method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN111091163A
CN111091163A CN202010210657.6A CN202010210657A CN111091163A CN 111091163 A CN111091163 A CN 111091163A CN 202010210657 A CN202010210657 A CN 202010210657A CN 111091163 A CN111091163 A CN 111091163A
Authority
CN
China
Prior art keywords
feature vector
test sample
feature
class
calculating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010210657.6A
Other languages
Chinese (zh)
Other versions
CN111091163B (en
Inventor
周才健
周柔刚
杨亮亮
盛锦华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Guangyuan Intelligent Technology Co ltd
Jinhua Mstar Intelligent Technology Co ltd
Suzhou Huicui Intelligent Technology Co Ltd
Hangzhou Huicui Intelligent Technology Co ltd
Original Assignee
Guangdong Guangyuan Intelligent Technology Co ltd
Jinhua Mstar Intelligent Technology Co ltd
Suzhou Huicui Intelligent Technology Co Ltd
Hangzhou Huicui Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Guangyuan Intelligent Technology Co ltd, Jinhua Mstar Intelligent Technology Co ltd, Suzhou Huicui Intelligent Technology Co Ltd, Hangzhou Huicui Intelligent Technology Co ltd filed Critical Guangdong Guangyuan Intelligent Technology Co ltd
Priority to CN202010210657.6A priority Critical patent/CN111091163B/en
Publication of CN111091163A publication Critical patent/CN111091163A/en
Application granted granted Critical
Publication of CN111091163B publication Critical patent/CN111091163B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to a minimum distance classification method, a minimum distance classification device, a computer device and a storage medium. The method comprises the following steps: acquiring a training sample data set; the training sample data set comprises a plurality of classes, each class comprises a plurality of training samples, each training sample is provided with a class label, and each training sample comprises a plurality of feature vectors and values of the feature vectors; calculating classification parameters according to the training sample data set; the classification parameters comprise central feature vectors corresponding to the feature vectors of each category for calculating weighted normalized correction distance, a total mean value of each feature vector and a total standard deviation of each feature vector; constructing a classifier according to the classification parameters; and inputting the feature vector of the test sample into the classifier to obtain the prediction category of the test sample. By adopting the method, the classification accuracy of the test sample can be improved.

Description

Minimum distance classification method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of image processing technologies, and in particular, to a minimum distance classification method and apparatus, a computer device, and a storage medium.
Background
The existing minimum distance classification method is a classification method which is the most basic classification method in a classifier by calculating the distance D from an unknown class vector X to a center vector of each class (such as class a, B, C, and the like) known in advance and then classifying the unknown class vector X into the class with the minimum distance D.
However, when the existing minimum distance classification method is applied to the classification problem of the uneven distribution, classification errors are easily generated, for example, as shown in fig. 1, in the defect detection of the display screen, the light-emitting electronic elements mainly have defects such as "dead spots", "dead lines", "Mura", and the like, and the surface has defects such as "scratches", "dirt", and the like, the dead spots "are defects of single or adjacent no more than 4 pixel points," dead lines "are linear defects with different lengths," Mura "is represented by blocks with different sizes and shapes, and the" scratches "and" dirt "on the surface can be regarded as curves and blocks with different shapes, such as classification type a as" points ", classification type B as" lines ", and in the characteristic dimensions such as area, perimeter, the" points "are highly aggregated, while the" lines "are loosely distributed, the unknown type vector X actually belongs to the type B" lines ", however, it is likely that the unknown class vector X is further from the class B center than the class a center, and the unknown class vector X is classified as a class a "point", resulting in a classification error.
Disclosure of Invention
In view of the above, there is a need to provide a minimum distance classification method, apparatus, computer device and storage medium capable of improving the accuracy of test sample classification.
A minimum distance classification method, the method comprising:
acquiring a training sample data set; the training sample data set comprises a plurality of classes, each class comprises a plurality of training samples, each training sample is provided with a class label, and each training sample comprises a plurality of feature vectors and values of the feature vectors;
calculating classification parameters according to the training sample data set; the classification parameters comprise central feature vectors corresponding to the feature vectors of each category for calculating weighted normalized correction distance, a total mean value of each feature vector and a total standard deviation of each feature vector;
constructing a classifier according to the classification parameters;
and inputting the feature vector of the test sample into the classifier to obtain the prediction category of the test sample.
In one embodiment, said calculating classification parameters from said training sample data set comprises: and calculating the overall mean value and the overall standard deviation of each feature vector according to the value of each feature vector in the training sample data set.
In one embodiment, after calculating an overall mean and an overall standard deviation of each of the feature vectors according to the value of each of the feature vectors in the training sample data set, the method includes: calculating a central feature vector corresponding to each feature vector of each category according to the value of each feature vector in all training samples in each category, the overall mean value of each feature vector and the overall standard deviation of each feature vector; wherein the central feature vector is used to calculate a weighted normalized correction distance.
In one embodiment, after calculating the central feature vector corresponding to each feature vector of each class according to the value of each feature vector in all training samples in each class, the overall mean value of each feature vector and the overall standard deviation of each feature vector, the method includes: and calculating the weight of each feature vector according to the value of each feature vector of all the training samples of all the classes, the overall mean value of each feature vector, the overall standard deviation of each feature vector and the central feature vector corresponding to each feature vector of all the classes.
In one embodiment, after calculating the central feature vector corresponding to each feature vector of each class according to the value of each feature vector in all training samples in each class, the overall mean value of each feature vector and the overall standard deviation of each feature vector, the method includes: and calculating the distance distribution standard deviation of each category according to the values of all the feature vectors of all the training samples of each category, the overall mean value of all the feature vectors, the overall standard deviation of all the feature vectors and the central feature vector corresponding to all the feature vectors in each category.
In one embodiment, the inputting the feature vector of the test sample into the classifier to obtain the prediction category of the test sample includes: obtaining the value of the feature vector of the test sample; calculating a feature vector normalized value of the feature vector of the test sample according to the value of the feature vector of the test sample, a population mean of the feature vector of the test sample, and a population standard deviation of the feature vector of the test sample; calculating weighted normalized correction distances between all the feature vectors of the test sample and each category according to the feature vector normalized values of all the feature vectors of the test sample and the classification parameters; and selecting the category corresponding to the minimum weighted normalized correction distance as the prediction category of the test sample.
In one embodiment, the inputting the feature vector of the test sample into the classifier to obtain the prediction category of the test sample includes: obtaining the value of the feature vector of the test sample; calculating a feature vector normalized value of the feature vector of the test sample according to the value of the feature vector of the test sample, a population mean of the feature vector of the test sample, and a population standard deviation of the feature vector of the test sample; calculating weighted normalized correction distances between all the feature vectors of the test sample and each class according to the feature vector normalized values of all the feature vectors of the test sample, the weights of all the feature vectors of the test sample, the corresponding central feature vector of all the feature vectors of the test sample in each class and the distance distribution standard deviation of each class; and selecting the category corresponding to the minimum weighted normalized correction distance as the prediction category of the test sample.
A minimum distance classification apparatus, the apparatus comprising:
the training sample data set acquisition module is used for acquiring a training sample data set; the training sample data set comprises a plurality of classes, each class comprises a plurality of training samples, each training sample is provided with a class label, and each training sample comprises a plurality of feature vectors and values of the feature vectors;
the classification parameter calculation module is used for calculating classification parameters according to the training sample data set; the classification parameters comprise central feature vectors corresponding to the feature vectors of each category for calculating weighted normalized correction distance, a total mean value of each feature vector and a total standard deviation of each feature vector;
the classifier building module is used for building a classifier according to the classification parameters;
and the prediction category acquisition module is used for inputting the feature vector of the test sample into the classifier to obtain the prediction category of the test sample.
A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
acquiring a training sample data set; the training sample data set comprises a plurality of classes, each class comprises a plurality of training samples, each training sample is provided with a class label, and each training sample comprises a plurality of feature vectors and values of the feature vectors;
calculating classification parameters according to the training sample data set; the classification parameters comprise central feature vectors corresponding to the feature vectors of each category for calculating weighted normalized correction distance, a total mean value of each feature vector and a total standard deviation of each feature vector;
constructing a classifier according to the classification parameters;
and inputting the feature vector of the test sample into the classifier to obtain the prediction category of the test sample.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
acquiring a training sample data set; the training sample data set comprises a plurality of classes, each class comprises a plurality of training samples, each training sample is provided with a class label, and each training sample comprises a plurality of feature vectors and values of the feature vectors;
calculating classification parameters according to the training sample data set; the classification parameters comprise central feature vectors corresponding to the feature vectors of each category for calculating weighted normalized correction distance, a total mean value of each feature vector and a total standard deviation of each feature vector;
constructing a classifier according to the classification parameters;
and inputting the feature vector of the test sample into the classifier to obtain the prediction category of the test sample.
The minimum distance classification method, the minimum distance classification device, the computer equipment and the storage medium construct a classifier by including classification parameters of central feature vectors of various categories, overall mean values of the feature vectors and overall standard deviations of the feature vectors for calculating weighted normalized correction distances, fully consider the difference of the scales of the feature vectors and the difference of the distribution compactness of the feature vectors of various classifications, apply a Z-Score normalization method, and normalize the features to a uniform range (the feature vectors are gathered to be close to 0, the variance is 1); calculating the weight of each eigenvector in distance calculation according to different contribution degrees of each eigenvector; when the distance between the test sample and each category of central feature vector is calculated, distance standardization correction is carried out by using the standard deviation of each training sample and each category of central feature vector; according to the method, iterative training is not needed, classification accuracy is remarkably improved after adaptive weight weighting and standardized correction optimization, distances can be corrected according to categories during calculation, the distances between the feature vectors and the feature vectors of the corresponding categories are more accurate, and accordingly classification accuracy is guaranteed.
Drawings
FIG. 1 is a schematic diagram of an embodiment of uneven distribution of defect detection for a display panel;
FIG. 2 is a flow diagram of a minimum distance classification method in one embodiment;
FIG. 3 is a block diagram of a minimum distance classification apparatus in one embodiment;
FIG. 4 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
In one embodiment, as shown in fig. 1, in the defect detection of the display screen, in the distribution of the "perimeter" of the feature vector, the feature vector of the detection sample is 20, the feature mean value of the detection sample belonging to the "point" class is 2.5, the feature standard deviation is 1, the feature mean value of the detection sample belonging to the "line" class is 50, the feature standard deviation is 20, the detection sample belongs to the line class according to the distribution, but the detection sample belongs to the "point" class according to the existing minimum distance classification method, and actually the detection sample belongs to the line class.
In one embodiment, as shown in fig. 2, there is provided a minimum distance classification method, comprising the steps of:
s110, acquiring a training sample data set; the training sample data set comprises a plurality of classes, each class comprises a plurality of training samples, each training sample is provided with a class label, and each training sample comprises a plurality of feature vectors and values of the feature vectors.
The training sample data set is sample data for training the classifier, the sample data is a training sample which is obtained by separating a plurality of features from an image, for example, a plurality of images are shot according to various defects of a display screen, the plurality of images are preprocessed to obtain the sample data, each defect can be classified in a manual mode, and a class label is correspondingly set for each training sample. In the display screen defect detection, the feature vectors include an average gray value, a perimeter, an area, a perimeter-to-area ratio, compactness, a maximum Feret diameter and ellipticity, and of course, different feature vectors can be set in different application fields. Each feature vector of each training sample in the training sample data set has a corresponding value, and the same feature vector of different training samples may have the same value or different values.
S120, calculating classification parameters according to the training sample data set; the classification parameters comprise central feature vectors corresponding to the feature vectors of each category for calculating the weighted normalized correction distance, a total mean value of each feature vector and a total standard deviation of each feature vector.
The central feature vector of each category is equal to the mean value of the feature vectors of all training samples of the category, and because the training samples are provided with category labels, the calculation of the central feature vector can be carried out according to the training samples with the same category labels, and the central feature vector is the mean value of a certain feature vector. The overall mean of the feature vector is equal to the mean of the feature vector of all the training samples, and the overall mean of the feature vector is the mean of a certain feature vector in all the training samples. The total standard deviation of the feature vector is equal to the standard deviation of the feature vector of all the training samples, the total standard deviation of the feature vector is the standard deviation of a certain feature vector in all the training samples, and each feature vector can calculate the total standard deviation of the feature vector due to the fact that the feature vector is multiple.
S130, constructing a classifier according to the classification parameters.
The specific process of constructing the classifier through the classification parameters comprises the following steps: the classifier is obtained by training a preset type of neural network through classification parameters, and the trained classifier can output prediction categories according to the input feature vectors of the test samples. The preset type of neural network comprises LeNet-5, AlexNet, ZFNET, VGG-16, GoogLeNet, ResNet and other neural networks, and of course, the preset type of neural network also comprises a logistic regression function.
S140, inputting the feature vector of the test sample into the classifier to obtain the prediction category of the test sample.
The classifier calculates the distance between the feature vector and the feature vector of each category for the test sample, and takes the category with the minimum distance as the prediction category of the test sample.
In the minimum distance classification method, the classifier is constructed by the classification parameters including the central feature vector of each category, the overall mean value of each feature vector and the overall standard deviation of each feature vector, which are used for calculating the weighted normalized correction distance, so that the distance can be corrected according to the category during calculation, the distance between the feature vector and the feature vector of the corresponding category is ensured to be more accurate, and the classification accuracy is ensured.
In one embodiment, the step S120 includes: and calculating the overall mean value and the overall standard deviation of each feature vector according to the value of each feature vector in the training sample data set.
The training sample data comprises the feature vectors of all the training samples, the overall mean value of the feature vectors is equal to the mean value of the feature vectors of all the training samples, the overall mean value of the feature vectors is the mean value of a certain feature vector in all the training samples, and as the feature vectors are multiple, each feature vector can calculate the feature directionOverall mean of the amounts. The total standard deviation of the feature vector is equal to the standard deviation of the feature vector of all the training samples, the total standard deviation of the feature vector is the standard deviation of a certain feature vector in all the training samples, and each feature vector can calculate the total standard deviation of the feature vector due to the fact that the feature vector is multiple. For example, training sample A, B, C, the number of training samples is 3, the value of the feature vector V of training sample a is T1, the value of the feature vector V of training sample B is T2, and the value of the feature vector V of training sample C is T3, then the overall mean of the feature vector V is
Figure DEST_PATH_IMAGE001
Global standard deviation of eigenvectors V
Figure DEST_PATH_IMAGE002
In one embodiment, after the step of calculating a global mean and a global standard deviation of each of the feature vectors according to the value of each of the feature vectors in the training sample data set, the method comprises: calculating a central feature vector corresponding to each feature vector of each category according to the value of each feature vector in all training samples in each category, the overall mean value of each feature vector and the overall standard deviation of each feature vector; wherein the central feature vector is used to calculate a weighted normalized correction distance.
Wherein, according to the feature vector, the overall mean and the overall standard deviation, the specific calculation process for calculating the central feature vector of each category is as follows: and calculating the normalized value of each feature vector of each training sample according to the feature vector, the overall mean and the overall standard deviation, and then calculating the mean value of the normalized values of a feature vector of all the training samples of a certain class, wherein the mean value of the normalized values of a feature vector of all the training samples of a certain class is the central feature vector of a certain class. For example, class B includes training samples a1, a2, then the number of class B training samples is 2, the value of the feature vector V of training sample a1 is T1, the value of the feature vector V of training sample a2 is T2, the overall mean M of the feature vector V, the feature vector B, the training sample a, the training sample BOverall standard deviation Dev of V, normalized value of feature vector V of training sample A1
Figure DEST_PATH_IMAGE003
Normalized value of feature vector V of training sample A2
Figure DEST_PATH_IMAGE004
Average of normalized values of the feature vector V of all training samples of class B
Figure DEST_PATH_IMAGE005
I.e. the central feature vector of the feature vector V of class B is m.
In one embodiment, after the step of calculating the central feature vector corresponding to each feature vector of each class according to the value of each feature vector in all training samples in each class, the overall mean value of each feature vector and the overall standard deviation of each feature vector, the method includes: and calculating the weight of each feature vector according to the value of each feature vector of all the training samples of all the classes, the overall mean value of each feature vector, the overall standard deviation of each feature vector and the central feature vector corresponding to each feature vector of all the classes.
Wherein, according to the feature vector, the overall mean, the overall standard deviation and the central feature vector, calculating the weight of each feature vector specifically comprises: calculating a normalized value of each feature vector of each training sample according to the feature vector, the overall mean and the overall standard deviation; calculating the intra-class distance of a certain feature vector according to the standard value of each training sample and the central feature vector, wherein the intra-class distance is the sum of the distances between the feature vector and the central feature vector of each class; calculating the inter-class distance of a certain feature vector according to the central feature vector of each class and the overall mean value of the feature vectors; taking the ratio of the inter-class distance to the intra-class distance as the contribution of the feature vector; and taking the ratio of the contribution degrees of the feature vectors in all the feature vectors as the weight of the feature vectors.
For example, the training sample data includes classes B1 and B2, class B1 includes training samples a1 and a2, class B2 includes training samples A3 and a4, normalized values of feature vectors V of the training samples a1, a2, A3 and a4 are X1, X2, X3 and X4, respectively, central feature vectors of feature vectors V of the classes B1 and B2 are m1 and m2, respectively, and the intra-class distance of the feature vector V is:
Figure DEST_PATH_IMAGE006
overall mean M of the feature vector V, inter-class distance of the feature vector V:
contribution of the feature vector V:
Figure DEST_PATH_IMAGE008
(ii) a If the total contribution of all the eigenvectors is C, the weight of the eigenvector V is: w = C/C.
In one embodiment, after the step of calculating the central feature vector corresponding to each feature vector of each class according to the value of each feature vector in all training samples in each class, the overall mean value of each feature vector and the overall standard deviation of each feature vector, the method includes: and calculating the distance distribution standard deviation of each category according to the values of all the feature vectors of all the training samples of each category, the overall mean value of all the feature vectors, the overall standard deviation of all the feature vectors and the central feature vector corresponding to all the feature vectors in each category.
Calculating the distance distribution standard deviation of each category according to the feature vector, the overall mean, the overall standard deviation and the central feature vector specifically comprises the following steps: calculating a normalized value of each feature vector of each training sample according to the feature vector, the overall mean and the overall standard deviation; calculating the weighted distance between the feature vector of each training sample and the central feature vector of the class according to the normalized value of each feature vector, the weight of each feature vector and the central feature vector of each class; calculating the weighted distance average value of all training samples of the class to obtain the sample-center distance average value of the class; and calculating the standard deviation of the distance distribution according to the weighted distance and the mean value of the sample-center distance.
For example, the central feature vector m of the ith training sample and the class k to which the ith training sample belongskThe weighted distance between is:
Figure DEST_PATH_IMAGE009
wherein J is a feature vector, J is the total number of the feature vectors, and XijNormalized value of j-th feature vector for i-th training sample, WjIs the weight of the jth feature vector. Calculate the "sample-center distance" mean for category k from the weighted distances:
Figure DEST_PATH_IMAGE010
(ii) a Calculate the "sample-center distance" standard deviation (distance distribution standard deviation) for category k from the weighted distance and the "sample-center distance" mean:
Figure DEST_PATH_IMAGE011
(ii) a Wherein K is the category, K is the total number of categories, i is the training sample, nkJ is the total number of training samples, J is the feature vector, and J is the total number of feature vectors.
In one embodiment, the inputting the feature vector of the test sample into the classifier to obtain the prediction category of the test sample includes: obtaining the value of the feature vector of the test sample;
calculating a feature vector normalized value of the feature vector of the test sample according to the value of the feature vector of the test sample, a population mean of the feature vector of the test sample, and a population standard deviation of the feature vector of the test sample; calculating weighted normalized correction distances between all the feature vectors of the test sample and each category according to the feature vector normalized values of all the feature vectors of the test sample and the classification parameters; and selecting the category corresponding to the minimum weighted normalized correction distance as the prediction category of the test sample.
In one embodiment, the inputting the feature vector of the test sample into the classifier to obtain the prediction category of the test sample includes: obtaining the value of the feature vector of the test sample; calculating a feature vector normalized value of the feature vector of the test sample according to the value of the feature vector of the test sample, a population mean of the feature vector of the test sample, and a population standard deviation of the feature vector of the test sample; calculating weighted normalized correction distances between all the feature vectors of the test sample and each class according to the feature vector normalized values of all the feature vectors of the test sample, the weights of all the feature vectors of the test sample, the corresponding central feature vector of all the feature vectors of the test sample in each class and the distance distribution standard deviation of each class; and selecting the category corresponding to the minimum weighted normalized correction distance as the prediction category of the test sample.
For example, a test sample x of an unknown class is normalized, and the j-th feature vector normalized value of the test sample is:
Figure DEST_PATH_IMAGE012
wherein, rawxjJ-th feature vector, M, representing training sample xjIs the overall mean of the jth feature vector, DevjRepresents the total standard deviation of the jth feature vector; according to the classifier in the above embodiment, the weighted normalized correction distance between the feature vector of the test sample x of unknown class and the class k center vector is calculated:
Figure DEST_PATH_IMAGE013
wherein j is a feature vectorJ is the total number of the feature vectors, wjFor the weight assignment of the feature vector j in the calculation of the class prediction, mkjIs the mean of the jth feature vector, distDevkThe "sample-center distance" standard deviation (or called distance distribution standard deviation) for class k; and calculating the weighted normalized correction distance between the test sample x of the unknown class and the center vector of each class, and selecting the class corresponding to the minimum weighted normalized correction distance as the prediction class of the test sample x of the unknown class.
In one embodiment, the subject of a distance minimum classification method of the present application is an adaptive weight and normalized modified minimum distance classifier. The classifier includes: ensemble mean of feature vectors for data normalization
Figure DEST_PATH_IMAGE014
And the total standard deviation of each feature vector
Figure DEST_PATH_IMAGE015
Central feature vectors of each class for calculating weighted normalized correction distances
Figure DEST_PATH_IMAGE016
Weight of each feature vector
Figure DEST_PATH_IMAGE017
Standard deviation of distance distribution of each class
Figure DEST_PATH_IMAGE018
The process of constructing the classifier is as follows:
Figure DEST_PATH_IMAGE019
preparing training data: a training sample data set containing K classes is prepared, and each training sample in the training sample data set is provided with a class label. Let the total number of training samples be N and the number of training samples in each class be NkThe raw data of each training sample includes J feature vectors,the jth feature vector of the ith training sample is used as rawXijAnd (4) showing. For example, in the display screen detection, a CCD (Charge Coupled Device) camera collects images of a detection screen in different lighting states, and a feature vector of each defect can be obtained through certain preprocessing and Blob analysis (Blob analysis is to extract and mark connected domains of a binary image after foreground/background separation), where the feature vector includes an average gray value, a perimeter, an area, a perimeter-to-area ratio, compactness, a maximum Feret diameter, ellipticity, and the like.
Figure DEST_PATH_IMAGE020
And calculating the overall mean value of each feature vector of all the training samples and the overall standard deviation of each feature vector. The overall mean of the jth feature vector is:
Figure DEST_PATH_IMAGE021
the overall standard deviation of the jth feature vector is:
Figure DEST_PATH_IMAGE022
Figure DEST_PATH_IMAGE023
and standardizing the training sample data set to obtain a standardized training sample data set X. The j-th feature vector normalization value of the ith training sample in the training samples is:
Figure DEST_PATH_IMAGE024
Figure DEST_PATH_IMAGE025
and calculating the mean value of each feature vector of each category according to the standardized training sample data set X. The kth class has nkTraining samples, the mean of the jth feature vector is:
Figure DEST_PATH_IMAGE026
wherein m iskThe central feature vector of the kth class.
Figure DEST_PATH_IMAGE027
Calculating the weight of each feature vector: analyzing the clusters of the training samples, and determining the weight of the feature vector in classification prediction according to the contribution degree of the single feature vector to the clusters; the feature vector contributing to the clustering should be represented as "compact within the same category and far away between different categories", and the contribution degree of a single feature vector j can be represented by "inter-category distance/intra-category distance". The sum of the intra-class distances of the feature vector j is:
Figure DEST_PATH_IMAGE028
wherein K is the category, K is the total number of categories, i is the training sample, nkIs the total number of training samples. The sum of the inter-class distances of the feature vector j is:
Figure DEST_PATH_IMAGE029
the contribution of the feature vector j to the cluster is:
Figure DEST_PATH_IMAGE030
. The weight of the feature vector j in the classification prediction calculation is distributed as follows:
Figure DEST_PATH_IMAGE031
Figure DEST_PATH_IMAGE032
and calculating the distance between each training sample in the training sample data set and the central feature of the class to which the training sample belongs, and calculating the distance distribution standard deviation according to the distance. TrainingThe central characteristic vector m of the ith training sample and the class k to which the ith training sample belongskThe weighted distance between is:
Figure 637493DEST_PATH_IMAGE009
then, the "sample-center distance" mean of class k is calculated from the weighted distances:
Figure DEST_PATH_IMAGE033
(ii) a Finally, the "sample-center distance" standard deviation (distance distribution standard deviation) for category k is calculated from the weighted distance and the "sample-center distance" mean:
Figure DEST_PATH_IMAGE034
. Wherein K is the category, K is the total number of categories, i is the training sample, nkJ is the total number of training samples, J is the feature vector, and J is the total number of feature vectors.
The classifier is used as follows:
Figure 476005DEST_PATH_IMAGE019
acquiring a test sample x of an unknown type, and standardizing the test sample of the unknown type, wherein the j th characteristic vector standardized value of the test sample is as follows:
Figure 396688DEST_PATH_IMAGE012
wherein, rawxjJ-th feature vector, M, representing training sample xjIs the overall mean of the jth feature vector, DevjRepresenting the overall standard deviation of the jth feature vector.
Figure 323056DEST_PATH_IMAGE020
And calculating the weighted normalized correction distance between the feature vector of the test sample x of the unknown class and the class k center vector according to the classifier:
Figure 589958DEST_PATH_IMAGE013
wherein J is a feature vector, J is the total number of the feature vectors, wjFor the weight assignment of the feature vector j in the calculation of the class prediction, mkjIs the mean of the jth feature vector, distDevkThe "sample-center distance" standard deviation (or called distance distribution standard deviation) for class k.
Figure 553366DEST_PATH_IMAGE023
And calculating the weighted normalized correction distance between the test sample x of the unknown class and the center vector of each class, and selecting the class corresponding to the minimum weighted normalized correction distance as the prediction class of the test sample x of the unknown class.
The minimum distance classification method constructs a classifier by classification parameters including central feature vectors of each category, total mean values of each feature vector and total standard deviation of each feature vector for calculating weighted normalized correction distance, fully considers the difference on the scale of each feature vector and the difference of the distribution density of each classification on each feature vector, applies a Z-Score normalization method to normalize each feature to a uniform range (gather near 0, the variance is 1); calculating the weight of each eigenvector in distance calculation according to different contribution degrees of each eigenvector; when the distance between the test sample and each category of central feature vector is calculated, distance standardization correction is carried out by using the standard deviation of each training sample and each category of central feature vector; according to the method, iterative training is not needed, classification accuracy is remarkably improved after adaptive weight weighting and standardized correction optimization, distances can be corrected according to categories during calculation, the distances between the feature vectors and the feature vectors of the corresponding categories are more accurate, and accordingly classification accuracy is guaranteed.
It should be understood that, although the steps in the flowchart of fig. 2 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 2 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.
In one embodiment, as shown in fig. 3, there is provided a minimum distance classification apparatus including: a training sample data set obtaining module 210, a classification parameter calculating module 220, a classifier constructing module 230, and a prediction category obtaining module 240, wherein:
a training sample data set obtaining module 210, configured to obtain a training sample data set; the training sample data set comprises a plurality of classes, each class comprises a plurality of training samples, each training sample is provided with a class label, and each training sample comprises a plurality of feature vectors and values of the feature vectors.
A classification parameter calculation module 220, configured to calculate a classification parameter according to the training sample data set; the classification parameters comprise central feature vectors corresponding to the feature vectors of each category for calculating the weighted normalized correction distance, a total mean value of each feature vector and a total standard deviation of each feature vector.
A classifier construction module 230 configured to construct a classifier according to the classification parameters.
And a prediction category obtaining module 240, configured to input the feature vector of the test sample into the classifier, so as to obtain a prediction category of the test sample.
In one embodiment, the classification parameter calculation module 220 is further configured to calculate a total mean and a total standard deviation of each of the feature vectors according to a value of each of the feature vectors in the training sample data set.
In one embodiment, the classification parameter calculation module 220 is further configured to calculate a central feature vector corresponding to each feature vector of each class according to a value of each feature vector in all training samples in each class, a total mean of each feature vector, and a total standard deviation of each feature vector; wherein the central feature vector is used to calculate a weighted normalized correction distance.
In one embodiment, the classification parameter calculation module 220 is further configured to calculate a weight of each feature vector according to a value of each feature vector of all the training samples of all the classes, a global mean of each feature vector, a global standard deviation of each feature vector, and a central feature vector corresponding to each feature vector of all the classes.
In one embodiment, the classification parameter calculation module 220 is further configured to calculate a standard deviation of distance distribution for each class according to values of all the feature vectors of all the training samples of each class, a total mean of all the feature vectors, a total standard deviation of all the feature vectors, and a central feature vector corresponding to all the feature vectors in each class.
In one embodiment, the prediction category obtaining module 240 includes: a feature vector acquisition unit for acquiring a value of a feature vector of a test sample; a normalized value calculation unit configured to calculate a normalized value of a feature vector of the test sample based on a value of the feature vector of the test sample, a population mean of the feature vector of the test sample, and a population standard deviation of the feature vector of the test sample; a modified distance calculation unit, configured to calculate a weighted normalized modified distance between all the feature vectors of the test sample and each class according to the feature vector normalized values of all the feature vectors of the test sample and the classification parameters; and the selection unit is used for selecting the category corresponding to the minimum weighted normalized correction distance as the prediction category of the test sample.
In one embodiment, the prediction category obtaining module 240 includes: a feature vector acquisition unit for acquiring a value of a feature vector of a test sample; a normalized value calculation unit configured to calculate a normalized value of a feature vector of the test sample based on a value of the feature vector of the test sample, a population mean of the feature vector of the test sample, and a population standard deviation of the feature vector of the test sample; a modified distance calculation unit, configured to calculate a weighted normalized modified distance between all the feature vectors of the test sample and each class according to a feature vector normalized value of all the feature vectors of the test sample, weights of all the feature vectors of the test sample, a central feature vector corresponding to all the feature vectors of the test sample in each class, and a distance distribution standard deviation of each class; and the selection unit is used for selecting the category corresponding to the minimum weighted normalized correction distance as the prediction category of the test sample.
For the specific definition of the minimum distance classification device, reference may be made to the above definition of the minimum distance classification method, which is not described herein again. The modules in the minimum distance classification device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 4. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing training sample data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a minimum distance classification method.
Those skilled in the art will appreciate that the architecture shown in fig. 4 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:
acquiring a training sample data set; the training sample data set comprises a plurality of classes, each class comprises a plurality of training samples, each training sample is provided with a class label, and each training sample comprises a plurality of feature vectors and values of the feature vectors;
calculating classification parameters according to the training sample data set; the classification parameters comprise central feature vectors corresponding to the feature vectors of each category for calculating weighted normalized correction distance, a total mean value of each feature vector and a total standard deviation of each feature vector;
constructing a classifier according to the classification parameters;
and inputting the feature vector of the test sample into the classifier to obtain the prediction category of the test sample.
In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:
acquiring a training sample data set; the training sample data set comprises a plurality of classes, each class comprises a plurality of training samples, each training sample is provided with a class label, and each training sample comprises a plurality of feature vectors and values of the feature vectors;
calculating classification parameters according to the training sample data set; the classification parameters comprise central feature vectors corresponding to the feature vectors of each category for calculating weighted normalized correction distance, a total mean value of each feature vector and a total standard deviation of each feature vector;
constructing a classifier according to the classification parameters;
and inputting the feature vector of the test sample into the classifier to obtain the prediction category of the test sample.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A minimum distance classification method, the method comprising:
acquiring a training sample data set; the training sample data set comprises a plurality of classes, each class comprises a plurality of training samples, each training sample is provided with a class label, and each training sample comprises a plurality of feature vectors and values of the feature vectors;
calculating classification parameters according to the training sample data set; the classification parameters comprise central feature vectors corresponding to the feature vectors of each category for calculating weighted normalized correction distance, a total mean value of each feature vector and a total standard deviation of each feature vector;
constructing a classifier according to the classification parameters;
and inputting the feature vector of the test sample into the classifier to obtain the prediction category of the test sample.
2. The method according to claim 1, wherein said calculating classification parameters from said set of training sample data comprises:
and calculating the overall mean value and the overall standard deviation of each feature vector according to the value of each feature vector in the training sample data set.
3. The method according to claim 2, comprising, after calculating a global mean and a global standard deviation for each of the feature vectors from the values of each of the feature vectors in the set of training sample data:
calculating a central feature vector corresponding to each feature vector of each category according to the value of each feature vector in all training samples in each category, the overall mean value of each feature vector and the overall standard deviation of each feature vector; wherein the central feature vector is used to calculate a weighted normalized correction distance.
4. The method of claim 3, wherein after calculating the central feature vector corresponding to each feature vector of each class according to the value of each feature vector in all training samples in each class, the overall mean of each feature vector and the overall standard deviation of each feature vector, the method comprises:
and calculating the weight of each feature vector according to the value of each feature vector of all the training samples of all the classes, the overall mean value of each feature vector, the overall standard deviation of each feature vector and the central feature vector corresponding to each feature vector of all the classes.
5. The method of claim 3, wherein after calculating the central feature vector corresponding to each feature vector of each class according to the value of each feature vector in all training samples in each class, the overall mean of each feature vector and the overall standard deviation of each feature vector, the method comprises:
and calculating the distance distribution standard deviation of each category according to the values of all the feature vectors of all the training samples of each category, the overall mean value of all the feature vectors, the overall standard deviation of all the feature vectors and the central feature vector corresponding to all the feature vectors in each category.
6. The method of claim 1, wherein inputting the feature vector of the test sample into the classifier to obtain a prediction class of the test sample comprises:
obtaining the value of the feature vector of the test sample;
calculating a feature vector normalized value of the feature vector of the test sample according to the value of the feature vector of the test sample, a population mean of the feature vector of the test sample, and a population standard deviation of the feature vector of the test sample;
calculating weighted normalized correction distances between all the feature vectors of the test sample and each category according to the feature vector normalized values of all the feature vectors of the test sample and the classification parameters;
and selecting the category corresponding to the minimum weighted normalized correction distance as the prediction category of the test sample.
7. The method of claim 5, wherein inputting the feature vector of the test sample into the classifier to obtain a prediction class of the test sample comprises:
obtaining the value of the feature vector of the test sample;
calculating a feature vector normalized value of the feature vector of the test sample according to the value of the feature vector of the test sample, a population mean of the feature vector of the test sample, and a population standard deviation of the feature vector of the test sample;
calculating weighted normalized correction distances between all the feature vectors of the test sample and each class according to the feature vector normalized values of all the feature vectors of the test sample, the weights of all the feature vectors of the test sample, the corresponding central feature vector of all the feature vectors of the test sample in each class and the distance distribution standard deviation of each class;
and selecting the category corresponding to the minimum weighted normalized correction distance as the prediction category of the test sample.
8. A minimum distance classification apparatus, characterized in that the apparatus comprises:
the training sample data set acquisition module is used for acquiring a training sample data set; the training sample data set comprises a plurality of classes, each class comprises a plurality of training samples, each training sample is provided with a class label, and each training sample comprises a plurality of feature vectors and values of the feature vectors;
the classification parameter calculation module is used for calculating classification parameters according to the training sample data set; the classification parameters comprise central feature vectors corresponding to the feature vectors of each category for calculating weighted normalized correction distance, a total mean value of each feature vector and a total standard deviation of each feature vector;
the classifier building module is used for building a classifier according to the classification parameters;
and the prediction category acquisition module is used for inputting the feature vector of the test sample into the classifier to obtain the prediction category of the test sample.
9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
CN202010210657.6A 2020-03-24 2020-03-24 Minimum distance classification method and device, computer equipment and storage medium Active CN111091163B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010210657.6A CN111091163B (en) 2020-03-24 2020-03-24 Minimum distance classification method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010210657.6A CN111091163B (en) 2020-03-24 2020-03-24 Minimum distance classification method and device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111091163A true CN111091163A (en) 2020-05-01
CN111091163B CN111091163B (en) 2021-05-11

Family

ID=70400661

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010210657.6A Active CN111091163B (en) 2020-03-24 2020-03-24 Minimum distance classification method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111091163B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111860671A (en) * 2020-07-28 2020-10-30 中山大学 Classification model training method and device, terminal equipment and readable storage medium
CN112116018A (en) * 2020-09-25 2020-12-22 奇安信科技集团股份有限公司 Sample classification method, apparatus, computer device, medium, and program product

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101488188A (en) * 2008-11-10 2009-07-22 西安电子科技大学 SAR image classification method based on SVM classifier of mixed nucleus function
CN101561867A (en) * 2009-05-19 2009-10-21 华中科技大学 Human body detection method based on Gauss shape feature
CN104361352A (en) * 2014-11-13 2015-02-18 东北林业大学 Solid wood panel defect separation method based on compressed sensing
CN104751166A (en) * 2013-12-30 2015-07-01 中国科学院深圳先进技术研究院 Spectral angle and Euclidean distance based remote-sensing image classification method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101488188A (en) * 2008-11-10 2009-07-22 西安电子科技大学 SAR image classification method based on SVM classifier of mixed nucleus function
CN101561867A (en) * 2009-05-19 2009-10-21 华中科技大学 Human body detection method based on Gauss shape feature
CN104751166A (en) * 2013-12-30 2015-07-01 中国科学院深圳先进技术研究院 Spectral angle and Euclidean distance based remote-sensing image classification method
CN104361352A (en) * 2014-11-13 2015-02-18 东北林业大学 Solid wood panel defect separation method based on compressed sensing

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
任靖 等: "最小距离分类器的改进算法——加权最小距离分类器", 《计算机应用》 *
郭亚琴 等: "一种改进的最小距离分类器NN-MDC", 《软件导刊》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111860671A (en) * 2020-07-28 2020-10-30 中山大学 Classification model training method and device, terminal equipment and readable storage medium
CN112116018A (en) * 2020-09-25 2020-12-22 奇安信科技集团股份有限公司 Sample classification method, apparatus, computer device, medium, and program product
CN112116018B (en) * 2020-09-25 2024-05-14 奇安信科技集团股份有限公司 Sample classification method, apparatus, computer device, medium, and program product

Also Published As

Publication number Publication date
CN111091163B (en) 2021-05-11

Similar Documents

Publication Publication Date Title
CN105354595B (en) A kind of robust visual pattern classification method and system
US8565538B2 (en) Detecting and labeling places using runtime change-point detection
CN109784293B (en) Multi-class target object detection method and device, electronic equipment and storage medium
CN110210625B (en) Modeling method and device based on transfer learning, computer equipment and storage medium
CN111931931A (en) Deep neural network training method and device for pathology full-field image
US20070041638A1 (en) Systems and methods for real-time object recognition
CN108564102A (en) Image clustering evaluation of result method and apparatus
CN111091163B (en) Minimum distance classification method and device, computer equipment and storage medium
CN114694143B (en) Cell image recognition method and device based on optical means
CN110020674B (en) Cross-domain self-adaptive image classification method for improving local category discrimination
CN116894985A (en) Semi-supervised image classification method and semi-supervised image classification system
CN117153268A (en) Cell category determining method and system
CN113762005A (en) Method, device, equipment and medium for training feature selection model and classifying objects
CN116524296A (en) Training method and device of equipment defect detection model and equipment defect detection method
CN111814853B (en) Decorrelation clustering method and device under data selection deviation
CN115170838A (en) Data screening method and device
CN113076823A (en) Training method of age prediction model, age prediction method and related device
CN113837173A (en) Target object detection method and device, computer equipment and storage medium
JP2020181265A (en) Information processing device, system, information processing method, and program
CN115700821B (en) Cell identification method and system based on image processing
CN111553418B (en) Method and device for detecting neuron reconstruction errors and computer equipment
CN115641472A (en) Universal field self-adaption method based on unified optimal transport framework
CN118196647A (en) Domain adaptive remote sensing image classification method based on unbalanced similarity and class mapping
CN111860603A (en) Method, device, equipment and storage medium for identifying rice ears in picture
CN114529720A (en) Interpretability evaluation method and device of neural network, storage medium and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant