CN111091163A

CN111091163A - Minimum distance classification method and device, computer equipment and storage medium

Info

Publication number: CN111091163A
Application number: CN202010210657.6A
Authority: CN
Inventors: 周才健; 周柔刚; 杨亮亮; 盛锦华
Original assignee: Guangdong Guangyuan Intelligent Technology Co ltd; Jinhua Mstar Intelligent Technology Co ltd; Suzhou Huicui Intelligent Technology Co Ltd; Hangzhou Huicui Intelligent Technology Co ltd
Current assignee: Guangdong Guangyuan Intelligent Technology Co ltd; Jinhua Mstar Intelligent Technology Co ltd; Suzhou Huicui Intelligent Technology Co Ltd; Hangzhou Huicui Intelligent Technology Co ltd
Priority date: 2020-03-24
Filing date: 2020-03-24
Publication date: 2020-05-01
Anticipated expiration: 2040-03-24
Also published as: CN111091163B

Abstract

The application relates to a minimum distance classification method, a minimum distance classification device, a computer device and a storage medium. The method comprises the following steps: acquiring a training sample data set; the training sample data set comprises a plurality of classes, each class comprises a plurality of training samples, each training sample is provided with a class label, and each training sample comprises a plurality of feature vectors and values of the feature vectors; calculating classification parameters according to the training sample data set; the classification parameters comprise central feature vectors corresponding to the feature vectors of each category for calculating weighted normalized correction distance, a total mean value of each feature vector and a total standard deviation of each feature vector; constructing a classifier according to the classification parameters; and inputting the feature vector of the test sample into the classifier to obtain the prediction category of the test sample. By adopting the method, the classification accuracy of the test sample can be improved.

Description

Minimum distance classification method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a minimum distance classification method and apparatus, a computer device, and a storage medium.

Background

The existing minimum distance classification method is a classification method which is the most basic classification method in a classifier by calculating the distance D from an unknown class vector X to a center vector of each class (such as class a, B, C, and the like) known in advance and then classifying the unknown class vector X into the class with the minimum distance D.

However, when the existing minimum distance classification method is applied to the classification problem of the uneven distribution, classification errors are easily generated, for example, as shown in fig. 1, in the defect detection of the display screen, the light-emitting electronic elements mainly have defects such as "dead spots", "dead lines", "Mura", and the like, and the surface has defects such as "scratches", "dirt", and the like, the dead spots "are defects of single or adjacent no more than 4 pixel points," dead lines "are linear defects with different lengths," Mura "is represented by blocks with different sizes and shapes, and the" scratches "and" dirt "on the surface can be regarded as curves and blocks with different shapes, such as classification type a as" points ", classification type B as" lines ", and in the characteristic dimensions such as area, perimeter, the" points "are highly aggregated, while the" lines "are loosely distributed, the unknown type vector X actually belongs to the type B" lines ", however, it is likely that the unknown class vector X is further from the class B center than the class a center, and the unknown class vector X is classified as a class a "point", resulting in a classification error.

Disclosure of Invention

In view of the above, there is a need to provide a minimum distance classification method, apparatus, computer device and storage medium capable of improving the accuracy of test sample classification.

A minimum distance classification method, the method comprising:

acquiring a training sample data set; the training sample data set comprises a plurality of classes, each class comprises a plurality of training samples, each training sample is provided with a class label, and each training sample comprises a plurality of feature vectors and values of the feature vectors;

calculating classification parameters according to the training sample data set; the classification parameters comprise central feature vectors corresponding to the feature vectors of each category for calculating weighted normalized correction distance, a total mean value of each feature vector and a total standard deviation of each feature vector;

constructing a classifier according to the classification parameters;

and inputting the feature vector of the test sample into the classifier to obtain the prediction category of the test sample.

In one embodiment, said calculating classification parameters from said training sample data set comprises: and calculating the overall mean value and the overall standard deviation of each feature vector according to the value of each feature vector in the training sample data set.

In one embodiment, after calculating an overall mean and an overall standard deviation of each of the feature vectors according to the value of each of the feature vectors in the training sample data set, the method includes: calculating a central feature vector corresponding to each feature vector of each category according to the value of each feature vector in all training samples in each category, the overall mean value of each feature vector and the overall standard deviation of each feature vector; wherein the central feature vector is used to calculate a weighted normalized correction distance.

In one embodiment, after calculating the central feature vector corresponding to each feature vector of each class according to the value of each feature vector in all training samples in each class, the overall mean value of each feature vector and the overall standard deviation of each feature vector, the method includes: and calculating the weight of each feature vector according to the value of each feature vector of all the training samples of all the classes, the overall mean value of each feature vector, the overall standard deviation of each feature vector and the central feature vector corresponding to each feature vector of all the classes.

In one embodiment, after calculating the central feature vector corresponding to each feature vector of each class according to the value of each feature vector in all training samples in each class, the overall mean value of each feature vector and the overall standard deviation of each feature vector, the method includes: and calculating the distance distribution standard deviation of each category according to the values of all the feature vectors of all the training samples of each category, the overall mean value of all the feature vectors, the overall standard deviation of all the feature vectors and the central feature vector corresponding to all the feature vectors in each category.

In one embodiment, the inputting the feature vector of the test sample into the classifier to obtain the prediction category of the test sample includes: obtaining the value of the feature vector of the test sample; calculating a feature vector normalized value of the feature vector of the test sample according to the value of the feature vector of the test sample, a population mean of the feature vector of the test sample, and a population standard deviation of the feature vector of the test sample; calculating weighted normalized correction distances between all the feature vectors of the test sample and each category according to the feature vector normalized values of all the feature vectors of the test sample and the classification parameters; and selecting the category corresponding to the minimum weighted normalized correction distance as the prediction category of the test sample.

In one embodiment, the inputting the feature vector of the test sample into the classifier to obtain the prediction category of the test sample includes: obtaining the value of the feature vector of the test sample; calculating a feature vector normalized value of the feature vector of the test sample according to the value of the feature vector of the test sample, a population mean of the feature vector of the test sample, and a population standard deviation of the feature vector of the test sample; calculating weighted normalized correction distances between all the feature vectors of the test sample and each class according to the feature vector normalized values of all the feature vectors of the test sample, the weights of all the feature vectors of the test sample, the corresponding central feature vector of all the feature vectors of the test sample in each class and the distance distribution standard deviation of each class; and selecting the category corresponding to the minimum weighted normalized correction distance as the prediction category of the test sample.

A minimum distance classification apparatus, the apparatus comprising:

the training sample data set acquisition module is used for acquiring a training sample data set; the training sample data set comprises a plurality of classes, each class comprises a plurality of training samples, each training sample is provided with a class label, and each training sample comprises a plurality of feature vectors and values of the feature vectors;

the classification parameter calculation module is used for calculating classification parameters according to the training sample data set; the classification parameters comprise central feature vectors corresponding to the feature vectors of each category for calculating weighted normalized correction distance, a total mean value of each feature vector and a total standard deviation of each feature vector;

the classifier building module is used for building a classifier according to the classification parameters;

and the prediction category acquisition module is used for inputting the feature vector of the test sample into the classifier to obtain the prediction category of the test sample.

A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

constructing a classifier according to the classification parameters;

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

constructing a classifier according to the classification parameters;

The minimum distance classification method, the minimum distance classification device, the computer equipment and the storage medium construct a classifier by including classification parameters of central feature vectors of various categories, overall mean values of the feature vectors and overall standard deviations of the feature vectors for calculating weighted normalized correction distances, fully consider the difference of the scales of the feature vectors and the difference of the distribution compactness of the feature vectors of various classifications, apply a Z-Score normalization method, and normalize the features to a uniform range (the feature vectors are gathered to be close to 0, the variance is 1); calculating the weight of each eigenvector in distance calculation according to different contribution degrees of each eigenvector; when the distance between the test sample and each category of central feature vector is calculated, distance standardization correction is carried out by using the standard deviation of each training sample and each category of central feature vector; according to the method, iterative training is not needed, classification accuracy is remarkably improved after adaptive weight weighting and standardized correction optimization, distances can be corrected according to categories during calculation, the distances between the feature vectors and the feature vectors of the corresponding categories are more accurate, and accordingly classification accuracy is guaranteed.

Drawings

FIG. 1 is a schematic diagram of an embodiment of uneven distribution of defect detection for a display panel;

FIG. 2 is a flow diagram of a minimum distance classification method in one embodiment;

FIG. 3 is a block diagram of a minimum distance classification apparatus in one embodiment;

FIG. 4 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In one embodiment, as shown in fig. 1, in the defect detection of the display screen, in the distribution of the "perimeter" of the feature vector, the feature vector of the detection sample is 20, the feature mean value of the detection sample belonging to the "point" class is 2.5, the feature standard deviation is 1, the feature mean value of the detection sample belonging to the "line" class is 50, the feature standard deviation is 20, the detection sample belongs to the line class according to the distribution, but the detection sample belongs to the "point" class according to the existing minimum distance classification method, and actually the detection sample belongs to the line class.

In one embodiment, as shown in fig. 2, there is provided a minimum distance classification method, comprising the steps of:

s110, acquiring a training sample data set; the training sample data set comprises a plurality of classes, each class comprises a plurality of training samples, each training sample is provided with a class label, and each training sample comprises a plurality of feature vectors and values of the feature vectors.

The training sample data set is sample data for training the classifier, the sample data is a training sample which is obtained by separating a plurality of features from an image, for example, a plurality of images are shot according to various defects of a display screen, the plurality of images are preprocessed to obtain the sample data, each defect can be classified in a manual mode, and a class label is correspondingly set for each training sample. In the display screen defect detection, the feature vectors include an average gray value, a perimeter, an area, a perimeter-to-area ratio, compactness, a maximum Feret diameter and ellipticity, and of course, different feature vectors can be set in different application fields. Each feature vector of each training sample in the training sample data set has a corresponding value, and the same feature vector of different training samples may have the same value or different values.

S120, calculating classification parameters according to the training sample data set; the classification parameters comprise central feature vectors corresponding to the feature vectors of each category for calculating the weighted normalized correction distance, a total mean value of each feature vector and a total standard deviation of each feature vector.

The central feature vector of each category is equal to the mean value of the feature vectors of all training samples of the category, and because the training samples are provided with category labels, the calculation of the central feature vector can be carried out according to the training samples with the same category labels, and the central feature vector is the mean value of a certain feature vector. The overall mean of the feature vector is equal to the mean of the feature vector of all the training samples, and the overall mean of the feature vector is the mean of a certain feature vector in all the training samples. The total standard deviation of the feature vector is equal to the standard deviation of the feature vector of all the training samples, the total standard deviation of the feature vector is the standard deviation of a certain feature vector in all the training samples, and each feature vector can calculate the total standard deviation of the feature vector due to the fact that the feature vector is multiple.

S130, constructing a classifier according to the classification parameters.

The specific process of constructing the classifier through the classification parameters comprises the following steps: the classifier is obtained by training a preset type of neural network through classification parameters, and the trained classifier can output prediction categories according to the input feature vectors of the test samples. The preset type of neural network comprises LeNet-5, AlexNet, ZFNET, VGG-16, GoogLeNet, ResNet and other neural networks, and of course, the preset type of neural network also comprises a logistic regression function.

S140, inputting the feature vector of the test sample into the classifier to obtain the prediction category of the test sample.

The classifier calculates the distance between the feature vector and the feature vector of each category for the test sample, and takes the category with the minimum distance as the prediction category of the test sample.

In the minimum distance classification method, the classifier is constructed by the classification parameters including the central feature vector of each category, the overall mean value of each feature vector and the overall standard deviation of each feature vector, which are used for calculating the weighted normalized correction distance, so that the distance can be corrected according to the category during calculation, the distance between the feature vector and the feature vector of the corresponding category is ensured to be more accurate, and the classification accuracy is ensured.

In one embodiment, the step S120 includes: and calculating the overall mean value and the overall standard deviation of each feature vector according to the value of each feature vector in the training sample data set.

The training sample data comprises the feature vectors of all the training samples, the overall mean value of the feature vectors is equal to the mean value of the feature vectors of all the training samples, the overall mean value of the feature vectors is the mean value of a certain feature vector in all the training samples, and as the feature vectors are multiple, each feature vector can calculate the feature directionOverall mean of the amounts. The total standard deviation of the feature vector is equal to the standard deviation of the feature vector of all the training samples, the total standard deviation of the feature vector is the standard deviation of a certain feature vector in all the training samples, and each feature vector can calculate the total standard deviation of the feature vector due to the fact that the feature vector is multiple. For example, training sample A, B, C, the number of training samples is 3, the value of the feature vector V of training sample a is T1, the value of the feature vector V of training sample B is T2, and the value of the feature vector V of training sample C is T3, then the overall mean of the feature vector V is

Global standard deviation of eigenvectors V

。

In one embodiment, after the step of calculating a global mean and a global standard deviation of each of the feature vectors according to the value of each of the feature vectors in the training sample data set, the method comprises: calculating a central feature vector corresponding to each feature vector of each category according to the value of each feature vector in all training samples in each category, the overall mean value of each feature vector and the overall standard deviation of each feature vector; wherein the central feature vector is used to calculate a weighted normalized correction distance.

Wherein, according to the feature vector, the overall mean and the overall standard deviation, the specific calculation process for calculating the central feature vector of each category is as follows: and calculating the normalized value of each feature vector of each training sample according to the feature vector, the overall mean and the overall standard deviation, and then calculating the mean value of the normalized values of a feature vector of all the training samples of a certain class, wherein the mean value of the normalized values of a feature vector of all the training samples of a certain class is the central feature vector of a certain class. For example, class B includes training samples a1, a2, then the number of class B training samples is 2, the value of the feature vector V of training sample a1 is T1, the value of the feature vector V of training sample a2 is T2, the overall mean M of the feature vector V, the feature vector B, the training sample a, the training sample BOverall standard deviation Dev of V, normalized value of feature vector V of training sample A1

Normalized value of feature vector V of training sample A2

Average of normalized values of the feature vector V of all training samples of class B

I.e. the central feature vector of the feature vector V of class B is m.

In one embodiment, after the step of calculating the central feature vector corresponding to each feature vector of each class according to the value of each feature vector in all training samples in each class, the overall mean value of each feature vector and the overall standard deviation of each feature vector, the method includes: and calculating the weight of each feature vector according to the value of each feature vector of all the training samples of all the classes, the overall mean value of each feature vector, the overall standard deviation of each feature vector and the central feature vector corresponding to each feature vector of all the classes.

Wherein, according to the feature vector, the overall mean, the overall standard deviation and the central feature vector, calculating the weight of each feature vector specifically comprises: calculating a normalized value of each feature vector of each training sample according to the feature vector, the overall mean and the overall standard deviation; calculating the intra-class distance of a certain feature vector according to the standard value of each training sample and the central feature vector, wherein the intra-class distance is the sum of the distances between the feature vector and the central feature vector of each class; calculating the inter-class distance of a certain feature vector according to the central feature vector of each class and the overall mean value of the feature vectors; taking the ratio of the inter-class distance to the intra-class distance as the contribution of the feature vector; and taking the ratio of the contribution degrees of the feature vectors in all the feature vectors as the weight of the feature vectors.

For example, the training sample data includes classes B1 and B2, class B1 includes training samples a1 and a2, class B2 includes training samples A3 and a4, normalized values of feature vectors V of the training samples a1, a2, A3 and a4 are X1, X2, X3 and X4, respectively, central feature vectors of feature vectors V of the classes B1 and B2 are m1 and m2, respectively, and the intra-class distance of the feature vector V is:

；

overall mean M of the feature vector V, inter-class distance of the feature vector V:

；

contribution of the feature vector V:

(ii) a If the total contribution of all the eigenvectors is C, the weight of the eigenvector V is: w = C/C.

In one embodiment, after the step of calculating the central feature vector corresponding to each feature vector of each class according to the value of each feature vector in all training samples in each class, the overall mean value of each feature vector and the overall standard deviation of each feature vector, the method includes: and calculating the distance distribution standard deviation of each category according to the values of all the feature vectors of all the training samples of each category, the overall mean value of all the feature vectors, the overall standard deviation of all the feature vectors and the central feature vector corresponding to all the feature vectors in each category.

Calculating the distance distribution standard deviation of each category according to the feature vector, the overall mean, the overall standard deviation and the central feature vector specifically comprises the following steps: calculating a normalized value of each feature vector of each training sample according to the feature vector, the overall mean and the overall standard deviation; calculating the weighted distance between the feature vector of each training sample and the central feature vector of the class according to the normalized value of each feature vector, the weight of each feature vector and the central feature vector of each class; calculating the weighted distance average value of all training samples of the class to obtain the sample-center distance average value of the class; and calculating the standard deviation of the distance distribution according to the weighted distance and the mean value of the sample-center distance.

For example, the central feature vector m of the ith training sample and the class k to which the ith training sample belongs_kThe weighted distance between is:

；

wherein J is a feature vector, J is the total number of the feature vectors, and X_ijNormalized value of j-th feature vector for i-th training sample, W_jIs the weight of the jth feature vector. Calculate the "sample-center distance" mean for category k from the weighted distances:

(ii) a Calculate the "sample-center distance" standard deviation (distance distribution standard deviation) for category k from the weighted distance and the "sample-center distance" mean:

(ii) a Wherein K is the category, K is the total number of categories, i is the training sample, n_kJ is the total number of training samples, J is the feature vector, and J is the total number of feature vectors.

In one embodiment, the inputting the feature vector of the test sample into the classifier to obtain the prediction category of the test sample includes: obtaining the value of the feature vector of the test sample;

calculating a feature vector normalized value of the feature vector of the test sample according to the value of the feature vector of the test sample, a population mean of the feature vector of the test sample, and a population standard deviation of the feature vector of the test sample; calculating weighted normalized correction distances between all the feature vectors of the test sample and each category according to the feature vector normalized values of all the feature vectors of the test sample and the classification parameters; and selecting the category corresponding to the minimum weighted normalized correction distance as the prediction category of the test sample.

For example, a test sample x of an unknown class is normalized, and the j-th feature vector normalized value of the test sample is:

；

wherein, rawx_jJ-th feature vector, M, representing training sample x_jIs the overall mean of the jth feature vector, Dev_jRepresents the total standard deviation of the jth feature vector; according to the classifier in the above embodiment, the weighted normalized correction distance between the feature vector of the test sample x of unknown class and the class k center vector is calculated:

；

wherein j is a feature vectorJ is the total number of the feature vectors, w_jFor the weight assignment of the feature vector j in the calculation of the class prediction, m_kjIs the mean of the jth feature vector, distDev_kThe "sample-center distance" standard deviation (or called distance distribution standard deviation) for class k; and calculating the weighted normalized correction distance between the test sample x of the unknown class and the center vector of each class, and selecting the class corresponding to the minimum weighted normalized correction distance as the prediction class of the test sample x of the unknown class.

In one embodiment, the subject of a distance minimum classification method of the present application is an adaptive weight and normalized modified minimum distance classifier. The classifier includes: ensemble mean of feature vectors for data normalization

And the total standard deviation of each feature vector

Central feature vectors of each class for calculating weighted normalized correction distances

Weight of each feature vector

Standard deviation of distance distribution of each class

。

The process of constructing the classifier is as follows:

preparing training data: a training sample data set containing K classes is prepared, and each training sample in the training sample data set is provided with a class label. Let the total number of training samples be N and the number of training samples in each class be N_kThe raw data of each training sample includes J feature vectors,the jth feature vector of the ith training sample is used as rawX_ijAnd (4) showing. For example, in the display screen detection, a CCD (Charge Coupled Device) camera collects images of a detection screen in different lighting states, and a feature vector of each defect can be obtained through certain preprocessing and Blob analysis (Blob analysis is to extract and mark connected domains of a binary image after foreground/background separation), where the feature vector includes an average gray value, a perimeter, an area, a perimeter-to-area ratio, compactness, a maximum Feret diameter, ellipticity, and the like.

And calculating the overall mean value of each feature vector of all the training samples and the overall standard deviation of each feature vector. The overall mean of the jth feature vector is:

；

the overall standard deviation of the jth feature vector is:

。

and standardizing the training sample data set to obtain a standardized training sample data set X. The j-th feature vector normalization value of the ith training sample in the training samples is:

。

and calculating the mean value of each feature vector of each category according to the standardized training sample data set X. The kth class has n_kTraining samples, the mean of the jth feature vector is:

；

wherein m is_kThe central feature vector of the kth class.

Calculating the weight of each feature vector: analyzing the clusters of the training samples, and determining the weight of the feature vector in classification prediction according to the contribution degree of the single feature vector to the clusters; the feature vector contributing to the clustering should be represented as "compact within the same category and far away between different categories", and the contribution degree of a single feature vector j can be represented by "inter-category distance/intra-category distance". The sum of the intra-class distances of the feature vector j is:

；

wherein K is the category, K is the total number of categories, i is the training sample, n_kIs the total number of training samples. The sum of the inter-class distances of the feature vector j is:

；

the contribution of the feature vector j to the cluster is:

. The weight of the feature vector j in the classification prediction calculation is distributed as follows:

。

and calculating the distance between each training sample in the training sample data set and the central feature of the class to which the training sample belongs, and calculating the distance distribution standard deviation according to the distance. TrainingThe central characteristic vector m of the ith training sample and the class k to which the ith training sample belongs_kThe weighted distance between is:

；

then, the "sample-center distance" mean of class k is calculated from the weighted distances:

(ii) a Finally, the "sample-center distance" standard deviation (distance distribution standard deviation) for category k is calculated from the weighted distance and the "sample-center distance" mean:

. Wherein K is the category, K is the total number of categories, i is the training sample, n_kJ is the total number of training samples, J is the feature vector, and J is the total number of feature vectors.

The classifier is used as follows:

acquiring a test sample x of an unknown type, and standardizing the test sample of the unknown type, wherein the j th characteristic vector standardized value of the test sample is as follows:

；

wherein, rawx_jJ-th feature vector, M, representing training sample x_jIs the overall mean of the jth feature vector, Dev_jRepresenting the overall standard deviation of the jth feature vector.

And calculating the weighted normalized correction distance between the feature vector of the test sample x of the unknown class and the class k center vector according to the classifier:

；

wherein J is a feature vector, J is the total number of the feature vectors, w_jFor the weight assignment of the feature vector j in the calculation of the class prediction, m_kjIs the mean of the jth feature vector, distDev_kThe "sample-center distance" standard deviation (or called distance distribution standard deviation) for class k.

And calculating the weighted normalized correction distance between the test sample x of the unknown class and the center vector of each class, and selecting the class corresponding to the minimum weighted normalized correction distance as the prediction class of the test sample x of the unknown class.

The minimum distance classification method constructs a classifier by classification parameters including central feature vectors of each category, total mean values of each feature vector and total standard deviation of each feature vector for calculating weighted normalized correction distance, fully considers the difference on the scale of each feature vector and the difference of the distribution density of each classification on each feature vector, applies a Z-Score normalization method to normalize each feature to a uniform range (gather near 0, the variance is 1); calculating the weight of each eigenvector in distance calculation according to different contribution degrees of each eigenvector; when the distance between the test sample and each category of central feature vector is calculated, distance standardization correction is carried out by using the standard deviation of each training sample and each category of central feature vector; according to the method, iterative training is not needed, classification accuracy is remarkably improved after adaptive weight weighting and standardized correction optimization, distances can be corrected according to categories during calculation, the distances between the feature vectors and the feature vectors of the corresponding categories are more accurate, and accordingly classification accuracy is guaranteed.

It should be understood that, although the steps in the flowchart of fig. 2 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 2 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.

In one embodiment, as shown in fig. 3, there is provided a minimum distance classification apparatus including: a training sample data set obtaining module 210, a classification parameter calculating module 220, a classifier constructing module 230, and a prediction category obtaining module 240, wherein:

a training sample data set obtaining module 210, configured to obtain a training sample data set; the training sample data set comprises a plurality of classes, each class comprises a plurality of training samples, each training sample is provided with a class label, and each training sample comprises a plurality of feature vectors and values of the feature vectors.

A classification parameter calculation module 220, configured to calculate a classification parameter according to the training sample data set; the classification parameters comprise central feature vectors corresponding to the feature vectors of each category for calculating the weighted normalized correction distance, a total mean value of each feature vector and a total standard deviation of each feature vector.

A classifier construction module 230 configured to construct a classifier according to the classification parameters.

And a prediction category obtaining module 240, configured to input the feature vector of the test sample into the classifier, so as to obtain a prediction category of the test sample.

In one embodiment, the classification parameter calculation module 220 is further configured to calculate a total mean and a total standard deviation of each of the feature vectors according to a value of each of the feature vectors in the training sample data set.

In one embodiment, the classification parameter calculation module 220 is further configured to calculate a central feature vector corresponding to each feature vector of each class according to a value of each feature vector in all training samples in each class, a total mean of each feature vector, and a total standard deviation of each feature vector; wherein the central feature vector is used to calculate a weighted normalized correction distance.

In one embodiment, the classification parameter calculation module 220 is further configured to calculate a weight of each feature vector according to a value of each feature vector of all the training samples of all the classes, a global mean of each feature vector, a global standard deviation of each feature vector, and a central feature vector corresponding to each feature vector of all the classes.

In one embodiment, the classification parameter calculation module 220 is further configured to calculate a standard deviation of distance distribution for each class according to values of all the feature vectors of all the training samples of each class, a total mean of all the feature vectors, a total standard deviation of all the feature vectors, and a central feature vector corresponding to all the feature vectors in each class.

In one embodiment, the prediction category obtaining module 240 includes: a feature vector acquisition unit for acquiring a value of a feature vector of a test sample; a normalized value calculation unit configured to calculate a normalized value of a feature vector of the test sample based on a value of the feature vector of the test sample, a population mean of the feature vector of the test sample, and a population standard deviation of the feature vector of the test sample; a modified distance calculation unit, configured to calculate a weighted normalized modified distance between all the feature vectors of the test sample and each class according to the feature vector normalized values of all the feature vectors of the test sample and the classification parameters; and the selection unit is used for selecting the category corresponding to the minimum weighted normalized correction distance as the prediction category of the test sample.

In one embodiment, the prediction category obtaining module 240 includes: a feature vector acquisition unit for acquiring a value of a feature vector of a test sample; a normalized value calculation unit configured to calculate a normalized value of a feature vector of the test sample based on a value of the feature vector of the test sample, a population mean of the feature vector of the test sample, and a population standard deviation of the feature vector of the test sample; a modified distance calculation unit, configured to calculate a weighted normalized modified distance between all the feature vectors of the test sample and each class according to a feature vector normalized value of all the feature vectors of the test sample, weights of all the feature vectors of the test sample, a central feature vector corresponding to all the feature vectors of the test sample in each class, and a distance distribution standard deviation of each class; and the selection unit is used for selecting the category corresponding to the minimum weighted normalized correction distance as the prediction category of the test sample.

For the specific definition of the minimum distance classification device, reference may be made to the above definition of the minimum distance classification method, which is not described herein again. The modules in the minimum distance classification device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 4. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing training sample data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a minimum distance classification method.

Those skilled in the art will appreciate that the architecture shown in fig. 4 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:

constructing a classifier according to the classification parameters;

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:

constructing a classifier according to the classification parameters;

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A minimum distance classification method, the method comprising:

constructing a classifier according to the classification parameters;

2. The method according to claim 1, wherein said calculating classification parameters from said set of training sample data comprises:

and calculating the overall mean value and the overall standard deviation of each feature vector according to the value of each feature vector in the training sample data set.

3. The method according to claim 2, comprising, after calculating a global mean and a global standard deviation for each of the feature vectors from the values of each of the feature vectors in the set of training sample data:

calculating a central feature vector corresponding to each feature vector of each category according to the value of each feature vector in all training samples in each category, the overall mean value of each feature vector and the overall standard deviation of each feature vector; wherein the central feature vector is used to calculate a weighted normalized correction distance.

4. The method of claim 3, wherein after calculating the central feature vector corresponding to each feature vector of each class according to the value of each feature vector in all training samples in each class, the overall mean of each feature vector and the overall standard deviation of each feature vector, the method comprises:

and calculating the weight of each feature vector according to the value of each feature vector of all the training samples of all the classes, the overall mean value of each feature vector, the overall standard deviation of each feature vector and the central feature vector corresponding to each feature vector of all the classes.

5. The method of claim 3, wherein after calculating the central feature vector corresponding to each feature vector of each class according to the value of each feature vector in all training samples in each class, the overall mean of each feature vector and the overall standard deviation of each feature vector, the method comprises:

and calculating the distance distribution standard deviation of each category according to the values of all the feature vectors of all the training samples of each category, the overall mean value of all the feature vectors, the overall standard deviation of all the feature vectors and the central feature vector corresponding to all the feature vectors in each category.

6. The method of claim 1, wherein inputting the feature vector of the test sample into the classifier to obtain a prediction class of the test sample comprises:

obtaining the value of the feature vector of the test sample;

calculating a feature vector normalized value of the feature vector of the test sample according to the value of the feature vector of the test sample, a population mean of the feature vector of the test sample, and a population standard deviation of the feature vector of the test sample;

calculating weighted normalized correction distances between all the feature vectors of the test sample and each category according to the feature vector normalized values of all the feature vectors of the test sample and the classification parameters;

and selecting the category corresponding to the minimum weighted normalized correction distance as the prediction category of the test sample.

7. The method of claim 5, wherein inputting the feature vector of the test sample into the classifier to obtain a prediction class of the test sample comprises:

obtaining the value of the feature vector of the test sample;

calculating weighted normalized correction distances between all the feature vectors of the test sample and each class according to the feature vector normalized values of all the feature vectors of the test sample, the weights of all the feature vectors of the test sample, the corresponding central feature vector of all the feature vectors of the test sample in each class and the distance distribution standard deviation of each class;

8. A minimum distance classification apparatus, characterized in that the apparatus comprises:

9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.