CN116246102A

CN116246102A - Image classification method and system based on self-encoder and decision tree

Info

Publication number: CN116246102A
Application number: CN202310070830.0A
Authority: CN
Inventors: 黄祎婧; 王辉; 黄宇廷; 韩星宇; 曹学儒; 范自柱
Original assignee: East China Jiaotong University
Current assignee: East China Jiaotong University
Priority date: 2023-02-07
Filing date: 2023-02-07
Publication date: 2023-06-09

Abstract

An image classification method and system based on a self-encoder and a decision tree, wherein the method comprises the following steps: collecting image sample data, and converting the image sample into a pixel information matrix/vector; learning the characterization information of the image sample by using a self-encoder network model, and compressing and extracting the low-dimensional characteristic information of the image sample by using an encoder; updating nearest neighbor values corresponding to each sample in the process of iteratively solving the self-encoder network optimal weight parameters; based on the low-dimensional sample characteristic information extracted from the trained self-encoder network model, combining the sample nearest neighbor value obtained through iteration as a corresponding sample label to construct a decision tree model; and obtaining low-dimensional characteristic information of the new sample by using the self-encoder, inputting a decision tree to obtain nearest neighbor values, searching the nearest neighbor field in the training set, and taking the category with the largest number in the nearest neighbor field as a prediction result. The invention can obtain the low-dimensional characteristic information of the target, predicts the category of the sample and has the interpretability of the prediction result.

Description

Image classification method and system based on self-encoder and decision tree

Technical Field

The invention relates to an image classification method and system based on a self-encoder and a decision tree, belonging to the technical field of machine learning and deep learning.

Background

The large number of image samples obtained by research do not need to be marked manually, but the large number of data are marked manually, so that the traditional machine learning method and the deep learning method aim at classifying or identifying the unlabeled samples through a sample data set with label information. The classification task is used as a basic task of a traditional machine learning method, and samples of unknown label information need to be classified by using sample label information of a training set. The task of image classification is a popular field of research today, and derives many corresponding classical methods of traditional machine learning and improved algorithms thereof.

The traditional machine learning classical algorithm comprises decision trees, bayesian classifiers, support vector machines, K-neighbor classifiers and the like, and the classification effect of the methods for processing small structured data is good. When complex data is input, such as high-dimensional data, most machine learning algorithms face dimension disasters, and the algorithm classification effect is reduced. Under a large sample data set, the deep learning network can greatly improve the speed of the algorithm and the classification accuracy.

The difference between the traditional machine learning algorithm and the deep learning algorithm is that the deep learning network does not need given characteristics and does not need to analyze the characteristics, when the data volume is increased, the number of layers of the network can be deepened to obtain better learning performance, and the performance of the machine learning algorithm is not adjusted after exceeding a certain limit. However, the machine learning algorithm has a common characteristic of having the interpretability, and can intuitively see the process of generating specific output, and the current deep network method has good classification effect but has no interpretability, and the classification process of the network is unknown.

With the development of diversification of data collection paths, challenges of high-dimensional complex data are often faced in image classification. For the problems of large computational complexity, long time consumption and the like of a basic machine learning method of image classification when facing high-dimensional complex data, a neural network is used for processing sample information. The method has the advantages that the interpretive performance of the decision tree model is utilized, the algorithm result has certain interpretive performance while the algorithm classification accuracy is improved, the self-encoder network is combined with the decision tree, and the classification interpretive performance and the generalization capability of the model are improved.

Disclosure of Invention

The invention aims to solve the problems of image classification, and provides an image classification method based on a self-encoder and a decision tree.

The technical scheme of the invention is as follows, an image classification method based on a self-encoder and a decision tree comprises the following steps:

(1) Collecting data, acquiring original RGB image data, and converting an image sample into a pixel information matrix/vector;

(2) Inputting the collected image data into a self-encoder, performing self-encoder characterization learning on the image samples through the encoder and the decoder by using a feedforward neural network, and extracting low-dimensional structural feature information of the image samples by using an encoder part; constructing a network according to the sample image data, adding sparsity constraint and correlation constraint to a loss function of the network, and updating weight parameters of the self-coding network in an iterative mode;

(3) In the iterative process of solving the optimal weight parameters of the network, calculating the distance between the image samples based on the low-dimensional sample vector obtained by the encoder, and updating the nearest neighbor value corresponding to each image sample under the constraint of the nearest neighbor distance;

(4) Based on the trained encoder in the self-encoder network model, extracting low-dimensional characteristic information of a sample, combining the sample nearest neighbor value obtained through iteration as a corresponding sample label, constructing a decision tree with the nearest neighbor value as a leaf node by using a CART method, and simultaneously adjusting parameters of the self-encoder network;

(5) And obtaining low-dimensional characteristic information of the new sample by using the self-encoder, inputting a decision tree to obtain nearest neighbor values, searching the nearest neighbor field of the training sample by KNN, and taking the category with the largest number in the nearest neighbor field as a prediction result.

The self-encoder characterization learning step includes:

according to sample image data, constructing a network, adding sparsity constraint and correlation constraint to a loss function of the network, wherein the loss function of the self-coding network under the unconstrained condition is as follows:

X＝(x ₁ ,x ₂ ,...,x _n )

wherein X is an input sample vector; x is x _i An ith feature of the input sample vector X; n is the dimension of the input vector, in an image sample of size 28 pixels x 28 pixels, the dimension of the corresponding input sample vector n=784;

reconstructing a sample vector for the output; />

For reconstructing sample vector->

Is the ith feature of (2); j (J) _ave (W, b) is an unconstrained loss function from the encoder network for measuring reconstructed samples +.>

And the average difference between the original sample X; w, b are respectivelyIs the weight and bias from the encoder network;

the loss function of the self-encoding network after adding a sparse constraint to the hidden layer output of the self-encoder network is:

wherein ,

is the average activation of hidden layer neurons in the self-encoder network; n is the dimension of the input vector; x is x _j A j-th feature that is an input sample vector; a, a _i (x _j ) Is the ith neuron at input x _j A lower activation value; />

Is the relative entropy, which is a penalty factor for measuring the difference between two distributions; h is the number of neurons of the hidden layer; ρ is a sparsity parameter; gamma is a KL divergence constraint parameter; j (J) _sparse (W, b) is a sparse loss function from the encoder network; />

Mean activation of neurons for the hidden layer;

self-encoder network loss function after similarity constraint is added to sparse self-encoding neural network:

wherein ,J_re (W, b) is a self-encoder network loss function incorporating sparsity constraints and similarity constraints, μ is a similarity parameter, n is the dimension of the input vector,

to reconstruct the ith feature of the sample vector, as a limitation to increase the sample-to-sample variance as much as possible;

and updating the weight parameters of the self-coding network in an iterative mode, wherein the iteration of the self-coding neural network uses a quasi-Newton method L-BFGS, and the maximum value of the iteration is set to 300.

The calculating the distance between the image samples comprises:

(1) Updating nearest neighbor values of samples at each iteration from an encoder network, presetting distance parameters between samples before training, and limiting maximum and minimum nearest neighbor values; under the limitation, the minimum nearest neighbor value of the sample is 1, the maximum nearest neighbor value is 10, namely, if the nearest neighbor value of the sample is 0, the nearest neighbor value is corrected to be 1, and if the nearest neighbor value of the sample is more than 10, the nearest neighbor value of the sample is corrected to be 10;

in the iterative calculation process, a new sample vector after the low-dimensional characteristics are improved can be obtained by the updated weight W and the bias term b; the distance of the compressed eigenvectors of the samples after passing through the self-encoding neural network of the two hidden layers is calculated as follows:

X _i ′＝h ₁ (X _i )＝σ ₁ (W ₁ X _i +b ₁ )

X _i ″＝h ₂ (X _i ′)＝σ ₂ (W ₂ X _i ′+b ₂ )

wherein ,W₁ and W₂ Respectively the first hidden layer h in the encoder network ₁ And a second hidden layer h ₂ Weights of (2); b ₁ and b₂ Then the corresponding bias term; sigma (sigma) ₁ and σ₂ Concealing layer h for network ₁ and h₂ An output function corresponding to the layer; x is X _i ' denoted as the ith input sample X _i Through the hidden layer h ₁ A post vector; x is X _i "is denoted as X _i ' pass through hidden layer h ₂ A post vector; d (X) _i ″,X _j ") is the extracted low-dimensional sample vector X _i "and X _j "Euclidean distance between; m is the dimension of the low-dimensional feature vector X "; x is x _i ′ _s ' is the sample vector X _i "s-th feature; x's' _j ′ _s For sample vector X _j "s-th feature; the distance parameter α between samples determines the nearest neighbor distance of the other sample to the sample, when the distance from the sample is greater than α, i.e. D (X _i ″,X _j "α), then it cannot be a neighbor of the sample; based on the distance parameter, the nearest neighbor value of the ith sample is expressed as the number of samples with the distances among all samples less than alpha;

(2) The self-encoding neural network is provided with two hidden layers, a first hidden layer d for 784×1 sample data ₁ Is set to 196 x 1, the second hidden layer d ₂ Is set to 20 x 1;

(3) Initializing parameters of the network, setting initial value of the self-encoder network weight parameter to 0, sparsity parameter ρ to 0.05, sparsity penalty factor coefficient γ to 0.5, similarity constraint parameter μ to 3×10 ^-3 。

The decision tree with the nearest neighbor value as the leaf node is constructed as follows:

taking the final output X' after iteration of the self-encoder as a sample vector for constructing a decision tree, and taking the nearest neighbor number of each sample obtained by iteration as a new label of the sample;

the decision tree is generated by adopting a CART algorithm, and a full binary tree is generated, and the method for calculating the base Ny index is as follows;

wherein Gini (B) represents the purity of sample set B in the decision tree; v _i Represents the proportion of samples of class i (i=1,.); gini (B, q) represents the base index of attribute q; t represents the attribute q= { q ¹ ,q ² ,...,q ^T Number of values; b (B) ^t Indicating that all values at the t-th branch node are q ^t Is a sample set of the sample set.

The adjusting parameters of the self-encoder network includes:

test sample X _z First a new verification sample vector X is generated by the generated self-encoding neural network _z Obtaining corresponding neighbor value K through constructed CART decision tree _z ；

Calculating the distance between samples:

wherein ,D(X_z ″,X _i "is used to represent the Euclidean distance between two samples, m is the dimension of the low-dimensional feature vector X", X' _z ′ _s Representing a sample vector X _z "s-th feature; x is x _i ′ _s ' representing a sample vector X _i "s-th feature;

then searching a corresponding neighbor sample set in the training set through a KNN algorithm, and classifying the samples by using labels of the neighbor sample set; if the classification effect is excellent, namely the classification accuracy reaches 85% or more, the generated network model and decision tree are reserved; otherwise, the parameters of the self-encoder neural network are adjusted to achieve better classification effect.

The nearest neighbor field of the search training sample is as follows:

Z＝(z ₁ ,z ₂ ,...,z _n )

wherein Z is a training sample; z _i Is the ith feature of Z; z' is a low-dimensional feature vector extracted from the encoder; z _i "ith feature which is a low-dimensional feature vector Z"; d (Z ", X) _i "is a Euclidean distance metric; m is the dimension of the low-dimensional feature vectors X 'and Z'; k (K) _z The optimal nearest neighbor value corresponding to the training sample output by the decision tree;

a nearest neighbor sample set that is a training sample; alpha is a distance parameter; />

Is->

A corresponding ith neighbor sample; p is p _i For nearest neighbor sample set->

Probability of occurrence of the corresponding tag; p (P) _z Predictive labels for training sample Z; c is the number of label categories of the sample.

The invention discloses a system of an image classification method based on a self-encoder and a decision tree, which comprises an image input conversion module, a training module, a feature extraction module, a nearest neighbor module, a decision tree module and a classification module.

The image input conversion module is used for acquiring image sample data, acquiring original RGB image data and acquiring a matrix/vector of pixel information of an image sample.

The training module inputs the collected image data into the self-encoder, and solves the self-encoder network weight parameters through a back propagation algorithm by using a feedforward neural network.

The feature extraction module learns the characterization information of the image sample by using a self-encoder network model and extracts typical feature information of the image sample based on an encoder.

And the nearest neighbor module calculates the distance between the image samples based on the output result of the corresponding encoder in the iterative process of solving the optimal weight parameter of the network, and simultaneously updates the nearest neighbor value corresponding to each image sample under the constraint of the nearest neighbor distance.

The decision tree module extracts compression characteristic information of samples based on the trained encoder in the self-encoder network model, and combines the optimal neighbor number of each sample as a label to construct a decision tree based on the CART method.

The classification module compresses sample characteristic information by using a self-encoder and inputs the sample characteristic information into a decision tree to obtain the corresponding optimal nearest neighbor numerical value, searches the corresponding nearest neighbor field of the sample, and takes the category with the largest number in the nearest neighbor field in the KNN as a prediction result.

In the system of the image classification method based on the self-encoder and the decision tree, firstly, an image input conversion module is utilized to process data, then a training module is adopted to train the data, a feature extraction module is generated based on the training module to extract data features, the data features are input into the decision tree module to obtain nearest neighbor values, the nearest neighbor module is used for searching the nearest neighbor field, and finally, a classification module is adopted to output a prediction result.

The invention has the beneficial effects that the self-encoder network is utilized to process the image sample, compress the characteristics of the sample and extract the low-dimensional characteristics and structure of the sample as far as possible; during training of the self-encoder network, continuously searching for the neighbor number of the sample meeting the given distance according to the distance between the extracted low-dimensional feature vectors; and constructing a decision tree by using the nearest neighbor value of the sample and the extracted low-dimensional sample characteristics, acquiring the nearest neighbor value of a new image sample of the unknown label by using the decision tree, and judging the category of the image by adopting a nearest neighbor algorithm.

The method provided by the invention can obtain the low-dimensional characteristic information of the target, predict the category of the sample and have the interpretation of the prediction result.

Drawings

FIG. 1 is a flow chart of a method for classifying images from an encoder and decision tree according to the present invention;

FIG. 2 is a schematic diagram of decision tree generation.

Detailed Description

As shown in fig. 1, the image classification method based on the self-encoder and the decision tree of the present embodiment includes:

s101, acquiring data, acquiring original RGB image data, and converting an image sample into a pixel information matrix/vector.

S102, inputting the collected image data into an encoder, performing characterization learning on the image samples through the encoder and the decoder by using a feedforward neural network, and extracting low-dimensional structural feature information of the image samples by utilizing an encoder part.

The loss function of the corresponding self-encoder network under unconstrained conditions is:

X＝(x ₁ ,x ₂ ,...,x _n )

wherein X is an input sample vector, X _i For the ith feature of the input sample vector X, n is the dimension of the input vector, (in an image sample of size 28 pixels X28 pixels, the dimension of the corresponding input sample vector n=784),

for the output reconstructed sample vector, +.>

For reconstructing sample vector->

I th feature of J _ave (W, b) is an unconstrained loss function from the encoder network for measuring reconstructed samples +.>

And the average difference between the original samples X, W, b are the weight and bias, respectively, from the encoder network.

The loss function of the self-encoding network after adding the sparse constraint from the hidden layer output of the encoder network is:

wherein ,

is the average activation of hidden layer neurons in the self-encoder network, n is the dimension of the input vector, x _j For the j-th feature of the input sample vector, a _i (x _j ) Is the ith neuron at input x _j Lower activation value, < >>

Is the relative entropy, h is the number of neurons of the hidden layer, ρ is the sparsity parameter,gamma is KL divergence constraint parameter, J _sparse (W, b) is a sparse loss function from the encoder network;

adding similarity constraint to sparse self-coding neural network:

to reconstruct the ith feature of the sample vector, the difference between samples is increased as much as possible as a limitation.

S103, in the iterative process of solving the optimal weight parameters of the network, calculating the distance between the image samples based on the low-dimensional sample vector obtained by the encoder, and updating the nearest neighbor value corresponding to each image sample under the constraint of the nearest neighbor distance;

the corresponding distance calculation formula is expressed as:

X _i ′＝h ₁ (X _i )＝σ ₁ (W ₁ X _i +b ₁ )

X _i ″＝h ₂ (X _i ′)＝σ ₂ (W ₂ X _i ′+b ₂ )

wherein ,W₁ and W₂ Respectively the first hidden layer h in the encoder network ₁ And a second hidden layer h ₂ Weights of b ₁ and b₂ Then it is the corresponding bias term, σ ₁ and σ₂ Concealing layer h for network ₁ and h₂ Layer-corresponding output function, X _i ' denoted as the ith input sample X _i Through the hidden layer h ₁ Post vector，X _i "is denoted as X _i ' pass through hidden layer h ₂ Post vector, D (X _i ″,X _j ") is the extracted low-dimensional sample vector X _i "and X _j "Euclidean distance between" m is the dimension of the low-dimensional feature vector X ", X _i ′ _s ' is the sample vector X _i "s-th feature, x' _j ′ _s For sample vector X _j "s-th feature. The distance parameter α between samples determines the nearest neighbor distance of the other sample to the sample, when the distance from the sample is greater than α, i.e. D (X _i ″,X _j "α), then it cannot be a neighbor of the sample. Based on the distance parameter, the nearest neighbor value of the ith sample is expressed as the number of samples with all inter-sample distances less than α.

S104, extracting low-dimensional characteristic information of a sample based on an encoder in the trained self-encoder network model, combining a sample nearest neighbor numerical value obtained through iteration as a corresponding sample label, and constructing a decision tree model by using a CART method;

the corresponding calculation method of the base index is expressed as follows;

wherein Gini (B) represents the purity, v, of sample set B in the decision tree _i Represents the proportion of samples of class i (i=1,., C) in sample set B, gini (B, q) represents the base index of attribute q, T represents attribute q= { q ¹ ,q ² ,...,q ^T Number of values of B ^t Indicating that all values at the t-th branch node are q ^t Is a sample set of the sample set.

The corresponding mode fine tuning includes:

test sample X _z First generating a new validation sample X through the generated self-encoding neural network _z "EtongThe over-constructed CART decision tree obtains the corresponding neighbor value K _z 。

Calculating the distance between samples:

wherein ,D(X_z ″,X _i "is used to represent the Euclidean distance between two samples, m is the dimension of the low-dimensional feature vector X", X' _z ′ _s Representing a sample vector X _z "s-th feature; x is x _i ′ _s ' is the sample vector X _i "s-th feature.

Searching a corresponding neighbor sample set in the training set through a KNN algorithm, classifying the samples by using labels of the neighbor sample set, and if the classification effect is excellent, namely the classification accuracy reaches 85% or more, reserving the generated network model and decision tree; otherwise, the parameters of the self-encoder neural network are adjusted to achieve better classification effect.

S105, obtaining low-dimensional characteristic information of a new sample by using a self-encoder, inputting a decision tree to obtain nearest neighbor values, searching for corresponding nearest neighbors by KNN, and taking the category with the largest number in the nearest neighbor field as a prediction result;

the nearest neighbor field of the corresponding search training sample is expressed as:

Z＝(z ₁ ,z ₂ ,...,z _n )

N _Kz ＝{X _K1 ,X _K2 ,...,X _Kz |D(Z″,X _i ″)＜α,K _z ＝1,...,10}

wherein Z is a training sample, Z _i Is the ith feature of Z"is a low-dimensional feature vector extracted from the encoder, z _i "ith feature, D (Z", X), which is a low-dimensional feature vector Z _i "is Euclidean distance metric, m is the dimension of the low-dimensional feature vectors X" and Z ", K _z For the optimal nearest neighbor value corresponding to the training sample output by the decision tree,

for the nearest neighbor sample set of training samples, α is the distance parameter, +.>

Is->

Corresponding ith neighbor sample, p _i For nearest neighbor sample set->

Probability of occurrence of corresponding tag, P _z For training the predictive labels of sample Z, C is the label class number of the sample.

The embodiment of the system for realizing the image classification method based on the self-encoder and the decision tree comprises an image input conversion module, a training module, a feature extraction module, a nearest neighbor module, a decision tree module and a classification module; the image transfer-in conversion module is connected with the training module, the training module is connected with the feature extraction module, the feature extraction module is connected with the decision tree module, the decision tree module is connected with the nearest neighbor module, and the nearest neighbor module is connected with the classification module.

The system comprises an image input conversion module, a matrix/vector conversion module and a display module, wherein the image input conversion module is used for acquiring image sample data, acquiring original RGB image data and acquiring pixel information of an image sample.

The training module of the system inputs the collected image data into the self-encoder, and solves the self-encoder network weight parameters through a back propagation algorithm by using a feedforward neural network.

The characteristic extraction module of the system learns the characteristic information of the image sample by utilizing a self-encoder network model and extracts the typical characteristic information of the image sample based on an encoder.

In the iterative process of solving the optimal weight parameter of the network, the nearest neighbor module of the system calculates the distance between the image samples based on the output result of the corresponding encoder, and updates the nearest neighbor value corresponding to each image sample under the constraint of the nearest neighbor distance.

The decision tree module of the system extracts compression characteristic information of samples based on the trained encoder in the self-encoder network model, combines the optimal neighbor number of each sample as a label to construct a decision tree, and constructs based on a CART method.

The classification module of the system compresses sample characteristic information by utilizing a self-encoder and inputs the sample characteristic information into a decision tree to obtain the corresponding optimal nearest neighbor numerical value, searches the corresponding nearest neighbor field of the sample, and takes the category with the largest number in the nearest neighbor field in the KNN as a prediction result.

In the embodiment, the self-encoder network is utilized to process the image samples, compress the characteristics of the samples, and extract the low-dimensional characteristics and structures of the samples as far as possible; during training of the self-encoder network, continuously searching for the neighbor number of the sample meeting the given distance according to the distance between the extracted low-dimensional feature vectors; and constructing a decision tree by using the nearest neighbor value of the sample and the extracted low-dimensional sample characteristics, acquiring the nearest neighbor value of a new image sample of the unknown label by using the decision tree, and judging the category of the image by adopting a nearest neighbor algorithm.

The technical principles of the present invention have been described above in connection with specific embodiments, which are provided for the purpose of explaining the principles of the present invention and are not to be construed as limiting the scope of the present invention in any way. Other embodiments of the invention will be apparent to those skilled in the art from consideration of this specification without undue burden.

Claims

1. A method of classifying images based on a self-encoder and a decision tree, the method comprising the steps of:

2. The method of image classification based on a self-encoder and decision tree according to claim 1, wherein the self-encoder characterization learning step comprises:

X＝(x ₁ ,x ₂ ,...,x _n )

reconstructing a sample vector for the output; />

For reconstructing sample vector->

And the average difference between the original sample X; w, b are weights and offsets, respectively, from the encoder network;

wherein ,

Mean activation of neurons for the hidden layer; />

3. The method of image classification based on a self-encoder and decision tree according to claim 1, wherein said calculating the distance between the image samples comprises:

X _i ′＝h ₁ (X _i )＝σ ₁ (W ₁ X _i +b ₁ )

X _i ″＝h ₂ (X _i ′)＝σ ₂ (W ₂ X _i ′+b ₂ )

(3) The parameters of the network are initialized, the initial value of the self-encoder network weight parameter is set to 0, the sparsity parameter rho is set to 0.05, the coefficient gamma of the sparsity penalty factor is set to 0.5, and the parameter mu of the similarity constraint is set to 3 e-3.

4. The method for classifying images based on a self-encoder and decision tree according to claim 1, wherein the construction of the decision tree with nearest neighbor values as leaf nodes is as follows:

/>

wherein Gini (B) represents the purity of sample set B in the decision tree; v _i Represents the i (i=i) th in sample set BThe proportion of samples of class C); gini (B, q) represents the base index of attribute q; t represents the attribute q= { q ¹ ,q ² ,...,q ^T Number of values; b (B) ^t Indicating that all values at the t-th branch node are q ^t Is a sample set of the sample set.

5. The method of image classification based on a self-encoder and decision tree according to claim 1, wherein said adjusting parameters of the self-encoder network comprises:

Calculating the distance between samples:

then searching a corresponding neighbor sample set in the training set through a KNN algorithm, and classifying the samples by using labels of the neighbor sample set; if the classification effect is excellent, namely the classification accuracy reaches 85% or more, the generated network model and decision tree are reserved.

6. The image classification method based on a self-encoder and decision tree according to claim 1, wherein the nearest neighbor field of the search training samples is as follows:

Z＝(z ₁ ,z ₂ ,...,z _n )

Is->

7. A system for implementing a self-encoder and decision tree based image classification method according to any of claims 1-6, said system comprising an image input conversion module, a training module, a feature extraction module, a nearest neighbor module, a decision tree module and a classification module:

the image input conversion module is used for acquiring image sample data, acquiring original RGB image data and acquiring a matrix/vector of pixel information of an image sample;

the training module inputs the collected image data into the self-encoder, and solves the weight parameters of the self-encoder by using a feedforward neural network through a back propagation algorithm;

the characteristic extraction module is used for learning the characteristic information of the image sample by utilizing a self-encoder network model and extracting the typical characteristic information of the image sample based on an encoder;

the nearest neighbor module calculates the distance between the image samples based on the output result of the corresponding encoder in the iterative process of solving the optimal weight parameter of the network, and updates the nearest neighbor value corresponding to each image sample under the constraint of the nearest neighbor distance;

the decision tree module extracts compression characteristic information of samples based on the trained encoder in the self-encoder network model, and constructs a decision tree based on a CART method by combining the optimal neighbor number of each sample as a label;