CN116091849A

CN116091849A - Tire pattern classification method, system, medium and equipment based on grouping decoder

Info

Publication number: CN116091849A
Application number: CN202310376437.4A
Authority: CN
Inventors: 刘萌; 厉盛华; 周迪; 郭杰; 宁阳; 马玉玲
Original assignee: Shandong Jianzhu University
Current assignee: Shandong Jianzhu University
Priority date: 2023-04-11
Filing date: 2023-04-11
Publication date: 2023-05-09
Anticipated expiration: 2043-04-11
Also published as: CN116091849B

Abstract

The invention belongs to the technical field of image processing, and provides a tire pattern classification method, a system, a medium and equipment based on a block decoder, which adopts the following scheme: obtaining a tire pattern image type based on the tire pattern image dataset and the trained tire pattern classification model; the training process of the tire pattern classification model comprises the following steps: constructing a teacher network and a student network based on transfer learning, adjusting the structure of the teacher network, replacing a global average pooling layer in the teacher network with a linear projection layer and a block decoder, and obtaining the classification probability of the tire pattern image through linear projection and block decoding; and carrying out knowledge distillation on the student network through a teacher network to obtain KL divergence loss and cross entropy loss functions. With both the complexity and computational overhead of the lower model.

Description

Tire pattern classification method, system, medium and equipment based on grouping decoder

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to a tire pattern classification method, system, medium and equipment based on a block decoder.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

At present, little research work is done on classification of tire pattern images, and the processing flow of the research work can be summarized as the following steps: 1) Utilizing a convolutional neural network to extract different characteristics of the tire picture; 2) And inputting the extracted features into a support vector machine classifier or a pre-trained depth image classification model for prediction. In order to further increase the classification accuracy of tire tread images, some methods utilize a tire image representation algorithm that combines discrete wavelet transforms with scale invariant features.

Although the accuracy of classification of tire pattern images is improved to a certain extent by the methods, texture features extracted by the traditional methods are represented by bottom-layer information of the images, and high-level visual semantic features are lacked, so that semantic gaps are caused; and the traditional method has large memory requirement and high calculation amount.

Meanwhile, the existing classification heads, such as the classification head based on global average pooling and the classification head based on attention, have the defects. A classification head based on global averaging pooling needs to identify multiple objects with different locations and sizes, which may make the use of averaging pooling less desirable; attention-based classification heads do improve the results, but are often costly.

Disclosure of Invention

In order to solve at least one technical problem in the background art, the invention provides a tire pattern classification method, a system, a medium and equipment based on a block decoder, which are used for bridging the gap between low-level visual features and high-level semantics through a lightweight tire pattern image classification model with a flexible and efficient classification head, and have the complexity and the calculation cost of a lower model.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

a first aspect of the present invention provides a tire pattern classification method based on a group decoder, comprising the steps of:

acquiring a tire pattern image dataset;

obtaining a tire pattern image type based on the tire pattern image dataset and the trained tire pattern classification model; the training process of the tire pattern classification model comprises the following steps:

constructing a teacher network and a student network based on transfer learning, adjusting the structure of the teacher network, replacing a global average pooling layer in the teacher network with a linear projection layer and a block decoder, and obtaining the classification probability of the tire pattern image through linear projection and block decoding;

and carrying out knowledge distillation on the student network through a teacher network to obtain KL divergence loss and cross entropy loss.

Further, the preprocessing is performed after the tire pattern image data set is acquired, specifically:

processing the tire pattern image dataset by utilizing the self-adaptive histogram equalization of the limiting contrast to obtain a first tire pattern image dataset;

and processing the first tire pattern image data set in a data enhancement mode to obtain a second tire pattern image data set.

Further, the processing the tire pattern image data set by using the adaptive histogram equalization of the limiting contrast to obtain a first tire pattern image data set specifically includes:

dividing tire pattern image data to obtain mutually disjoint sub-blocks with the same size;

calculating a histogram of each sub-block;

clipping the histogram and reassigning the pixel points;

performing histogram equalization on the sub-blocks after the pixels are reassigned;

and reconstructing the gray value of each sub-block after equalization by using a bilinear interpolation algorithm to obtain a first tire pattern image data set.

Further, the replacing the global average pooling layer in the teacher network with a linear projection layer and a block decoder, and obtaining the classification probability of the tire pattern image through linear projection and block decoding specifically includes:

projecting an output feature map of the teacher network convolution layer as input of a projection layer, and remolding a projection result to obtain local features;

introduction of

The individual dimension is->

Is used as a query matrix, +.>

For predefining the number of packets in the packet decoder;

dividing the local features equally according to feature dimensions, and inputting each part of local features and a query matrix into a cross attention mechanism to obtain global feature information related to the query matrix;

based on global feature information related to the query matrix, the classification probability of the tire pattern image is obtained through inflation mapping.

Further, the KL divergence loss and the cross entropy loss function are weighted and summed to be a training total loss of the tire pattern classification model.

Further, the knowledge distillation is performed on the student network through the teacher network to obtain the KL divergence loss and cross entropy loss functions specifically comprises:

inputting the tire pattern image data set into a teacher network to obtain the output of a full-connection layer of the teacher network;

processing the output of the full-connection layer of the teacher network through a softmax activation function to obtain an output soft tag;

loading the tire pattern image data set to a student network, and loading a file trained on a teacher network to obtain a full-connection layer output after being processed by the student network;

the output of the obtained student network full-connection layer is processed by a softmax activation function at a first temperature to obtain an output soft prediction;

performing KL divergence calculation based on the soft label and soft prediction to obtain KL divergence loss;

outputting the full-connection layer obtained through student network processing to obtain output hard prediction through softmax activation function processing of the second temperature;

and performing cross entropy loss calculation between the hard prediction and the real label of each picture, namely the hard label, so as to obtain cross entropy loss.

Further, the teacher network employs Resnet50 and the student network employs Efficient-V2-S.

A second aspect of the present invention provides a group decoder-based tire pattern classification system comprising:

a data acquisition module configured to: acquiring a tire pattern image dataset;

a classification module configured to: obtaining a tire pattern image type based on the tire pattern image dataset and the trained tire pattern classification model; the training process of the tire pattern classification model comprises the following steps:

and carrying out knowledge distillation on the student network through a teacher network to obtain KL divergence loss and cross entropy loss functions.

A third aspect of the present invention provides a computer-readable storage medium.

A computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps in the block decoder-based tire pattern classification method of the first aspect.

A fourth aspect of the invention provides an electronic device.

A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps in the packet decoder based tire pattern classification method of the first aspect when the program is executed.

Compared with the prior art, the invention has the beneficial effects that:

1. the invention provides a novel tire pattern image classification method, which is used for obtaining tire pattern image types based on a tire pattern image data set and a trained tire pattern classification model; the training process of the tire pattern classification model comprises the following steps: constructing a teacher network and a student network based on transfer learning, adjusting the structure of the teacher network, replacing a global average pooling layer in the teacher network with a linear projection layer and a block decoder, and obtaining the classification probability of the tire pattern image through linear projection and block decoding; the problem of fitting is effectively solved, and classification accuracy is improved while the consumption of operation resources is reduced.

2. The training strategy of the tire pattern classification method based on knowledge distillation is provided, knowledge distillation is carried out on a student network through a teacher network, KL divergence loss and cross entropy loss functions are obtained, a light model is guided through a large model which is relatively complex in training and excellent in effect, model delay is reduced while model accuracy is improved, and network parameters are compressed.

3. The present invention proposes a block decoder mechanism that not only provides a better speed-accuracy tradeoff, but also improves the accuracy of the model.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.

FIG. 1 is a flow chart of knowledge distillation provided by an embodiment of the present invention.

Detailed Description

The invention will be further described with reference to the drawings and examples.

It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.

The purpose of classifying the tire pattern images is to distinguish different types of tire images according to the characteristics of the tire images, so as to realize minimum classification errors. The information contained in the tire pattern image can provide important clues for police investigation, and helps to accelerate the forensic of criminal cases, so that the police working efficiency is effectively improved; important evidence can be provided for rapidly and accurately dividing responsibility of two parties in traffic accident processing. Because the images of various tire patterns are similar in texture, it is quite laborious to manually judge the types of the tire patterns.

Example 1

As shown in fig. 1, the present embodiment provides a tire pattern image classification method based on a block decoder, including the steps of:

step 1: acquiring an original tire pattern image dataset with a real label;

the tire pattern image data set is marked as S, wherein the tire pattern image data is marked as I;

the tire pattern image data I category information is recorded as

，/>

Representing the number of categories;

step 2: preprocessing the tire pattern image data set to obtain a training data set;

in step 2, the pretreatment process specifically includes:

step 201: the resulting transformed sipe image dataset processed by step 201 is denoted S1 using limited contrast adaptive histogram equalization to enhance the contrast of the image and suppress image noise.

The step 201 specifically includes:

step 2011: dividing the tire pattern image data I to obtain mutually disjoint tire pattern image data I with the same size

Sub-blocks;

step 2012: computing a histogram for each sub-block

Wherein->

The representation is located->

The number of pixels at the gray level;

step 2013: clipping the histogram and reassigning pixels, specifically comprising:

according to the clipping amplitude

Sub-block histogram->

Cutting, wherein%>

And->

For each subThe number of pixels of the block in the x and y directions, B is the gray level number, and e is the clipping coefficient;

the number of pixels clipped is equally distributed to each gray level. Specifically, the total number of pixels exceeding the clipping amplitude R is

The number of pixels allocated per gray level is +.>

；

The histogram after reassignment of pixels is noted +.>

；

Step 2014: performing histogram equalization on the sub-blocks after the pixels are reassigned;

step 2015: reconstructing the gray value of each sub-block by using a bilinear interpolation algorithm; the method specifically comprises the following steps: according to the calculated points

Coordinates of center pixel point of four sub-blocks around +.>

And this->

Mapping function through four sub-blocks around +.>

The four mapping values obtained ∈ ->

、/>

、/>

、

The ∈point can be obtained>

Gray value of (2):

。

step 202: obtaining a derivative tire pattern image data set by adopting a data enhancement mode; the specific steps of data enhancement here include: the processed tire pattern image is subjected to random inversion in the horizontal direction, the tire pattern image is randomly cut into images with the length and the width of 224, normalization of image data is carried out, and a new data set formed by the normalized images is recorded as follows: s2, performing operation.

Step 3: constructing a teacher network based on a block decoder and pre-training the teacher network;

the teacher network construction process based on the block decoder comprises the following steps:

constructing a teacher network based on transfer learning, adjusting the structure of the teacher network, replacing a global average pooling layer in the teacher network with a linear projection layer and a block decoder, and modifying the output dimension of a full connection layer of the teacher network into the number of categories in the tire pattern image data set

；

The block decoder mainly comprises three parts: cross-attention layer, feed forward layer and group full connection pooling layer.

Wherein, cross the attention layer:

will be

As a query vector, from the input->

To select useful information.

Feed-forward layer: a feed forward neural network is used to synthesize all the information.

Group full connection pooling layer: expansion mapping, to

The individual query vector representation information is mapped to + ->

Probability values for individual image classes.

In this embodiment, the teacher network uses Resnet50 to pretrain the Resnet50 network on the ImageNet dataset.

In step 3, constructing and pre-training a teacher network based on a block decoder specifically includes:

step 301: inputting the tire pattern image data into a teacher network, and then obtaining an output result of a convolution layer in the teacher network, which is recorded as

Wherein H, W respectively represent the height and width of the convolution layer output feature map, +.>

The number of channels of the output characteristic diagram of the convolution layer is represented;

step 302: output result of teacher network convolution layer

Input to the linear projection layer, will +.>

Projection to +.>

Vitamin and remodel->

Output local feature->

；

Step 303: introduction of

The individual dimension is->

Is denoted as a query matrix, and is denoted as +.>

Here->

For predefining the number of packets in the packet decoder.

Step 304: local features to be obtained

Equally dividing according to characteristic dimension, and equally dividing into +.>

Parts, each part of local features is designated +.>

Here->

。

Step 305: will be

Query matrix->

Input into the Cross-attention mechanism, get +.>

Characteristic information related to the query matrix, denoted +.>

，

（1）

（2）

（3）

Wherein Attention represents the traditional mechanism of Attention, i.e

Representation and->

An associated learnable projection matrix.

Thereafter, will

The results of the individual heads are spliced together and all local characteristic information (about the query matrix) is obtained through a layer of full-connection mapping>

Here->

Is a learnable projection matrix.

In order to obtain probability information of the input image about image categories, the present embodiment designs a dilation map, which maps k query vector representation information into probability values of C image categories, i.e., group factors

Then an output is obtained>

Wherein->

、/>

、/>

For matrix->

Is the s line of (2); splicing the prediction results of k queries together to obtain probability values of the current image about C categories, and recording the probability values as +.>

。

Step 306: the resulting output

Obtaining classification probability of tire pattern image through Sigmoid activation function, and marking the classification probability as +.>

；

Step 307: the classification probability of the obtained tire pattern image is input into a binary cross entropy Loss function, and the binary cross entropy Loss is calculated and recorded as Loss1.

Step 4: constructing a student network and training the student network by using a knowledge distillation mechanism;

the step 4 specifically comprises the following steps:

step 401: constructing a student network;

in the embodiment, the student network adopts Efficientnet-V2-S to pretrain the Efficientnet-V2-S network on the ImageNet data set; modifying the output dimension of the student network full-connection layer into the number C of categories in the tire pattern image dataset;

step 402: carrying out knowledge distillation based on a student network and a teacher network to obtain KL divergence loss and cross entropy loss functions; the method specifically comprises the following steps:

step 4021: in the knowledge distillation process, the tire pattern image data set S2 is input into a teacher network to obtain the output of a full-connection layer of the teacher network

；

Step 4022: the obtained output

Processing with a distillation temperature T=t softmax to obtain an output soft label, denoted +.>

；

Step 4023: in the knowledge distillation process, the tire pattern image data set S2 is loaded to a student network, and the parameter file of the trained teacher network is loaded in the student network, and the full-connection layer processed by the student network is output

；

Step 4024: will result in a full link layer output

The output soft prediction is recorded as +.f by the softmax activation function treatment with distillation temperature t=t>

；

Step 4025: the soft label in step 4022

And soft prediction in step 4024 +.>

Obtaining KL divergence calculation to obtain KL divergence loss +.>

。

Step 4026: outputting the full connection layer obtained in the step 4023

Treatment with a softmax activation function at a distillation temperature t=1 gives an output hard prediction denoted +.>

；

Step 4027: hard the result of the calculation in step 4026Prediction

Performing cross entropy loss calculation with real tag of each picture, i.e. hard tag, to obtain cross entropy loss +.>

；

Step 403: introduction of parameters

Weighting the KL divergence loss and the cross entropy loss function to obtain a final optimized objective function total loss function +.>

For network training, wherein +.>

The formula of (2) is: />

。

Example two

The present embodiment provides a tire pattern classification system based on a block decoder, comprising:

Example III

The present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps in the tire pattern classification method based on a block decoder of the embodiment.

Example IV

The present embodiment provides an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps in the tire pattern classification method based on a block decoder of the first embodiment when executing the program.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The tire pattern classification method based on the grouping decoder is characterized by comprising the following steps of:

acquiring a tire pattern image dataset;

2. A method of classifying a tire pattern based on a block decoder as in claim 1, wherein said acquiring a tire pattern image dataset is followed by a preprocessing, specifically:

3. A method of classifying tire patterns based on a block decoder as in claim 2 wherein said processing the tire pattern image dataset with limited contrast adaptive histogram equalization to obtain a first tire pattern image dataset comprises:

calculating a histogram of each sub-block;

clipping the histogram and reassigning the pixel points;

4. The tire pattern classification method based on the grouping decoder as claimed in claim 1, wherein the replacing the global average pooling layer in the teacher network with the linear projection layer and the grouping decoder, the classification probability of the tire pattern image is obtained through the linear projection and the grouping decoding, specifically comprises:

taking an output feature map of a teacher network convolution layer as a projection layer input to project, and remolding a projection result to obtain local features;

introducing a matrix formed by vectors with k dimensions d as a query matrix, wherein k is the number of packets in a predefined packet decoder;

5. The packet decoder-based tire pattern classification method according to claim 1, wherein the KL divergence loss and the cross entropy loss are weighted and summed as a training total loss of the tire pattern classification model.

6. The method for classifying tire patterns based on the block decoder according to claim 1, wherein the knowledge distillation is performed on the student network through the teacher network to obtain KL divergence loss and cross entropy loss specifically comprises:

the output of the full-connection layer of the teacher network is processed by a softmax activation function to obtain an output soft label;

loading the tire pattern image data set to a student network, and outputting the tire pattern image data set through a full-connection layer processed by the student network;

the output of the full-connection layer of the student network is processed by a softmax activation function at a first temperature to obtain an output soft prediction;

the output of the full-connection layer obtained by the student network is processed by a softmax activation function at the second temperature to obtain an output hard prediction;

7. The method of packet decoder based tire pattern classification of claim 1, wherein said teacher network employs a Resnet50 and said student network employs an Efficientnet-V2-S.

8. A packet decoder-based tire pattern classification system, comprising:

9. A computer-readable storage medium having stored thereon a computer program, which when executed by a processor, implements the steps of the block decoder-based tire pattern classification method as claimed in any one of claims 1-7.

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the packet decoder-based tire pattern classification method of any one of claims 1-7 when the program is executed.