CN115439681A - Image multi-classification network structure based on feature remapping and training method - Google Patents

Image multi-classification network structure based on feature remapping and training method Download PDF

Info

Publication number
CN115439681A
CN115439681A CN202210978719.7A CN202210978719A CN115439681A CN 115439681 A CN115439681 A CN 115439681A CN 202210978719 A CN202210978719 A CN 202210978719A CN 115439681 A CN115439681 A CN 115439681A
Authority
CN
China
Prior art keywords
classification
network
classification network
remapping
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210978719.7A
Other languages
Chinese (zh)
Inventor
邹腊梅
李广磊
连志祥
王皓
谢佳
钟胜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN202210978719.7A priority Critical patent/CN115439681A/en
Publication of CN115439681A publication Critical patent/CN115439681A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an image multi-classification network structure based on feature remapping and a training method. The normal sample classification result of the multi-classification network is corrected through the trained two-classification network, so that the recognition accuracy of the multi-classification network on the normal sample classification can be improved, and particularly, the multi-classification network with higher recognition accuracy on the normal sample classification can be obtained under the condition that the normal samples are fewer.

Description

Image multi-classification network structure based on feature remapping and training method
Technical Field
The invention belongs to the technical field of computer vision deep learning, and particularly relates to an image multi-classification network structure based on feature remapping and a training method.
Background
In recent years, with the increasing construction of cities, the task of detecting underground drainage pipelines is becoming burdensome. Traditional underground drainline inspection requires a large amount of manpower to complete, which presents not only time challenges but also efficiency challenges. The method for detecting the drainage pipeline through deep learning improves efficiency and saves labor cost.
In general drainage pipeline detection, people mainly pay attention to the accuracy of defect type samples, but ignore the accuracy of normal type samples, so that a large number of normal samples are mistakenly detected as defect samples. The reason for this is that the industry has paid excessive attention to the defect type when collecting the drainage channel data, so that the defect sample is more during training and the normal type sample is not collected enough, and the data collection is not completed all the time, so the problem can be considered to be solved from the aspect of deep learning network.
Since the 2-classification network has simple tasks, only two categories of normal and defect need to be judged, when the same complex network is used, the simpler the task is, the stronger the information extraction capability of the network is, and the more accurate the result is. Therefore, in the conventional technology, in order to improve the accuracy, two networks are usually trained separately, a 2-class network is trained first, a defect sample is extracted, and then the sample determined as a defect by the 2-class network is input into a multi-class network, and is specifically predicted to be a specific defect category. Although the accuracy of the normal class sample can be improved by using the 2-class network, the accuracy of the 2-class network cannot be guaranteed to reach 100%, so that a large accumulated error is generated by successively passing through the two class networks.
Disclosure of Invention
In view of the above defects or improvement requirements of the prior art, the present invention provides an image multi-classification network structure based on feature remapping and a training method, and aims to modify a multi-classification network by using a two-classification network when training the multi-classification network, thereby improving the accuracy of the multi-classification network in recognizing normal samples.
To achieve the above object, according to an aspect of the present invention, there is provided an image multi-classification network structure based on feature remapping, including:
the multi-classification network is used for realizing m kinds of classification after the samples input into the multi-classification network sequentially pass through the convolutional layer, the full connection layer and the softmax layer and outputting a classification result [ outsoft mcls0 ,outsoft mcls1 ,outsoft mcls2 ,……,outsoft mclsn-1 ,outsoft mclsn ]Wherein, outsoft mclsn Representing normal class probability, and the rest representing different defect probabilities respectively, wherein n = m-1, and m is more than or equal to 3;
the two-classification network is used for inputting samples of the two-classification network to realize 2 kinds of classification after sequentially passing through the convolutional layer, the full connection layer and the softmax layer, and outputting a classification result
Figure BDA0003799436560000021
Wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003799436560000022
respectively representing defect class probability and normal class probability;
the characteristic remapping network is used for correcting the classification result output by the multi-classification network according to the classification result output by the two classification networks so as to enable the multi-classification network to output a normal class probability outsoft during training mclsn Approaching normal class probability of the two-class network output
Figure BDA0003799436560000023
Implementing feature remapping;
and the loss calculation module is used for calculating the training loss and reversely adjusting the parameters of the multi-classification network so as to converge the loss.
In one embodiment, the two-class network and the multi-class network share a sample input.
In one embodiment, the feature remapping network comprises:
a weight parameter adjusting module for obtaining the output result [ out ] of the multi-classification network full-connection layer mcls0 ,out mcls1 ,out mcls2 ,……,out mclsn-1 ,out mclsn ]Output result of full connection layer of two-classification network
Figure BDA0003799436560000024
After splicing, outputting a weight parameter k through a full connection layer;
a normal class probability correction module for correcting the normal class probability out by using the weight parameter k mclsn Corrected to out mclsn ', wherein,
Figure BDA0003799436560000031
the loss calculation module comprises a feature difference loss module and a classification loss module, the feature difference loss module is used for calculating feature difference loss between input features of a multi-classification network full-connection layer and input features of a two-classification network full-connection layer, the classification loss module is used for calculating classification loss of the multi-classification network after feature remapping, and the loss calculation module takes the sum of the classification loss and the feature difference loss as training loss.
In one embodiment, the multi-class network includes a backbone network and a feature delivery enhancement module connected to the backbone network, the backbone network includes a plurality of convolutional layers, full-link layers, and softmax layers connected in sequence, and the feature enhancement module includes:
the middle-high frequency feature transfer enhancement module is used for extracting edge and texture information of a shallow layer, is bridged at the front two ends of a front end convolution layer J of a trunk network and comprises a 1 x 1 convolution layer, a middle-high frequency domain channel attention module and a first cross attention module, the features in the trunk network are divided into three paths when reaching the input end of the middle-high frequency feature transfer enhancement module, the first path is continuously transmitted forwards before passing through the front end convolution layer J, the second path is converged into a trunk path after sequentially passing through the 1 x 1 convolution layer, the middle-high frequency domain channel attention module and the 1 x 1 convolution layer to extract high-frequency information of an image, and the third path is converged into the trunk path after sequentially passing through the 1 x 1 convolution layer, the first cross attention module and the 1 x 1 convolution layer to extract long-distance dependency relations among pixels at different positions;
the middle-low frequency feature transmission enhancing module is used for extracting deep semantic information, is bridged at the two ends of the rear end convolution layer J of a trunk network and comprises a 1 x 1 convolution layer, a middle-low frequency domain channel attention module and a second cross attention module, the features in the trunk network are divided into three paths when reaching the input end of the middle-low frequency feature transmission enhancing module, the first path is continuously transmitted forward after passing through the rear end convolution layer J, the second path sequentially passes through the 1 x 1 convolution layer, the middle-low frequency domain channel attention module and the 1 x 1 convolution layer to extract the low frequency information of an image and then is merged into a main trunk line, and the third path sequentially passes through the 1 x 1 convolution layer, the second cross attention module and the 1 x 1 convolution layer to extract the long-distance dependency relationship among different position pixels and then is merged into the main trunk line.
In one embodiment, the system further comprises a feature remapping control module for judging the normal class probability
Figure BDA0003799436560000041
Whether it is greater than the preset value, when the normal classification probability
Figure BDA0003799436560000042
When the probability is larger than the preset value, the characteristic remapping network is accessed to carry out characteristic remapping, and when the probability of the normal category is larger than the preset value
Figure BDA0003799436560000043
And when the characteristic is not greater than the preset value, the characteristic remapping network is cut off and the characteristic remapping is not carried out.
According to another aspect of the present invention, there is provided a method for training an image multi-classification network based on feature remapping, comprising:
will sampleInputting the data into a multi-classification network to train the multi-classification network, wherein the multi-classification network realizes m classifications, and outputs a classification result [ outsoft ] after passing through a softmax layer of the multi-classification network mcls0 ,outsoft mcls1 ,outsoft mcls2 ,……,outsoft mclsn-1 ,outsoft mclsn ]Wherein, outsoft mclsn Representing normal class probability, and the rest representing different defect probabilities respectively, wherein n = m-1, and m is more than or equal to 3;
inputting the same sample into the trained two-classification network, and outputting a classification result after passing through a softmax layer of the two-classification network
Figure BDA0003799436560000044
Wherein the content of the first and second substances,
Figure BDA0003799436560000045
respectively representing defect class probability and normal class probability;
correcting the classification result output by the multi-classification network by using the classification result output by the two-classification network, so that the normal class probability outsoft output by the multi-classification network during training mclsn Approaching the normal class probability of the two-class network output
Figure BDA0003799436560000046
Implementing feature remapping;
calculating training loss and reversely adjusting parameters of the multi-classification network, continuing training until the loss is converged to an expected range, and ending the training.
In one embodiment, the modifying the classification result output by the multi-classification network by using the classification result output by the two-classification network comprises:
obtaining output result of multi-classification network full connection layer mcls0 ,out mcls1 ,out mcls2 ,……,out mclsn-1 ,out mclsn ]And output result of full connection layer of two-classification network
Figure BDA0003799436560000047
Carry out the splicingOutputting a weight parameter k through a full connection layer;
normal class probability out is divided by weight parameter k mclsn Corrected to out mclsn ', wherein,
Figure BDA0003799436560000048
calculating a training loss comprising:
and calculating the characteristic difference loss between the input characteristics of the multi-classification network full-connection layer and the input characteristics of the two classification network full-connection layers and the classification loss of the multi-classification network after characteristic remapping, and finishing training after the sum of the classification loss and the characteristic difference loss converges to an expected range.
In one embodiment, before the classification result output by the multi-classification network is corrected by using the classification result output by the two-classification network, the probability of normal classification is judged
Figure BDA0003799436560000051
Whether it is greater than the preset value, when the normal classification probability
Figure BDA0003799436560000052
When the probability is larger than the preset value, the characteristic remapping is carried out, and when the probability of the normal category is larger than the preset value
Figure BDA0003799436560000053
And when the value is not greater than the preset value, the characteristic remapping is not carried out.
In one embodiment, the preset value is 0.9.
According to still another aspect of the present invention, there is provided an image multi-classification method including:
and inputting the images into a trained multi-classification network to obtain a classification result, wherein the multi-classification network is obtained by training according to the image multi-classification network training method based on the characteristic remapping.
In general, compared with the prior art, the above technical solutions conceived by the present invention can achieve the following beneficial effects:
in the invention, during the training of the multi-classification network, the trained two-classification network is used, and the features output by the normal class samples in the training process in the multi-classification network are approximated to the features of the normal class samples output by the 2-classification network through a feature remapping method. The task of the two-classification network is relatively simple, the simpler the task is, the stronger the information extraction capability of the network is, the more accurate the result is, and the recognition precision of the two-classification network on a normal sample is higher than that of a multi-classification network under the same sample. Therefore, in the application, during the training of the multi-classification network, the normal sample classification result of the multi-classification network is corrected through the trained two classification networks, so that the recognition accuracy of the multi-classification network on the normal class sample can be improved, and particularly under the condition of fewer normal samples, the multi-classification network with higher recognition accuracy on the normal class sample can be obtained.
Drawings
FIG. 1 is a schematic structural diagram of an image multi-classification network structure based on feature remapping according to an embodiment;
FIG. 2 is a block diagram of an embodiment of a medium-high frequency feature transfer enhancement module;
FIG. 3 is a block diagram of a low and medium frequency feature transfer enhancement module according to an embodiment;
FIG. 4 is a flowchart illustrating steps of a method for training an image multi-classification network based on feature remapping according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Fig. 1 is a schematic structural diagram of an image multi-classification network structure based on feature remapping in an embodiment. The image multi-classification network structure comprises four functional structures which are a multi-classification network, a two-classification network, a feature remapping network and a loss calculation module respectively.
The multi-classification network comprises a convolution layer, a full connection layer and a softmax layer which are connected in sequence. The convolution layer is used for extracting image features, the full-connection layer is used for calculating the logit values (probability values which are not normalized) of all the classifications, and the softmax layer is used for performing normalization processing on all the classified logit values to obtain the final classification result, namely the probability of each classification. Specifically, the multi-classification network realizes m classification, m is more than or equal to 3, and a classification result is output [ outoft ] mcls0 ,outsoft mcls1 ,outsoft mcls2 ,……,outsoft mclsn-1 ,outsoft mclsn ]Wherein, outsoft mclsi Indicates the probability of the (i + 1) th class, in particular, outsoft mclsn Representing normal class probabilities and the remainder representing different defect probabilities, e.g. outsoft mcls0 Indicates the probability of type 1 defects, outsoft mcls1 Probability of class 2 defect, and so on, outsoft mclsn-1 Indicating the probability of the nth type of defect, n = m-1.
The two-classification network comprises a convolution layer, a full connection layer and a softmax layer which are connected in sequence. Wherein, the functions of the convolution layer, the full connection layer and the softmax layer are the same as those of the multi-classification network. Specifically, the multi-classification network realizes 2 kinds of classification and outputs the classification result
Figure BDA0003799436560000071
Wherein the content of the first and second substances,
Figure BDA0003799436560000072
and the two classification networks only distinguish the defects from the normality.
A feature remapping network for modifying the classification result output by the multi-classification network according to the classification result output by the two-classification network, so that the normal class probability outsoft output by the multi-classification network during training mclsn Approaching the normal class probability of the two-class network output
Figure BDA0003799436560000073
Feature remapping is implemented. It can be understood that, in particular, the multi-classification network parameters are modified according to the classification result output by the two-classification network, so that the normal class probability outsoft output by the multi-classification network is obtained mclsn Approaching normal class probability of the two-class network output
Figure BDA0003799436560000074
It should be noted that the present application is not limited to the specific way of modifying the multi-class network parameters according to the classification result outputted by the two-class network, as long as the normal class probability outsoft outputted by the multi-class network can be achieved during training mclsn Approaching the normal class probability of the two-class network output
Figure BDA0003799436560000075
And (4) finishing.
And the loss calculation module is used for calculating the training loss and reversely adjusting the parameters of the multi-classification network so as to make the loss converge, and continuously training until the loss converges to an expected range, and then finishing the training.
The image multi-classification network structure based on the feature remapping can train two classification networks to obtain the trained two classification networks, then simultaneously input the samples into the two classification networks and the multi-classification networks during the training of the multi-classification networks, and enable the features output by normal classification samples in the training process in the multi-classification networks to approach the features of normal classification samples output by the 2 classification networks through the feature remapping method, so that the recognition accuracy of the multi-classification networks on the normal classification samples is improved, and particularly under the condition of less normal samples, the multi-classification networks with higher recognition accuracy on the normal classification samples can be obtained.
In an embodiment, the two-class network and the multi-class network use the same network model, which may be specifically a MobileNet-V3 network model, or may be a network model such as ShuffleNet, squeezeNet, xception, or the like. The same network model is beneficial to feature remapping, so that the features output by the normal category samples in the multi-classification network better approach the features of the normal category samples output by the 2-classification network.
In one embodiment, the two-class network and the multi-class network share a sample input end, when the multi-class network is trained, the structural parameters of the 2-class network need to be frozen, and then the images input into the multi-class network are simultaneously input into the 2-class network through the same input end to predict normal class information.
In one embodiment, with continued reference to FIG. 1, the feature remapping network includes a weight parameter adjustment module and a normal class probability correction module.
Wherein, the weight parameter adjusting module is used for obtaining the output result [ out ] of the multi-classification network full connection layer mcls0 ,out mcls1 ,out mcls2 ,……,out mclsn-1 ,out mclsn ]And output result of full connection layer of two-classification network
Figure BDA0003799436560000087
And outputting a weight parameter k through a full connection layer after splicing. out mclsi And outsoft mclsi Corresponding to out mclsi After the normalization processing of softmax, the product is outsoft mclsi
Specifically, the weight parameter adjusting module may include a feature concatenation layer (concat) and a plurality of full connection layers connected in sequence, the feature concatenation layer being connected to an output of the full connection layer of the multi-classification network and an output of the full connection layer of the binary network, respectively, to obtain the information of the multi-classification network and the full connection layer of the binary network, respectively
Figure BDA0003799436560000081
Figure BDA0003799436560000082
And
Figure BDA0003799436560000083
splicing is carried out to obtain splicing characteristics
Figure BDA0003799436560000084
Figure BDA0003799436560000085
And then outputting a 1-dimensional characteristic value which is a weight parameter k after passing through a plurality of full connection layers. In fig. 1, specifically, for example, m =17, a 19-dimensional feature is obtained after feature splicing, the 19-dimensional feature passes through a first full-connected layer, an output dimension is 17, and passes through a second full-connected layer, and a weight parameter k with a dimension of 1 is output.
The normal class probability modification module is used for utilizing a weight parameter k to modify the normal class probability out mclsn Corrected to out mclsn '. Specifically, the correction can be performed by the following correction formula:
Figure BDA0003799436560000086
in this embodiment, the weight parameter k is used to control the mapping ratio, and is used when k is 0
Figure BDA0003799436560000091
Is not mapped.
The above correction formula is derived as follows:
when m =3, let the output of the 2-class network after softmax be [ ab](where A represents the probability of defect class and B represents the probability of normal class), the logic value (output result of the full link layer) output by the 3-class network is [ x ] 1 ,x 2 ,x 3 ](x 1 And x 2 Logic values, x, representing two different classes of defects 3 Logic value representing normal category), and after softmax, [ X, Y, Z]. The most direct way to achieve the feature approximation is to approximate the probabilities of the corresponding classes, and since softmax (x) is a monotonically increasing function, the change of the probabilities can be further reflected on the logic value of the network output.
Let the probability of the 3 classified network output after the feature remapping is completed be [ X, Y, Z'],lThe value of ogit is [ x ] 1 ,x 2 ,x′ 3 ]。
The first step is as follows: the normal class probability in the 3-class network is close to the normal class probability in the 2-class network:
Figure BDA0003799436560000092
for convenience of calculation, when x 3 When > 0, map it to value x' 3 The value range is specified to be (0, infinity); when x is 3 If < 0, map it to value x' 3 The value range is specified to be (-infinity, 0); when x is 3 And 0, its value is not changed. The characteristic of monotonic increase by softmax (x) is given by:
Figure BDA0003799436560000093
Figure BDA0003799436560000094
synthesizing the formulas (1) - (3) to obtain x 'after mapping' 3
Figure BDA0003799436560000095
Figure BDA0003799436560000096
Unifying the formula (4) and the formula (5) and preventing the 3-class network from losing and mutating due to the overlarge difference of the output characteristics of the normal samples in the 3-class network and the 2-class network, finally adding momentum operation into the formulas (4) and (5), so that the addition approximate solution can be obtained:
Figure BDA0003799436560000101
to be provided with
Figure BDA0003799436560000102
Namely, it is
Figure BDA0003799436560000103
For example, (1) if x 3 X 'when k is greater than 0' 3 >x 3 In this case, Z can be approached to Z'; (2) If x 3 X 'when k is less than 0' 3 >x 3 In this case, Z can be approached to Z'; in that
Figure BDA0003799436560000104
In addition to the above two cases, the other cases may cause the feature output by the 3-class network to be far from the feature output by the 2-class network, but based on the network connection relationship in the present embodiment, the weight coefficient k may be adjusted by training loss feedback, that is, the degree of approximation may be changed by training the weight coefficient, and finally the magnitude of the weight coefficient is controlled by back propagation of the loss. At the moment, the multi-classification network not only has the capability of judging whether the sample belongs to the normal class according to the network parameter of the multi-classification network, but also has the capability of learning and judging the normal class sample from the 2-classification network, and finally the Z direction is approached.
In fig. 1, m =17 is specifically taken as an example. 2 output of the classification network softmax is
Figure BDA0003799436560000105
The logic value of the 17 classification network output is out _ logic:
Figure BDA0003799436560000106
Figure BDA0003799436560000107
the result after softmax is outsoft _17cls:
Figure BDA0003799436560000108
Figure BDA0003799436560000109
the mapping method comprises the following steps:
Figure BDA00037994365600001010
analogy between equation (7) and equations (1) - (6) yields equation (8):
Figure BDA00037994365600001011
wherein
Figure BDA00037994365600001012
To use the pair of formula (8)
Figure BDA00037994365600001013
The updated value of logit for the normal class sample output (at this time softmax has not elapsed).
By analogy, for m classification networks, the correction formula is:
Figure BDA0003799436560000111
meanwhile, the loss calculation module comprises a feature difference loss module and a classification loss module. The characteristic difference loss module is used for calculating the characteristic difference loss between the input characteristics of the multi-classification network full-connection layer and the input characteristics of the two-classification network full-connection layer. In an embodiment, the characteristic difference loss may be calculated by using KL divergence as a loss function, and specifically, JS divergence, wasserstein distance, and the like may be used. And (3) reducing the dimension of the output features by adopting Principal Component Analysis (PCA), calculating the similarity of the features by adopting KL divergence, and reversely propagating the similarity as loss, so that the feature information output by the normal class samples in the multi-classification network is close to the feature information output by the normal class samples in the 2-classification network. The classification loss module is used for calculating the classification loss of the multi-classification network after feature remapping, namely the difference between a predicted value and a true value, and the following specific method can use the Focal loss. And the loss calculation module takes the sum of the classification loss and the characteristic difference loss as the training loss, and finishes the training when the training loss converges to the expected range.
In an embodiment, the image multi-classification network structure further includes a feature remapping control module for determining a normal class probability of the two-classification network
Figure BDA0003799436560000112
Whether the normal class probability of the two-class network is greater than the preset value or not
Figure BDA0003799436560000113
When the probability is larger than the preset value, the access characteristic remapping network carries out characteristic remapping, and when the probability of the normal category is larger than the preset value
Figure BDA0003799436560000114
And if the characteristic is not greater than the preset value, cutting off the characteristic remapping network and not carrying out characteristic remapping. The preset value can be flexibly set, the larger the preset value is, the larger the probability value of the normal sample of the used two-classification network is, the higher the correction precision is, but the correction frequency is reduced. In the embodiment, through comprehensive analysis, the overall correction effect is better when the preset value is 0.9.
In one embodiment, as shown in fig. 1, the multi-class network includes a backbone network and a feature transfer enhancement module connected to the backbone network, the backbone network includes a plurality of convolutional layers, full connection layers and softmax layers connected in sequence, and the feature enhancement module includes a medium-high frequency (edge of shallow layer, texture information) feature transfer enhancement module (FTSMA) and a medium-low frequency (deep semantic information) feature transfer enhancement module (FTSMB).
As shown in fig. 2, a medium-high frequency feature transfer enhancement module (FTSMA) is connected across the front convolutional layer J of the backbone network Front part Two ends of the main network comprise a 1 multiplied by 1 convolutional layer, a middle-high frequency domain channel attention module and a first cross attention module, the characteristics in the main network are divided into three paths when reaching the input end of the middle-high frequency characteristic transmission enhancement module, and the first path passes through a front end convolutional layer J Front side Go on to the frontAnd transmitting, wherein the second path sequentially passes through the 1 × 1 convolutional layer, the middle-high frequency domain channel attention module and the 1 × 1 convolutional layer to extract high-frequency information of the image and then is converged into the trunk path, and the third path sequentially passes through the 1 × 1 convolutional layer, the first cross attention module and the 1 × 1 convolutional layer to extract long-distance dependency among pixels at different positions and then is converged into the trunk path. In particular, the medium-high frequency feature transfer enhancement module may be located at a 1/3 location of the multi-class network.
As shown in FIG. 3, the middle and low frequency feature transfer enhancement module is connected across the back convolution layer J of the backbone network Rear end Two ends of the main network comprise a 1 multiplied by 1 convolutional layer, a middle and low frequency domain channel attention module and a second cross attention module, the characteristics in the main network are divided into three paths when reaching the input end of the middle and low frequency characteristic transmission enhancement module, and the first path passes through a back-end convolutional layer J Rear end And continuously transmitting the image, wherein the second path sequentially passes through the 1 × 1 convolutional layer, the middle-low frequency domain channel attention module and the 1 × 1 convolutional layer to extract low-frequency information of the image and then is converged into the trunk path, and the third path sequentially passes through the 1 × 1 convolutional layer, the second cross attention module and the 1 × 1 convolutional layer to extract long-distance dependency among pixels at different positions and then is converged into the trunk path. In particular, the medium and low frequency feature delivery enhancement module may be located at a location of the multi-class network 2/3.
In the embodiment, a feature enhancement module is added, so that the problem of loss of features in the network transmission process can be solved, firstly, a medium-high frequency feature transmission enhancement module is adopted to extract the long-distance dependency relationship between the high-frequency information of the image and pixels at different positions, and the loss of the high-frequency information (textures, edges and the like) in the network feature transmission process is reduced; and then, a long-distance dependency relationship between the low-frequency information of the image and pixels at different positions is extracted by adopting a medium-low frequency characteristic transmission enhancement module, so that the phenomenon that the network excessively focuses on the high-frequency information and loses too much low-frequency information is prevented.
In the image multi-classification network structure based on the feature remapping, the complex classification network is approached to the simple classification network by using the feature remapping method, so that the complex classification network learns the key information extracted by the simple classification network. The method adopts a double-branch structure to realize the learning of the characteristics of the multi-classification network, and uses a characteristic transfer enhancement module (FTSM) to enable the network to pay attention to high-frequency information in the forward propagation process and pay attention to the attention relationship among the defect positions in the image, thereby improving the characteristic transfer capability of the network.
Correspondingly, the application also relates to a feature remapping-based image multi-classification network training method, which can finish training by depending on the above-described feature remapping-based image multi-classification network structure or not depending on the above-described feature remapping-based image multi-classification network structure. As shown in fig. 4, the training method includes:
step S100: and simultaneously inputting the samples into a multi-classification network to be trained and a trained two-classification network, and training the multi-classification network.
Specifically, the sample is input into a trained two-classification network, and a classification result is output after the sample passes through a softmax layer of the two-classification network
Figure BDA0003799436560000131
Wherein, in the step (A),
Figure BDA0003799436560000132
Figure BDA0003799436560000133
respectively representing a defect class probability and a normal class probability. Inputting samples into a multi-classification network to train the multi-classification network, wherein the multi-classification network realizes m classifications, and outputs a classification result [ outsoft ] after passing through a softmax layer of the multi-classification network mcls0 ,outsoft mcls1 ,outsoft mcls2 ,……,outsoft mclsn-1 ,outsoft mclsn ]Wherein, outsoft mclsn Representing normal class probability, the rest representing different defect probability, n = m-1, m ≧ 3.
In one embodiment, a distillation network with the same random sampling number in a data set and the same defect-normal sample data is input into a two-class network, and the two-class network is trained by combining the trained distillation network. Specifically, the distillation network may be an EfficientNet-B6 network model, or a model having a large number of layers such as ResNet-101, resNet-152, and SE-Net.
When a multi-classification network is trained, all data are divided into a training set and a verification set according to the proportion of 7: 3, wherein the training set is used for training the network, and the verification set is used for selecting a better training result; after the training data is set, the image to be trained is input into the two branches of the model. Alternatively, the ratio of training set to validation set may be divided into 6: 4, 7: 3, 8: 2, 9: 1, etc., depending on the amount of total data.
In one embodiment, before a network is trained by using image data in a sample, data enhancement is realized by preprocessing the image data, so that the anti-interference capability and the generalization capability of the network are improved. Optionally, the data enhancement mode includes geometric transformation, such as flipping, rotating, clipping, scaling, translating, and dithering; pixel transformation, such as salt and pepper noise and Gaussian noise, is performed to perform Gaussian blur, adjust HSV contrast, adjust brightness and saturation, equalize histograms, adjust white balance and the like.
Step S200: and correcting the classification result output by the multi-classification network by using the classification result output by the two-classification network, so that the normal class probability output by the multi-classification network during training approaches to the normal class probability output by the two-classification network, and the feature remapping is realized.
In a specific embodiment, step S200 includes:
step S210: and acquiring the output result of the multi-classification network full-connection layer and the output result of the two-classification network full-connection layer, splicing, and outputting a weight parameter k through the full-connection layer.
Wherein, the output result of the multi-classification network full-connection layer can be expressed as [ out mcls0 ,out mcls1 ,out mcls2 ,……,out mclsn-1 ,out mclsn ]The output result of the full-connection layer of the two-classification network can be expressed as
Figure BDA0003799436560000143
Step S220: normal class probability out is divided by weight parameter k mclsn Corrected to out mclsn ′。
Wherein, the correction formula is:
Figure BDA0003799436560000141
the derivation of the correction formula is described above and will not be described herein.
And through the characteristic remapping process, when the multi-classification network is trained, the logic value output by the normal class sample in the multi-classification network is supervised by using the logic value output by the normal class sample in the trained 2-classification network, and the logic value output by the normal class sample in the multi-classification network is updated according to a trainable weight coefficient in the supervision process.
In an embodiment, before performing step S200, the method further includes:
judging normal class probability
Figure BDA0003799436560000142
Whether the class is greater than a preset value or not, when the class is normal
Figure BDA0003799436560000151
When the probability is larger than the preset value, the characteristic remapping is carried out, and when the probability of the normal category is larger than the preset value
Figure BDA0003799436560000152
And when the value is not greater than the preset value, the characteristic remapping is not carried out. Specifically, the preset value may be 0.9.
Step S300: calculating training loss and reversely adjusting parameters of the multi-classification network, continuing training until the loss is converged to an expected range, and ending the training.
Specifically, the training loss is calculated, including:
and calculating the characteristic difference loss between the input characteristics of the multi-classification network full-connection layer and the input characteristics of the two classification network full-connection layers and the classification loss of the multi-classification network after characteristic remapping, and finishing training after the sum of the classification loss and the characteristic difference loss converges to an expected range. The selection of the specific loss function can be referred to the above description, and will not be described herein.
According to the image multi-classification network training method based on the feature remapping, during the training of the multi-classification network, the normal sample classification result of the multi-classification network is corrected through the trained two classification networks, the recognition accuracy of the multi-classification network on the normal classification sample can be improved, and particularly under the condition that the number of the normal samples is small, the multi-classification network with high recognition accuracy on the normal classification sample can be obtained.
Correspondingly, the application also relates to an image multi-classification method, which is used for inputting the images into a trained multi-classification network to obtain a classification result, wherein the multi-classification network is obtained by training according to the training method. Specifically, the input image may be an image of the drainage pipeline, and the drainage pipeline defect is classified through the multi-classification network. It can be understood that the method is not limited to classification of the drainage pipeline defects, and can also be used for classification of defects in other occasions.
In the following, 17 kinds of pipeline defect images are taken as samples, wherein the original size of the acquired image is 480 × 270 pixels, and 25939 images are calculated in total, and a multi-classification network model trained by different methods is input, and the following results are compared to show the effectiveness of the scheme. Wherein, baseline represents a multi-classification network test result without feature remapping, baseline + FL represents a test result using Focal Loss (FL) as a loss function, baseline + FL + FRM represents a multi-classification network with Feature Remapping (FRM), and Baseline + FL + FRM + FTEM represents a multi-classification network test result with feature remapping and an FTEM module.
The comparison results of the network model based on MobileNet-V3 are shown in the following table:
table one: comparison result when using MobileNet-V3 as basic network model
Method Normal class sample accuracy rate Average recall rate
Baseline 40.31% 99.70%
Baseline+FL 45.73% 99.33%
Baseline+FL+FRM 46.51% 99.36%
Baseline+FL+FRM+FTEM 49.61% 99.27%
The comparison results when using ShuffLeNet-V2 as the basic network model are shown in the following table two:
table two: comparison result when using ShuffleNet-V2 as basic network model
Method Normal class sample accuracy rate Average recall rate
Baseline 32.56% 99.67%
Baseline+FL 51.16% 98.33%
Baseline+FL+FRM 55.03% 96.63%
Baseline+FL+FRM+FTEM 60.47% 94.79%
The comparison results when the EfficientNet-B0 is used as a basic network model are shown in the third table:
a third table: comparison result when EfficientNet-B0 is used as basic network model
Method Normal class sample accuracy rate Average recall rate
Baseline 33.33% 99.90%
Baseline+FL 42.64% 99.67%
Baseline+FL+FRM 66.67% 97.06%
Baseline+FL+FRM+FTEM 69.77% 92.57%
The results show that the accuracy of the multi-classification network on normal classification samples can be improved by training the multi-classification network through a feature remapping method. Moreover, after the FTEM module is added to the multi-classification network, the effect can be further improved, and therefore the effectiveness of the scheme is verified.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. An image multi-classification network structure based on feature remapping, comprising:
the multi-classification network is used for realizing m kinds of classification after the samples input into the multi-classification network sequentially pass through the convolutional layer, the full connection layer and the softmax layer and outputting a classification result [ outsoft mcls0 ,outsoft mcls1 ,outsoft mcls2 ,……,outsoft mclsn-1 ,outsoft mclsn ]Wherein, outsoft mclsn Representing normal class probability, the rest representing different defect probabilityThe ratio n = m-1, m is more than or equal to 3;
two classification networks, wherein the samples input into the two classification networks sequentially pass through the convolutional layer, the full connection layer and the softmax layer to realize 2 kinds of classification, and classification results are output
Figure FDA0003799436550000011
Wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003799436550000012
respectively representing defect class probability and normal class probability;
the characteristic remapping network is used for correcting the classification result output by the multi-classification network according to the classification result output by the two classification networks so as to enable the multi-classification network to output a normal class probability outsoft during training mclsn Approaching the normal class probability of the two-class network output
Figure FDA0003799436550000013
Implementing feature remapping;
and the loss calculation module is used for calculating the training loss and reversely adjusting the parameters of the multi-classification network so as to converge the loss.
2. The feature remapping based image multi-classification network structure as recited in claim 1, wherein said two classification networks and said multi-classification network share a sample input.
3. The feature remapping based image multi-classification network structure of claim 1, wherein said feature remapping network includes:
a weight parameter adjusting module for obtaining the output result [ out ] of the multi-classification network full-connection layer mcls0 ,out mcls1 ,out mcls2 ,……,out mclsn-1 ,out mclsn ]Output result of full connection layer of two-classification network
Figure FDA0003799436550000014
After splicing, outputting a weight parameter k through a full connection layer;
a normal class probability correction module for correcting the normal class probability out by using the weight parameter k mclsn Corrected to out mclsn ', wherein,
Figure FDA0003799436550000015
the loss calculation module comprises a feature difference loss module and a classification loss module, the feature difference loss module is used for calculating feature difference loss between input features of a multi-classification network full-connection layer and input features of a two-classification network full-connection layer, the classification loss module is used for calculating classification loss of the multi-classification network after feature remapping, and the loss calculation module takes the sum of the classification loss and the feature difference loss as training loss.
4. The feature remapping based image multi-classification network structure of claim 1, said multi-classification network including a backbone network and a feature transfer enhancing module connected to said backbone network, said backbone network including a plurality of convolutional layers, fully-connected layers and softmax layers connected in sequence, said feature enhancing module including:
a middle-high frequency characteristic transfer enhancing module for extracting shallow layer edge and texture information, a front-end convolution layer J bridged on the trunk network Front part Two ends of the main network comprise a 1 multiplied by 1 convolutional layer, a middle-high frequency domain channel attention module and a first cross attention module, the characteristics in the main network are divided into three paths when reaching the input end of the middle-high frequency characteristic transmission enhancement module, and the first path passes through a front end convolutional layer J Front side The second path is sequentially passed through a 1 × 1 convolutional layer, a middle-high frequency domain channel attention module and the 1 × 1 convolutional layer to extract high-frequency information of the image and then is converged into the trunk path, and the third path is sequentially passed through the 1 × 1 convolutional layer, a first cross attention module and the 1 × 1 convolutional layer to extract long-distance dependency among pixels at different positions and then is converged into the trunk path;
a middle and low frequency feature transfer enhancement module for extracting deep semantic information and bridged on the back end convolution layer J of the backbone network Rear end Two ends of the main network comprise a 1 multiplied by 1 convolutional layer, a middle and low frequency domain channel attention module and a second cross attention module, the characteristics in the main network are divided into three paths when reaching the input end of the middle and low frequency characteristic transmission enhancement module, and the first path passes through a back-end convolutional layer J Rear end And continuously transmitting the image, wherein the second path sequentially passes through the 1 × 1 convolutional layer, the middle-low frequency domain channel attention module and the 1 × 1 convolutional layer to extract low-frequency information of the image and then is converged into the trunk path, and the third path sequentially passes through the 1 × 1 convolutional layer, the second cross attention module and the 1 × 1 convolutional layer to extract long-distance dependency among pixels at different positions and then is converged into the trunk path.
5. The feature remapping-based image multi-classification network structure of claim 1, further comprising a feature remapping control module for determining normal class probabilities
Figure FDA0003799436550000021
Whether the class is greater than a preset value or not, when the class is normal
Figure FDA0003799436550000031
When the probability is larger than the preset value, the access characteristic remapping network carries out characteristic remapping, and when the probability of the normal category is larger than the preset value
Figure FDA0003799436550000032
And when the characteristic is not greater than the preset value, the characteristic remapping network is cut off and the characteristic remapping is not carried out.
6. An image multi-classification network training method based on feature remapping is characterized by comprising the following steps:
inputting samples into a multi-classification network to train the multi-classification network, wherein the multi-classification network realizes m classifications, and outputs a classification result [ outsoft ] after passing through a softmax layer of the multi-classification network mcls0 ,outsoft mcls1 ,outsoft mcls2 ,……,outsoft mclsn-1 ,outsoft mclsn ]Wherein, outsoft mclsn Representing normal class probability, and the rest representing different defect probabilities respectively, wherein n = m-1, and m is more than or equal to 3;
inputting the same sample into the trained two-classification network, and outputting a classification result after passing through a softmax layer of the two-classification network
Figure FDA0003799436550000033
Wherein the content of the first and second substances,
Figure FDA0003799436550000034
respectively representing defect class probability and normal class probability;
correcting the classification result output by the multi-classification network by using the classification result output by the two-classification network, so that the normal class probability outsoft output by the multi-classification network during training mclsn Approaching normal class probability of the two-class network output
Figure FDA0003799436550000035
Implementing feature remapping;
calculating training loss and reversely adjusting parameters of the multi-classification network, continuing training until the loss is converged to an expected range, and ending the training.
7. The method as claimed in claim 6, wherein the modifying the classification result output by the multi-classification network with the classification result output by the two-classification network comprises:
obtaining output result of multi-classification network full connection layer mcls0 ,out mcls1 ,out mcls2 ,……,out mclsn-1 ,out mclsn ]Output result of full connection layer of two-classification network
Figure FDA0003799436550000036
Outputting a weight parameter k through a full connection layer after splicing;
normal class probability out is divided by weight parameter k mclsn Corrected to out mclsn ', wherein,
Figure FDA0003799436550000041
calculating a training loss comprising:
and calculating the characteristic difference loss between the input characteristics of the full connection layer of the multi-classification network and the input characteristics of the full connection layer of the two classification networks, and the classification loss of the multi-classification network after characteristic remapping, and ending the training after the sum of the classification loss and the characteristic difference loss is converged to an expected range.
8. The method of claim 6, wherein before the classification result outputted by the multi-classification network is modified by the classification result outputted by the two-classification network, the normal class probability is determined
Figure FDA0003799436550000042
Whether it is greater than the preset value, when the normal classification probability
Figure FDA0003799436550000043
When the probability is larger than the preset value, the characteristic remapping is carried out, and when the probability of the normal category is larger than the preset value
Figure FDA0003799436550000044
And when the value is not greater than the preset value, the characteristic remapping is not carried out.
9. The method according to claim 8, wherein the predetermined value is 0.9.
10. An image multi-classification method, comprising:
inputting the image into a trained multi-classification network to obtain a classification result, wherein the multi-classification network is obtained by training according to the feature remapping-based image multi-classification network training method of any one of claims 6 to 9.
CN202210978719.7A 2022-08-16 2022-08-16 Image multi-classification network structure based on feature remapping and training method Pending CN115439681A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210978719.7A CN115439681A (en) 2022-08-16 2022-08-16 Image multi-classification network structure based on feature remapping and training method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210978719.7A CN115439681A (en) 2022-08-16 2022-08-16 Image multi-classification network structure based on feature remapping and training method

Publications (1)

Publication Number Publication Date
CN115439681A true CN115439681A (en) 2022-12-06

Family

ID=84242792

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210978719.7A Pending CN115439681A (en) 2022-08-16 2022-08-16 Image multi-classification network structure based on feature remapping and training method

Country Status (1)

Country Link
CN (1) CN115439681A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117316286A (en) * 2023-11-29 2023-12-29 广州燃石医学检验所有限公司 Data processing method, device and storage medium for tumor tracing

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117316286A (en) * 2023-11-29 2023-12-29 广州燃石医学检验所有限公司 Data processing method, device and storage medium for tumor tracing
CN117316286B (en) * 2023-11-29 2024-02-27 广州燃石医学检验所有限公司 Data processing method, device and storage medium for tumor tracing

Similar Documents

Publication Publication Date Title
CN109919108B (en) Remote sensing image rapid target detection method based on deep hash auxiliary network
CN111325751B (en) CT image segmentation system based on attention convolution neural network
CN108509978B (en) Multi-class target detection method and model based on CNN (CNN) multi-level feature fusion
CN113313657B (en) Unsupervised learning method and system for low-illumination image enhancement
CN111126386B (en) Sequence domain adaptation method based on countermeasure learning in scene text recognition
CN110048827B (en) Class template attack method based on deep learning convolutional neural network
CN110109060B (en) Radar radiation source signal sorting method and system based on deep learning network
CN110532932B (en) Method for identifying multi-component radar signal intra-pulse modulation mode
Liu et al. Remote sensing image change detection based on information transmission and attention mechanism
CN108694346B (en) Ship radiation noise signal identification method based on two-stage CNN
CN111862093A (en) Corrosion grade information processing method and system based on image recognition
CN112396110A (en) Method for generating anti-cascade network augmented image
CN112434732A (en) Deep learning classification method based on feature screening
CN113408605A (en) Hyperspectral image semi-supervised classification method based on small sample learning
CN110245620B (en) Non-maximization inhibition method based on attention
CN111932577B (en) Text detection method, electronic device and computer readable medium
CN113657491A (en) Neural network design method for signal modulation type recognition
CN112651917A (en) Space satellite low-illumination image enhancement method based on generation countermeasure network
CN111767860A (en) Method and terminal for realizing image recognition through convolutional neural network
CN112580661A (en) Multi-scale edge detection method under deep supervision
CN115439681A (en) Image multi-classification network structure based on feature remapping and training method
CN115564983A (en) Target detection method and device, electronic equipment, storage medium and application thereof
CN115272776A (en) Hyperspectral image classification method based on double-path convolution and double attention and storage medium
CN115421141A (en) Two-stage self-adaptive sampling unbalanced SAR target identification method
CN113989256A (en) Detection model optimization method, detection method and detection device for remote sensing image building

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination