CN115439681A - Image multi-classification network structure based on feature remapping and training method - Google Patents
Image multi-classification network structure based on feature remapping and training method Download PDFInfo
- Publication number
- CN115439681A CN115439681A CN202210978719.7A CN202210978719A CN115439681A CN 115439681 A CN115439681 A CN 115439681A CN 202210978719 A CN202210978719 A CN 202210978719A CN 115439681 A CN115439681 A CN 115439681A
- Authority
- CN
- China
- Prior art keywords
- classification
- network
- classification network
- remapping
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/7715—Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses an image multi-classification network structure based on feature remapping and a training method. The normal sample classification result of the multi-classification network is corrected through the trained two-classification network, so that the recognition accuracy of the multi-classification network on the normal sample classification can be improved, and particularly, the multi-classification network with higher recognition accuracy on the normal sample classification can be obtained under the condition that the normal samples are fewer.
Description
Technical Field
The invention belongs to the technical field of computer vision deep learning, and particularly relates to an image multi-classification network structure based on feature remapping and a training method.
Background
In recent years, with the increasing construction of cities, the task of detecting underground drainage pipelines is becoming burdensome. Traditional underground drainline inspection requires a large amount of manpower to complete, which presents not only time challenges but also efficiency challenges. The method for detecting the drainage pipeline through deep learning improves efficiency and saves labor cost.
In general drainage pipeline detection, people mainly pay attention to the accuracy of defect type samples, but ignore the accuracy of normal type samples, so that a large number of normal samples are mistakenly detected as defect samples. The reason for this is that the industry has paid excessive attention to the defect type when collecting the drainage channel data, so that the defect sample is more during training and the normal type sample is not collected enough, and the data collection is not completed all the time, so the problem can be considered to be solved from the aspect of deep learning network.
Since the 2-classification network has simple tasks, only two categories of normal and defect need to be judged, when the same complex network is used, the simpler the task is, the stronger the information extraction capability of the network is, and the more accurate the result is. Therefore, in the conventional technology, in order to improve the accuracy, two networks are usually trained separately, a 2-class network is trained first, a defect sample is extracted, and then the sample determined as a defect by the 2-class network is input into a multi-class network, and is specifically predicted to be a specific defect category. Although the accuracy of the normal class sample can be improved by using the 2-class network, the accuracy of the 2-class network cannot be guaranteed to reach 100%, so that a large accumulated error is generated by successively passing through the two class networks.
Disclosure of Invention
In view of the above defects or improvement requirements of the prior art, the present invention provides an image multi-classification network structure based on feature remapping and a training method, and aims to modify a multi-classification network by using a two-classification network when training the multi-classification network, thereby improving the accuracy of the multi-classification network in recognizing normal samples.
To achieve the above object, according to an aspect of the present invention, there is provided an image multi-classification network structure based on feature remapping, including:
the multi-classification network is used for realizing m kinds of classification after the samples input into the multi-classification network sequentially pass through the convolutional layer, the full connection layer and the softmax layer and outputting a classification result [ outsoft mcls0 ,outsoft mcls1 ,outsoft mcls2 ,……,outsoft mclsn-1 ,outsoft mclsn ]Wherein, outsoft mclsn Representing normal class probability, and the rest representing different defect probabilities respectively, wherein n = m-1, and m is more than or equal to 3;
the two-classification network is used for inputting samples of the two-classification network to realize 2 kinds of classification after sequentially passing through the convolutional layer, the full connection layer and the softmax layer, and outputting a classification resultWherein, the first and the second end of the pipe are connected with each other,respectively representing defect class probability and normal class probability;
the characteristic remapping network is used for correcting the classification result output by the multi-classification network according to the classification result output by the two classification networks so as to enable the multi-classification network to output a normal class probability outsoft during training mclsn Approaching normal class probability of the two-class network outputImplementing feature remapping;
and the loss calculation module is used for calculating the training loss and reversely adjusting the parameters of the multi-classification network so as to converge the loss.
In one embodiment, the two-class network and the multi-class network share a sample input.
In one embodiment, the feature remapping network comprises:
a weight parameter adjusting module for obtaining the output result [ out ] of the multi-classification network full-connection layer mcls0 ,out mcls1 ,out mcls2 ,……,out mclsn-1 ,out mclsn ]Output result of full connection layer of two-classification networkAfter splicing, outputting a weight parameter k through a full connection layer;
a normal class probability correction module for correcting the normal class probability out by using the weight parameter k mclsn Corrected to out mclsn ', wherein,
the loss calculation module comprises a feature difference loss module and a classification loss module, the feature difference loss module is used for calculating feature difference loss between input features of a multi-classification network full-connection layer and input features of a two-classification network full-connection layer, the classification loss module is used for calculating classification loss of the multi-classification network after feature remapping, and the loss calculation module takes the sum of the classification loss and the feature difference loss as training loss.
In one embodiment, the multi-class network includes a backbone network and a feature delivery enhancement module connected to the backbone network, the backbone network includes a plurality of convolutional layers, full-link layers, and softmax layers connected in sequence, and the feature enhancement module includes:
the middle-high frequency feature transfer enhancement module is used for extracting edge and texture information of a shallow layer, is bridged at the front two ends of a front end convolution layer J of a trunk network and comprises a 1 x 1 convolution layer, a middle-high frequency domain channel attention module and a first cross attention module, the features in the trunk network are divided into three paths when reaching the input end of the middle-high frequency feature transfer enhancement module, the first path is continuously transmitted forwards before passing through the front end convolution layer J, the second path is converged into a trunk path after sequentially passing through the 1 x 1 convolution layer, the middle-high frequency domain channel attention module and the 1 x 1 convolution layer to extract high-frequency information of an image, and the third path is converged into the trunk path after sequentially passing through the 1 x 1 convolution layer, the first cross attention module and the 1 x 1 convolution layer to extract long-distance dependency relations among pixels at different positions;
the middle-low frequency feature transmission enhancing module is used for extracting deep semantic information, is bridged at the two ends of the rear end convolution layer J of a trunk network and comprises a 1 x 1 convolution layer, a middle-low frequency domain channel attention module and a second cross attention module, the features in the trunk network are divided into three paths when reaching the input end of the middle-low frequency feature transmission enhancing module, the first path is continuously transmitted forward after passing through the rear end convolution layer J, the second path sequentially passes through the 1 x 1 convolution layer, the middle-low frequency domain channel attention module and the 1 x 1 convolution layer to extract the low frequency information of an image and then is merged into a main trunk line, and the third path sequentially passes through the 1 x 1 convolution layer, the second cross attention module and the 1 x 1 convolution layer to extract the long-distance dependency relationship among different position pixels and then is merged into the main trunk line.
In one embodiment, the system further comprises a feature remapping control module for judging the normal class probabilityWhether it is greater than the preset value, when the normal classification probabilityWhen the probability is larger than the preset value, the characteristic remapping network is accessed to carry out characteristic remapping, and when the probability of the normal category is larger than the preset valueAnd when the characteristic is not greater than the preset value, the characteristic remapping network is cut off and the characteristic remapping is not carried out.
According to another aspect of the present invention, there is provided a method for training an image multi-classification network based on feature remapping, comprising:
will sampleInputting the data into a multi-classification network to train the multi-classification network, wherein the multi-classification network realizes m classifications, and outputs a classification result [ outsoft ] after passing through a softmax layer of the multi-classification network mcls0 ,outsoft mcls1 ,outsoft mcls2 ,……,outsoft mclsn-1 ,outsoft mclsn ]Wherein, outsoft mclsn Representing normal class probability, and the rest representing different defect probabilities respectively, wherein n = m-1, and m is more than or equal to 3;
inputting the same sample into the trained two-classification network, and outputting a classification result after passing through a softmax layer of the two-classification networkWherein the content of the first and second substances,respectively representing defect class probability and normal class probability;
correcting the classification result output by the multi-classification network by using the classification result output by the two-classification network, so that the normal class probability outsoft output by the multi-classification network during training mclsn Approaching the normal class probability of the two-class network outputImplementing feature remapping;
calculating training loss and reversely adjusting parameters of the multi-classification network, continuing training until the loss is converged to an expected range, and ending the training.
In one embodiment, the modifying the classification result output by the multi-classification network by using the classification result output by the two-classification network comprises:
obtaining output result of multi-classification network full connection layer mcls0 ,out mcls1 ,out mcls2 ,……,out mclsn-1 ,out mclsn ]And output result of full connection layer of two-classification networkCarry out the splicingOutputting a weight parameter k through a full connection layer;
normal class probability out is divided by weight parameter k mclsn Corrected to out mclsn ', wherein,
calculating a training loss comprising:
and calculating the characteristic difference loss between the input characteristics of the multi-classification network full-connection layer and the input characteristics of the two classification network full-connection layers and the classification loss of the multi-classification network after characteristic remapping, and finishing training after the sum of the classification loss and the characteristic difference loss converges to an expected range.
In one embodiment, before the classification result output by the multi-classification network is corrected by using the classification result output by the two-classification network, the probability of normal classification is judgedWhether it is greater than the preset value, when the normal classification probabilityWhen the probability is larger than the preset value, the characteristic remapping is carried out, and when the probability of the normal category is larger than the preset valueAnd when the value is not greater than the preset value, the characteristic remapping is not carried out.
In one embodiment, the preset value is 0.9.
According to still another aspect of the present invention, there is provided an image multi-classification method including:
and inputting the images into a trained multi-classification network to obtain a classification result, wherein the multi-classification network is obtained by training according to the image multi-classification network training method based on the characteristic remapping.
In general, compared with the prior art, the above technical solutions conceived by the present invention can achieve the following beneficial effects:
in the invention, during the training of the multi-classification network, the trained two-classification network is used, and the features output by the normal class samples in the training process in the multi-classification network are approximated to the features of the normal class samples output by the 2-classification network through a feature remapping method. The task of the two-classification network is relatively simple, the simpler the task is, the stronger the information extraction capability of the network is, the more accurate the result is, and the recognition precision of the two-classification network on a normal sample is higher than that of a multi-classification network under the same sample. Therefore, in the application, during the training of the multi-classification network, the normal sample classification result of the multi-classification network is corrected through the trained two classification networks, so that the recognition accuracy of the multi-classification network on the normal class sample can be improved, and particularly under the condition of fewer normal samples, the multi-classification network with higher recognition accuracy on the normal class sample can be obtained.
Drawings
FIG. 1 is a schematic structural diagram of an image multi-classification network structure based on feature remapping according to an embodiment;
FIG. 2 is a block diagram of an embodiment of a medium-high frequency feature transfer enhancement module;
FIG. 3 is a block diagram of a low and medium frequency feature transfer enhancement module according to an embodiment;
FIG. 4 is a flowchart illustrating steps of a method for training an image multi-classification network based on feature remapping according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Fig. 1 is a schematic structural diagram of an image multi-classification network structure based on feature remapping in an embodiment. The image multi-classification network structure comprises four functional structures which are a multi-classification network, a two-classification network, a feature remapping network and a loss calculation module respectively.
The multi-classification network comprises a convolution layer, a full connection layer and a softmax layer which are connected in sequence. The convolution layer is used for extracting image features, the full-connection layer is used for calculating the logit values (probability values which are not normalized) of all the classifications, and the softmax layer is used for performing normalization processing on all the classified logit values to obtain the final classification result, namely the probability of each classification. Specifically, the multi-classification network realizes m classification, m is more than or equal to 3, and a classification result is output [ outoft ] mcls0 ,outsoft mcls1 ,outsoft mcls2 ,……,outsoft mclsn-1 ,outsoft mclsn ]Wherein, outsoft mclsi Indicates the probability of the (i + 1) th class, in particular, outsoft mclsn Representing normal class probabilities and the remainder representing different defect probabilities, e.g. outsoft mcls0 Indicates the probability of type 1 defects, outsoft mcls1 Probability of class 2 defect, and so on, outsoft mclsn-1 Indicating the probability of the nth type of defect, n = m-1.
The two-classification network comprises a convolution layer, a full connection layer and a softmax layer which are connected in sequence. Wherein, the functions of the convolution layer, the full connection layer and the softmax layer are the same as those of the multi-classification network. Specifically, the multi-classification network realizes 2 kinds of classification and outputs the classification resultWherein the content of the first and second substances,and the two classification networks only distinguish the defects from the normality.
A feature remapping network for modifying the classification result output by the multi-classification network according to the classification result output by the two-classification network, so that the normal class probability outsoft output by the multi-classification network during training mclsn Approaching the normal class probability of the two-class network outputFeature remapping is implemented. It can be understood that, in particular, the multi-classification network parameters are modified according to the classification result output by the two-classification network, so that the normal class probability outsoft output by the multi-classification network is obtained mclsn Approaching normal class probability of the two-class network outputIt should be noted that the present application is not limited to the specific way of modifying the multi-class network parameters according to the classification result outputted by the two-class network, as long as the normal class probability outsoft outputted by the multi-class network can be achieved during training mclsn Approaching the normal class probability of the two-class network outputAnd (4) finishing.
And the loss calculation module is used for calculating the training loss and reversely adjusting the parameters of the multi-classification network so as to make the loss converge, and continuously training until the loss converges to an expected range, and then finishing the training.
The image multi-classification network structure based on the feature remapping can train two classification networks to obtain the trained two classification networks, then simultaneously input the samples into the two classification networks and the multi-classification networks during the training of the multi-classification networks, and enable the features output by normal classification samples in the training process in the multi-classification networks to approach the features of normal classification samples output by the 2 classification networks through the feature remapping method, so that the recognition accuracy of the multi-classification networks on the normal classification samples is improved, and particularly under the condition of less normal samples, the multi-classification networks with higher recognition accuracy on the normal classification samples can be obtained.
In an embodiment, the two-class network and the multi-class network use the same network model, which may be specifically a MobileNet-V3 network model, or may be a network model such as ShuffleNet, squeezeNet, xception, or the like. The same network model is beneficial to feature remapping, so that the features output by the normal category samples in the multi-classification network better approach the features of the normal category samples output by the 2-classification network.
In one embodiment, the two-class network and the multi-class network share a sample input end, when the multi-class network is trained, the structural parameters of the 2-class network need to be frozen, and then the images input into the multi-class network are simultaneously input into the 2-class network through the same input end to predict normal class information.
In one embodiment, with continued reference to FIG. 1, the feature remapping network includes a weight parameter adjustment module and a normal class probability correction module.
Wherein, the weight parameter adjusting module is used for obtaining the output result [ out ] of the multi-classification network full connection layer mcls0 ,out mcls1 ,out mcls2 ,……,out mclsn-1 ,out mclsn ]And output result of full connection layer of two-classification networkAnd outputting a weight parameter k through a full connection layer after splicing. out mclsi And outsoft mclsi Corresponding to out mclsi After the normalization processing of softmax, the product is outsoft mclsi 。
Specifically, the weight parameter adjusting module may include a feature concatenation layer (concat) and a plurality of full connection layers connected in sequence, the feature concatenation layer being connected to an output of the full connection layer of the multi-classification network and an output of the full connection layer of the binary network, respectively, to obtain the information of the multi-classification network and the full connection layer of the binary network, respectively Andsplicing is carried out to obtain splicing characteristics And then outputting a 1-dimensional characteristic value which is a weight parameter k after passing through a plurality of full connection layers. In fig. 1, specifically, for example, m =17, a 19-dimensional feature is obtained after feature splicing, the 19-dimensional feature passes through a first full-connected layer, an output dimension is 17, and passes through a second full-connected layer, and a weight parameter k with a dimension of 1 is output.
The normal class probability modification module is used for utilizing a weight parameter k to modify the normal class probability out mclsn Corrected to out mclsn '. Specifically, the correction can be performed by the following correction formula:
in this embodiment, the weight parameter k is used to control the mapping ratio, and is used when k is 0Is not mapped.
The above correction formula is derived as follows:
when m =3, let the output of the 2-class network after softmax be [ ab](where A represents the probability of defect class and B represents the probability of normal class), the logic value (output result of the full link layer) output by the 3-class network is [ x ] 1 ,x 2 ,x 3 ](x 1 And x 2 Logic values, x, representing two different classes of defects 3 Logic value representing normal category), and after softmax, [ X, Y, Z]. The most direct way to achieve the feature approximation is to approximate the probabilities of the corresponding classes, and since softmax (x) is a monotonically increasing function, the change of the probabilities can be further reflected on the logic value of the network output.
Let the probability of the 3 classified network output after the feature remapping is completed be [ X, Y, Z'],lThe value of ogit is [ x ] 1 ,x 2 ,x′ 3 ]。
The first step is as follows: the normal class probability in the 3-class network is close to the normal class probability in the 2-class network:
for convenience of calculation, when x 3 When > 0, map it to value x' 3 The value range is specified to be (0, infinity); when x is 3 If < 0, map it to value x' 3 The value range is specified to be (-infinity, 0); when x is 3 And 0, its value is not changed. The characteristic of monotonic increase by softmax (x) is given by:
synthesizing the formulas (1) - (3) to obtain x 'after mapping' 3 :
Unifying the formula (4) and the formula (5) and preventing the 3-class network from losing and mutating due to the overlarge difference of the output characteristics of the normal samples in the 3-class network and the 2-class network, finally adding momentum operation into the formulas (4) and (5), so that the addition approximate solution can be obtained:
to be provided withNamely, it isFor example, (1) if x 3 X 'when k is greater than 0' 3 >x 3 In this case, Z can be approached to Z'; (2) If x 3 X 'when k is less than 0' 3 >x 3 In this case, Z can be approached to Z'; in thatIn addition to the above two cases, the other cases may cause the feature output by the 3-class network to be far from the feature output by the 2-class network, but based on the network connection relationship in the present embodiment, the weight coefficient k may be adjusted by training loss feedback, that is, the degree of approximation may be changed by training the weight coefficient, and finally the magnitude of the weight coefficient is controlled by back propagation of the loss. At the moment, the multi-classification network not only has the capability of judging whether the sample belongs to the normal class according to the network parameter of the multi-classification network, but also has the capability of learning and judging the normal class sample from the 2-classification network, and finally the Z direction is approached.
In fig. 1, m =17 is specifically taken as an example. 2 output of the classification network softmax isThe logic value of the 17 classification network output is out _ logic: the result after softmax is outsoft _17cls: the mapping method comprises the following steps:
analogy between equation (7) and equations (1) - (6) yields equation (8):
whereinTo use the pair of formula (8)The updated value of logit for the normal class sample output (at this time softmax has not elapsed).
By analogy, for m classification networks, the correction formula is:
meanwhile, the loss calculation module comprises a feature difference loss module and a classification loss module. The characteristic difference loss module is used for calculating the characteristic difference loss between the input characteristics of the multi-classification network full-connection layer and the input characteristics of the two-classification network full-connection layer. In an embodiment, the characteristic difference loss may be calculated by using KL divergence as a loss function, and specifically, JS divergence, wasserstein distance, and the like may be used. And (3) reducing the dimension of the output features by adopting Principal Component Analysis (PCA), calculating the similarity of the features by adopting KL divergence, and reversely propagating the similarity as loss, so that the feature information output by the normal class samples in the multi-classification network is close to the feature information output by the normal class samples in the 2-classification network. The classification loss module is used for calculating the classification loss of the multi-classification network after feature remapping, namely the difference between a predicted value and a true value, and the following specific method can use the Focal loss. And the loss calculation module takes the sum of the classification loss and the characteristic difference loss as the training loss, and finishes the training when the training loss converges to the expected range.
In an embodiment, the image multi-classification network structure further includes a feature remapping control module for determining a normal class probability of the two-classification networkWhether the normal class probability of the two-class network is greater than the preset value or notWhen the probability is larger than the preset value, the access characteristic remapping network carries out characteristic remapping, and when the probability of the normal category is larger than the preset valueAnd if the characteristic is not greater than the preset value, cutting off the characteristic remapping network and not carrying out characteristic remapping. The preset value can be flexibly set, the larger the preset value is, the larger the probability value of the normal sample of the used two-classification network is, the higher the correction precision is, but the correction frequency is reduced. In the embodiment, through comprehensive analysis, the overall correction effect is better when the preset value is 0.9.
In one embodiment, as shown in fig. 1, the multi-class network includes a backbone network and a feature transfer enhancement module connected to the backbone network, the backbone network includes a plurality of convolutional layers, full connection layers and softmax layers connected in sequence, and the feature enhancement module includes a medium-high frequency (edge of shallow layer, texture information) feature transfer enhancement module (FTSMA) and a medium-low frequency (deep semantic information) feature transfer enhancement module (FTSMB).
As shown in fig. 2, a medium-high frequency feature transfer enhancement module (FTSMA) is connected across the front convolutional layer J of the backbone network Front part Two ends of the main network comprise a 1 multiplied by 1 convolutional layer, a middle-high frequency domain channel attention module and a first cross attention module, the characteristics in the main network are divided into three paths when reaching the input end of the middle-high frequency characteristic transmission enhancement module, and the first path passes through a front end convolutional layer J Front side Go on to the frontAnd transmitting, wherein the second path sequentially passes through the 1 × 1 convolutional layer, the middle-high frequency domain channel attention module and the 1 × 1 convolutional layer to extract high-frequency information of the image and then is converged into the trunk path, and the third path sequentially passes through the 1 × 1 convolutional layer, the first cross attention module and the 1 × 1 convolutional layer to extract long-distance dependency among pixels at different positions and then is converged into the trunk path. In particular, the medium-high frequency feature transfer enhancement module may be located at a 1/3 location of the multi-class network.
As shown in FIG. 3, the middle and low frequency feature transfer enhancement module is connected across the back convolution layer J of the backbone network Rear end Two ends of the main network comprise a 1 multiplied by 1 convolutional layer, a middle and low frequency domain channel attention module and a second cross attention module, the characteristics in the main network are divided into three paths when reaching the input end of the middle and low frequency characteristic transmission enhancement module, and the first path passes through a back-end convolutional layer J Rear end And continuously transmitting the image, wherein the second path sequentially passes through the 1 × 1 convolutional layer, the middle-low frequency domain channel attention module and the 1 × 1 convolutional layer to extract low-frequency information of the image and then is converged into the trunk path, and the third path sequentially passes through the 1 × 1 convolutional layer, the second cross attention module and the 1 × 1 convolutional layer to extract long-distance dependency among pixels at different positions and then is converged into the trunk path. In particular, the medium and low frequency feature delivery enhancement module may be located at a location of the multi-class network 2/3.
In the embodiment, a feature enhancement module is added, so that the problem of loss of features in the network transmission process can be solved, firstly, a medium-high frequency feature transmission enhancement module is adopted to extract the long-distance dependency relationship between the high-frequency information of the image and pixels at different positions, and the loss of the high-frequency information (textures, edges and the like) in the network feature transmission process is reduced; and then, a long-distance dependency relationship between the low-frequency information of the image and pixels at different positions is extracted by adopting a medium-low frequency characteristic transmission enhancement module, so that the phenomenon that the network excessively focuses on the high-frequency information and loses too much low-frequency information is prevented.
In the image multi-classification network structure based on the feature remapping, the complex classification network is approached to the simple classification network by using the feature remapping method, so that the complex classification network learns the key information extracted by the simple classification network. The method adopts a double-branch structure to realize the learning of the characteristics of the multi-classification network, and uses a characteristic transfer enhancement module (FTSM) to enable the network to pay attention to high-frequency information in the forward propagation process and pay attention to the attention relationship among the defect positions in the image, thereby improving the characteristic transfer capability of the network.
Correspondingly, the application also relates to a feature remapping-based image multi-classification network training method, which can finish training by depending on the above-described feature remapping-based image multi-classification network structure or not depending on the above-described feature remapping-based image multi-classification network structure. As shown in fig. 4, the training method includes:
step S100: and simultaneously inputting the samples into a multi-classification network to be trained and a trained two-classification network, and training the multi-classification network.
Specifically, the sample is input into a trained two-classification network, and a classification result is output after the sample passes through a softmax layer of the two-classification networkWherein, in the step (A),、respectively representing a defect class probability and a normal class probability. Inputting samples into a multi-classification network to train the multi-classification network, wherein the multi-classification network realizes m classifications, and outputs a classification result [ outsoft ] after passing through a softmax layer of the multi-classification network mcls0 ,outsoft mcls1 ,outsoft mcls2 ,……,outsoft mclsn-1 ,outsoft mclsn ]Wherein, outsoft mclsn Representing normal class probability, the rest representing different defect probability, n = m-1, m ≧ 3.
In one embodiment, a distillation network with the same random sampling number in a data set and the same defect-normal sample data is input into a two-class network, and the two-class network is trained by combining the trained distillation network. Specifically, the distillation network may be an EfficientNet-B6 network model, or a model having a large number of layers such as ResNet-101, resNet-152, and SE-Net.
When a multi-classification network is trained, all data are divided into a training set and a verification set according to the proportion of 7: 3, wherein the training set is used for training the network, and the verification set is used for selecting a better training result; after the training data is set, the image to be trained is input into the two branches of the model. Alternatively, the ratio of training set to validation set may be divided into 6: 4, 7: 3, 8: 2, 9: 1, etc., depending on the amount of total data.
In one embodiment, before a network is trained by using image data in a sample, data enhancement is realized by preprocessing the image data, so that the anti-interference capability and the generalization capability of the network are improved. Optionally, the data enhancement mode includes geometric transformation, such as flipping, rotating, clipping, scaling, translating, and dithering; pixel transformation, such as salt and pepper noise and Gaussian noise, is performed to perform Gaussian blur, adjust HSV contrast, adjust brightness and saturation, equalize histograms, adjust white balance and the like.
Step S200: and correcting the classification result output by the multi-classification network by using the classification result output by the two-classification network, so that the normal class probability output by the multi-classification network during training approaches to the normal class probability output by the two-classification network, and the feature remapping is realized.
In a specific embodiment, step S200 includes:
step S210: and acquiring the output result of the multi-classification network full-connection layer and the output result of the two-classification network full-connection layer, splicing, and outputting a weight parameter k through the full-connection layer.
Wherein, the output result of the multi-classification network full-connection layer can be expressed as [ out mcls0 ,out mcls1 ,out mcls2 ,……,out mclsn-1 ,out mclsn ]The output result of the full-connection layer of the two-classification network can be expressed as。
Step S220: normal class probability out is divided by weight parameter k mclsn Corrected to out mclsn ′。
Wherein, the correction formula is:
the derivation of the correction formula is described above and will not be described herein.
And through the characteristic remapping process, when the multi-classification network is trained, the logic value output by the normal class sample in the multi-classification network is supervised by using the logic value output by the normal class sample in the trained 2-classification network, and the logic value output by the normal class sample in the multi-classification network is updated according to a trainable weight coefficient in the supervision process.
In an embodiment, before performing step S200, the method further includes:
judging normal class probabilityWhether the class is greater than a preset value or not, when the class is normalWhen the probability is larger than the preset value, the characteristic remapping is carried out, and when the probability of the normal category is larger than the preset valueAnd when the value is not greater than the preset value, the characteristic remapping is not carried out. Specifically, the preset value may be 0.9.
Step S300: calculating training loss and reversely adjusting parameters of the multi-classification network, continuing training until the loss is converged to an expected range, and ending the training.
Specifically, the training loss is calculated, including:
and calculating the characteristic difference loss between the input characteristics of the multi-classification network full-connection layer and the input characteristics of the two classification network full-connection layers and the classification loss of the multi-classification network after characteristic remapping, and finishing training after the sum of the classification loss and the characteristic difference loss converges to an expected range. The selection of the specific loss function can be referred to the above description, and will not be described herein.
According to the image multi-classification network training method based on the feature remapping, during the training of the multi-classification network, the normal sample classification result of the multi-classification network is corrected through the trained two classification networks, the recognition accuracy of the multi-classification network on the normal classification sample can be improved, and particularly under the condition that the number of the normal samples is small, the multi-classification network with high recognition accuracy on the normal classification sample can be obtained.
Correspondingly, the application also relates to an image multi-classification method, which is used for inputting the images into a trained multi-classification network to obtain a classification result, wherein the multi-classification network is obtained by training according to the training method. Specifically, the input image may be an image of the drainage pipeline, and the drainage pipeline defect is classified through the multi-classification network. It can be understood that the method is not limited to classification of the drainage pipeline defects, and can also be used for classification of defects in other occasions.
In the following, 17 kinds of pipeline defect images are taken as samples, wherein the original size of the acquired image is 480 × 270 pixels, and 25939 images are calculated in total, and a multi-classification network model trained by different methods is input, and the following results are compared to show the effectiveness of the scheme. Wherein, baseline represents a multi-classification network test result without feature remapping, baseline + FL represents a test result using Focal Loss (FL) as a loss function, baseline + FL + FRM represents a multi-classification network with Feature Remapping (FRM), and Baseline + FL + FRM + FTEM represents a multi-classification network test result with feature remapping and an FTEM module.
The comparison results of the network model based on MobileNet-V3 are shown in the following table:
table one: comparison result when using MobileNet-V3 as basic network model
Method | Normal class sample accuracy rate | Average recall rate |
Baseline | 40.31% | 99.70% |
Baseline+FL | 45.73% | 99.33% |
Baseline+FL+FRM | 46.51% | 99.36% |
Baseline+FL+FRM+FTEM | 49.61% | 99.27% |
The comparison results when using ShuffLeNet-V2 as the basic network model are shown in the following table two:
table two: comparison result when using ShuffleNet-V2 as basic network model
Method | Normal class sample accuracy rate | Average recall rate |
Baseline | 32.56% | 99.67% |
Baseline+FL | 51.16% | 98.33% |
Baseline+FL+FRM | 55.03% | 96.63% |
Baseline+FL+FRM+FTEM | 60.47% | 94.79% |
The comparison results when the EfficientNet-B0 is used as a basic network model are shown in the third table:
a third table: comparison result when EfficientNet-B0 is used as basic network model
Method | Normal class sample accuracy rate | Average recall rate |
Baseline | 33.33% | 99.90% |
Baseline+FL | 42.64% | 99.67% |
Baseline+FL+FRM | 66.67% | 97.06% |
Baseline+FL+FRM+FTEM | 69.77% | 92.57% |
The results show that the accuracy of the multi-classification network on normal classification samples can be improved by training the multi-classification network through a feature remapping method. Moreover, after the FTEM module is added to the multi-classification network, the effect can be further improved, and therefore the effectiveness of the scheme is verified.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.
Claims (10)
1. An image multi-classification network structure based on feature remapping, comprising:
the multi-classification network is used for realizing m kinds of classification after the samples input into the multi-classification network sequentially pass through the convolutional layer, the full connection layer and the softmax layer and outputting a classification result [ outsoft mcls0 ,outsoft mcls1 ,outsoft mcls2 ,……,outsoft mclsn-1 ,outsoft mclsn ]Wherein, outsoft mclsn Representing normal class probability, the rest representing different defect probabilityThe ratio n = m-1, m is more than or equal to 3;
two classification networks, wherein the samples input into the two classification networks sequentially pass through the convolutional layer, the full connection layer and the softmax layer to realize 2 kinds of classification, and classification results are outputWherein, the first and the second end of the pipe are connected with each other,respectively representing defect class probability and normal class probability;
the characteristic remapping network is used for correcting the classification result output by the multi-classification network according to the classification result output by the two classification networks so as to enable the multi-classification network to output a normal class probability outsoft during training mclsn Approaching the normal class probability of the two-class network outputImplementing feature remapping;
and the loss calculation module is used for calculating the training loss and reversely adjusting the parameters of the multi-classification network so as to converge the loss.
2. The feature remapping based image multi-classification network structure as recited in claim 1, wherein said two classification networks and said multi-classification network share a sample input.
3. The feature remapping based image multi-classification network structure of claim 1, wherein said feature remapping network includes:
a weight parameter adjusting module for obtaining the output result [ out ] of the multi-classification network full-connection layer mcls0 ,out mcls1 ,out mcls2 ,……,out mclsn-1 ,out mclsn ]Output result of full connection layer of two-classification networkAfter splicing, outputting a weight parameter k through a full connection layer;
a normal class probability correction module for correcting the normal class probability out by using the weight parameter k mclsn Corrected to out mclsn ', wherein,
the loss calculation module comprises a feature difference loss module and a classification loss module, the feature difference loss module is used for calculating feature difference loss between input features of a multi-classification network full-connection layer and input features of a two-classification network full-connection layer, the classification loss module is used for calculating classification loss of the multi-classification network after feature remapping, and the loss calculation module takes the sum of the classification loss and the feature difference loss as training loss.
4. The feature remapping based image multi-classification network structure of claim 1, said multi-classification network including a backbone network and a feature transfer enhancing module connected to said backbone network, said backbone network including a plurality of convolutional layers, fully-connected layers and softmax layers connected in sequence, said feature enhancing module including:
a middle-high frequency characteristic transfer enhancing module for extracting shallow layer edge and texture information, a front-end convolution layer J bridged on the trunk network Front part Two ends of the main network comprise a 1 multiplied by 1 convolutional layer, a middle-high frequency domain channel attention module and a first cross attention module, the characteristics in the main network are divided into three paths when reaching the input end of the middle-high frequency characteristic transmission enhancement module, and the first path passes through a front end convolutional layer J Front side The second path is sequentially passed through a 1 × 1 convolutional layer, a middle-high frequency domain channel attention module and the 1 × 1 convolutional layer to extract high-frequency information of the image and then is converged into the trunk path, and the third path is sequentially passed through the 1 × 1 convolutional layer, a first cross attention module and the 1 × 1 convolutional layer to extract long-distance dependency among pixels at different positions and then is converged into the trunk path;
a middle and low frequency feature transfer enhancement module for extracting deep semantic information and bridged on the back end convolution layer J of the backbone network Rear end Two ends of the main network comprise a 1 multiplied by 1 convolutional layer, a middle and low frequency domain channel attention module and a second cross attention module, the characteristics in the main network are divided into three paths when reaching the input end of the middle and low frequency characteristic transmission enhancement module, and the first path passes through a back-end convolutional layer J Rear end And continuously transmitting the image, wherein the second path sequentially passes through the 1 × 1 convolutional layer, the middle-low frequency domain channel attention module and the 1 × 1 convolutional layer to extract low-frequency information of the image and then is converged into the trunk path, and the third path sequentially passes through the 1 × 1 convolutional layer, the second cross attention module and the 1 × 1 convolutional layer to extract long-distance dependency among pixels at different positions and then is converged into the trunk path.
5. The feature remapping-based image multi-classification network structure of claim 1, further comprising a feature remapping control module for determining normal class probabilitiesWhether the class is greater than a preset value or not, when the class is normalWhen the probability is larger than the preset value, the access characteristic remapping network carries out characteristic remapping, and when the probability of the normal category is larger than the preset valueAnd when the characteristic is not greater than the preset value, the characteristic remapping network is cut off and the characteristic remapping is not carried out.
6. An image multi-classification network training method based on feature remapping is characterized by comprising the following steps:
inputting samples into a multi-classification network to train the multi-classification network, wherein the multi-classification network realizes m classifications, and outputs a classification result [ outsoft ] after passing through a softmax layer of the multi-classification network mcls0 ,outsoft mcls1 ,outsoft mcls2 ,……,outsoft mclsn-1 ,outsoft mclsn ]Wherein, outsoft mclsn Representing normal class probability, and the rest representing different defect probabilities respectively, wherein n = m-1, and m is more than or equal to 3;
inputting the same sample into the trained two-classification network, and outputting a classification result after passing through a softmax layer of the two-classification networkWherein the content of the first and second substances,respectively representing defect class probability and normal class probability;
correcting the classification result output by the multi-classification network by using the classification result output by the two-classification network, so that the normal class probability outsoft output by the multi-classification network during training mclsn Approaching normal class probability of the two-class network outputImplementing feature remapping;
calculating training loss and reversely adjusting parameters of the multi-classification network, continuing training until the loss is converged to an expected range, and ending the training.
7. The method as claimed in claim 6, wherein the modifying the classification result output by the multi-classification network with the classification result output by the two-classification network comprises:
obtaining output result of multi-classification network full connection layer mcls0 ,out mcls1 ,out mcls2 ,……,out mclsn-1 ,out mclsn ]Output result of full connection layer of two-classification networkOutputting a weight parameter k through a full connection layer after splicing;
normal class probability out is divided by weight parameter k mclsn Corrected to out mclsn ', wherein,
calculating a training loss comprising:
and calculating the characteristic difference loss between the input characteristics of the full connection layer of the multi-classification network and the input characteristics of the full connection layer of the two classification networks, and the classification loss of the multi-classification network after characteristic remapping, and ending the training after the sum of the classification loss and the characteristic difference loss is converged to an expected range.
8. The method of claim 6, wherein before the classification result outputted by the multi-classification network is modified by the classification result outputted by the two-classification network, the normal class probability is determinedWhether it is greater than the preset value, when the normal classification probabilityWhen the probability is larger than the preset value, the characteristic remapping is carried out, and when the probability of the normal category is larger than the preset valueAnd when the value is not greater than the preset value, the characteristic remapping is not carried out.
9. The method according to claim 8, wherein the predetermined value is 0.9.
10. An image multi-classification method, comprising:
inputting the image into a trained multi-classification network to obtain a classification result, wherein the multi-classification network is obtained by training according to the feature remapping-based image multi-classification network training method of any one of claims 6 to 9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210978719.7A CN115439681A (en) | 2022-08-16 | 2022-08-16 | Image multi-classification network structure based on feature remapping and training method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210978719.7A CN115439681A (en) | 2022-08-16 | 2022-08-16 | Image multi-classification network structure based on feature remapping and training method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115439681A true CN115439681A (en) | 2022-12-06 |
Family
ID=84242792
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210978719.7A Pending CN115439681A (en) | 2022-08-16 | 2022-08-16 | Image multi-classification network structure based on feature remapping and training method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115439681A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117316286A (en) * | 2023-11-29 | 2023-12-29 | 广州燃石医学检验所有限公司 | Data processing method, device and storage medium for tumor tracing |
-
2022
- 2022-08-16 CN CN202210978719.7A patent/CN115439681A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117316286A (en) * | 2023-11-29 | 2023-12-29 | 广州燃石医学检验所有限公司 | Data processing method, device and storage medium for tumor tracing |
CN117316286B (en) * | 2023-11-29 | 2024-02-27 | 广州燃石医学检验所有限公司 | Data processing method, device and storage medium for tumor tracing |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109919108B (en) | Remote sensing image rapid target detection method based on deep hash auxiliary network | |
CN111325751B (en) | CT image segmentation system based on attention convolution neural network | |
CN108509978B (en) | Multi-class target detection method and model based on CNN (CNN) multi-level feature fusion | |
CN113313657B (en) | Unsupervised learning method and system for low-illumination image enhancement | |
CN111126386B (en) | Sequence domain adaptation method based on countermeasure learning in scene text recognition | |
CN110048827B (en) | Class template attack method based on deep learning convolutional neural network | |
CN110109060B (en) | Radar radiation source signal sorting method and system based on deep learning network | |
CN110532932B (en) | Method for identifying multi-component radar signal intra-pulse modulation mode | |
Liu et al. | Remote sensing image change detection based on information transmission and attention mechanism | |
CN108694346B (en) | Ship radiation noise signal identification method based on two-stage CNN | |
CN111862093A (en) | Corrosion grade information processing method and system based on image recognition | |
CN112396110A (en) | Method for generating anti-cascade network augmented image | |
CN112434732A (en) | Deep learning classification method based on feature screening | |
CN113408605A (en) | Hyperspectral image semi-supervised classification method based on small sample learning | |
CN110245620B (en) | Non-maximization inhibition method based on attention | |
CN111932577B (en) | Text detection method, electronic device and computer readable medium | |
CN113657491A (en) | Neural network design method for signal modulation type recognition | |
CN112651917A (en) | Space satellite low-illumination image enhancement method based on generation countermeasure network | |
CN111767860A (en) | Method and terminal for realizing image recognition through convolutional neural network | |
CN112580661A (en) | Multi-scale edge detection method under deep supervision | |
CN115439681A (en) | Image multi-classification network structure based on feature remapping and training method | |
CN115564983A (en) | Target detection method and device, electronic equipment, storage medium and application thereof | |
CN115272776A (en) | Hyperspectral image classification method based on double-path convolution and double attention and storage medium | |
CN115421141A (en) | Two-stage self-adaptive sampling unbalanced SAR target identification method | |
CN113989256A (en) | Detection model optimization method, detection method and detection device for remote sensing image building |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |