CN116977711A

CN116977711A - Image classification method and system based on improved CFNet network

Info

Publication number: CN116977711A
Application number: CN202310721943.2A
Authority: CN
Inventors: 钟雪菲; 殷萧峰; 袁焕然; 潘雨晨
Original assignee: Shandong Normal University
Current assignee: Shandong Normal University
Priority date: 2023-06-16
Filing date: 2023-06-16
Publication date: 2023-10-31

Abstract

The invention discloses an image classification method and system based on an improved CFNet network, wherein the method comprises the following steps: acquiring an image to be classified, inputting the preprocessed image to be classified into an improved CFNet network, and outputting an image classification result of the image to be classified; wherein the improved CFNet network comprises an average pooling layer and a plurality of branch networks which are connected in parallel and comprise cascade networks; inputting the images to be classified into an improved CFNet network, equally dividing the images into n multiplied by n parts after an average pooling layer, respectively entering the images and the original images into corresponding branch networks after each part of images are amplified to the original image size, and extracting feature images through the branch networks; after feature images respectively extracted by a plurality of branch networks connected in parallel are spliced, an image classification result of an image to be classified is output through a visual multi-layer perceptron based on Debroil waves. The invention utilizes the improved CFNet network to classify the images, improves the classification precision and optimizes the classification effect.

Description

Image classification method and system based on improved CFNet network

Technical Field

The invention relates to the technical field of image recognition, in particular to an image classification method and system based on an improved CFNet network.

Background

With the rapid development of machine learning technology, the current deep learning method based on big data far exceeds the traditional identification and detection method, a convolutional neural network (Convolutional Neural Network, CNN) is one of the more popular methods in deep learning, deep learning (deep learning) is a branch of machine learning, and is an algorithm for abstracting data at a high layer by utilizing a plurality of processing layers comprising complex structures or consisting of multiple nonlinear transformations. The convolutional neural network gradually extracts high-level features of the image by alternately performing convolutional kernel pooling operation on the image, and then classifies the features by using the neural network to complete the recognition function.

Convolutional neural network algorithms based on big data deep learning are used in large numbers in image classification, such as FPN, leNet, alexNet, VGG, googLeNet, resNet, mobileNet series, etc. Although the algorithm models can solve the problem of image classification, the models have insufficient generalization performance and poor classification effect. At present, a novel multi-scale feature fusion network architecture CFNet (Cascade Fusion Network) is proposed, and unlike widely used FPN (Feature Pyramid Networks, feature pyramid) network machine variants which use a lightweight fusion module to fuse multi-scale features extracted by a backbone network, the CFNet adopts a serial cascade fusion network, realizes the extraction of multi-scale features by introducing a plurality of cascade stage modules, and fuses the feature integration operation into the backbone network, so that the multi-scale learning can be realized by effectively utilizing parameters of the backbone network, more parameters can be used for feature fusion, the richness of feature fusion is greatly increased, and the effect of final image classification is improved. The CFNet network is applied to image target detection, image segmentation and image classification tasks, and better recognition and classification effects relative to a peer-level network model are achieved in a plurality of main stream dense prediction tasks.

However, the CFNet network adopts a serial network structure to extract multi-scale features, when facing large-size images, the computing cost is high, the extracted features of each scale are independently processed, and the feature information among different scales cannot be fully fused, so that part of important feature information is lost; the CFNet network performs feature fusion by using element addition or splicing through a transfer module, and can only fuse shallow features, but cannot fuse deeper features; the multi-scale network has the problem of parameter selection, the super-parameters of the network model are usually determined by screening after multiple training by an optimizer, and the super-parameters are usually fixed, so that the model has poor generalization capability and poor adaptability. Therefore, the image classification using the CFNet network described above is less effective.

Disclosure of Invention

In order to solve the defects in the prior art, the invention provides an image classification method and an image classification system based on an improved CFNet network, which are used for improving the structure of the traditional CGNet network, globally using a parallel and serial connection network, adding a multi-scale input module and a self-adaptive depth adjustment mode, improving the original characteristic fusion method, utilizing the improved CFNet network to classify images, improving the classification precision and optimizing the classification effect.

In a first aspect, the present disclosure provides an image classification method based on an improved CFNet network.

An image classification method based on an improved CFNet network, comprising:

acquiring an image to be classified, and preprocessing the image to be classified;

inputting the preprocessed image to be classified into an improved CFNet network, and outputting an image classification result of the image to be classified;

wherein the improved CFNet network comprises an average pooling layer and a plurality of branch networks which are connected in parallel and comprise cascade networks; inputting the images to be classified into an improved CFNet network, equally dividing the images into n multiplied by n parts after an average pooling layer, respectively entering the images and the original images into corresponding branch networks after each part of images are amplified to the original image size, and extracting feature images through the branch networks; after feature images respectively extracted by a plurality of branch networks connected in parallel are spliced, an image classification result of an image to be classified is output through a visual multi-layer perceptron based on Debroil waves.

In a second aspect, the present disclosure provides an image classification system based on an improved CFNet network.

An image classification system based on an improved CFNet network, comprising:

the image acquisition module is used for acquiring images to be classified;

the image preprocessing module is used for preprocessing the images to be classified;

the image classification module is used for inputting the preprocessed images to be classified into the improved CFNet network and outputting image classification results of the images to be classified;

In a third aspect, the present disclosure also provides an electronic device comprising a memory and a processor, and computer instructions stored on the memory and running on the processor, which when executed by the processor, perform the steps of the method of the first aspect.

In a fourth aspect, the present disclosure also provides a computer readable storage medium storing computer instructions which, when executed by a processor, perform the steps of the method of the first aspect.

The one or more of the above technical solutions have the following beneficial effects:

the invention provides an image classification method and system based on an improved CFNet network, which improves the existing CGNet network structure, and realizes the comprehensive extraction of multi-scale features by globally using a parallel and serial connection network, thereby avoiding the loss of important feature information; the original feature fusion method is improved, and deeper features are fused; the multi-scale input module and the self-adaptive depth adjustment mode are added, so that the generalization capability and the adaptability of the model are improved; and (3) performing image classification by using the improved CFNet network, improving the classification precision and optimizing the classification effect.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.

Fig. 1 is a schematic diagram of a conventional CFNet network;

FIG. 2 is a schematic diagram of the structure of each CFNet stage in a conventional CFNet network;

fig. 3 is a schematic structural diagram of a Block used in the existing CFNet network;

fig. 4 is a schematic structural diagram of a transfer module used in a conventional CFNet network;

fig. 5 is a schematic structural diagram of an improved CFNet network in an embodiment of the present invention;

fig. 6 is a schematic diagram of a transfer module used in the CFNet network of the present invention.

Detailed Description

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.

Example 1

CFNet (Cascade Fusion Network) is a novel multi-scale feature fusion network, adopts a novel architecture of a cascade fusion network, and fuses feature integration operation into a backbone network, so that more parameters can be used for feature fusion, and the richness of feature fusion is greatly increased. The existing CFNet network structure is shown in fig. 1, and comprises an input layer, a Block, a downsampling convolution layer and a cascade network (this part forms a backbone network) which are sequentially arranged, wherein the cascade network comprises a plurality of CFNet stages (stages) which are connected in series, as shown in fig. 2, each CFNet stage comprises two Block blocks and downsampling convolution layers (namely a first Block, a first downsampling convolution layer, a second Block and a second downsampling convolution layer which are sequentially connected), a focus module and a transfer module. Further, as shown in fig. 3, the Block may be a Block of Swin Transformer or a Block of convnex; the transfer module is shown in fig. 4, and the feature fusion can be realized by adopting the modes of sequence addition, addition and combination.

Specifically, in the original CFNet network architecture, firstly, an image is input to an input layer, the input layer is formed by two convolution layers, each convolution layer is followed by a LayerNorm regularization layer, image features are extracted through the input layer and then input to a Block, the Block can select a Block of a Swin converter or a Block of a ConvNeXt, then the Block can be input to a cascade network module through a downsampled convolution layer, the cascade network is formed by serially connected CFNet stages, in each stage, the feature map is subjected to the one Block and then subjected to the downsampled convolution layer, the feature map output by the layer is reserved as C3, meanwhile, the output is input to the next Block, the feature map C4 is output through the downsampled convolution layer, the feature map C4 is reserved and is divided into two types according to the Swin converter and ConvNeXt, and the focus module is used for expanding the field feeling of neurons of the last stage Block group; outputting a feature map C5 after passing through the focus module, fusing the feature maps C3, C4 and C5 through the transfer module, and finally outputting feature maps P3, P4 and P5, wherein only the feature map P3 fuses the features of C3, C4 and C5, and the feature maps P4 and P5 serve as a transitional feature subgraph and are discarded after each stage is finished; in the last stage, the output P3, P4 and P5 pass through the final classifier to output the image classification result.

The CFNet network has a certain problem in the aspects of feature extraction and fusion, and particularly, the network adopts a serial network structure, so that feature information can be lost, namely, features of each scale are independently processed, the feature information among different scales cannot be fully fused, so that some important feature information can be lost, and the computing cost of the serial network is higher in the face of large-size images; the original transfer module uses an element-by-element addition or splicing method to fuse features in a shallow layer, so that deeper features cannot be fused; the multi-scale network always has the parameter selection problem, namely the number of parameters in the multi-scale network is large, and each branch needs to be subjected to parameter selection, so that each branch needs to be carefully subjected to parameter selection and adjustment, otherwise, the performance of a model can be influenced, and the traditional super-parameter selection mode is that the model is subjected to multiple training and multiple screening acquisition through an optimizer, so that the obtained super-parameters are always fixed, and the model has poor generalization capability and poor adaptability.

In order to solve the problems that when an existing CFNet network is used for image classification, the CFNet network is not comprehensive in image feature extraction, only shallow features are fused when features are fused, and model generalization capability is poor, and finally an image classification effect is poor, the embodiment provides an image classification method based on an improved CFNet network, which comprises the following steps:

In this embodiment, an image to be classified is first obtained, and image preprocessing is performed on the obtained image to be classified, where the preprocessing includes image cutting, image scaling, brightness enhancement, and the like, and a foundation is laid for subsequent image recognition through image preprocessing.

And inputting the preprocessed image to be classified into an improved CFNet network, and outputting an image classification result of the image to be classified. The focus of this embodiment is on an improved CFNet network, where the structure of the improved CFNet network is shown in fig. 5, and the backbone network of the network includes an average pooling layer, multiple branch networks connected in parallel, and a splicing layer (i.e. a Concat layer) for summarizing all branch networks, where image features are extracted through the backbone network and spliced through the Concat layer, and finally, the classification result is output through Wave MLP. Each branch network comprises a downsampling convolution layer, a cascade network, a fusion module and a maximum pooling layer which are sequentially connected. The image to be classified is input into the network, after one-time average pooling, the image is equally segmented into n×n parts, namely, the image is equally segmented into n×n sub-images (which are not fully complemented by 0), the segmented sub-images are amplified into original image sizes, and the sub-images enter into the branch networks (each branch network comprises a cascade network) after the original images enter together, as shown in fig. 5, the improved CFNet network comprises n×n+1 parallel branch networks, each sub-image is used as a part (patch) and the original image is used as the last part (patch) to be combined, each part is correspondingly input into one branch network, global information is extracted by adding the original image, and the problem of incomplete feature information extraction is avoided.

Further, the cascade network disclosed in this embodiment is the same as the basic concept of the cascade network in the original CFNet network, in this embodiment, the cascade network includes a plurality of CFNet stages (stages) sequentially connected in series, and each CFNet stage includes two Block blocks and a downsampling convolution layer (i.e., a first Block, a first downsampling convolution layer, a second Block, a second downsampling convolution layer sequentially connected), a focus module, and an improved transfer module.

After the image of each part is amplified to the original image size, the image and the original image respectively enter a corresponding branch network, a feature image is extracted through the branch network, specifically, an input image firstly extracts the feature image through a downsampling convolution layer, the feature image enters a cascade network, sequentially passes through a plurality of CFNet stages (stages), finally, feature images P3, P4 and P5 are output, element-by-element addition is carried out on the feature images P3, P4 and P5 through a fusion module, and finally, the feature images with the same size are output through a maximum pooling layer. And inputting and outputting feature graphs with the same size from each parallel branch network, finally splicing all feature graphs through a concat layer, classifying through a visual multi-layer perceptron Wave MLP (also called a matter Wave) based on Debrow waves, and outputting a final classification result.

The feature map sequentially passes through a plurality of CFNet stages (stages), and finally outputs feature maps P3, P4 and P5, wherein the process is as follows: in each stage, the feature map passes through a Block and then a downsampling convolution layer to output a feature map C3, meanwhile, the output feature map C3 is input into the next Block, the feature map C4 is output through the downsampling convolution layer, the feature map C4 is reserved and passes through a focus module again, and the focus module is divided into two types according to Swin transducer and ConvNeXt and is used for expanding the receptive field of the neurons of the last Block group in each stage; after the focus module, a feature map C5 is output, then the feature maps C3, C4 and C5 are fused by the improved transfer module, and finally feature maps P3, P4 and P5 are output (in practice, P4 is C4, P5 is C5, the two feature maps are not fused with other features), wherein only the feature map P3 fuses the features of C3, C4 and C5, and P4 and P5 serve as a transitional feature sub-map and are discarded after each stage is finished. The profile P3 then proceeds to the next stage.

In this embodiment, as shown in fig. 6, the structure of the improved transfer module is shown in fig. 6, the feature maps C3, C4, and C5 are input into the transfer module, each feature map is subjected to average pooling by an average pooling layer, then the image sizes are unified, a multi-head self-attention module is input through a splicing (concat) operation, different parts of an input sequence are processed by using a multi-head self-attention mechanism, and the dependency relationship between different positions can be captured at the same time by using a plurality of heads, so that a more comprehensive information representation is obtained, and then a final feature map P3 is output by an LN regularization layer (i.e., layerNorm regularization).

Furthermore, the super-parameter optimization of the network model adopts the latest greedy solutions model, and the calculation of each image part is mutually independent, so that parallel calculation can be used, and the total parameter quantity is larger than that of the original CFNet network, but the calculation speed is not obviously increased.

Further, after the input image is subjected to the downsampling convolution layer to extract the feature map and before entering the CFNet stage (stage), the method further comprises calculating the number of CFNet stages (i.e. network depth) according to the two-dimensional entropy of the input image.

The depth of the original neural network is often set as a super parameter, the super parameter is manually set, the optimal depth is usually calculated through an optimizer, and the optimal depth is not changed after the setting is finished, namely the super parameter of the network depth is fixed. If the above conventional super parameter determining and setting method is adopted, a depth parameter (n+1 are required to be set) needs to be set for each CFNet cascade network in parallel, which is time-consuming and labor-consuming. Therefore, in this embodiment, while a parallel network architecture is proposed, after dividing into different partial patches for different images, each patch often has different information content, and a fixed network depth cannot be simply used for information extraction, so that the information content of a picture is represented by using the two-dimensional entropy of the picture, and the network depth is inversely related to the two-dimensional entropy, because the information amount in one patch is large, a shallow network can extract enough information, and conversely, a deep network is required. The specific calculation method for judging the network depth based on the image entropy comprises the following steps:

firstly, selecting a neighborhood gray average value of an image as a space characteristic quantity of gray distribution, forming a characteristic binary group by the neighborhood gray average value and a pixel gray value of the image, and marking the characteristic binary group as (i, j)) Wherein i represents the gray value of the pixel (i is more than or equal to 0 and less than or equal to 255), j represents the neighborhood gray average value (j is more than or equal to 0 and less than or equal to 255), and when three channels exist, the gray of each pixel takes the average value of the three channels. By the formulaThe comprehensive characteristics of gray values at a certain pixel position and gray distribution of pixels around the gray values are reflected, wherein f (i, j) is the frequency of occurrence of a characteristic binary group (i, j), N is the scale of an image, and the definition of the two-dimensional entropy H of the image is as follows:

and secondly, acquiring network depth based on the calculated two-dimensional entropy of the image, wherein the network depth d=a/H, wherein a is a super parameter to be determined, and performing artificial setting through calculation by an optimizer and the like.

With the method for setting the network depth based on the image entropy, for a plurality of parallel cascade networks, only one super parameter a needs to be set, and although n+1 times still need to be calculated according to the determined super parameter a due to n+1 branches, compared with the conventional method that n+1 network depth super parameters need to be determined and set, the scheme in the embodiment reduces the super parameter of the depth of each branch from n+1 to one _， The number of setting super parameters is reduced, and the generalization capability and adaptability of the model are improved.

According to the image classification method based on the improved CFNet network, the existing CGNet network structure is improved, parallel and serial connection networks are used globally, comprehensive extraction of multi-scale features is achieved, loss of important feature information is avoided, an original feature fusion method is improved, deeper features are fused, a multi-scale input module and a self-adaptive depth adjustment mode are added, generalization capability and adaptability of a model are improved, image classification is conducted by the aid of the improved CFNet network, classification accuracy is improved, and classification effect is optimized.

Example two

The embodiment provides an image classification system based on an improved CFNet network, which comprises:

the image acquisition module is used for acquiring images to be classified;

wherein the improved CFNet network comprises an average pooling layer and a plurality of cascade networks which are connected in sequence; inputting the images to be classified into an improved CFNet network, equally dividing the images into n multiplied by n parts after an average pooling layer, respectively entering the images and the original images into corresponding cascade networks after each part of images are amplified to the original image size, and extracting feature images through the cascade networks; after the feature images respectively extracted by the plurality of cascade networks connected in parallel are spliced, an image classification result of the image to be classified is output through a visual multi-layer perceptron based on Debroil waves.

Example III

The present embodiment provides an electronic device comprising a memory and a processor and computer instructions stored on the memory and running on the processor that, when executed by the processor, perform the steps in the improved CFNet network based image classification method as described above.

Example IV

The present embodiment also provides a computer readable storage medium storing computer instructions that, when executed by a processor, perform the steps in the improved CFNet network based image classification method described above.

The steps involved in the second to fourth embodiments correspond to the first embodiment of the method, and the detailed description of the second embodiment refers to the relevant description of the first embodiment. The term "computer-readable storage medium" should be taken to include a single medium or multiple media including one or more sets of instructions; it should also be understood to include any medium capable of storing, encoding or carrying a set of instructions for execution by a processor and that cause the processor to perform any one of the methods of the present invention.

It will be appreciated by those skilled in the art that the modules or steps of the invention described above may be implemented by general-purpose computer means, alternatively they may be implemented by program code executable by computing means, whereby they may be stored in storage means for execution by computing means, or they may be made into individual integrated circuit modules separately, or a plurality of modules or steps in them may be made into a single integrated circuit module. The present invention is not limited to any specific combination of hardware and software.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

While the foregoing description of the embodiments of the present invention has been presented in conjunction with the drawings, it should be understood that it is not intended to limit the scope of the invention, but rather, it is intended to cover all modifications or variations within the scope of the invention as defined by the claims of the present invention.

Claims

1. An image classification method based on an improved CFNet network is characterized by comprising the following steps:

2. The image classification method based on the improved CFNet network according to claim 1, wherein the branch network comprises a downsampling convolution layer, a cascade network, a fusion module and a maximum pooling layer which are connected in sequence; the cascade network comprises a plurality of CFNet stages which are sequentially connected in series, wherein each CFNet stage comprises two Block blocks and a downsampling convolution layer which are sequentially connected, a focus module and an improved transfer module.

3. The image classification method based on the improved CFNet network of claim 1, wherein said extracting a feature map through a branch network comprises:

the input image firstly extracts feature images through a downsampling convolution layer, the extracted feature images enter a cascade network, the feature images pass through a plurality of CFNet stages in sequence, the feature images P3, P4 and P5 are output, the feature images P3, P4 and P5 are added element by element through a fusion module, and finally the feature images with the same size are output through a maximum pooling layer.

4. The image classification method based on the improved CFNet network as in claim 3, wherein the feature map goes through CFNet stage, and outputs feature maps P3, P4, P5, comprising:

in each CFNet stage, the feature map passes through a Block and then a downsampling convolution layer, a feature map C3 is output, the output feature map C3 is input into the next Block, a feature map C4 is output through the downsampling convolution layer, and the feature map C4 passes through a focus module and then a feature map C5 is output;

the improved transfer module fuses the feature graphs C3, C4 and C5, and finally outputs feature graphs P3, P4 and P5; wherein the feature map P3 merges the feature outputs of the feature maps C3, C4, C5, and P4 and P5 are discarded as a transitional feature sub-map after each phase is completed.

5. The image classification method based on the improved CFNet network as claimed in claim 4, wherein in said improved transfer module, each feature map is subjected to average pooling by an average pooling layer, the image size is unified, the multi-head self-attention module is input after the stitching operation, and then the fused feature map P3 is output by an LN regularization layer.

6. The method for classifying images based on an improved CFNet network as recited in claim 3, wherein after extracting the feature map by the downsampling convolution layer and before entering the CFNet stage, the method further comprises calculating the number of CFNet stages according to the two-dimensional entropy of the input image.

7. An image classification system based on an improved CFNet network, comprising:

the image acquisition module is used for acquiring images to be classified;

8. The image classification system based on an improved CFNet network of claim 7, wherein said extracting feature map through a branched network comprises:

9. An electronic device comprising a memory and a processor and computer instructions stored on the memory and running on the processor, which when executed by the processor, perform the steps of a method of image classification based on an improved CFNet network as claimed in any one of claims 1 to 6.

10. A computer readable storage medium storing computer instructions which, when executed by a processor, perform the steps of an improved CFNet network based image classification method of any of claims 1-6.