CN116266390A

CN116266390A - Image recognition method, device, storage medium and computer equipment

Info

Publication number: CN116266390A
Application number: CN202111541244.7A
Authority: CN
Inventors: 成雪娜; 陈娇
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Suzhou Software Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Suzhou Software Technology Co Ltd
Priority date: 2021-12-16
Filing date: 2021-12-16
Publication date: 2023-06-20

Abstract

The invention discloses an image recognition method, a device, a storage medium and computer equipment, which relate to the field of intelligent quality inspection of assembly and maintenance, wherein the invention adopts a multi-task learning mode to realize automatic intelligent quality inspection of fiber cat configuration in the assembly and maintenance process, can classify whether the fiber cat configuration image to be recognized is compliant, and simultaneously realize detection of a non-compliant position, so that network training is simple and easy to operate, and simultaneously classification and detection are combined to further improve model prediction accuracy, and the defects of low efficiency, high cost and uncertainty of manual sampling inspection are overcome, three different pooling modes are used to acquire multi-scale on a channel, a spatial multi-scale sensing module fuses characteristic images with different expansion rates again to obtain a deeper multi-scale, the deep multi-scale can further solve the problem of inconsistent target information, the two modules have smaller parameters, the acquired multi-scale information is more targeted, and the achieved effect is better.

Description

Image recognition method, device, storage medium and computer equipment

Technical Field

The invention relates to the technical field of intelligent quality inspection of assembly and maintenance, in particular to an image recognition method, an image recognition device, a storage medium and computer equipment.

Background

With the rapid development of home broadband service, the requirements for quality inspection of the package are higher and higher, because of limited package labor, the quality inspection is carried out by adopting a spot inspection mode at first, the package personnel take photos, archives and upload the optical fiber cat images on the construction site as important basis for the spot inspection evaluation of the later package quality, but with the increasing number of home broadband user bases, the mode of manual spot inspection evaluation cannot support the explosive increase of package image data quantity, and in recent years, deep learning is widely applied to various industries and has unexpected results, which means the arrival of intelligent automation of the broadband package quality inspection service.

In the patent with publication number CN111523476a, an AI dimension-loading inspection method and system based on YOLO algorithm, the quality inspection method is disclosed, firstly, the images shot in dimension-loading inspection scene are classified and predicted by convolutional neural network, then a series of image preprocessing such as cleaning, filtering and reasonable cutting is performed, finally, the image is subjected to feature recognition by improved YOLO algorithm, and the final feature recognition result is output by adopting a port matrix mode.

In the patent with publication number CN110956366A, namely, a method for checking consistency of construction of a beam splitter in quality inspection of dress, the consistency of construction conditions of the beam splitter in the automatic check dress and system distribution resources is realized through target detection, semantic segmentation and OCR recognition technology in deep learning, firstly, port numbers of actual access of the tail fibers are obtained through positions of all ports of the beam splitter and the tail fibers and connection ports thereof obtained through a first network and a second network, meanwhile, the port numbers to be constructed on labels are obtained through an OCR text recognition method, and finally, whether the port numbers of the three ports are consistent is judged by combining the port numbers on a work order.

In addition, other patents are basically all patents which directly apply the existing deep learning algorithm to the field of quality control of the package, are not further optimized aiming at the characteristics of the scene of quality control of the package, and are closest to the network model designed by the invention, and are as follows:

in patent publication No. CN110517235A, an automatic dividing method of OCT image based on GCS-Net, a network model for automatic dividing of choroid is disclosed, two modules are designed to automatically select multi-scale information between groups, a choroid dividing model with a receptive field consistent with a choroid dividing target area is obtained through end-to-end network model training, and then an OCT image of normal or high myopia is sent into the trained choroid dividing model, so that a corresponding choroid dividing map is obtained.

The existing method for automated quality inspection of packaging based on deep learning basically adopts a network framework which is mature at present, either classification of whether quality inspection pictures are in compliance or classification of non-compliance positions of quality inspection pictures, and the method for combining the quality inspection pictures and the non-compliance positions of quality inspection pictures is often completed by adopting two different tasks, so that the training process is complicated and complicated.

Because the quality inspection pictures are shot by different maintenance personnel at different time points and under the condition of no unified standard, the size of the interested target area in the quality inspection pictures is often inconsistent, meanwhile, the problem of inconsistent targets exists in the home broadband service quality inspection fiber cat configuration pictures aiming at the proposal, such as whether the fiber cat indicator lamp normally belongs to small target detection, whether the interface loosens and belongs to medium target detection, the fiber bending belongs to large target detection, the problem of inconsistent targets is not considered in the prior art, the final result accuracy of network model prediction is low, and the prediction is biased.

Disclosure of Invention

The invention aims to solve the defects that quality inspection images cannot be classified and detected simultaneously and the information of target areas of the quality inspection images is inconsistent in the prior art, and provides an image identification method, an image identification device, a storage medium and computer equipment.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

an image recognition method comprises the following steps:

preprocessing an image to be identified to obtain an initial image feature map;

inputting the initial image feature map into a channel multi-scale sensing module of a multi-task convolutional neural network to obtain a multi-scale channel information feature map;

inputting the initial image feature map into a spatial multi-scale sensing module of a multi-task convolutional neural network to obtain a spatial multi-scale information feature map;

inputting the multi-scale channel information feature map and the spatial multi-scale information feature map into a classifier of a multi-task convolutional neural network to obtain a preliminary classification result;

and carrying out post-processing on the preliminary classification result to realize the identification of the image.

Preferably, the preprocessing of the image to be identified to obtain an initial image feature map includes

And carrying out data enhancement and data amplification processing on the image to be identified.

Preferably, the inputting the initial image feature map into a channel multi-scale sensing module of a multi-task convolutional neural network to obtain a multi-scale channel information feature map includes:

respectively carrying out global average pooling, maximum pooling and overlapping pooling on an input group of feature images to obtain three groups of multi-scale channel information;

adding the three groups of multi-scale channel information to serve as a channel weight factor, and multiplying the channel weight factor with the initial image feature map to obtain a multiplication result;

and carrying out residual operation on the multiplied result and the initial image feature map, and outputting a feature map of the channel multi-scale information.

Preferably, the inputting the feature map into a spatial multi-scale sensing module of the multi-task convolutional neural network to obtain a feature map of multi-scale spatial information specifically includes:

performing expansion convolution operation with expansion rates of 1, 3 and 5 on the initial image feature images to obtain three groups of feature images with different scales;

fusing the characteristic graphs obtained by convolution with expansion rates of 1, 3 and 3, 5 again;

adding the fused results again to obtain a further multi-scale information feature map;

the multi-scale information feature map is subjected to convolution operation to obtain a space feature map;

multiplying the space feature map serving as space weight with the initial image feature map to obtain a multiplication result;

and carrying out residual operation on the multiplication result and the input feature map to obtain a spatial multi-scale information feature map.

Preferably, the post-processing is performed on the preliminary classification result, so as to realize the identification of the image, and the method specifically comprises the following steps:

judging the preliminary classification result;

if the classification result of the fiber cat configuration standardization is yes, directly outputting the configuration standardization;

if the classification result of the standardization of the optical fiber cat configuration is no, the output configuration is not standardized, and meanwhile, the position of the optical fiber cat configuration is output under the condition of the non-standardization.

Preferably, the image to be identified is a fiber cat configuration image.

Preferably, the loss function adopted by the multitasking convolutional neural network in the training process is L which is commonly used at present _IoU In combination with a binary cross entropy loss function.

An image recognition apparatus comprising:

the preprocessing module is used for preprocessing the image to be identified to obtain an initial image feature map;

the channel multi-scale sensing module is used for inputting the initial image feature map into the channel multi-scale sensing module of the multi-task convolutional neural network to obtain a multi-scale channel information feature map;

the spatial multi-scale sensing module is used for inputting the initial image feature map into the channel multi-scale sensing module of the multi-task convolutional neural network to obtain a multi-scale channel information feature map;

the classification module is used for inputting the multi-scale channel information feature map and the spatial multi-scale information feature map into a classifier of the multi-task convolutional neural network to obtain a preliminary classification result;

and the post-processing module is used for carrying out post-processing on the preliminary classification result, so that the identification of the image is realized.

A computer device comprising a memory having stored therein computer readable instructions which when executed by a processor implement the steps of the image recognition method as described.

A computer readable storage medium having stored thereon computer readable instructions which when executed by a processor perform the steps of the image recognition method as described.

Compared with the prior art, the invention has the beneficial effects that:

1. according to the invention, through a novel multi-task convolutional neural network, an automatic intelligent quality inspection is realized on the configuration of the fiber cat in the process of assembly and maintenance by adopting a multi-task learning mode, whether the configuration image of the fiber cat to be identified is compliant or not can be classified, meanwhile, the detection of the non-compliant position is realized, so that the network training is simple and easy to operate, meanwhile, the model prediction accuracy can be further improved by combining the classification and the detection, and the defects of low efficiency, high cost and uncertainty of manual spot inspection are overcome.

2. According to the invention, the problem of inconsistent target areas is solved by acquiring multi-scale information in data in two different modes through two brand-new modules, namely, a small target fiber cat indicator lamp image and a large target fiber bending image can be well predicted, so that a final network model has higher generalization capability and prediction accuracy, the number of network parameters is reduced through brand-new channel multi-scale sensing modules by utilizing weight sharing, overfitting is avoided, and channel multi-scale information is acquired in three different pooling modes to enhance the recognition and classification capability of fiber cat images.

3. According to the invention, three different pooling modes are used by the multi-scale sensing module to acquire multi-scale on the channel, the spatial multi-scale sensing module fuses the characteristic graphs with different expansion rates again to acquire a deeper multi-scale, the deep multi-scale can further solve the problem of inconsistent target size, the parameters of the two modules are small, the acquired multi-scale information is more targeted, and the effect is better.

4. According to the invention, through a totally new space multi-scale sensing module, the number of network parameters is reduced by utilizing weight sharing, over fitting is avoided, three scale information obtained by three expansion convolutions are further fused to obtain deeper multi-scale information, and the recognition and classification capability of the optical fiber cat image is enhanced in another mode by utilizing guidance of space information.

Drawings

FIG. 1 is a flow chart of an image recognition method according to the present invention;

FIG. 2 is a flow chart of an intelligent quality inspection algorithm for fiber optic cat configuration according to the present invention;

FIG. 3 is a block diagram of a multi-scale sensing module of a channel according to the present invention;

fig. 4 is a block diagram of a spatial multi-scale sensing module according to the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments.

An image recognition method comprises the following steps:

s1, preprocessing an image to be identified to obtain an initial image feature map; .

In the embodiment of the invention, the image is preprocessed by adopting two modes of data enhancement and data amplification to obtain a group of characteristic images, and the data enhancement normalizes the data by adopting a gray stretching mode, so that the speed of network training can be improved, and the precision of model prediction can be improved.

The specific formula for normalizing the data by adopting a gray stretching mode for data enhancement is as follows:

wherein I (I, j) is an original image, g (I, j) is an output image, and meanwhile, data is amplified by adopting a random cutting and rotating mode to solve the problem of data unbalance, wherein the original image is an optical fiber cat image uploaded by a quality inspector.

For the multitasking convolutional neural network, as shown in fig. 3, two brand new multi-scale perception modules are designed to overcome the defects that the space information is weakened due to the fact that an encoder downsamples the feature map layer by layer and the global context information obtained by the encoder is weakened due to the fact that a decoder upsamples the feature map layer by layer in the traditional U-shaped network. Meanwhile, the network can simultaneously classify whether the configuration image of the fiber cat to be identified is nonstandard or not and detect the nonstandard position by adopting a multi-task learning mode, firstly, the feature image obtained by target detection is further extracted through three 3x3 convolution operations and then the classification information is sent to a full connection layer, finally, a classification result is obtained through a sigmod classifier, and the generalization capability and the prediction accuracy of the model are further improved through detection and classification combination.

S2, inputting the initial image feature map into a channel multi-scale sensing module of a multi-task convolutional neural network to obtain a multi-scale channel information feature map:

in the embodiment of the invention, as shown in fig. 4, a channel multi-scale perception module structure diagram is designed, firstly, a group of input feature images are respectively subjected to global average pooling (G), maximum pooling (M) and overlapping pooling (O) to capture different scale information, wherein three 1x1 convolutions share weight parameters, so that the parameter quantity of a model can be reduced, the risk of overfitting is reduced, then the obtained three groups of multi-scale channel information are added and then are used as channel weight factors to be multiplied by the input group of feature images, so that the effective channel weight is increased, the invalid channel weight is reduced, the larger the channel weight is, the larger the contribution of the corresponding scale of the group to a final network prediction image is indicated, and finally, residual operation is carried out on the multiplication result and the input feature images to obtain the feature images of the channel multi-scale information.

F herein _(hwc) Refers to the input set of feature maps, P _i Representing the ith pooling operation, conv represents a 1x1 convolution operation, softmax (x, -1) represents a regression operation in the channel direction on the feature map x, sum represents an addition operation,

representing multiplication at pixel level, F _(hwc) Representing a feature map that outputs specific multi-scale channel information.

S3, inputting the initial image feature map into a spatial multi-scale sensing module of the multi-task convolutional neural network to obtain a spatial multi-scale information feature map;

in the embodiment of the invention, as shown in the structure diagram of the spatial multiscale sensing module in fig. 4, firstly, an input group of feature images are respectively subjected to expansion convolution operation with expansion rates of 1, 3 and 5 to obtain three groups of feature images with different scales, the purpose of different expansion convolution operation is to solve the problem that the sizes of recognition targets are inconsistent, the expansion rate is 1 for small target recognition of an optical fiber cat indicator lamp, the expansion rate is 3 for medium target recognition of an optical fiber cat interface, the expansion rate is 5 for large target recognition of optical fiber bending, the three different expansion rates are convolved to share weight parameters, the purpose of reducing the number of model parameters can be achieved, the purpose of avoiding overfitting is achieved, in order to avoid that the three target sizes are not absolute due to the shooting angle of the optical fiber cat, so that the three scale information obtained by expansion convolution cannot meet the actual requirement, the feature images obtained by the expansion rate of 1, 3 and 5 are fused again, further multiscale information is obtained by adding the fused result, the feeling field of the module can recognize the different sizes, meanwhile, the feature images with the input into the space feature images with the specific space feature images with the space dimension information is obtained by multiplying the space feature images with the specific space dimension information, and the space feature images with the space feature images shown in the space images are obtained.

F herein _(hwc) Referring to the set of feature maps of the input, conv represents a 3x3 convolution operation, dconv@d ^2i-1 Sum represents an dilation convolution operation with a dilation rate of 2i-1 ₂ Representing the addition of two adjacent feature maps,

representing multiplication at pixel level, F _(hwc) Representing a feature map that outputs specific multi-scale spatial information.

As shown in a network model structure diagram in FIG. 2, two brand-new multi-scale perception modules are designed to overcome the defects that the space information is weakened due to the fact that an encoder downsamples a feature image layer by layer in a traditional U-shaped network and the global context information obtained by the encoder is weakened due to the fact that a decoder upsamples the feature image layer by layer, meanwhile, the network can simultaneously classify whether an optical fiber cat configuration image to be identified is nonstandard or not and realize detection of an nonstandard position by adopting a multi-task learning mode, firstly, the feature image obtained by target detection is subjected to three 3x3 convolution operations to further extract classification information and then is sent to a full connection layer, finally, classification results are obtained through a sigmod classifier, the generalization capability and prediction accuracy of the model are further improved through detection and classification combination, and a loss function used by the invention is specifically introduced below;

the loss function adopted in the training process of the multi-task convolutional neural network designed by the invention is L commonly used at present _IoU In combination with a binary cross entropy loss function, wherein the binary cross entropy loss functionThe numerical formula is as follows:

where i is the index to which the input image corresponds in the training set and N is the total number of training data set images. P is p ⁱ ∈[0,1]And

the probability that pixel i predicts as a fiber cat configuration specification and its own normative genuine label are represented respectively.

S4, inputting the multi-scale channel information feature map and the spatial multi-scale information feature map into a classifier of the multi-task convolutional neural network to obtain a preliminary classification result;

after a preliminary result is obtained based on a network model, judging a classification result obtained by the classifier, if the classification result of the standardization of the configuration of the fiber cat is yes, directly outputting the configuration standard, otherwise outputting the configuration non-standard, and meanwhile outputting the position of the configuration non-standard of the fiber cat under the condition of non-standard.

In the embodiment of the present invention, the fusing processing is performed on the multi-scale feature map and the spatial feature map to obtain a feature map with spatial multi-scale information, including:

multiplying the spatial feature map with the multi-scale feature map as spatial weight to obtain multi-scale information with spatial features;

and carrying out residual operation on the multi-scale information with the spatial characteristics and the input characteristic map to obtain the characteristic map of the spatial multi-scale information.

S5, post-processing is carried out on the preliminary classification result, so that the identification of the image is realized;

in the embodiment of the invention, the image to be identified is an optical fiber cat configuration image;

the method further comprises the steps of after classifying according to the feature map with the spatial multi-scale information to obtain a classification result multi-task convolutional neural network:

and determining whether the fiber cat configuration corresponding to the fiber cat configuration image is standard or not according to the classification result.

In the embodiment of the present invention, the determining, according to the classification result, whether the configuration of the optical fiber cat corresponding to the configuration image of the optical fiber cat is standard includes:

and if the fiber cat configuration corresponding to the fiber cat configuration image is determined to be nonstandard, outputting the position of the fiber cat configuration nonstandard in the fiber cat configuration image.

An image recognition apparatus comprising:

the channel multi-scale sensing module is used for inputting the initial image feature map into the channel multi-scale sensing module of the multi-task convolutional neural network to obtain a multi-scale feature map;

the spatial multi-scale sensing module is used for inputting the feature map into the spatial multi-scale sensing module of the multi-task convolutional neural network to obtain a spatial feature map;

the fusion module is used for carrying out fusion processing on the multi-scale feature map and the space feature map to obtain a feature map with space multi-scale information;

and the classification module is used for classifying according to the feature map with the spatial multi-scale information to obtain a classification result multi-task convolutional neural network.

A computer device comprising a memory having stored therein computer readable instructions which when executed by a processor perform steps such as an image recognition method.

A computer readable storage medium having stored thereon computer readable instructions for execution by a processor of the steps of the image recognition method.

The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art, who is within the scope of the present invention, should make equivalent substitutions or modifications according to the technical scheme of the present invention and the inventive concept thereof, and should be covered by the scope of the present invention.

Claims

1. An image recognition method is characterized by comprising the following steps:

preprocessing an image to be identified to obtain an initial image feature map;

2. The image recognition method according to claim 1, wherein the preprocessing the image to be recognized to obtain an initial image feature map includes:

3. The image recognition method according to claim 1, wherein inputting the initial image feature map into a channel multi-scale perception module of a multi-task convolutional neural network to obtain a multi-scale channel information feature map, comprises:

and carrying out residual operation on the multiplied result and the initial image feature map to obtain a feature map of the channel multi-scale information.

4. The image recognition method according to claim 1, wherein the feature map is input to a spatial multiscale sensing module of a multitasking convolutional neural network to obtain a feature map of multiscale spatial information, and specifically comprises:

5. The method according to claim 1, wherein the post-processing of the preliminary classification results enables the identification of images, and specifically comprises:

judging the preliminary classification result;

6. The method of claim 1, wherein the image to be identified is a fiber cat configuration image.

7. The method of claim 1, wherein the loss function employed by the multi-tasking convolutional neural network during training is L currently in common use _IoU In combination with a binary cross entropy loss function.

8. An image recognition apparatus, comprising:

9. A computer device comprising a memory having stored therein computer readable instructions which when executed by a processor implement the steps of the image recognition method of any of claims 1 to 7.

10. A computer readable storage medium having stored thereon computer readable instructions which when executed by a processor implement the steps of the image recognition method according to any of claims 1 to 7.