CN110046650B

CN110046650B - Express package bar code rapid detection method

Info

Publication number: CN110046650B
Application number: CN201910197753.9A
Authority: CN
Inventors: 许绍云; 易帆; 李功燕
Original assignee: Zhongke Weizhi Intelligent Manufacturing Technology Jiangsu Co ltd
Current assignee: Zhongke Weizhi Technology Co ltd
Priority date: 2019-03-15
Filing date: 2019-03-15
Publication date: 2021-05-28
Anticipated expiration: 2039-03-15
Also published as: CN110046650A

Abstract

The invention discloses a quick detection method for express package bar codes, which comprises the following steps: constructing a cascade multi-scale feature fusion network, wherein the feature fusion network comprises a feature reduction network and a feature retention network which are connected in sequence; inputting the express package barcode image into the feature fusion network, and obtaining a feature map through convolution extraction; and inputting the feature map into a detection module, and carrying out classification and coordinate regression on the feature map to obtain a detection result. The method has obvious advantages in the aspects of accuracy and speed, and can well solve the problem of bar code detection in practical application. In other practical application environments, the number of feature fusion layers in the network can be customized according to specific requirements, and the universality is realized.

Description

Express package bar code rapid detection method

Technical Field

The invention relates to the technical field of deep learning and image processing, in particular to a quick detection method for express package bar codes.

Background

Image processing techniques are an effective way to achieve barcode detection. In a relatively ideal physical environment, the traditional image algorithm generally adopts methods of extracting bar code edge texture feature information, obtaining a bar code area by means of corrosion and expansion of a morphological processing algorithm, or detecting a bar code edge straight line by adopting a Hough transform algorithm and the like to realize detection. The traditional image algorithm has high requirements on detection environment, and can realize good detection effect under specific physical environment. However, in reality, in a logistics package automatic sorting scene, due to the influence of illumination conditions, field environments and the like, the quality of the acquired pictures is uneven, for example, false detection and missing detection are easily caused by severe illumination change, complex background interference, bar code distortion, small bar code target and the like, so that the bar code detection difficulty is improved. Therefore, the research on the highly reliable, strong and stable bar code detection and positioning method has important significance for realizing efficient automatic sorting of logistics packages in complex environments.

The deep learning is different from the traditional image algorithm in that the characteristics need to be manually designed, the relevant characteristics can be extracted through self-learning, tasks such as characteristic extraction, screening and classification can be integrated in a network for optimization, and the method has remarkable advantages. Particularly, the convolutional neural network realizes the solving function of far-exceeding the traditional image algorithm aiming at the tasks of image recognition, image understanding, target detection, semantic segmentation and the like, and can be suitable for any scene task by virtue of good robustness.

The target detection model of the convolutional neural network is mainly divided into two types: a is with fast-Rcnn as the representative's two-stage detector based on candidate area, its principle is to use the deep convolution neural Network to obtain the characteristic map of the picture first, and then propose the Network (RPN) to produce the candidate area through the area with the characteristic map, and combine classifier, boundry regressor and non-maximum value to inhibit algorithm, etc. to classify and adjust the candidate area, and then obtain the valid target; the other one-stage detector based on regression represented by YOLO and SSD mainly adopts the principle that the feature diagram of an output layer or an intermediate layer is selected to be directly classified and subjected to coordinate regression. Different from the two-stage detector, the regression-based one-stage detector omits the process of generating a candidate region by a region proposal network, directly integrates target identification and target judgment, greatly saves the calculation cost and time consumption, and plays an important role in realizing real-time end-to-end target detection.

Disclosure of Invention

The purpose of the invention is realized by the following technical scheme.

Specifically, the invention provides a method for rapidly detecting express package bar codes, which comprises the following steps:

constructing a cascade multi-scale feature fusion network, wherein the feature fusion network comprises a feature reduction network and a feature retention network which are connected in sequence;

inputting the express package barcode image into the feature fusion network, and obtaining a feature map through convolution extraction;

and inputting the feature map into a detection module, and carrying out classification and coordinate regression on the feature map to obtain a detection result.

Preferably, the feature reduction network is used for feature graph size reduction and feature information extraction; the feature preserving network is used for feature semantic information fusion.

Preferably, the feature reduction network includes a plurality of feature fusion modules, each feature fusion module is composed of 3 × 3 convolutional layers, the output of the feature fusion module is obtained by splicing feature maps after convolution of the 3 convolutional layers, and the feature maps are adjusted to have the same size by transposing convolution and upsampling the second and third 3 × 3 convolutional layers, thereby completing splicing.

Preferably, the feature preserving network comprises a plurality of feature fusion modules, each feature fusion module is composed of 3 × 3 expansion convolutional layers, and the convolution results of the 3 expansion convolutional layers are directly spliced to output feature maps of the same size.

Preferably, the method further comprises the following steps: and adding a 1x1 convolution compression layer between adjacent feature fusion modules to reduce the number of feature maps output by the feature fusion modules.

Preferably, the method further comprises the following steps: and converting the 3x3 convolution layer in the feature reduction network, and splitting the convolution layer into a depth convolution and a dot-product convolution, wherein the number of convolution kernel channels of the depth convolution is 1, and the size of convolution kernel space of the dot-product convolution is 1.

More preferably, the method further comprises the following steps: and further optimizing the dot product convolution by adopting packet convolution, dividing dot product convolution kernels into a plurality of groups, performing convolution respectively, and finally merging and outputting results.

More preferably, the method further comprises the following steps: and adding channel recombination after the grouping convolution, and cross-mixing the feature maps of different groups.

Preferably, the detection module comprises a classifier, a regressor and a non-maxima suppression unit.

The invention has the advantages that: the method has obvious advantages in the aspects of accuracy and speed, and can well solve the problem of bar code detection in practical application. In other practical application environments, the number of feature fusion layers in the network can be customized according to specific requirements, and the universality is realized.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

fig. 1 shows a flow chart of a quick detection method for express package bar codes according to an embodiment of the invention;

FIG. 2 shows a schematic structural diagram of a feature fusion layer I according to an embodiment of the invention;

FIG. 3 shows a schematic structural diagram of a feature fusion layer II according to an embodiment of the invention;

FIG. 4 illustrates a feature map channel compression diagram according to an embodiment of the present invention;

FIG. 5 illustrates a schematic diagram of a standard convolution kernel and a depth separable convolution kernel according to an embodiment of the present invention;

FIG. 6 shows a schematic diagram of channel reorganization according to an embodiment of the present invention;

fig. 7 shows a schematic diagram of a standard 3x3 convolutional layer structure (left) and a modified 3x3 depth separable convolutional layer structure (right) in accordance with an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

According to the method, a light cascade multi-scale feature fusion network is designed by taking a one-stage detector as a basic framework according to the special scene of package sorting and the specificity of the target bar code to be detected, and features are directly extracted through convolution and the bar code is positioned through classification regression. The model structure and the flow chart are shown in fig. 1, wherein the feature extraction part mainly comprises two networks: feature reduction networks, feature retention networks. The feature reduction network mainly realizes feature graph size reduction and feature information extraction; the feature preserving network further fuses feature semantic information on the premise of keeping the size of the feature graph unchanged. The detection part consists of a classifier, a regressor and a non-maximum value suppression unit, and performs coordinate regression on the obtained feature map to further obtain a detection frame of the target.

The following provides a detailed description of various arrangements and specific parts of the present invention:

1Anchor setup

In the present invention, the anchor type setting is mainly to adjust two types of hyper-parameters, i.e. the aspect ratio R and the area size S. According to the analysis of the size of the bar code in the actual application field, the bar code size is smaller, so the set S is set to be 322,482,642,802,962, and the set R is set to be 1:1,1:2,1:3,1:4,4:1,3:1,2:1,1: 1. Therefore, the anchor can cover the size range of all the bar codes in the bar code data set, and the accuracy of detecting the bar codes finally can be effectively improved.

2 feature reduction network

As shown in FIG. 1, the invention designs a multi-scale feature fusion layer I, which enhances bar code features by fusing semantic information of different hierarchical scales, as shown in FIG. 2. Each feature fusion layer I corresponds to a feature fusion module consisting of 3 convolutional layers of 3 × 3, each convolutional layer having a step size (stride) of 2. The output of the fusion layer I is obtained by splicing (concat) the feature maps after the 3 convolutions, and the feature maps are adjusted to have the same size by up-sampling the second and third 3 × 3 convolution layers by means of transposition convolution (deconstruction), thereby completing the splicing. In this way, the size of the feature map is down-sampled by 2 times every time the feature map passes through one multi-scale feature fusion layer I, and the feature map size is rapidly reduced by repeatedly overlapping a plurality of feature fusion layers I. In order to ensure that the feature points of the barcode do not disappear under a series of downsampling, the number of the feature fusion modules is set to be T-4, and the feature map of the last layer is downsampled to 1/16 of the input image, so that the integrity of the barcode features is kept as much as possible while the semantic information is fully extracted.

3 feature retention network

In order to further extract semantic information features, after a feature fusion layer II is designed and added to a feature reduction network, the structure of the feature fusion layer II is shown in fig. 3, each feature fusion layer II is composed of 3 × 3 expanded convolution layers (scaled convolution), the convolution step size stride of each layer is 1, and 3 convolutions are directly spliced to output feature maps of the same size. The expansion convolution is adopted to replace the traditional standard convolution, the size of the feature graph is kept unchanged, the receptive field can be better improved, and the semantic features are enriched. The receptive field calculation mode of the dilation convolution is as follows:

RF＝(K+1)×(DR-1)+K

RF denotes the receptive field of the dilated convolution output, K denotes the convolution kernel size, and DR denotes the dilation rate.

4 optimized acceleration of network model

In order to increase the processing speed of the model and adapt to the actual application requirements, the invention further makes acceleration improvement on two aspects of the output channel and the convolution depth of the model respectively.

(1) Output feature map channel compression

In feature fusion, the number of channels is increased after feature map splicing, so that the complexity and the calculated amount of a model are increased. To solve the problem, a 1x1 convolution compression layer is added between adjacent fusion modules, and the number of feature maps after the modules are output is reduced. Specifically, the number V of convolution kernels of 1 × 1 is set so as to be less than the number M of channels of the input feature map, as shown in fig. 4.

(2) Convolution kernel depth separation

In the convolution calculation, the 3x3 standard convolution is completely converted into the improved depth separable convolution (depthwise separable convolution), so that the calculation speed of the convolution layer can be effectively improved. Taking a standard convolution kernel with a spatial size of K and a depth of M as an example (fig. 5(a)), the total number of convolution kernels is N, and the convolution can be divided into depth convolution (fig. 5(b)) and point convolution (fig. 5(c)) by conversion. The number of the convolution kernel channels of the deep convolution is 1, namely each channel of the feature map has an independent convolution kernel, the characteristics of cross channels in the standard convolution are stripped, and the extraction of the feature map space dimension features is concerned. The convolution kernel space size of the dot-product convolution is 1, which is equivalent to a convolution kernel of 1 × 1, and the dot-product convolution implements mixing and flowing of cross-channel feature information, as opposed to the deep convolution.

Generally, the number N of convolution kernels is set to be large (e.g., 128, 256, etc.), and the convolution kernel size is mainly 3x3 or 5x5, so the computation amount of the depth separable convolution is mainly focused on the point-by-point convolution. The 1x1 dot product convolution can be further optimized by adopting the grouping convolution, the dot product convolution kernel is divided into g groups, then the convolution is respectively carried out, and finally the results are combined and output, so that the calculation amount can be reduced by g times. However, the grouping convolution tends to make the inter-channel information independent, contrary to the cross-channel information mixing effect of 1x1 dot product convolution, so that adding channel recombination (fig. 6) after the grouping convolution cross-mixes the feature maps of different groups.

Through the above improvement, the structure of the 3 × 3 convolutional layer is changed as shown in fig. 7.

Assuming an output signature size of DxD, the standard convolution is compared to the computation of the improved depth separable convolution by:

in the invention, the size of the convolution kernel is 3x3, the grouping number g of the dot product convolution is 2, the number N of the convolution kernels is generally larger, and the second term of the ratio can be temporarily ignored, so that the speed of the improved depth separable convolution can be increased by about 18 times compared with the standard convolution theory, the complexity and the parameter quantity of the model are effectively reduced, and the speed of the model is improved.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A quick detection method for express package bar codes is characterized by comprising the following steps:

inputting the feature map into a detection module, and carrying out classification and coordinate regression on the feature map to obtain a detection result;

the feature reduction network comprises a plurality of feature fusion modules, each feature fusion module consists of 3 convolutional layers of 3x3, the output of the feature fusion modules is obtained by splicing feature graphs after convolution of the 3 convolutional layers, the feature graphs are adjusted to be the same in size by means of transposing convolution and up-sampling a second convolutional layer and a third convolutional layer of 3x3, and splicing is further completed;

the feature maintaining network comprises a plurality of feature fusion layers, each feature fusion layer is composed of 3x3 expansion convolutional layers, and the convolution results of the 3 expansion convolutional layers are directly spliced to output feature maps with the same size.

2. The express parcel barcode rapid detection method according to claim 1,

the feature reduction network is used for reducing the size of the feature map and extracting feature information; the feature preserving network is used for feature semantic information fusion.

3. The express package bar code rapid detection method according to claim 1, characterized by further comprising:

and adding a 1x1 convolution compression layer between adjacent feature fusion modules to reduce the number of feature maps output by the feature fusion modules.

4. The express package bar code rapid detection method according to claim 1, characterized by further comprising:

and converting the 3x3 convolution layer in the feature reduction network, and splitting the convolution layer into a depth convolution and a dot-product convolution, wherein the number of convolution kernel channels of the depth convolution is 1, and the size of convolution kernel space of the dot-product convolution is 1.

5. The express package barcode rapid detection method of claim 4, further comprising:

and further optimizing the dot product convolution by adopting packet convolution, dividing dot product convolution kernels into a plurality of groups, performing convolution respectively, and finally merging and outputting results.

6. The express package barcode rapid detection method of claim 5, further comprising:

and adding channel recombination after the grouping convolution, and cross-mixing the feature maps of different groups.

7. The express parcel barcode rapid detection method according to claim 1,

the detection module includes a classifier, a regressor, and a non-maxima suppression unit.