CN115358931B

CN115358931B - Image reconstruction method and device for warehouse logistics system

Info

Publication number: CN115358931B
Application number: CN202211283775.5A
Authority: CN
Inventors: 李华; 温燕飞; 谢立杰; 张桓
Original assignee: Y2T Technology Co Ltd
Current assignee: Y2T Technology Co Ltd
Priority date: 2022-10-20
Filing date: 2022-10-20
Publication date: 2023-01-03
Anticipated expiration: 2042-10-20
Also published as: CN115358931A; NL2035792A

Abstract

The invention discloses an image reconstruction method and device for a warehouse logistics system, and belongs to the technical field of artificial intelligence and image processing. The image reconstruction method comprises the steps of obtaining an image to be processed and a trained image reconstruction model, inputting the image to be processed into the image reconstruction model, carrying out feature extraction operation on the image to be processed by utilizing a feature extraction subnet, inputting a high-order feature map into an image reconstruction subnet, reconstructing the high-order feature map by utilizing the image reconstruction subnet, and then outputting a processed image with the resolution ratio larger than that of the image to be processed. The invention reconstructs the low-resolution label image by a deep learning technology, and improves the capability of the logistics management system for identifying the low-resolution label image and automatically reading information under the condition of keeping hardware equipment unchanged; compared with the existing image super-resolution reconstruction model, the image reconstruction model provided by the invention has good anti-interference capability on local noise and good model reconstruction effect.

Description

Image reconstruction method and device for warehouse logistics system

Technical Field

The invention belongs to the technical field of warehouse logistics systems and image processing, and particularly relates to an image reconstruction method and device for a warehouse logistics system.

Background

In a modern warehouse logistics management system, information labels are attached to each commodity, and in the processes of warehousing, checking, ex-warehouse and the like, automatic management of materials can be realized by acquiring label images and reading information in the label images through equipment. However, in some scenes, the resolution of the label image acquired by the device is low due to the limitation of factors such as material stacking positions, places and hardware performance, information in the label image cannot be identified and read with a certain probability, and great difficulty is brought to warehouse logistics management.

Disclosure of Invention

In view of the above phenomena, the present invention provides an image reconstruction method and apparatus for a warehouse logistics system, which reconstructs an acquired low-resolution tag image to improve the ability of identifying and reading information.

In order to achieve the above purpose, the solution adopted by the invention is as follows: an image reconstruction method for a warehouse logistics system, comprising the following steps:

p100, acquiring an image to be processed and a trained image reconstruction model;

the image reconstruction model comprises a feature extraction subnet and an image reconstruction subnet which are sequentially arranged, and the feature extraction subnet comprises a plurality of stacked feature extraction re-fusion modules;

p200, inputting the image to be processed into the image reconstruction model, and outputting a high-order characteristic diagram by using the characteristic extraction subnet after performing characteristic extraction operation on the image to be processed by using the characteristic extraction subnet, wherein the high-order characteristic diagram comprises high-order characteristic information of the image to be processed;

p300, inputting the high-order characteristic map into the image reconstruction subnet, and after the high-order characteristic map is reconstructed by using the image reconstruction subnet, outputting a processed image with a resolution larger than that of the image to be processed by the image reconstruction subnet;

the process of extracting the image characteristic information by the characteristic extraction and re-fusion module is expressed as the following mathematical model:

wherein MI represents a feature map input to the feature extraction complex fusion module, MO represents a feature map output from the feature extraction complex fusion module, and f _COV 1、f _COV 2、f _COV 3、f _COV 4、f _COV 5 and f _COV 6 each represent a normal convolution operation with a step size of 1, f _COV 1、f _COV 3 and f _COV 4 convolution kernel sizes of 3 x 3,f each _COV 2 and f _COV 5 convolution kernel sizes were all 5 x 5,f _COV 6 convolution kernel size 1 x 1, f _σ 1、f _σ 2、f _σ 3、f _σ 4、f _σ 5 and f _σ 6 each represent a non-linear activation function ReLU, f _T 1 and f _T 2 each represents a nonlinear activation function Tanh, x represents a product operation corresponding to an element,<•>showing the splicing operation of the characteristic graphs therein.

Preferably, a low-order convolutional layer is arranged in the image reconstruction model, and the low-order convolutional layer is arranged at the front end of the feature extraction subnet; and performing convolution operation on the input image to be processed by using the low-order convolution layer to generate a low-order feature map, and then inputting the low-order feature map into the feature extraction subnet.

Preferably, an integrated attention unit is arranged in the feature extraction complex fusion module, an integrated modulation chart generated and output by the integrated attention unit is used for modulating the feature chart MO, and the modulated MO feature chart is used as an output of the feature extraction complex fusion module.

Preferably, a hierarchical jump connection is provided in the image reconstruction model, and the low-level feature maps are input into each of the integrated attention units through the hierarchical jump connection.

Preferably, the mathematical model of the integrated attention unit is;

wherein β 1 represents (α 6+ α 7) characteristic diagram passing through f _T 1 feature map generated after function activation, beta 2 represents (alpha 6 multiplied by alpha 7) feature map passing through f _T 2, a characteristic diagram generated after the function is activated, and beta 3 represents that the alpha 8 characteristic diagram passes through f in sequence _COV 6 convolution and f _σ A feature map generated after activation of the 6 function, wherein MA represents a low-level feature map of the output of the low-level convolutional layer, the low-level feature map MA and feature maps alpha 6, alpha 7, beta 1, beta 2 and beta 3 (in the same feature refinement and blend module as the integrated attention unit) are simultaneously used as the input of the integrated attention unit,<•>representing the operation of splicing the feature maps, where x represents the element pairIn response to the product operation, tbC represents global maximum pooling of feature maps in the channel direction, tvS represents global variance pooling of feature maps in the spatial direction, f _FC 1 and f _FC 2 each represents a full-link layer, f _δ 1、f _δ 2 and f _δ 3 each represents a sigmoid activation function, and MS represents an integrated map generated and output by the integrated attention unit.

Preferably, a multi-level information integration unit is arranged in the feature extraction subnet, the multi-level information integration unit includes a splicing operation layer and an integration convolution layer, the splicing operation layer is used for splicing the feature maps output by the feature extraction re-fusion module (when the feature extraction re-fusion module does not have an integrated attention unit, the feature extraction re-fusion module outputs an MO feature map, and when the feature extraction re-fusion module sets an integrated attention unit, the feature extraction re-fusion module outputs an MO feature map modulated by the integrated modulation map) and generating a multi-dimensional feature map, and the integration convolution layer is used for performing convolution operation on the multi-dimensional feature map and then generating and outputting the high-order feature map.

The invention also provides an image reconstruction device for the warehouse logistics system, which comprises a processor and a memory, wherein the memory stores a computer program, and the processor is used for executing the image reconstruction method for the warehouse logistics system by loading the computer program.

The invention has the beneficial effects that:

(1) The method has the advantages that the low-resolution label image is reconstructed through a deep learning technology, the resolution is increased, and the identification and automatic information reading capabilities of the logistics management system on the low-resolution label image are improved under the condition that hardware equipment is kept unchanged;

(2) When the model based on the artificial neural network carries out super-resolution reconstruction on the image, the information source of the model is from the low-resolution image in the original input network, so whether various types of image information in the low-resolution image can be fully acquired and the characteristics of the image information can be effectively learned from the low-resolution image is the effect of influencing the super-resolution reconstruction effect of the model on the imageOne of the key factors of fruit; the front end of the characteristic extraction complex fusion module adopts convolution kernels with different sizes to carry out convolution operation, fuses characteristic graphs (alpha 1 and alpha 2) obtained after convolution and an input MI characteristic graph in different modes (element corresponding product and matrix addition) to extract characteristic information in the input MI characteristic graph from a plurality of different visual angles, can efficiently learn different types of image information characteristics in a more targeted manner, and then f _COV 3, in the convolution operation process, information useful for image reconstruction can be adaptively selected from a plurality of characteristic flows, noise and repeated information can be filtered more accurately, and the characteristic extraction effect is improved; at the rear end of the feature extraction complex fusion module, convolution operation and multi-mode fusion feature map fusion are carried out by adopting convolution kernels with different sizes again, and in order to increase the nonlinear fitting capacity of the model and facilitate f _COV 6, mining differential information among different branches during convolution operation, increasing the proportion of high-frequency information in the MO characteristic diagram, and setting two Tanh functions at the rear end of the characteristic extraction and re-fusion module to activate the fused characteristic diagram;

(3) Aiming at the structure of a feature extraction and re-fusion module, an integrated attention unit is designed, a first input end of the integrated attention unit takes a plurality of parallel feature maps (alpha 6, alpha 7, beta 1 and beta 2) as input, then global maximum pooling operation is respectively carried out on each feature map in the channel direction, and the relative relation among different spatial position features in different kinds of feature maps can be learned simultaneously, so that the integrated attention unit can synthesize more information to generate an integrated modulation map, has good anti-interference capability on local noise and good robustness of a model;

(4) In the invention, four feature maps of alpha 6, alpha 7, beta 1 and beta 2 are fused to generate a beta 3 feature map, and the fused beta 3 feature map is used in an integrated attention unit, and global variance pooling in the spatial direction is performed, and f is used _FC 1 full tie layer and f _δ 1 function activation, generating modulation vector theta 2, theta 2 and theta 1 as element corresponding product, theta 2 according to information in different kinds of characteristic diagramsThe internal modulation of each layer of theta 1 is realized, so that the characteristic diagram MO can be modulated towards the information quantity maximization direction by the integrated modulation diagram, and the effect of the model output image at the high-frequency detail position is improved;

(5) Generally, as image information is transmitted along the depth direction of a network, the abstraction of feature information is gradually improved, and meanwhile, the phenomenon of feature loss in different degrees can also occur, in order to reduce the negative influence of information loss on the performance of an image reconstruction model, the invention also introduces a low-order feature map into an attention unit to generate a theta 3 vector, performs internal modulation on each layer of theta 1 and cooperates with theta 2 to realize the effect of complementing the low-order feature with each hierarchical feature, and more effective information is retained in the following high-order feature map through the calibration of an integrated attention unit;

(6) The integrated attention unit creatively adopts a mechanism of front-rear two-section global maximum pooling and middle variance pooling calibration, so that various information is organically integrated, and the effect of the attention mechanism on improving the performance of the whole model is enhanced.

Drawings

FIG. 1 is a schematic structural diagram of an image reconstruction model according to example 1;

fig. 2 is a schematic structural diagram of a feature extraction and re-fusion module according to embodiment 1;

FIG. 3 is a schematic structural diagram of an image reconstruction model according to example 2;

fig. 4 is a schematic structural diagram of a feature extraction and re-fusion module in embodiment 2;

FIG. 5 is a schematic diagram of an integrated attention unit;

FIG. 6 is a schematic structural diagram of a feature extraction re-fusion module after setting a CBAM attention mechanism;

in the drawings:

1-to-be-processed image, 2-low-order convolution layer, 3-feature extraction subnet, 31-multi-level information integration unit, 32-splicing operation layer, 33-integration convolution layer, 4-feature extraction re-fusion module, 5-image reconstruction subnet, 51-first reconstruction convolution layer, 52-pixel reconstruction layer, 53-second reconstruction convolution layer, 6-processed image, 7-integrated attention unit and 8-level jump connection.

Detailed Description

The invention is further described below with reference to the accompanying drawings:

example 1:

as shown in fig. 1, in the present embodiment, the front end of the feature extraction subnet 3 is provided with the low-order convolution layer 2, the convolution kernel size of which is 3*3, and the step size is 1. The low-order convolutional layer 2 takes the image 1 to be processed as an input, and after convolution operation, the low-order convolutional layer 2 outputs a low-order feature map (channel is 48). Five feature extraction and re-fusion modules 4 connected in sequence are arranged in the feature extraction subnet 3, the first feature extraction and re-fusion module 4 takes a low-order feature map as input, the downstream feature extraction and re-fusion module 4 takes a feature map output by the feature extraction and re-fusion module 4 at the upstream of the downstream feature extraction and re-fusion module 4 as input, feature information sequentially passes through the feature extraction and re-fusion modules 4 in the depth direction transmission process of the feature extraction subnet 3, and the feature extraction and re-fusion modules 4 sequentially perform feature extraction operation. The tail end of the feature extraction subnet 3 is provided with a multilevel information integration unit 31, the multilevel information integration unit 31 comprises a splicing operation layer 32 and an integration convolution layer 33, the splicing operation layer 32 is used for splicing the feature maps output by the feature extraction complex fusion modules 4 to generate a multidimensional feature map, the convolution kernel size of the integration convolution layer 33 is 1*1, the step length is 1, and the integration convolution layer 33 is used for performing convolution operation on the multidimensional feature map to reduce the number of channels of the multidimensional feature map, realize feature map fusion output by the feature extraction complex fusion modules 4, and then generate and output a high-order feature map (the channel is 48).

The internal structure of the feature extraction complex fusion module 4 is shown in fig. 2, and the mathematical model of the feature extraction complex fusion module 4 is as described above. Assuming that the size of the image to be processed 1 input into the model is H × W × 3 (height × W × channel, the same applies below), the size of the feature map MI input into each feature extraction and fusion module 4 is H × W × 48, the size of the feature map MI output by each feature extraction and fusion module 4 and the size of the high-order feature map are also H × W × 48, and the sizes of the feature maps α 1, α 2, α 3, α 4, α 5, α 6, α 7, β 1, β 2, and β 3 inside each feature extraction and fusion module 4 are also H × W48 when the model runs.

The image reconstruction subnet 5 takes the high-order characteristic map as input, the image reconstruction subnet 5 is used for reconstructing the high-order characteristic map, and then outputs the processed image 6 with the resolution ratio larger than that of the image 1 to be processed. The image reconstruction subnet 5 of the present embodiment is implemented by the prior art, as shown in fig. 1, the image reconstruction subnet 5 includes a first reconstruction convolution layer 51 (convolution kernel 3*3, step size 1), a pixlshuffle layer 52 and a second reconstruction convolution layer 53 (convolution kernel 3*3, step size 1) which are sequentially arranged, and the first reconstruction convolution layer 51 outputs a characteristic graph with a size H W48G ² (G denotes a multiple of increasing the length and width of the image 1 to be processed), the pixelhuffle layer 52 outputs a characteristic map size GH GW 48, and the second reconstructed convolution layer 53 outputs a processed image 6 size GH GW 3.

The present embodiment trains an image reconstruction model by using a common DIV2K data set as a training set (obtaining its corresponding low-resolution image through bicubic down-sampling). And in the training process, an L1 function is used as a loss function, and when the epoch is 800, the loss function is already converged and the training is finished. Tests were then performed on a self-constructed test set (containing 300 self-acquired label images, corresponding to low resolution images obtained by a bicubic down-sampling), BSD100 and Urban100, and compared to existing models, with the results shown in the following table:

as can be seen from comparison of the data in the table, the image reconstruction model in this embodiment 1 obtains a better reconstruction effect on the self-constructed test set, the BSD100 and the Urban100, which fully illustrates the substantial improvement of the performance of the image reconstruction model provided by the present invention.

Example 2:

the overall structure of the image reconstruction model in this embodiment is shown in fig. 3, in which five feature extraction and re-fusion modules 4 are provided, the structure of the feature extraction and re-fusion modules 4 is shown in fig. 4, an integrated attention unit 7 is provided in the feature extraction and re-fusion modules 4, and the internal structure of the integrated attention unit 7 is shown in fig. 5. A hierarchical jump connection 8 is arranged in the image reconstruction model, and the low-level feature map is transmitted to each integrated attention unit 7 through the hierarchical jump connection 8.

All the low-order feature maps MA, the feature maps α 6, α 7, β 1, β 2, and β 3 have the size H × W × 48, and after the feature maps α 6, α 7, β 1, and β 2 are globally pooled and spliced in the channel direction in the integrated attention unit 7, the θ 1 feature map having the size H × W4 is obtained. The processing procedures of the low-order feature maps MA and the beta 3 feature maps are the same, and the feature maps are subjected to global variance pooling in the spatial direction to obtain vectors with the size of 1 × 48, and then the vectors are input into the full-connection layer. The number of input nodes of the full-connection layer is 48, the number of output nodes is 4, and a theta 2 or theta 3 vector (the length of each vector is 4) is obtained after the sigmoid function is activated. And then, respectively multiplying theta 2 and theta 3 by theta 1 in an element correspondence manner, and distributing different weight parameters for each layer of theta 1 by utilizing theta 2 and theta 3. And finally, performing global maximum pooling and activation on the obtained result in the channel direction to generate a two-dimensional matrix MS (with the size of H x W x 1), namely an integrated modulation chart generated and output by the integrated attention unit 7.

In contrast, the present embodiment also sets an existing CBAM attention mechanism in the feature extraction and re-fusion module 4 (as shown in fig. 6), and employs the architecture in fig. 1, and constructs a comparative example. The image reconstruction model and comparative example in example 2 were trained using DIV2K as the training set, with the L1 function as the loss function, and epoch set to 800. After training was complete, the test results are shown in the following table:

comparing the test results of example 1, example 2 and comparative example, it can be seen that the improvement of the model performance is limited after setting the CBAM attention mechanism, compared with the improvement of the model performance obtained after setting the integrated attention unit 7 in example 2, which fully illustrates the effectiveness and superiority of the integrated attention unit 7.

The above-mentioned embodiments only express the specific embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention.

Claims

1. An image reconstruction method for a warehouse logistics system is characterized by comprising the following steps: the method comprises the following steps:

wherein MI represents a feature map input to the feature extraction complex fusion module, MO represents a feature map output from the feature extraction complex fusion module, and f _COV 1、f _COV 2、f _COV 3、f _COV 4、f _COV 5 and f _COV 6 each represent a normal convolution operation with a step size of 1, f _COV 1、f _COV 3 and f _COV 4 are each 3 x 3,f _COV 2 and f _COV 5 convolution kernel sizes were all 5 x 5,f _COV 6 convolution kernel size 1 x 1, f _σ 1、f _σ 2、f _σ 3、f _σ 4、f _σ 5 and f _σ 6 each represent a non-linear activation function ReLU, f _T 1 and f _T 2 each represents a nonlinear activation function Tanh, x represents a product operation corresponding to an element,<•>showing the splicing operation of the characteristic graphs therein.

2. The image reconstruction method for the warehouse logistics system of claim 1, wherein: the image reconstruction model is provided with a low-order convolutional layer, and the low-order convolutional layer is arranged at the front end of the feature extraction subnet; and performing convolution operation on the input image to be processed by using the low-order convolution layer to generate a low-order feature map, and then inputting the low-order feature map into the feature extraction subnet.

3. The image reconstruction method for the warehouse logistics system of claim 2, wherein: and an integrated attention unit is arranged in the characteristic extraction and re-fusion module, an integrated modulation chart output by the integrated attention unit is used for modulating the characteristic chart MO, and the modulated MO characteristic chart is used as the output of the characteristic extraction and re-fusion module.

4. The image reconstruction method for the warehouse logistics system of claim 3, wherein: and the image reconstruction model is provided with hierarchical jump connections, and the low-level feature maps are input into the integrated attention units through the hierarchical jump connections.

5. The image reconstruction method for the warehouse logistics system of claim 4, wherein: the mathematical model of the integrated attention unit is;

wherein β 1 represents (α 6+ α 7) characteristic diagram passing through f _T 1 feature map generated after function activation, beta 2 represents (alpha 6 multiplied by alpha 7) feature map passing through f _T 2, a characteristic diagram generated after the function is activated, and beta 3 represents that the alpha 8 characteristic diagram passes through f in sequence _COV 6 convolution and f _σ 6, MA represents a low-level feature map of the output of the low-level convolutional layer, the feature maps alpha 6, alpha 7, beta 1, beta 2, beta 3 and MA are simultaneously used as the input of the integrated attention unit,<•>representing the splicing operation of the feature map, x represents the product operation of element correspondence, tbC represents the global maximum pooling of the feature map in the channel direction, tvS represents the global variance pooling of the feature map in the space direction, f _FC 1 and f _FC 2 each represents a full-link layer, f _δ 1、f _δ 2 and f _δ 3 each represents a sigmoid activation function, and MS represents an integrated map generated and output by the integrated attention unit.

6. The image reconstruction method for the warehouse logistics system of claim 5, wherein: and a multi-level information integration unit is arranged in the feature extraction sub-network, the multi-level information integration unit comprises a splicing operation layer and an integration convolution layer, the splicing operation layer is used for splicing the feature maps output by the feature extraction and re-fusion modules and generating a multi-dimensional feature map, and the integration convolution layer is used for performing convolution operation on the multi-dimensional feature map and then generating and outputting the high-order feature map.

7. An image reconstruction device for a warehouse logistics system, comprising a processor and a memory, the memory storing a computer program, characterized in that: the processor is configured to execute the image reconstruction method for the warehouse logistics system according to any one of claims 1 to 6 by loading the computer program.