CN114913516A

CN114913516A - Tobacco retail license identification method and system

Info

Publication number: CN114913516A
Application number: CN202210383762.9A
Authority: CN
Inventors: 单宇翔; 金泳; 许珍珍; 杜旋; 郁钢; 高扬华; 岑涌; 陆海良; 王骋; 任琴
Original assignee: China Tobacco Zhejiang Industrial Co Ltd
Current assignee: China Tobacco Zhejiang Industrial Co Ltd
Priority date: 2022-04-12
Filing date: 2022-04-12
Publication date: 2022-08-16

Abstract

The application discloses a method and a system for identifying a tobacco retail license, wherein the identification method comprises the following steps: preprocessing the image of the tobacco retail license to obtain a preprocessed image; inputting the preprocessed image into a convolutional neural network to obtain a first visual feature sequence; the convolutional neural network comprises at least one convolutional submodule, each convolutional submodule comprises a first dense connection network module, a convolutional layer and a first pooling layer which are sequentially connected, and input data of each convolutional layer is dense connection data of output data of all convolutional submodules before the convolutional layer; and coding and decoding the first visual characteristic sequence to obtain the license number, the enterprise name and the valid period on the tobacco retail license. The method and the device have the advantages that segmentation-free identification and tobacco sales information extraction of the tobacco retail license image are achieved, the sensitivity to diversified acquisition conditions and other interference factors is reduced, and the accuracy of feature extraction is improved.

Description

Tobacco retail license identification method and system

Technical Field

The application relates to the field of image processing, in particular to a method and a system for identifying a tobacco retail license.

Background

Tobacco marketing Robotic Process Automation (RPA) involves intelligent identification of multiple scene objects in tobacco operations. For example, in order to verify the promotional effect of tobacco marketing campaigns on the product market, it is necessary to perform in-depth analysis and mining of various textual and graphical information related to the product retail process. Robotic process automation (sometimes referred to as business process automation) involves the use of "robots" as a digital workforce to perform common tasks such as processing invoices or transferring data from one database or spreadsheet to another. Most of the data that has to be processed is unstructured, such as text in e-mail or even video and pictures, which requires complex algorithms. The AI robot can use techniques such as computer vision to recognize different types of documents or natural language processing to understand the context of an email message. However, there is a need in intelligent tobacco operations management to identify and extract information from tobacco retail licenses from different regions. On the one hand, images of tobacco retail licenses have complex unstructured features, which increases the difficulty of extracting information therefrom. The existing feature extraction method of the tobacco retail license is influenced by diversified acquisition conditions and other interference factors, and the feature extraction effect is poor.

Disclosure of Invention

The application provides a tobacco retail license identification method and system, which realize non-segmentation identification and tobacco sales information extraction of tobacco retail license images, reduce sensitivity to diversified acquisition conditions and other interference factors, and improve accuracy of feature extraction.

The application provides a method for identifying a tobacco retail license, which comprises the following steps:

preprocessing the image of the tobacco retail license to obtain a preprocessed image;

inputting the preprocessed image into a convolutional neural network to obtain a first visual feature sequence; the convolutional neural network comprises at least one convolutional submodule, each convolutional submodule comprises a first dense connection network module, convolutional layers and a first pooling layer which are connected in sequence, and input data of each convolutional layer is dense connection data of output data of all convolutional submodules before the convolutional layer;

and coding and decoding the first visual characteristic sequence to obtain the license number, the enterprise name and the valid period on the tobacco retail license.

Preferably, the encoding and decoding of the first visual feature sequence comprises:

inputting the first visual characteristic sequence into a first bidirectional long-short term memory module for carrying out serialized association processing to obtain a second visual characteristic sequence of the first bidirectional long-short term memory module;

fusing the second visual characteristic sequence obtained by the first bidirectional long-short term memory module with input data thereof to obtain an original characteristic diagram corresponding to the first bidirectional long-short term memory module, performing one-dimensional self-attention operation on the original characteristic diagram to obtain an operation result, and inputting the operation result into the full-connection layer to obtain a first attention characteristic diagram corresponding to the first bidirectional long-short term memory module;

for each second bidirectional long-short term memory module behind the first bidirectional long-short term memory module, performing serialized association processing by taking output data of the first bidirectional long-short term memory module or the second bidirectional long-short term memory module in front of the second bidirectional long-short term memory module as input, and obtaining a first attention feature map and an original feature map corresponding to the second bidirectional long-short term memory module;

and respectively calculating pixel products between the first attention feature map and the original feature map corresponding to the first bidirectional long-short term memory module and each second bidirectional long-short term memory module, adding all the pixel products of the same pixel points to generate a second attention feature map, identifying the license number, the enterprise name and the valid period according to the second attention feature map, and outputting an identification result.

Preferably, the inputting the preprocessed image into a convolutional neural network to obtain a first visual feature sequence, specifically including:

sequentially obtaining output data of all convolution sub-modules;

and inputting the dense connection data of the output data of all the convolution sub-modules into a second pooling layer, and taking the obtained pooling result as a first visual feature sequence.

Preferably, the densely connected data of the input convolutional layer is a tensor which maps and connects the output data of all convolutional sub-modules before the convolutional layer.

Preferably, the image of the tobacco retail license is preprocessed to obtain a preprocessed image, and the preprocessing specifically includes:

filtering and enhancing the image of the tobacco retail license;

and performing content-based alignment processing on the processing result of the filtering and enhancing processing.

Preferably, the content-based alignment process specifically includes:

adopting a column scanning algorithm to extract edges of the image;

calculating the average slope of the extracted edge clusters;

the image is rotated based on the average slope such that horizontal lines in the processed image are parallel to the image lateral edges.

The application also provides an identification system of the tobacco retail license, which comprises a preprocessing module, a first visual characteristic sequence obtaining module and a coding and decoding module;

the preprocessing module is used for preprocessing the image of the tobacco retail license to obtain a preprocessed image;

the first visual characteristic sequence obtaining module is used for inputting the preprocessed image into a convolutional neural network to obtain a first visual characteristic sequence; the convolution neural network comprises at least one convolution submodule, the convolution submodule comprises a first intensive connection network module, a convolution layer and a first pooling layer which are connected in sequence, and the first intensive connection network module is connected with the first pooling layers of all convolution submodules in front of the convolution submodule;

and the coding and decoding module is used for coding and decoding the first visual characteristic sequence to obtain the license number, the enterprise name and the valid period on the tobacco retail license.

Preferably, the convolutional neural network further comprises a second densely connected network module and a second pooling layer, the second densely connected network module is connected with the first pooling layers of all the convolutional sub-modules, and the second pooling layer is connected with the second densely connected network module;

the second dense connection network module is used for connecting the output data of all the convolution sub-modules into dense connection data;

the pooling results of the second pooling layer form a first sequence of visual features.

Preferably, the codec module comprises a first codec submodule, at least one second codec submodule and a first decoder;

the first coding and decoding submodule comprises a first bidirectional long-short term memory module and a second decoder, and the first bidirectional long-short term memory module is used for carrying out serialization association processing on the first visual feature sequence to obtain a second visual feature sequence of the first bidirectional long-short term memory module; the second decoder is used for fusing the second visual characteristic sequence obtained by the first bidirectional long-short term memory module with input data thereof to obtain an original characteristic diagram corresponding to the first bidirectional long-short term memory module, performing one-dimensional self-attention operation on the original characteristic diagram to obtain an operation result, and inputting the operation result into the full-connection layer to obtain a first attention characteristic diagram corresponding to the first bidirectional long-short term memory module;

the second coding and decoding sub-module comprises a second bidirectional long-short term memory module and a third decoder, and the second bidirectional long-short term memory module is used for performing serialized association processing on output data of the first bidirectional long-short term memory module or the second bidirectional long-short term memory module in front of the second bidirectional long-short term memory module as input to obtain a second visual characteristic sequence of the second bidirectional long-short term memory module; the third decoder is used for obtaining a first attention feature map and an original feature map corresponding to the second bidirectional long-short term memory module;

the first decoder is used for calculating pixel products between the first attention feature map and the original feature map corresponding to the first bidirectional long-short term memory module and each second bidirectional long-short term memory module respectively, adding all the pixel products of the same pixel points to generate a second attention feature map, identifying the license number, the enterprise name and the valid period according to the second attention feature map, and outputting an identification result.

Preferably, the preprocessing module comprises a filter enhancement module and an alignment module;

the filtering enhancement module is used for filtering and enhancing the image of the tobacco retail license;

and the alignment module is used for carrying out content-based alignment processing on the processing result of the filtering and enhancing processing.

Further features of the present application and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which is to be read in connection with the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate embodiments of the application and together with the description, serve to explain the principles of the application.

FIG. 1 is a flow chart of a method of identifying a tobacco retail license provided herein;

FIG. 2 is a block diagram of an identification system for a tobacco retail license as provided herein.

Detailed Description

Various exemplary embodiments of the present application will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present application unless specifically stated otherwise.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the application, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail, but are intended to be part of the specification where appropriate.

In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.

As shown in fig. 1, the method for identifying a tobacco retail license includes:

s110: and preprocessing the image of the tobacco retail license to obtain a preprocessed image.

Specifically, as an embodiment, the image of the tobacco retail license is preprocessed to obtain a preprocessed image, and the method includes the following steps:

s1101: and filtering and enhancing the image of the tobacco retail license.

As an embodiment, the filtering is performed using a gaussian filtering method. And (3) performing enhancement treatment by adopting an expansion corrosion method.

It will be appreciated that other filtering methods (e.g., mean filtering) may be employed to filter the image. Other image enhancement methods (e.g., neighborhood enhancement) may be used for enhancement.

S1102: and performing content-based alignment processing on the processing result of the filtering and enhancing processing.

Specifically, the content-based alignment process includes:

s11021: and (4) carrying out edge extraction on the image by adopting a column scanning algorithm to obtain an edge cluster.

S11022: the average slope k of the extracted edge clusters is calculated.

S11023: the image is rotated based on the average slope k so that the horizontal lines in the processed image are parallel to the image lateral edges.

S120: inputting the preprocessed image into a convolutional neural network to obtain a first visual feature sequence I ₁ 。

The convolutional neural network comprises at least one convolutional submodule (please refer to fig. 2), each convolutional submodule comprises a first dense connection network module, a convolutional layer and a first pooling layer which are connected in sequence, and input data of each convolutional layer is dense connection data of output data of all convolutional submodules before the convolutional layer.

Since in a dense connection structure, no change occurs after the generation of features, this results in that although some shallow features may have great potential at deep layers and can play a great role after fine-tuning, they are still considered redundant at the deep layers of the network and are therefore often trimmed away. In view of the above, it is preferable that the first densely connected network module further includes an activation function, the above-mentioned densely connected data is input to the activation function, and output data of the activation function is used as input data of the convolutional layer. The activation function can extract potential redundant features for reactivation, so that the potential redundant features can better adapt to feature learning of a deep network, and the feature reuse efficiency of the network is exerted to the maximum extent.

Thus, connecting with the residualConnecting each convolution layer in a feedforward mode, enabling the first layer of convolution layer to take the feature maps of all the layers in front of the first layer of convolution layer as input, and calculating a visual feature sequence I of the first layer of convolution layer through a nonlinear change function _l ：

I _l ＝H _l ([I ₀ ,I ₁ ,…,I _l-1 ])

Wherein H _l Is a nonlinear transformation function per convolution layer, I ₀ ,I ₁ ,…,I _l-1 Is the visual signature sequence output by all convolution sub-modules before the l-th layer.

Specifically, as an embodiment, the dense connection data of the input convolutional layer is a tensor in which the output data of all convolution submodules before the convolutional layer is mapped and connected.

Preferably, the convolutional neural network further comprises a second densely connected network module and a second pooling layer, the second densely connected network module being disposed between the last convolutional sub-module and the second pooling layer. Specifically, the second pooling layer is a maximum pooling layer.

On the basis of the preferred embodiment, the preprocessed image is input into a convolutional neural network to obtain a first visual feature sequence I ₁ The method specifically comprises the following steps:

s1201: and sequentially obtaining output data of all convolution submodules.

S1202: inputting the dense connection data of the output data of all the convolution sub-modules into a second pooling layer, and taking the obtained pooling result as a first visual feature sequence I ₁ 。

Because the characteristics of all layers are connected together by adopting the dense connection network module, each layer receives the gradient signals of all the previous layers during the back transmission, thereby reducing the problem of gradient dissipation in the training process to a certain extent. In addition, because a large number of features are multiplexed, a large number of features can be generated by using a small number of convolution kernels, and the size of the final convolution neural network model is small.

In the convolutional neural network structure, because each convolutional layer receives the characteristics of all the previous layers as input, in order to avoid that the characteristic dimension grows too fast along with the increase of the number of network layers, when down-sampling is performed, the characteristic dimension is firstly compressed to half of the current input through the convolutional layer, and then pooling is performed.

S130: for the first visual characteristic sequence I ₁ And coding and decoding to obtain the license number, the enterprise name and the valid period on the tobacco retail license.

Specifically, a plurality of coding and decoding sub-modules and a first decoder are used for coding and decoding the first visual characteristic sequence, each coding and decoding sub-module comprises a bidirectional long-short term memory module and a decoder, wherein the bidirectional long-short term memory module connected with the convolutional neural network is marked as a first bidirectional long-short term memory module, and the subsequent bidirectional long-short term memory module is marked as a second bidirectional long-short term memory module.

Based on the structure of the coding and decoding, the coding and decoding of the first visual characteristic sequence comprises:

s1301: the first visual characteristic sequence I ₁ Inputting the first bidirectional long-short term memory module for sequential association processing to obtain a second visual feature sequence I of the first bidirectional long-short term memory module ₂ 。

S1302: obtaining a second visual characteristic sequence I of the first bidirectional long-term and short-term memory module ₂ With which the data (i.e. the first sequence of visual features I) is input ₁ ) Fusing to obtain an original feature map D corresponding to the first bidirectional long-term and short-term memory module ₀ For the original feature map D ₀ Performing one-dimensional self-attention operation to obtain operation result, inputting the operation result into the full-connection layer to obtain a first attention feature map D corresponding to the first bidirectional long-term and short-term memory module ₁ 。

S1303: for each second bidirectional long-short term memory module after the first bidirectional long-short term memory module, outputting a second visual characteristic sequence I of the first bidirectional long-short term memory module before the second bidirectional long-short term memory module ₂ Or a second visual characteristic sequence I output by a second bidirectional long-short term memory module ₂ Performing serialization association processing as input, and obtaining the second bidirectional long-term short-term memoryFirst attention feature map D corresponding to module ₁ And original feature map D ₀ 。

S1304: calculating a first attention feature map D corresponding to the first bidirectional long-short term memory module and each second bidirectional long-short term memory module respectively ₁ And original feature map D ₀ The pixel products between (thus, for each pixel point, there is a pixel product corresponding to the first bidirectional long-short term memory module and each second bidirectional long-short term memory module), and all the pixel products of the same pixel point are added to generate a second attention feature map D ₂ According to the second attention feature map D ₂ And identifying the license number, the enterprise name and the valid period and outputting an identification result.

Based on the identification method, the application provides an identification system of a tobacco retail license, which comprises a preprocessing module 210, a first visual feature sequence obtaining module 220 and a coding and decoding module 230.

The preprocessing module 210 is configured to preprocess the image of the tobacco retail license to obtain a preprocessed image.

The pre-processing module 210 includes a filter enhancement module and an alignment module. The filtering enhancement module is used for filtering and enhancing the image of the tobacco retail license. And the alignment module is used for carrying out content-based alignment processing on the processing result of the filtering and enhancing processing.

The first visual feature sequence obtaining module 220 is configured to input the preprocessed image into a convolutional neural network to obtain a first visual feature sequence.

As shown in fig. 2, the convolutional neural network includes at least one convolutional submodule 2201, the convolutional submodule includes a first densely connected network module 22011, a convolutional layer 22012 and a first pooling layer 22013, which are connected in sequence, the first densely connected network module 22011 is connected to the first pooling layers of all convolutional submodules before the convolutional submodule, and the pooling result of the first pooling layer of the last convolutional submodule is taken as a first visual feature sequence.

Preferably, the convolutional neural network further comprises a second densely connected network module 2202 and a second pooling layer 2203, the second densely connected network module 2202 being connected with the first pooling layer of all convolutional sub-modules, the second pooling layer 2203 being connected with the second densely connected network module 2202.

The second densely connected network module 2202 is used to concatenate the output data of all the convolution sub-modules into densely connected data.

The pooling results of the second pooling layer 2203 form a first visual feature sequence.

The encoding and decoding module 230 is configured to encode and decode the first visual characteristic sequence, and obtain the license number, the enterprise name, and the validity period of the tobacco retail license.

The codec module includes a first codec sub-module 2301, at least one second codec sub-module 2302, and a first decoder 2303.

The first codec sub-module 2301 includes a first bidirectional long-term and short-term memory module 23011 and a second decoder 23012, and the first bidirectional long-term and short-term memory module 23011 is configured to perform a serialization association process on the first visual feature sequence to obtain a second visual feature sequence of the first bidirectional long-term and short-term memory module. The second decoder 23012 is configured to fuse the second visual feature sequence obtained by the first bidirectional long-short term memory module with the input data thereof to obtain an original feature map corresponding to the first bidirectional long-short term memory module, perform a one-dimensional self-attention operation on the original feature map to obtain an operation result, and input the operation result into the full connection layer to obtain a first attention feature map corresponding to the first bidirectional long-short term memory module.

The second codec sub-module 2302 includes a second bidirectional long-term short-term memory module 23021 and a third decoder 23022. The second bidirectional long-short term memory module 23021 is configured to perform a serialization association process on output data of the first bidirectional long-short term memory module or the second bidirectional long-short term memory module before the second bidirectional long-short term memory module as input, so as to obtain a second visual characteristic sequence of the second bidirectional long-short term memory module. The third decoder 23022 is used to obtain the first attention feature map and the primitive feature map corresponding to the second bidirectional long-short term memory module.

The first decoder 2303 is configured to calculate pixel products between the first attention feature map and the original feature map corresponding to the first bidirectional long-short term memory module and each second bidirectional long-short term memory module, respectively, add all pixel products of the same pixel point to generate a second attention feature map, identify the license number, the enterprise name, and the validity period according to the second attention feature map, and output an identification result.

In this application, the pooling layer in a convolutional neural network can be viewed as a special mean weighted attention mechanism. As is known, the bidirectional long-term and short-term memory model is also an attention mechanism, so that the application adopts the dual attention mechanism to realize the non-segmentation identification of the tobacco retail license image and the extraction of the tobacco sales information, reduces the sensitivity to diversified acquisition conditions and other interference factors, and can obtain better effect in most practical application scenes.

Although some specific embodiments of the present application have been described in detail by way of example, it should be understood by those skilled in the art that the above examples are for illustrative purposes only and are not intended to limit the scope of the present application. It will be appreciated by those skilled in the art that modifications may be made to the above embodiments without departing from the scope and spirit of the present application. The scope of the application is defined by the appended claims.

Claims

1. A method of identifying a tobacco retail license, comprising:

inputting the preprocessed image into a convolutional neural network to obtain a first visual feature sequence; the convolutional neural network comprises at least one convolutional submodule, each convolutional submodule comprises a first dense connection network module, a convolutional layer and a first pooling layer which are connected in sequence, and input data of each convolutional layer is dense connection data of output data of all convolutional submodules before the convolutional layer;

2. A method of identifying a tobacco retail license according to claim 1, wherein encoding and decoding the first sequence of visual features comprises:

inputting the first visual characteristic sequence into a first bidirectional long-short term memory module for serialized association processing to obtain a second visual characteristic sequence of the first bidirectional long-short term memory module;

fusing the second visual feature sequence obtained by the first bidirectional long-short term memory module with input data thereof to obtain an original feature map corresponding to the first bidirectional long-short term memory module, performing one-dimensional self-attention operation on the original feature map to obtain an operation result, and inputting the operation result into a full connection layer to obtain a first attention feature map corresponding to the first bidirectional long-short term memory module;

for each second bidirectional long-short term memory module after the first bidirectional long-short term memory module, performing serialized association processing by taking output data of the first bidirectional long-short term memory module or the second bidirectional long-short term memory module before the second bidirectional long-short term memory module as input, and obtaining a first attention feature map and an original feature map corresponding to the second bidirectional long-short term memory module;

3. The method according to claim 1 or 2, wherein the step of inputting the preprocessed image into a convolutional neural network to obtain a first visual feature sequence includes:

sequentially obtaining output data of all convolution sub-modules;

and inputting the dense connection data of the output data of all the convolution sub-modules into a second pooling layer, and taking the obtained pooling result as the first visual feature sequence.

4. The method of identifying a tobacco retail license of claim 1 wherein the densely connected data input to the convolutional layer is a tensor which map connects the output data of all convolutional sub-modules before the convolutional layer.

5. The method for identifying a tobacco retail license according to claim 1, wherein the image of the tobacco retail license is preprocessed to obtain a preprocessed image, and the method specifically comprises:

filtering and enhancing the image of the tobacco retail license;

6. A method of identifying a tobacco retail license according to claim 5, wherein the content-based alignment process specifically comprises:

adopting a column scanning algorithm to extract edges of the image;

calculating the average slope of the extracted edge clusters;

and rotating the image based on the average slope so that the horizontal line in the processed image is parallel to the transverse edge of the image.

7. A recognition system of tobacco retail license is characterized by comprising a preprocessing module, a first visual feature sequence obtaining module and a coding and decoding module;

the first visual feature sequence obtaining module is used for inputting the preprocessed image into a convolutional neural network to obtain a first visual feature sequence; the convolutional neural network comprises at least one convolutional submodule, the convolutional submodule comprises a first dense connection network module, a convolutional layer and a first pooling layer which are sequentially connected, and the first dense connection network module is connected with the first pooling layers of all convolutional submodules before the convolutional submodule;

8. A tobacco retail license identification system according to claim 7, characterised in that the convolutional neural network further comprises a second densely connected network module connected with the first pooling layer of all convolutional sub-modules and a second pooling layer connected with the second densely connected network module;

the pooling results of the second pooling layer form the first visual feature sequence.

9. A tobacco retail license identification system according to claim 7, characterised in that the codec module comprises a first codec sub-module, at least one second codec sub-module and a first decoder;

the first coding and decoding submodule comprises a first bidirectional long and short term memory module and a second decoder, wherein the first bidirectional long and short term memory module is used for carrying out serialization association processing on a first visual feature sequence to obtain a second visual feature sequence of the first bidirectional long and short term memory module; the second decoder is used for fusing the second visual feature sequence obtained by the first bidirectional long-short term memory module with input data thereof to obtain an original feature map corresponding to the first bidirectional long-short term memory module, performing one-dimensional self-attention operation on the original feature map to obtain an operation result, and inputting the operation result into a full connection layer to obtain a first attention feature map corresponding to the first bidirectional long-short term memory module;

the second coding and decoding sub-module comprises a second bidirectional long-short term memory module and a third decoder, wherein the second bidirectional long-short term memory module is used for carrying out serialization association processing on output data of the first bidirectional long-short term memory module or the second bidirectional long-short term memory module before the second bidirectional long-short term memory module as input to obtain a second visual characteristic sequence of the second bidirectional long-short term memory module; the third decoder is used for obtaining a first attention feature map and an original feature map corresponding to the second bidirectional long-short term memory module;

the first decoder is used for calculating pixel products between the first attention feature map and the original feature map corresponding to the first bidirectional long-short term memory module and each second bidirectional long-short term memory module respectively, adding all the pixel products of the same pixel point to generate a second attention feature map, and identifying the license number, the enterprise name and the valid period according to the second attention feature map and outputting an identification result.

10. A tobacco retail license identification system according to claim 7, characterized in that the pre-processing module comprises a filter enhancement module and an alignment module;