CN114913516A - Tobacco retail license identification method and system - Google Patents

Tobacco retail license identification method and system Download PDF

Info

Publication number
CN114913516A
CN114913516A CN202210383762.9A CN202210383762A CN114913516A CN 114913516 A CN114913516 A CN 114913516A CN 202210383762 A CN202210383762 A CN 202210383762A CN 114913516 A CN114913516 A CN 114913516A
Authority
CN
China
Prior art keywords
term memory
short term
memory module
module
bidirectional long
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210383762.9A
Other languages
Chinese (zh)
Inventor
单宇翔
金泳
许珍珍
杜旋
郁钢
高扬华
岑涌
陆海良
王骋
任琴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Tobacco Zhejiang Industrial Co Ltd
Original Assignee
China Tobacco Zhejiang Industrial Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Tobacco Zhejiang Industrial Co Ltd filed Critical China Tobacco Zhejiang Industrial Co Ltd
Priority to CN202210383762.9A priority Critical patent/CN114913516A/en
Publication of CN114913516A publication Critical patent/CN114913516A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/18Extraction of features or characteristics of the image
    • G06V30/1801Detecting partial patterns, e.g. edges or contours, or configurations, e.g. loops, corners, strokes or intersections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/1918Fusion techniques, i.e. combining data from various sources, e.g. sensor fusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a method and a system for identifying a tobacco retail license, wherein the identification method comprises the following steps: preprocessing the image of the tobacco retail license to obtain a preprocessed image; inputting the preprocessed image into a convolutional neural network to obtain a first visual feature sequence; the convolutional neural network comprises at least one convolutional submodule, each convolutional submodule comprises a first dense connection network module, a convolutional layer and a first pooling layer which are sequentially connected, and input data of each convolutional layer is dense connection data of output data of all convolutional submodules before the convolutional layer; and coding and decoding the first visual characteristic sequence to obtain the license number, the enterprise name and the valid period on the tobacco retail license. The method and the device have the advantages that segmentation-free identification and tobacco sales information extraction of the tobacco retail license image are achieved, the sensitivity to diversified acquisition conditions and other interference factors is reduced, and the accuracy of feature extraction is improved.

Description

Tobacco retail license identification method and system
Technical Field
The application relates to the field of image processing, in particular to a method and a system for identifying a tobacco retail license.
Background
Tobacco marketing Robotic Process Automation (RPA) involves intelligent identification of multiple scene objects in tobacco operations. For example, in order to verify the promotional effect of tobacco marketing campaigns on the product market, it is necessary to perform in-depth analysis and mining of various textual and graphical information related to the product retail process. Robotic process automation (sometimes referred to as business process automation) involves the use of "robots" as a digital workforce to perform common tasks such as processing invoices or transferring data from one database or spreadsheet to another. Most of the data that has to be processed is unstructured, such as text in e-mail or even video and pictures, which requires complex algorithms. The AI robot can use techniques such as computer vision to recognize different types of documents or natural language processing to understand the context of an email message. However, there is a need in intelligent tobacco operations management to identify and extract information from tobacco retail licenses from different regions. On the one hand, images of tobacco retail licenses have complex unstructured features, which increases the difficulty of extracting information therefrom. The existing feature extraction method of the tobacco retail license is influenced by diversified acquisition conditions and other interference factors, and the feature extraction effect is poor.
Disclosure of Invention
The application provides a tobacco retail license identification method and system, which realize non-segmentation identification and tobacco sales information extraction of tobacco retail license images, reduce sensitivity to diversified acquisition conditions and other interference factors, and improve accuracy of feature extraction.
The application provides a method for identifying a tobacco retail license, which comprises the following steps:
preprocessing the image of the tobacco retail license to obtain a preprocessed image;
inputting the preprocessed image into a convolutional neural network to obtain a first visual feature sequence; the convolutional neural network comprises at least one convolutional submodule, each convolutional submodule comprises a first dense connection network module, convolutional layers and a first pooling layer which are connected in sequence, and input data of each convolutional layer is dense connection data of output data of all convolutional submodules before the convolutional layer;
and coding and decoding the first visual characteristic sequence to obtain the license number, the enterprise name and the valid period on the tobacco retail license.
Preferably, the encoding and decoding of the first visual feature sequence comprises:
inputting the first visual characteristic sequence into a first bidirectional long-short term memory module for carrying out serialized association processing to obtain a second visual characteristic sequence of the first bidirectional long-short term memory module;
fusing the second visual characteristic sequence obtained by the first bidirectional long-short term memory module with input data thereof to obtain an original characteristic diagram corresponding to the first bidirectional long-short term memory module, performing one-dimensional self-attention operation on the original characteristic diagram to obtain an operation result, and inputting the operation result into the full-connection layer to obtain a first attention characteristic diagram corresponding to the first bidirectional long-short term memory module;
for each second bidirectional long-short term memory module behind the first bidirectional long-short term memory module, performing serialized association processing by taking output data of the first bidirectional long-short term memory module or the second bidirectional long-short term memory module in front of the second bidirectional long-short term memory module as input, and obtaining a first attention feature map and an original feature map corresponding to the second bidirectional long-short term memory module;
and respectively calculating pixel products between the first attention feature map and the original feature map corresponding to the first bidirectional long-short term memory module and each second bidirectional long-short term memory module, adding all the pixel products of the same pixel points to generate a second attention feature map, identifying the license number, the enterprise name and the valid period according to the second attention feature map, and outputting an identification result.
Preferably, the inputting the preprocessed image into a convolutional neural network to obtain a first visual feature sequence, specifically including:
sequentially obtaining output data of all convolution sub-modules;
and inputting the dense connection data of the output data of all the convolution sub-modules into a second pooling layer, and taking the obtained pooling result as a first visual feature sequence.
Preferably, the densely connected data of the input convolutional layer is a tensor which maps and connects the output data of all convolutional sub-modules before the convolutional layer.
Preferably, the image of the tobacco retail license is preprocessed to obtain a preprocessed image, and the preprocessing specifically includes:
filtering and enhancing the image of the tobacco retail license;
and performing content-based alignment processing on the processing result of the filtering and enhancing processing.
Preferably, the content-based alignment process specifically includes:
adopting a column scanning algorithm to extract edges of the image;
calculating the average slope of the extracted edge clusters;
the image is rotated based on the average slope such that horizontal lines in the processed image are parallel to the image lateral edges.
The application also provides an identification system of the tobacco retail license, which comprises a preprocessing module, a first visual characteristic sequence obtaining module and a coding and decoding module;
the preprocessing module is used for preprocessing the image of the tobacco retail license to obtain a preprocessed image;
the first visual characteristic sequence obtaining module is used for inputting the preprocessed image into a convolutional neural network to obtain a first visual characteristic sequence; the convolution neural network comprises at least one convolution submodule, the convolution submodule comprises a first intensive connection network module, a convolution layer and a first pooling layer which are connected in sequence, and the first intensive connection network module is connected with the first pooling layers of all convolution submodules in front of the convolution submodule;
and the coding and decoding module is used for coding and decoding the first visual characteristic sequence to obtain the license number, the enterprise name and the valid period on the tobacco retail license.
Preferably, the convolutional neural network further comprises a second densely connected network module and a second pooling layer, the second densely connected network module is connected with the first pooling layers of all the convolutional sub-modules, and the second pooling layer is connected with the second densely connected network module;
the second dense connection network module is used for connecting the output data of all the convolution sub-modules into dense connection data;
the pooling results of the second pooling layer form a first sequence of visual features.
Preferably, the codec module comprises a first codec submodule, at least one second codec submodule and a first decoder;
the first coding and decoding submodule comprises a first bidirectional long-short term memory module and a second decoder, and the first bidirectional long-short term memory module is used for carrying out serialization association processing on the first visual feature sequence to obtain a second visual feature sequence of the first bidirectional long-short term memory module; the second decoder is used for fusing the second visual characteristic sequence obtained by the first bidirectional long-short term memory module with input data thereof to obtain an original characteristic diagram corresponding to the first bidirectional long-short term memory module, performing one-dimensional self-attention operation on the original characteristic diagram to obtain an operation result, and inputting the operation result into the full-connection layer to obtain a first attention characteristic diagram corresponding to the first bidirectional long-short term memory module;
the second coding and decoding sub-module comprises a second bidirectional long-short term memory module and a third decoder, and the second bidirectional long-short term memory module is used for performing serialized association processing on output data of the first bidirectional long-short term memory module or the second bidirectional long-short term memory module in front of the second bidirectional long-short term memory module as input to obtain a second visual characteristic sequence of the second bidirectional long-short term memory module; the third decoder is used for obtaining a first attention feature map and an original feature map corresponding to the second bidirectional long-short term memory module;
the first decoder is used for calculating pixel products between the first attention feature map and the original feature map corresponding to the first bidirectional long-short term memory module and each second bidirectional long-short term memory module respectively, adding all the pixel products of the same pixel points to generate a second attention feature map, identifying the license number, the enterprise name and the valid period according to the second attention feature map, and outputting an identification result.
Preferably, the preprocessing module comprises a filter enhancement module and an alignment module;
the filtering enhancement module is used for filtering and enhancing the image of the tobacco retail license;
and the alignment module is used for carrying out content-based alignment processing on the processing result of the filtering and enhancing processing.
Further features of the present application and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which is to be read in connection with the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate embodiments of the application and together with the description, serve to explain the principles of the application.
FIG. 1 is a flow chart of a method of identifying a tobacco retail license provided herein;
FIG. 2 is a block diagram of an identification system for a tobacco retail license as provided herein.
Detailed Description
Various exemplary embodiments of the present application will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present application unless specifically stated otherwise.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the application, its application, or uses.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail, but are intended to be part of the specification where appropriate.
In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.
The application provides a tobacco retail license identification method and system, which realize non-segmentation identification and tobacco sales information extraction of tobacco retail license images, reduce sensitivity to diversified acquisition conditions and other interference factors, and improve accuracy of feature extraction.
As shown in fig. 1, the method for identifying a tobacco retail license includes:
s110: and preprocessing the image of the tobacco retail license to obtain a preprocessed image.
Specifically, as an embodiment, the image of the tobacco retail license is preprocessed to obtain a preprocessed image, and the method includes the following steps:
s1101: and filtering and enhancing the image of the tobacco retail license.
As an embodiment, the filtering is performed using a gaussian filtering method. And (3) performing enhancement treatment by adopting an expansion corrosion method.
It will be appreciated that other filtering methods (e.g., mean filtering) may be employed to filter the image. Other image enhancement methods (e.g., neighborhood enhancement) may be used for enhancement.
S1102: and performing content-based alignment processing on the processing result of the filtering and enhancing processing.
Specifically, the content-based alignment process includes:
s11021: and (4) carrying out edge extraction on the image by adopting a column scanning algorithm to obtain an edge cluster.
S11022: the average slope k of the extracted edge clusters is calculated.
S11023: the image is rotated based on the average slope k so that the horizontal lines in the processed image are parallel to the image lateral edges.
S120: inputting the preprocessed image into a convolutional neural network to obtain a first visual feature sequence I 1
The convolutional neural network comprises at least one convolutional submodule (please refer to fig. 2), each convolutional submodule comprises a first dense connection network module, a convolutional layer and a first pooling layer which are connected in sequence, and input data of each convolutional layer is dense connection data of output data of all convolutional submodules before the convolutional layer.
Since in a dense connection structure, no change occurs after the generation of features, this results in that although some shallow features may have great potential at deep layers and can play a great role after fine-tuning, they are still considered redundant at the deep layers of the network and are therefore often trimmed away. In view of the above, it is preferable that the first densely connected network module further includes an activation function, the above-mentioned densely connected data is input to the activation function, and output data of the activation function is used as input data of the convolutional layer. The activation function can extract potential redundant features for reactivation, so that the potential redundant features can better adapt to feature learning of a deep network, and the feature reuse efficiency of the network is exerted to the maximum extent.
Thus, connecting with the residualConnecting each convolution layer in a feedforward mode, enabling the first layer of convolution layer to take the feature maps of all the layers in front of the first layer of convolution layer as input, and calculating a visual feature sequence I of the first layer of convolution layer through a nonlinear change function l
I l =H l ([I 0 ,I 1 ,…,I l-1 ])
Wherein H l Is a nonlinear transformation function per convolution layer, I 0 ,I 1 ,…,I l-1 Is the visual signature sequence output by all convolution sub-modules before the l-th layer.
Specifically, as an embodiment, the dense connection data of the input convolutional layer is a tensor in which the output data of all convolution submodules before the convolutional layer is mapped and connected.
Preferably, the convolutional neural network further comprises a second densely connected network module and a second pooling layer, the second densely connected network module being disposed between the last convolutional sub-module and the second pooling layer. Specifically, the second pooling layer is a maximum pooling layer.
On the basis of the preferred embodiment, the preprocessed image is input into a convolutional neural network to obtain a first visual feature sequence I 1 The method specifically comprises the following steps:
s1201: and sequentially obtaining output data of all convolution submodules.
S1202: inputting the dense connection data of the output data of all the convolution sub-modules into a second pooling layer, and taking the obtained pooling result as a first visual feature sequence I 1
Because the characteristics of all layers are connected together by adopting the dense connection network module, each layer receives the gradient signals of all the previous layers during the back transmission, thereby reducing the problem of gradient dissipation in the training process to a certain extent. In addition, because a large number of features are multiplexed, a large number of features can be generated by using a small number of convolution kernels, and the size of the final convolution neural network model is small.
In the convolutional neural network structure, because each convolutional layer receives the characteristics of all the previous layers as input, in order to avoid that the characteristic dimension grows too fast along with the increase of the number of network layers, when down-sampling is performed, the characteristic dimension is firstly compressed to half of the current input through the convolutional layer, and then pooling is performed.
S130: for the first visual characteristic sequence I 1 And coding and decoding to obtain the license number, the enterprise name and the valid period on the tobacco retail license.
Specifically, a plurality of coding and decoding sub-modules and a first decoder are used for coding and decoding the first visual characteristic sequence, each coding and decoding sub-module comprises a bidirectional long-short term memory module and a decoder, wherein the bidirectional long-short term memory module connected with the convolutional neural network is marked as a first bidirectional long-short term memory module, and the subsequent bidirectional long-short term memory module is marked as a second bidirectional long-short term memory module.
Based on the structure of the coding and decoding, the coding and decoding of the first visual characteristic sequence comprises:
s1301: the first visual characteristic sequence I 1 Inputting the first bidirectional long-short term memory module for sequential association processing to obtain a second visual feature sequence I of the first bidirectional long-short term memory module 2
S1302: obtaining a second visual characteristic sequence I of the first bidirectional long-term and short-term memory module 2 With which the data (i.e. the first sequence of visual features I) is input 1 ) Fusing to obtain an original feature map D corresponding to the first bidirectional long-term and short-term memory module 0 For the original feature map D 0 Performing one-dimensional self-attention operation to obtain operation result, inputting the operation result into the full-connection layer to obtain a first attention feature map D corresponding to the first bidirectional long-term and short-term memory module 1
S1303: for each second bidirectional long-short term memory module after the first bidirectional long-short term memory module, outputting a second visual characteristic sequence I of the first bidirectional long-short term memory module before the second bidirectional long-short term memory module 2 Or a second visual characteristic sequence I output by a second bidirectional long-short term memory module 2 Performing serialization association processing as input, and obtaining the second bidirectional long-term short-term memoryFirst attention feature map D corresponding to module 1 And original feature map D 0
S1304: calculating a first attention feature map D corresponding to the first bidirectional long-short term memory module and each second bidirectional long-short term memory module respectively 1 And original feature map D 0 The pixel products between (thus, for each pixel point, there is a pixel product corresponding to the first bidirectional long-short term memory module and each second bidirectional long-short term memory module), and all the pixel products of the same pixel point are added to generate a second attention feature map D 2 According to the second attention feature map D 2 And identifying the license number, the enterprise name and the valid period and outputting an identification result.
Based on the identification method, the application provides an identification system of a tobacco retail license, which comprises a preprocessing module 210, a first visual feature sequence obtaining module 220 and a coding and decoding module 230.
The preprocessing module 210 is configured to preprocess the image of the tobacco retail license to obtain a preprocessed image.
The pre-processing module 210 includes a filter enhancement module and an alignment module. The filtering enhancement module is used for filtering and enhancing the image of the tobacco retail license. And the alignment module is used for carrying out content-based alignment processing on the processing result of the filtering and enhancing processing.
The first visual feature sequence obtaining module 220 is configured to input the preprocessed image into a convolutional neural network to obtain a first visual feature sequence.
As shown in fig. 2, the convolutional neural network includes at least one convolutional submodule 2201, the convolutional submodule includes a first densely connected network module 22011, a convolutional layer 22012 and a first pooling layer 22013, which are connected in sequence, the first densely connected network module 22011 is connected to the first pooling layers of all convolutional submodules before the convolutional submodule, and the pooling result of the first pooling layer of the last convolutional submodule is taken as a first visual feature sequence.
Preferably, the convolutional neural network further comprises a second densely connected network module 2202 and a second pooling layer 2203, the second densely connected network module 2202 being connected with the first pooling layer of all convolutional sub-modules, the second pooling layer 2203 being connected with the second densely connected network module 2202.
The second densely connected network module 2202 is used to concatenate the output data of all the convolution sub-modules into densely connected data.
The pooling results of the second pooling layer 2203 form a first visual feature sequence.
The encoding and decoding module 230 is configured to encode and decode the first visual characteristic sequence, and obtain the license number, the enterprise name, and the validity period of the tobacco retail license.
The codec module includes a first codec sub-module 2301, at least one second codec sub-module 2302, and a first decoder 2303.
The first codec sub-module 2301 includes a first bidirectional long-term and short-term memory module 23011 and a second decoder 23012, and the first bidirectional long-term and short-term memory module 23011 is configured to perform a serialization association process on the first visual feature sequence to obtain a second visual feature sequence of the first bidirectional long-term and short-term memory module. The second decoder 23012 is configured to fuse the second visual feature sequence obtained by the first bidirectional long-short term memory module with the input data thereof to obtain an original feature map corresponding to the first bidirectional long-short term memory module, perform a one-dimensional self-attention operation on the original feature map to obtain an operation result, and input the operation result into the full connection layer to obtain a first attention feature map corresponding to the first bidirectional long-short term memory module.
The second codec sub-module 2302 includes a second bidirectional long-term short-term memory module 23021 and a third decoder 23022. The second bidirectional long-short term memory module 23021 is configured to perform a serialization association process on output data of the first bidirectional long-short term memory module or the second bidirectional long-short term memory module before the second bidirectional long-short term memory module as input, so as to obtain a second visual characteristic sequence of the second bidirectional long-short term memory module. The third decoder 23022 is used to obtain the first attention feature map and the primitive feature map corresponding to the second bidirectional long-short term memory module.
The first decoder 2303 is configured to calculate pixel products between the first attention feature map and the original feature map corresponding to the first bidirectional long-short term memory module and each second bidirectional long-short term memory module, respectively, add all pixel products of the same pixel point to generate a second attention feature map, identify the license number, the enterprise name, and the validity period according to the second attention feature map, and output an identification result.
In this application, the pooling layer in a convolutional neural network can be viewed as a special mean weighted attention mechanism. As is known, the bidirectional long-term and short-term memory model is also an attention mechanism, so that the application adopts the dual attention mechanism to realize the non-segmentation identification of the tobacco retail license image and the extraction of the tobacco sales information, reduces the sensitivity to diversified acquisition conditions and other interference factors, and can obtain better effect in most practical application scenes.
Although some specific embodiments of the present application have been described in detail by way of example, it should be understood by those skilled in the art that the above examples are for illustrative purposes only and are not intended to limit the scope of the present application. It will be appreciated by those skilled in the art that modifications may be made to the above embodiments without departing from the scope and spirit of the present application. The scope of the application is defined by the appended claims.

Claims (10)

1. A method of identifying a tobacco retail license, comprising:
preprocessing the image of the tobacco retail license to obtain a preprocessed image;
inputting the preprocessed image into a convolutional neural network to obtain a first visual feature sequence; the convolutional neural network comprises at least one convolutional submodule, each convolutional submodule comprises a first dense connection network module, a convolutional layer and a first pooling layer which are connected in sequence, and input data of each convolutional layer is dense connection data of output data of all convolutional submodules before the convolutional layer;
and coding and decoding the first visual characteristic sequence to obtain the license number, the enterprise name and the valid period on the tobacco retail license.
2. A method of identifying a tobacco retail license according to claim 1, wherein encoding and decoding the first sequence of visual features comprises:
inputting the first visual characteristic sequence into a first bidirectional long-short term memory module for serialized association processing to obtain a second visual characteristic sequence of the first bidirectional long-short term memory module;
fusing the second visual feature sequence obtained by the first bidirectional long-short term memory module with input data thereof to obtain an original feature map corresponding to the first bidirectional long-short term memory module, performing one-dimensional self-attention operation on the original feature map to obtain an operation result, and inputting the operation result into a full connection layer to obtain a first attention feature map corresponding to the first bidirectional long-short term memory module;
for each second bidirectional long-short term memory module after the first bidirectional long-short term memory module, performing serialized association processing by taking output data of the first bidirectional long-short term memory module or the second bidirectional long-short term memory module before the second bidirectional long-short term memory module as input, and obtaining a first attention feature map and an original feature map corresponding to the second bidirectional long-short term memory module;
and respectively calculating pixel products between the first attention feature map and the original feature map corresponding to the first bidirectional long-short term memory module and each second bidirectional long-short term memory module, adding all the pixel products of the same pixel points to generate a second attention feature map, identifying the license number, the enterprise name and the valid period according to the second attention feature map, and outputting an identification result.
3. The method according to claim 1 or 2, wherein the step of inputting the preprocessed image into a convolutional neural network to obtain a first visual feature sequence includes:
sequentially obtaining output data of all convolution sub-modules;
and inputting the dense connection data of the output data of all the convolution sub-modules into a second pooling layer, and taking the obtained pooling result as the first visual feature sequence.
4. The method of identifying a tobacco retail license of claim 1 wherein the densely connected data input to the convolutional layer is a tensor which map connects the output data of all convolutional sub-modules before the convolutional layer.
5. The method for identifying a tobacco retail license according to claim 1, wherein the image of the tobacco retail license is preprocessed to obtain a preprocessed image, and the method specifically comprises:
filtering and enhancing the image of the tobacco retail license;
and performing content-based alignment processing on the processing result of the filtering and enhancing processing.
6. A method of identifying a tobacco retail license according to claim 5, wherein the content-based alignment process specifically comprises:
adopting a column scanning algorithm to extract edges of the image;
calculating the average slope of the extracted edge clusters;
and rotating the image based on the average slope so that the horizontal line in the processed image is parallel to the transverse edge of the image.
7. A recognition system of tobacco retail license is characterized by comprising a preprocessing module, a first visual feature sequence obtaining module and a coding and decoding module;
the preprocessing module is used for preprocessing the image of the tobacco retail license to obtain a preprocessed image;
the first visual feature sequence obtaining module is used for inputting the preprocessed image into a convolutional neural network to obtain a first visual feature sequence; the convolutional neural network comprises at least one convolutional submodule, the convolutional submodule comprises a first dense connection network module, a convolutional layer and a first pooling layer which are sequentially connected, and the first dense connection network module is connected with the first pooling layers of all convolutional submodules before the convolutional submodule;
and the coding and decoding module is used for coding and decoding the first visual characteristic sequence to obtain the license number, the enterprise name and the valid period on the tobacco retail license.
8. A tobacco retail license identification system according to claim 7, characterised in that the convolutional neural network further comprises a second densely connected network module connected with the first pooling layer of all convolutional sub-modules and a second pooling layer connected with the second densely connected network module;
the second dense connection network module is used for connecting the output data of all the convolution sub-modules into dense connection data;
the pooling results of the second pooling layer form the first visual feature sequence.
9. A tobacco retail license identification system according to claim 7, characterised in that the codec module comprises a first codec sub-module, at least one second codec sub-module and a first decoder;
the first coding and decoding submodule comprises a first bidirectional long and short term memory module and a second decoder, wherein the first bidirectional long and short term memory module is used for carrying out serialization association processing on a first visual feature sequence to obtain a second visual feature sequence of the first bidirectional long and short term memory module; the second decoder is used for fusing the second visual feature sequence obtained by the first bidirectional long-short term memory module with input data thereof to obtain an original feature map corresponding to the first bidirectional long-short term memory module, performing one-dimensional self-attention operation on the original feature map to obtain an operation result, and inputting the operation result into a full connection layer to obtain a first attention feature map corresponding to the first bidirectional long-short term memory module;
the second coding and decoding sub-module comprises a second bidirectional long-short term memory module and a third decoder, wherein the second bidirectional long-short term memory module is used for carrying out serialization association processing on output data of the first bidirectional long-short term memory module or the second bidirectional long-short term memory module before the second bidirectional long-short term memory module as input to obtain a second visual characteristic sequence of the second bidirectional long-short term memory module; the third decoder is used for obtaining a first attention feature map and an original feature map corresponding to the second bidirectional long-short term memory module;
the first decoder is used for calculating pixel products between the first attention feature map and the original feature map corresponding to the first bidirectional long-short term memory module and each second bidirectional long-short term memory module respectively, adding all the pixel products of the same pixel point to generate a second attention feature map, and identifying the license number, the enterprise name and the valid period according to the second attention feature map and outputting an identification result.
10. A tobacco retail license identification system according to claim 7, characterized in that the pre-processing module comprises a filter enhancement module and an alignment module;
the filtering enhancement module is used for filtering and enhancing the image of the tobacco retail license;
and the alignment module is used for carrying out content-based alignment processing on the processing result of the filtering and enhancing processing.
CN202210383762.9A 2022-04-12 2022-04-12 Tobacco retail license identification method and system Pending CN114913516A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210383762.9A CN114913516A (en) 2022-04-12 2022-04-12 Tobacco retail license identification method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210383762.9A CN114913516A (en) 2022-04-12 2022-04-12 Tobacco retail license identification method and system

Publications (1)

Publication Number Publication Date
CN114913516A true CN114913516A (en) 2022-08-16

Family

ID=82763956

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210383762.9A Pending CN114913516A (en) 2022-04-12 2022-04-12 Tobacco retail license identification method and system

Country Status (1)

Country Link
CN (1) CN114913516A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115830298A (en) * 2023-02-17 2023-03-21 江苏羲辕健康科技有限公司 Medicine supervision code identification method and system based on neural network

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115830298A (en) * 2023-02-17 2023-03-21 江苏羲辕健康科技有限公司 Medicine supervision code identification method and system based on neural network

Similar Documents

Publication Publication Date Title
CN111079532B (en) Video content description method based on text self-encoder
CN109117713B (en) Drawing layout analysis and character recognition method of full convolution neural network
CN111324774B (en) Video duplicate removal method and device
WO2022105125A1 (en) Image segmentation method and apparatus, computer device, and storage medium
CN111340814A (en) Multi-mode adaptive convolution-based RGB-D image semantic segmentation method
CN111368790A (en) Construction method, identification method and construction device of fine-grained face identification model
CN111062426A (en) Method, device, electronic equipment and medium for establishing training set
CN114187317B (en) Image matting method and device, electronic equipment and storage medium
CN108376257B (en) Incomplete code word identification method for gas meter
CN110599495B (en) Image segmentation method based on semantic information mining
CN114241459B (en) Driver identity verification method and device, computer equipment and storage medium
CN115687670A (en) Image searching method and device, computer readable storage medium and electronic equipment
CN112700460A (en) Image segmentation method and system
CN114913516A (en) Tobacco retail license identification method and system
CN111881943A (en) Method, device, equipment and computer readable medium for image classification
Ukwuoma et al. Image inpainting and classification agent training based on reinforcement learning and generative models with attention mechanism
Ertugrul et al. Embedding 3D models in offline physical environments
CN112016592A (en) Domain adaptive semantic segmentation method and device based on cross domain category perception
Nangoy et al. Analysis of chatbot-based image classification on Social Commerce line@ platform
CN115393470A (en) Cultural relic digital line graph drawing method, system and device
CN112633394A (en) Intelligent user label determination method, terminal equipment and storage medium
CN111311197A (en) Travel data processing method and device
CN116912845B (en) Intelligent content identification and analysis method and device based on NLP and AI
CN111402012B (en) E-commerce defective product identification method based on transfer learning
US20230394306A1 (en) Multi-Modal Machine Learning Models with Improved Computational Efficiency Via Adaptive Tokenization and Fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination