CN116189196A

CN116189196A - Express delivery face bill identification and self-correction method and identification system

Info

Publication number: CN116189196A
Application number: CN202211727783.4A
Authority: CN
Inventors: 窦毅; 王晓浩; 温燕香; 唐经航; 田昊洋
Original assignee: Guangxi Guihua Intelligent Manufacturing Co ltd
Current assignee: Guangxi Guihua Intelligent Manufacturing Co ltd
Priority date: 2022-12-30
Filing date: 2022-12-30
Publication date: 2023-05-30

Abstract

The invention discloses an express delivery face bill identification and self-correction method and an express delivery face bill identification system. The express delivery face list identification and self-correction method comprises the following steps: acquiring an image of the express delivery, detecting a text block, and pre-identifying character content of the text block; performing relationship matching analysis according to the character content of the pre-identified text block and the position of the text block, so as to obtain the interrelationship of the text block and the category of the text block; respectively carrying out reasoning on the pre-recognized characters of the text blocks according to the interrelationship of the text blocks and the pre-recognized characters of the text blocks to realize self-correction; and taking the self-correcting character information and the text block type as the recognition result of the express bill. The method and the system can improve the recognition accuracy of the express bill and have wider application range.

Description

Express delivery face bill identification and self-correction method and identification system

Technical Field

The invention relates to the technical field of OCR (optical character recognition) recognition (Optical Character Recognition), in particular to a method for recognizing and self-correcting express delivery face sheets.

Background

Because of the packages of different express companies at present, the information of a receiver and a sender is difficult to acquire and share, so that the village and town level express forwarding station cannot directly sort the received express, and in order to reduce the workload of manual sorting, the express bill needs to be identified through an OCR technology. However, the layout structures of the express delivery face sheets of different companies are quite different, the difficulty of performing layout analysis based on vision is quite high, and the problems that the random coating and face sheet deletion exist in many express delivery face sheets, and the like, so that the accuracy of a conventional OCR algorithm is quite low.

Currently, the recognition of the express bill information usually takes the express bill as a special complex table, and the analysis and recognition are carried out through the table layout analysis and the character OCR algorithm, which mainly comprises three methods:

1. the text box position is directly detected. The specific position of the text box is detected through visual algorithms such as target detection or semantic segmentation, OCR recognition is carried out on the text box, then travel, column and cell information is deduced from the space arrangement information of the text box, and a spreadsheet is generated by combining the text content of each text box. However, the method is extremely dependent on OCR detection results and manual design rules, and is required to be developed in a targeted manner for the express delivery bill of different styles, and is very poor in adaptability and expansibility for a new express delivery bill format.

2. And detecting the grid lines for layout analysis. Detecting angular points through a target detection algorithm, or extracting table lines by means of image morphological transformation, texture extraction, edge detection and the like, deriving information of rows, columns and merging cells from the angular points and the table lines, performing OCR (optical character recognition) on text boxes in the cells, and finally generating an electronic table by combining layout analysis results. However, the method needs a clearer and complete form line, and is not suitable for layout analysis of a wireless complex form such as an express bill.

3. Neural networks end-to-end learning. Through a specially designed neural network model, such as TableBank, and the like, the image to text technology is used for converting the table picture identification into a certain structured description language (such as a label of an html definition table structure), wherein the structured description language comprises a table layout format and text content, and the electronic table can be directly generated. The solution has poor robustness, is easy to appear in the situation that the identification effect of individual type of the express bill is poor for different types of express bill, can not be repaired by intervening in the middle step, has large difficulty in model repair and adjustment, and is not suitable for engineering landing.

Therefore, how to provide a method for identifying and self-correcting express delivery face sheets with high identification accuracy is a technical problem to be solved.

Disclosure of Invention

In order to solve the technical problems of low accuracy and narrow application range of the identification algorithm in the prior art, the invention provides an express delivery face list identification and self-correction method and an identification system.

The express delivery face list identification and self-correction method provided by the invention comprises the following steps:

step 1, acquiring an image of an express delivery face sheet, detecting a text block, and pre-identifying character content of the text block;

step 2, carrying out relation matching analysis according to the character content of the pre-identified text blocks and the positions of the text blocks, so as to obtain the interrelationship among the text blocks and the category of each text block;

step 3, carrying out semantic analysis according to the interrelationship among the text blocks and the character content of the pre-identified text blocks, and respectively carrying out reasoning on the character content of the text blocks to realize self-correction;

and 4, taking the character information obtained by self-correction and the category of the text block as the recognition result of the express bill.

Further, intercepting a data frame from the video stream, and inputting the data frame into a preset express delivery face list detection model to obtain an image of the express delivery face list.

Further, the step 2 includes:

step 2.1, taking text blocks as units, carrying out embedded coding on each text block, wherein each text block is coded into an embedded vector, and splicing and transforming the embedded vectors of all the text blocks to obtain coding matrixes of all the text blocks with express face sheets as units;

step 2.2, analyzing the relative position relation among the text blocks to obtain the position coding matrix of all the text blocks of the express delivery face list;

step 2.3, performing multi-head self-attention reasoning on the coding matrix and the position coding matrix of all text blocks of an express bill, extracting semantic relations between each text block and other text blocks, and forming a relation matching matrix;

and 2.4, obtaining the category of the text block based on the relation matching matrix.

Further, the step 2.1 includes:

word segmentation is carried out on the text content of each text block by adopting a word segmentation algorithm to obtain S ([ c1, c2, ci, … …, cn)]) Wherein ci is the minimum semantic unit of text, and the text is mapped into a one-dimensional text word vector S through word list query ₀ Vector length L ₀ ；

Splicing each text word vector into a text mapping matrix according to the serial number sequence of the text blocks

Wherein M is ₀ Maximum number of text blocks;

setting 2 trainable matrix variables

And->

N ₀ For the length of model hidden variables, by E _t ＝Z _t ×K _t +b _t Calculating the coding matrix of the text block of the express delivery face sheet +.>

Further, the step 2.2 includes:

sequencing and numbering the coordinates of the central points of all the text blocks according to a certain sequence, marking the obtained text block number as d, and the length of the corresponding position coding vector is N ₀ The numerical values are P _d Wherein P is _d The calculation formula of (2) is

Splicing the text block coding vectors into a position coding matrix according to the numbering sequence

Further, the relation matching matrix obtains the category of the text block through FFN network reasoning, and the calculation formula of the loss function in the reasoning process is as follows

Wherein M is ₀ For the maximum number of text blocks, types is the number of categories of text blocks, y _di For a tag value of a certain class, y _{di_pre_prob} Is the probability value of outputting the category at the time of reasoning. />

Further, the step 3 includes:

step 3.1, carrying out embedded coding on characters in each text block to obtain a character coding matrix of each text block;

step 3.2, splicing the relation matching matrix of the express bill and all character coding matrixes thereof to obtain a text block input coding matrix, wherein the splicing formula is X _i ＝RE _i ||E _i ，RE _i Is the corresponding relation matching vector of the ith text block, and has the length of N ₀ ，E _i Is character Embeddings coding matrix corresponding to the ith text block, and the shape is L ₀ ,N ₀ ]，X _i Representing an ith text block input encoding matrix having a shape of L ₀ +1,N ₀ ]；

Step 3.3, inputting the text block into the coding matrix to perform multi-head self-attention reasoning to obtain a character feature matrix;

and 3.4, obtaining the coding matrix of each corrected text block based on the character feature matrix.

Further, the step 3.1 includes:

performing word segmentation on the pre-recognized characters of the text block by adopting a word segmentation algorithm to obtain S ([ c1, c2, ci, … …, cn ]), wherein ci is the minimum semantic unit of the characters;

mapping text into one-dimensional text word vector S by vocabulary inquiry ₀ Vector length L ₀ ；

Setting a trainable dictionary matrix

And->

And calculating to obtain a character coding matrix corresponding to the text block based on the two dictionary matrices.

The express delivery face list recognition system provided by the invention comprises the following components:

the camera is used for acquiring images;

the industrial personal computer is used for acquiring an image of the format text from the image, carrying out text block detection, pre-identifying text contents of the text blocks, carrying out relation matching analysis according to the text contents of the pre-identified text blocks and the positions of the text blocks, thus obtaining the relation between the text blocks and the positions thereof and the category of the text blocks, respectively carrying out reasoning on the text contents of the text blocks according to the category and semantic analysis of the text blocks to realize self-correction, and taking the self-corrected text information as an identification result of the format text.

Further, the camera is provided with at least two groups.

Compared with the prior art, the invention has the following beneficial effects.

The invention designs a relationship matching module based on an attention mechanism, comprehensively considers related information such as text semantics, text block positions and the like, and realizes reliable classification of text blocks. The difference of the layout of different express delivery face sheets is larger, some express delivery is that a receiver is behind a sender, some sender is in front (such as security logistics), text identifiers are different, a conventional form analysis algorithm cannot accurately analyze the layout of various express delivery face sheets, simple semantic analysis cannot accurately distinguish the receiving and sending information, and the system comprehensively considers related information such as context semantics, text block positions and the like, so that reliable classification of text blocks is realized.

The invention designs a text self-correction module based on an attention mechanism, and performs error detection and correction on an original text by combining context relation matching information on the basis of performing semantic analysis on the text. In the express delivery process, the conditions that single-part content of an express delivery surface is scribbled, covered, damaged and the like often occur, the damaged content cannot be effectively identified by a conventional OCR algorithm, and the system can carry out deep semantic analysis on an original identification result and a context on the basis of OCR pre-identification, so that the damaged content can be effectively detected and corrected. In addition, the system adopts a text generation type network structure, can support error correction of different word numbers such as word missing, multiple words and the like, and obviously improves the text self-correction accuracy.

According to the invention, a multi-stage multi-branch training method is designed aiming at the dependency relationship of the algorithm module, and a special data enhancement strategy is designed aiming at the characteristics of each model data set, so that the collection and processing work of the data sets is effectively reduced, the model training difficulty is reduced, and the generalization capability of the model is enhanced.

Drawings

The invention is described in detail below with reference to examples and figures, wherein:

FIG. 1 is a system architecture diagram of an embodiment of the present invention.

FIG. 2 is a flowchart of an identification and self-correction algorithm according to an embodiment of the present invention.

FIG. 3 is a block diagram of an identification and self-correction algorithm according to an embodiment of the present invention.

Fig. 4 is a network architecture diagram of a relational matching algorithm Transformer Encoder module according to an embodiment of the invention.

Fig. 5 is a network architecture diagram of a self-correcting algorithm Transformer Encoder module according to an embodiment of the invention.

Detailed Description

In order to make the technical problems, technical schemes and beneficial effects to be solved more clear, the invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Thus, reference throughout this specification to one feature will be used in order to describe one embodiment of the invention, not to imply that each embodiment of the invention must be in the proper motion. Furthermore, it should be noted that the present specification describes a number of features. Although certain features may be combined together to illustrate a possible system design, such features may be used in other combinations not explicitly described. Thus, unless otherwise indicated, the illustrated combinations are not intended to be limiting.

The invention provides an express delivery face bill identification and self-correction method, which mainly comprises the following steps.

And step 1, acquiring an image of the format text, detecting a text block, and pre-identifying character content of the text block. I.e. the first step is to obtain the rough content of the text block.

And 2, carrying out relation matching analysis according to the character content of the pre-identified text blocks and the positions among the text blocks, thereby determining the category of the text blocks. Because the express delivery face sheets of different enterprises have specific style formats, such as express delivery addressee name phones of a claim pass are before the receiving address, and the addressee name phones of preferential delivery are after the receiving address, such as sender information of the security stream face sheets are in front of the addressee information, and the addressee information of the forward stream face sheets is in front, many stream face sheets only have the addressee information and have no sender information, and in addition, for a multi-line receiving address, different lines correspond to different text blocks, and close semantic relations exist between uplink and downlink addresses. Therefore, semantic analysis is carried out on the relation among the text blocks, and the classification accuracy of each text block can be effectively improved.

And 3, carrying out semantic analysis according to the interrelationship among the text blocks and the character content of the pre-identified text blocks, and respectively carrying out reasoning on the character content of the text blocks to realize self-correction.

And 4, finally, taking the self-corrected character information and the text block type as the recognition result of the express bill.

Although express companies are many, the express delivery orders of each express company have a corresponding rule, for example, the mode of sending the express delivery orders of the express companies only has fixed categories such as 'delivery to the gate, not going upstairs, self-lifting', and the like, for example, the express delivery in the middle uses 'sending' to identify the receiving address, the express delivery in the middle uses 'principal' to identify the receiving name, for example, the Delbang stream uses 'x' to protect privacy, the preferential delivery uses 'a' to protect privacy, the round express delivery order number starts with YT, and the order number of Jingdong starts with JD. Therefore, the interrelationship of each text block in the express delivery face list is fully applied, and the text semantics inside the text blocks can be checked and corrected, so that the accuracy of text correction is improved.

Through the steps, all text blocks are extracted through the text block detection algorithm, and then relation matching analysis is carried out according to the pre-recognized characters and position coordinates of the text blocks, so that layout analysis and semantic self-correction of the express delivery face sheets are realized.

In one embodiment, the express delivery face list image is obtained by intercepting a data frame from a video stream and inputting the data frame into a preset express delivery face list detection model. When the express package passes, shooting to obtain video streams of the express package, intercepting data frames from the video streams, and inputting images corresponding to the data frames into a preset format text type detection model to obtain express face list images on the express package.

The invention uses a lightweight YoloV4-tiny network to detect the express bill, which needs to use an express bill detection algorithm to be operated at the camera end side.Reading 1 frame of high-definition express single image from a camera, and enabling the high-definition image to be Resize to a preset resolution [416,416,3 ]]Input image x using the YoloV4-tiny model _img ∈R ^3×416×416 And carrying out reasoning prediction to obtain the category of the format text as the express bill.

In one embodiment, the present invention employs a DBNet algorithm for text block detection. The main network adopts ResNet, and adds an FPN structure to obtain a feature map F, and then a probability map, a threshold map and a binary map are obtained by reasoning through the feature map F, so that segmentation of a text box is realized. The loss function is: l=l _S +α×L _b +β×L _t Wherein L is _S Is a probability map loss function, L _b Is a binary pattern loss function, L _t Is a loss function of a threshold value diagram, the alpha value is 1.0, the beta value is 10, and L _S And L _b Solving by Binary Cross Entropy (BCE), L _t L1 loss calculation was used.

In one embodiment, the present invention pre-identifies the character content of a text block using a single line Chinese recognition algorithm. The single-line Chinese recognition algorithm can also be called as a target detection algorithm and is mainly responsible for pre-recognizing detected text blocks and is divided into three parts of a backbone network, a detection neck and a detection head. The backstone adopts a lightweight network GhostNet, and the detection capability is ensured as much as possible under the condition of greatly reducing the operation amount. The detection neg part adopts an LSTM algorithm to carry out the front-back semantic association on the identified Chinese characters, so that the efficient application of the characteristic information is realized. The detection head uses CTC algorithm to support end-to-end recognition of variable length text. The Loss function is CTC Loss: l (S) = Σ _(x,z)∈S ln p (zx), where S represents the training set space, x is the deduced input sequence, z represents the annotated label sequence, and p (zx) represents the accumulated value of all output sequences in x that can be mapped to z.

In one embodiment, step 2 of the present invention performs a relationship matching analysis according to the character content of the pre-identified text blocks and the positions of the text blocks, so as to obtain the interrelationship between the text blocks and the category of each text block, which specifically includes the following steps.

step 2.1 can be specifically divided into the following steps.

Wherein M is ₀ Maximum number of text blocks;

setting 2 trainable matrix variables

And->

in step 2.2, the coordinates of the center points of the text blocks are numbered in a sequence, the obtained text block numbers are marked as d, and the length of the corresponding position coding vector is N ₀ The numerical values are P _d Wherein P is _d The calculation formula of (2) is

In one embodiment, the relationship matching matrix is inferred through FFN network to obtain the category of the text block, and the calculation formula of the loss function in the inference is as follows

Wherein M is ₀ For the maximum number of text blocks, types is the number of categories of text blocks, y _di For a tag value of a certain class, y _{di_pre_prob} Is the probability value of outputting the category at the time of reasoning.

The above step 2 may also be referred to as a relational matching algorithm. The relation matching algorithm is to take a text block as a basic unit, perform semantic relation reasoning on the text block contained in the whole express delivery face list, and the main structure is shown in fig. 3, wherein the algorithm is input as position information of the text block and pre-identified character content, and the algorithm is output as a relation matching matrix and categories of each text block. Firstly, according to character information pre-recognized by text blocks, performing Embeddding coding on each text block, wherein each text block is coded into 1 Embeddding vector, and the Embeddding vectors of all the text blocks are spliced and transformed to obtain an integral text block coding matrix of the express delivery face list. Since the number of text blocks contained in each express delivery side sheet is fixed, the maximum number of text blocks can be set to 64. And then inputting the whole text block coding matrix and the position codes of the text blocks into a multi-stage self-attention module for reasoning to obtain a relation matching matrix among the text blocks. The relationship matching matrix mainly has two functions: firstly, specific categories (such as addressee addresses, sender names and the like) of the text blocks are obtained through forward network reasoning, and secondly, the specific categories are input into a subsequent text self-correcting algorithm, and the text self-correcting effect is improved by means of the interrelationship among the text blocks.

The relation matching algorithm can also be divided into a text block embedding calculation method (text block references calculation method) and a text block position embedding calculation method (text block position references calculation method) and a text block attention module calculation method.

The text block embedding calculation method mainly comprises the following steps.

Firstly, word segmentation is carried out on character content of a text block by adopting a word segmentation algorithm and a token method to obtain S ([ c1, c2, ci, … …, cn)]) Wherein ci is the minimum semantic unit of the character, chinese is taken, and English is taken; mapping characters into one-dimensional word vectors S by vocabulary inquiry _i Where i is the character number and the vector length is L ₀ (L ₀ The number of characters for the longest text block, which in one embodiment takes a value of 128), the shortfall is complemented with 0.

Splicing each text word vector into a text mapping matrix according to the corresponding text block numbering sequence (text blocks are numbered by a text block position embedding calculation method)

(wherein M ₀ For the maximum number of text blocks, the value is 64 in one embodiment of the invention), and the shortfall is 0.

Setting 2 trainable matrix variables

And->

(wherein N ₀ For the length of model hidden variables, in one embodiment of the inventionTake a value of 1024), by E _t ＝Z _t ×K _t +b _t Calculating text block Embeddding matrix of the express delivery face sheet>

The text block position embedding calculation method mainly comprises the following steps.

For text Embedddings matrix

The corresponding position-coding matrix is +.>

Firstly, sequencing and numbering each text block from left to right and from top to bottom according to the coordinates of a central point, marking the text block number as d, and enabling the length of a corresponding position coding vector to be N ₀ The numerical values are P _d Wherein P is _d The calculation formula of (2) is as follows:

then splicing the text block position coding vectors into a position coding matrix according to the numbering sequence

i is to distinguish whether d is odd or even, in order to make the numbered values of adjacent positions a bit further apart.

The text block attention module calculation method includes the following steps.

Text block Embedddings matrix E _t And a position coding matrix E _p And (3) reasoning through a text block attention module to obtain a relation matching matrix RE of the express delivery face list. The text block attention module is a multi-level self-attention network, the specific structure is shown in figure 4, and the text block Embeddings matrix E _t And a position coding matrix E _p After addition, the input variables are used as K channel and Q channel, and the V channel is directly input into text block Embeddings matrix E _t Multi-headed self-attention reasoning is then performed. The feature matrix output by the text block self-attention module is used as the input text block feature matrix of the next text block self-attention module.

The number N of text block self-attention module cascades can be flexibly selected according to the type and the number of the text blocks, and n=6 is adopted in an embodiment of the present invention. The text feature matrix finally output by the multi-stage self-attention module is the relation matching matrix RE of the express bill, and the relation matching matrix RE is input into the text self-correction module for next reasoning.

And the loss function calculation is also needed during the relation matching model training.

The relation matching matrix RE output by the text block attention module is inferred by a feedforward neural network (FFN) network to directly obtain the category of each text block. The loss function uses multi-class cross entropy_cross-sentropy with the calculation formula:

wherein M is ₀ For maximum number of text blocks, types is the number of text block categories (for example, the express bill, the text block categories may include, for example, express company name, addressee phone, etc.), y _di For a tag value of a certain class, y _{di_pre_prob} Is the probability value of this classification at the model output.

According to the invention, the self-correction model is adopted to automatically correct the characters in the single text block according to the semantic information in the text block and the semantic relation between the text blocks. The input of the model is a pre-recognized text character and a relation matching matrix RE output by a relation matching module, and the output is self-corrected text information. The main process is that the characters in the text block are subjected to Embeddding coding, added with the corresponding character position codes, then spliced with a relation matching matrix RE, and then corrected semantic text is obtained through self-attention push.

The self-correction mainly comprises a character Embeddings module calculation method, an attention coding module calculation method and a text self-correction algorithm loss function calculation.

The character Embeddings module calculating method comprises the following steps.

The first step: word segmentation is carried out on the text by using a token method to obtain S ([ c1, c2, ci, … …, cn)]) Wherein ci is the minimum semantic unit of the text, chinese is taken, and English is taken; mapping text into one-dimensional word vector S by vocabulary inquiry ₀ Vector length L ₀ (L ₀ The maximum number of characters in the text block is 128, and the shortage is 0.

And a second step of: generating word segmentation position vectors according to word segmentation sequences in input coding vectors, S ₀ The corresponding position vector is set to P according to the character sequence _i ([1,2,…,L ₀ ]) The total length of the word segmentation position vector is the same as the total length of the input coding vector, and is L ₀ 。

And a third step of: setting a trainable dictionary matrix

And->

Wherein W is ₀ For the total number of character codes L ₀ For the longest number of characters of the text block, N ₀ Calculating the Embeddings codes and the position Embeddings codes of the corresponding characters respectively for the length of the hidden variables of the model by a lookup table query method, and adding to obtain a character Embeddings coding matrix corresponding to the text block: />

Fourth step: splicing the relation matching matrix RE and the character coding matrix, wherein the formula is as follows: x is X _i ＝RE _i ||E _i Wherein RE is _i Is the corresponding relation matching vector of the ith text block, and has the length of N ₀ ，E _i Is character Embeddings coding matrix corresponding to the ith text block, and the shape is L ₀ ,N ₀ ]，X _i Representing an ith text block input encoding matrix in the shape of

[L ₀ +1,N ₀ ]。

The character attention coding module is composed of N Transformer Encoder modules (the specific structure is shown in figure 5) connected in series, and the input value is a text block input coding matrix X _i The output value is the current character feature matrix, and is used as the input value of the next-stage Transformer Encoder module to perform the next model reasoning.

The text self-correcting algorithm loss function calculates a text feature matrix output by the character attention module, and the text feature matrix corrected by each text block is directly obtained through reasoning of a feedforward neural network (FFN) network. The loss function uses multi-class cross entropy_cross-sentropy with the calculation formula:

wherein L is ₀ For the maximum number of characters per text block, num is the total number of Chinese characters, y _di Tag value, y, for a character class _{di_pre_prob} Is the probability value of this character class at the model output.

Each algorithm and each model need to be trained in advance, and each algorithm model has a strong data dependency relationship and needs to be trained step by step.

When the express delivery face sheet detection and face sheet pre-recognition algorithm is trained, the express delivery face sheet detection algorithm directly uses marked express delivery face sheet picture samples for training, and data enhancement can be carried out by adopting methods of random cutting, rotation, brightness, color, noise and the like. The text block detection and single-line chinese recognition module may use the disclosed OCR data set directly while augmenting the express sheet mono-related data set for training.

And (5) training a relation matching algorithm. The relation matching algorithm data set file is marked by adopting an XML format, each express delivery face list sample is a main node, the contained text blocks are corresponding sub-nodes, and each text block sub-node contains attributes such as a text block number, a pre-identification text, a text block category serial number and the like.

The method comprises the steps of firstly, using a trained text block detection algorithm to infer collected express delivery face sheets to obtain coordinates and sizes of all text blocks in the express delivery face sheets, intercepting and storing the text blocks as text block pictures, numbering the text block pictures according to a sequence from left to right and from top to bottom, and then using a single-row Chinese recognition algorithm to infer the intercepted text block pictures to obtain pre-recognition texts corresponding to the text blocks. And filling the sample name, the text block number and the pre-identification text into a data set file according to an XML specification, then judging the text block type serial number according to the text block information manually, and adding the text block type serial number into the node attribute corresponding to the annotation file to finally obtain the original annotation file.

And secondly, establishing a shape near word dictionary aiming at the common express words, and then carrying out confusion processing on the pre-identified text in the original labeling file according to a fixed proportion, wherein the random replacement shape near word accounts for 40%, the random deletion of words accounts for 20%, the random addition of words accounts for 20% and the random inversion of word sequence accounts for 20%, so that the data enhancement labeling file is obtained.

And thirdly, mixing the original annotation file and the data enhancement annotation file, and then carrying out random disorder, dividing a training set and a verification set, and completing the training of the relation matching algorithm.

The training of the text self-correcting algorithm, i.e., the training of the self-correcting model, is described below.

Because the relation matching matrix output by the relation matching algorithm is needed as input when the text self-correcting algorithm is trained, the data labeling formats of the two algorithms must be kept consistent. In the invention, a text self-correction algorithm data set is characterized in that a text label attribute is added on a text block child node on the basis of a relation matching algorithm data set to be used for representing real text characters of the text block.

Firstly, filling real text characters of each text block on the text label attribute of each text block child node in a manual checking mode on the basis of the relation matching algorithm data set, and completing the construction of the text self-correction algorithm data set.

And secondly, the prepared data set is proportionally divided into a training set and a verification set, and then the training is performed by an unfolding algorithm. When the text self-correction algorithm is trained, the text self-correction algorithm is trained by taking the express face single main node as a basic unit, all the child node data in the main node are input into the relation matching algorithm for reasoning to obtain a corresponding relation matching matrix, and then the corresponding relation matching matrix and the corresponding attribute value in the label file are input into the text self-correction algorithm for model training.

The invention also provides an express delivery face list recognition system which comprises a camera and an industrial personal computer. The camera is used for acquiring an original image.

The industrial personal computer is used for acquiring the express face list image from the original image, carrying out text block detection, carrying out text pre-recognition on the text block, carrying out relation matching analysis according to the pre-recognized text content and the position of the text block, thus obtaining the interrelationship of the text block and the category of the text block, carrying out self-correction on the pre-recognized text of the text block according to the interrelationship of the text block and the character semantics inside the text block, and taking the self-corrected text information and the text block category as final recognition results.

In one embodiment, the camera is provided with at least two groups, and express delivery face sheets in different height ranges are respectively acquired, so that image blurring caused by insufficient depth of field is prevented.

FIG. 1 illustrates one embodiment of the express delivery face sheet recognition system of the present invention. The recognition system of the format text comprises cameras 1 and 4, embedded

reasoning modules

2 and 5, an industrial personal computer 3 and a background server 6.

The modules are connected and communicate by wire or wirelessly. Wherein, the embedded reasoning module 2 is connected with the camera 1 through an MIPI bus; the embedded reasoning module 5 is connected with the camera 4 through an MIPI bus; the embedded reasoning module 2 is connected with the industrial personal computer 3 through the Ethernet; the embedded reasoning module 2 is connected with the industrial personal computer 3 through the Ethernet; the embedded reasoning module 5 is connected with the industrial personal computer 3 through the Ethernet; the industrial personal computer 3 is connected with the background server 6 through a wired or 4G channel.

As depicted in the system frame diagram, the various modules perform their respective different functions. The cameras 1 and 4 are mainly responsible for collecting video information in a shooting area and transmitting collected digital signals to the embedded

reasoning modules

2 and 5; the embedded

reasoning modules

2 and 5 are responsible for compressing the images and carrying out algorithm operation detection to extract the express bill images; the industrial personal computer 3 is responsible for OCR (optical character recognition) and layout analysis of the express bill image, completing self-correction of characters according to semantics, displaying the express bill image and the recognition result on a display in real time, and uploading picture data, the recognition result and configuration information to the background server 6; the background server 6 is responsible for storing and managing the picture data, the identification result and the configuration information, and sends the identification result to other systems through an API interface.

The main flow of the system in operation is roughly divided into three stages. The first stage is an express delivery face list detection stage, equipment intercepts a frame of image data from a video stream, and after preprocessing, the position and the size of the express delivery face list are calculated through reasoning of an express delivery face list detection model and intercepted into an independent express delivery face list image. And the second stage is a character pre-recognition stage, namely firstly detecting text blocks of the corrected express bill, then circularly using an OCR algorithm to perform OCR pre-recognition on each text block, and if the number of the text blocks, the number of characters, the confidence level and the like of the pre-recognition result reach a set threshold value, performing subsequent character self-correction. The third stage is a character self-correcting stage, and the relationship matching module judges the interrelationship among the text blocks by reasoning through a attention mechanism according to the positions of the text blocks and the content of the pre-recognized characters, so that the category (such as the address content of a receiver, the telephone content of a sender and the like) of the text blocks is predicted, and meanwhile, the output relationship vector is sent to the self-correcting module to finish error correction of the pre-recognized text characters.

FIG. 3 shows a schematic diagram of the cooperation of the various algorithm modules. After the video is shot by the camera, a corresponding data frame is intercepted from the video and is used as an input image to be input into the express delivery face list detection module, and the express delivery face list detection module operates an express delivery face list detection algorithm to screen out an image of the express delivery face list from the data frame.

The image of the express bill is input to the bill pre-recognition module as input data. The face sheet pre-recognition module comprises two modules, namely a text block detection module and a single-line Chinese recognition module, wherein the text block detection module operates a text block detection algorithm to detect all text blocks of each express face sheet. The single line chinese detection algorithm pre-identifies the character content of each text block.

The text block is then input as a relationship matching module, which includes a text block position encoding module (named text block positions in the figure), a text block attention module, and an FFN module, the text block position encoding module generating a text block position encoding matrix according to coordinates of the text block. The text block coding module generates a text block coding matrix according to the character content of the pre-recognized text block, and then inputs the text block coding matrix to the text block attention module to obtain a relation matching matrix, and the relation matching matrix is used as the input of the FFN module to obtain the category, namely the classification, of the corresponding text block.

In addition, the relation matching matrix is input to a text self-correcting module. The text self-correcting module comprises a character coding module (called a character codedings module in the figure), a character attention module and an FFN module.

The character encoding module encodes the characters of each text block to generate a character encoding matrix, the character encoding matrix and the relation matching matrix are input into the character attention module together to obtain a character feature matrix, and the character feature matrix obtains self-correcting character information, namely self-correcting character information through the FFN module.

It should be noted that the FFN modules of the relationship matching module and the text self-correcting module are independent FFN modules.

According to the invention, the express face sheet image is acquired through the camera, all the text blocks are extracted through the text block detection algorithm, and then the relation matching analysis is carried out according to the position coordinates of the text blocks and the pre-identified text, so that face sheet layout analysis and OCR semantic self-correction are realized.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims

1. The express delivery face bill identification and self-correction method is characterized by comprising the following steps of:

and step 4, taking the character information and the text block type obtained by self-correction as the recognition result of the express bill.

2. The method for identifying and self-correcting the express delivery face list according to claim 1, wherein the method is characterized in that data frames are intercepted from a video stream and are input into a preset express delivery face list detection model to obtain images of the express delivery face list.

3. The method for identifying and self-correcting an express delivery order according to claim 1, wherein the step 2 comprises:

4. The method for identifying and self-correcting an express delivery bill according to claim 3, wherein the step 2.1 comprises:

Wherein M is ₀ Maximum number of text blocks;

setting 2 trainable matrix variables

And->

5. The method for identifying and self-correcting an express delivery bill according to claim 4, wherein the step 2.2 includes:

6. The express delivery face order recognition and self-correction method of claim 3, wherein the relation matching matrix obtains the category of the text block through FFN network reasoning, and a calculation formula of a loss function in the reasoning is as follows

7. The method for identifying and self-correcting an express delivery order according to claim 1, wherein the step 3 comprises:

8. The method for identifying and self-correcting an express delivery order according to claim 7, wherein the step 3.1 comprises:

Setting a trainable dictionary matrix

And->

9. An express delivery face list identification system, which is characterized by comprising:

the camera is used for acquiring images;

10. The express delivery order identification system of claim 9, wherein the cameras are provided with at least two groups.