CN112818951B - Ticket identification method - Google Patents
Ticket identification method Download PDFInfo
- Publication number
- CN112818951B CN112818951B CN202110265378.4A CN202110265378A CN112818951B CN 112818951 B CN112818951 B CN 112818951B CN 202110265378 A CN202110265378 A CN 202110265378A CN 112818951 B CN112818951 B CN 112818951B
- Authority
- CN
- China
- Prior art keywords
- text
- network
- recognition
- text line
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 48
- 238000001514 detection method Methods 0.000 claims abstract description 29
- 238000012549 training Methods 0.000 claims abstract description 28
- 238000000605 extraction Methods 0.000 claims abstract description 22
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 8
- 230000008569 process Effects 0.000 claims description 23
- 230000006870 function Effects 0.000 claims description 15
- 230000007246 mechanism Effects 0.000 claims description 8
- 230000004927 fusion Effects 0.000 claims description 7
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 4
- 239000011159 matrix material Substances 0.000 claims description 4
- 238000009877 rendering Methods 0.000 claims description 2
- 238000005070 sampling Methods 0.000 claims description 2
- 238000012545 processing Methods 0.000 abstract description 3
- 230000015572 biosynthetic process Effects 0.000 abstract description 2
- 238000004364 calculation method Methods 0.000 abstract description 2
- 238000003786 synthesis reaction Methods 0.000 abstract description 2
- 230000000694 effects Effects 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 241000282326 Felis catus Species 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 230000001502 supplementing effect Effects 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/414—Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Computer Graphics (AREA)
- Geometry (AREA)
- Probability & Statistics with Applications (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Character Discrimination (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a ticket identification method, which relates to the technical field of text detection, text identification and information structured extraction and solves the technical problem that the existing model can not effectively extract structured information; the training data of the text recognition model is expanded and the accuracy of the recognition model is improved through the rule synthesis data of the high-frequency words and the text content of the specific field in the high-frequency words; based on convolutional neural network, the method has good parallelism, and high-performance GPU (Graphics Processing Unit, graphics processor) can be utilized to accelerate calculation.
Description
Technical Field
The disclosure relates to the technical field of text detection, text recognition and information structured extraction, in particular to a ticket recognition method.
Background
Ticket identification refers to a technology for identifying images containing text information in different fields such as invoices, identity cards, bank cards and the like which are common in daily life and extracting structured information in the images. Because of the numerous fields involved in the ticket, the format of the ticket is complex, which presents a number of difficulties for identification and structured extraction.
The ticket structured recognition task can be subdivided into research tasks in a number of fields of text detection, text recognition, etc. The main method in the current text detection field is to combine the target detection or segmentation algorithm in the deep learning with the text detection task, such as EAST, the algorithm adopts the FCN (Fully Convolutional Networks, full convolution network) structure commonly used in semantic segmentation, regression is actually performed on text frame parameters based on the regression idea, the operation of feature extraction and feature fusion is completed by means of the FCN structure, then the EAST model predicts the regression parameters of a group of text lines at each position in the image, and finally the text lines in the input image can be extracted by using non-maximum suppression operation. The method greatly simplifies the text detection process, but the current similar method still has the problems of poor long text detection effect and poor small text area detection capability, and the problems are the key problems in ticket identification.
The current methods in the text recognition field mainly comprise character recognition and sequence recognition. When the character recognition method is used for character recognition, firstly, single characters are required to be segmented from the images, then the single character images are classified by a classifier, and finally, recognition results of the character line level are combined; and the text recognition algorithm based on sequence recognition takes the whole text line as the minimum unit of recognition, completes the recognition of the whole text sequence in an automatic alignment mode, and introduces a Seq2Seq model and an attention mechanism of natural language processing to improve the recognition effect. However, both methods have respective problems, and the character recognition method requires supervision information at the character level, so that a large amount of labeling work is required; the robustness of the sequence recognition-based method is greatly affected by training data, and erroneous recognition is likely to occur for images with complex backgrounds and similar characters.
Therefore, for the task of ticket structured recognition, the current method does not consider the problem of extracting information structure, and the messy information obtained by recognition cannot be directly used for subsequent work, so the problem needs to be studied and solved.
Disclosure of Invention
The present disclosure provides a ticket identifying method, which aims at establishing a model capable of effectively extracting structured information, aiming at the problems of different image styles, non-uniform table format, unclear printing, etc. in tickets.
The technical aim of the disclosure is achieved by the following technical scheme:
a method of ticket recognition, a model training process and a text recognition process, the model training process comprising:
s100: collecting data for text line detection and text image recognition; wherein the data comprises a text line image;
s101: collecting high-frequency words appearing in various ticket scenes, establishing a keyword database through the high-frequency words, counting rules of text contents in specific fields in the high-frequency words, and randomly generating expansion data according to the high-frequency words and the rules;
s102: training a CTPN network through the text line image to obtain a text line position detection model;
s103: training the recognition network through the data and the expansion data to obtain a text recognition model with a self-attention mechanism;
the text recognition process includes:
s200, inputting an image of the ticket into a text line position detection model, wherein the text line position detection model detects the text line position in the ticket and outputs a text image with the detected text line position;
s201: inputting the text image into a text recognition model for text recognition, recognizing the text through a self-attention mechanism of the text recognition model to obtain a recognition result, and carrying out structural extraction on the recognition result according to the keyword database to obtain effective information.
The beneficial effects of the present disclosure are: the invention obtains the text line position detection model by training the CTPN network, thereby positioning key information in the ticket and having robustness to the ticket in various forms (forms, etc.); the training data of the text recognition model is expanded and the accuracy of the recognition model is improved through the rule synthesis data of the high-frequency words and the text content of the specific field in the high-frequency words; based on convolutional neural network, the method has good parallelism, and high-performance GPU (Graphics Processing Unit, graphics processor) can be utilized to accelerate calculation.
Drawings
FIGS. 1 and 2 are flowcharts of a method model training process for ticket identification according to the present invention;
FIGS. 3 and 4 are flowcharts of a text recognition process according to a ticket recognition method of the present invention;
FIG. 5 is a block diagram of a text recognition model;
fig. 6 is a schematic flow chart of text line positioning, text recognition and structured extraction according to an embodiment of the present invention.
Detailed Description
The technical scheme of the present disclosure will be described in detail below with reference to the accompanying drawings. In the description of the present disclosure, it should be understood that the terms "first," "second," and "third" are used for descriptive purposes only and are not to be interpreted as indicating or implying a relative importance or the number of technical features indicated is implicitly indicated, only for distinguishing between different components.
Fig. 1 and fig. 2 are flowcharts of a model training process of a ticket recognition method according to the present invention, where, as shown in fig. 1 and fig. 2, the model training process includes: s100: collecting data for text line detection and text image recognition; wherein the data comprises a text line image.
Specifically, data for text line detection and text image recognition are collected, and a large number of published, high-precision labeled text line detection sets and text image recognition data sets containing multiple languages can be obtained through searching in the text detection and recognition research field. And screening out data which has a larger phase difference with the ticket identification scene from the collected data set, marking and removing the obtained abnormal data, and using the data obtained by the arrangement for training of a CTPN (Connectionist Text Proposal Network, connected with a text proposal network) network and an identification network.
S101: collecting high-frequency words appearing in various ticket scenes, establishing a keyword database through the high-frequency words, counting rules of text contents in specific fields in the high-frequency words, and randomly generating expansion data according to the high-frequency words and the rules.
Specifically, generating extended data randomly according to the high-frequency word and the rule includes: (1) And combining the high-frequency words with word frequency not smaller than a preset threshold value to generate a text. (2) The text is combined to conform to a particular format of text in the ticket. (3) And randomly selecting blank or noisy images as a background, and rendering the text conforming to a specific format onto the images to obtain images of the text, thereby obtaining the expansion data.
The data or the expansion data are actually image data, and the CTPN network and the identification network are directly trained by extracting the characteristics of the image data.
S102: and training the CTPN network through the text line image to obtain a text line position detection model.
S103: training the recognition network through the data and the expansion data to obtain a text recognition model.
Fig. 3 and fig. 4 are flowcharts of a text recognition process according to a ticket recognition method of the present invention, and as shown in fig. 3 and fig. 4, the text recognition process includes: s200, inputting the image of the ticket into a text line position detection model, wherein the text line position detection model detects the text line position in the ticket, and outputting the text image with the detected text line position.
S201: inputting the text image into a text recognition model for text recognition, recognizing the text through a self-attention mechanism of the text recognition model to obtain a recognition result, and carrying out structural extraction on the recognition result according to the keyword database to obtain effective information.
Specifically, the performing the structured extraction to obtain the effective information includes: and calculating the editing distance between each keyword and the recognition result, generating an editing distance matrix, matching the pairing recognition result with the minimum editing distance for each keyword, and determining the position of the keyword in the recognition result according to the pairing recognition result to obtain the effective information. When the key word is not matched with the pairing identification result, returning a default value; that is, the recognition rate is not 100%, and when the keyword cannot be matched with the pairing recognition result with the minimum editing distance, a default value is returned. The output of the deep neural network is matched through the minimum editing distance to obtain keyword information, so that the reliability of the result is effectively improved.
As a specific embodiment, step S102 includes:
s102-1: the CTPN network comprises a convolutional neural network, an LSTM (Long Short-Term Memory) network and a 1X 1 convolutional layer which are connected in sequence; each text line comprises at least two text line components, and a plurality of preset anchor boxes with fixed widths of 16 and different heights are preset in the convolutional neural network for positioning the text line components.
S102-2: the initial learning rate of the CTPN network training is 0.001, the momentum is 0.9, and the text line images are put into the CTPN network for training.
In the forward propagation process of the CTPN network, firstly, carrying out feature extraction on an input text line image through a convolutional neural network (such as VGG 16) to obtain a first feature map with the size of NxCxHxW, then carrying out convolution on the first feature map at a position corresponding to each preset anchor frame by using 3 x 3 to obtain a second feature map with the size of Nx9CxHxW, then converting the dimension of the second feature map into NH xW x 9C, then sending the second feature map with the dimension of NH xW x 9C into the LSTM network to learn the sequence feature of each line in the second feature map, obtaining a third feature map with the output of NH xW x 256, converting the dimension of the third feature map into Nx512 xH xW, and finally inputting the third feature map with the dimension of Nx512 xH xW into a convolution layer of 1 x 1 to obtain a prediction result; where N represents the number of text line images processed at a time, H represents the height of the text line images, W represents the width of the text line images, and C represents the number of channels the text line images have in the forward propagation of the network.
S102-3: after the prediction result is obtained, calculating the loss of the CTPN network according to a first loss function, updating the parameter of the CTPN network by using an optimizer SGD (stochastic gradient descent, random gradient descent), putting the text line image into the CTPN network after updating the parameter for training, and repeating the process repeatedly until an optimal prediction result is obtained, and storing the optimal model parameter corresponding to the optimal prediction result to obtain the text line position detection model;
wherein the first loss function is: loss=λ v ×L v +λ conf ×L conf +λ x ×L x Wherein L is v The ordinate Loss is represented, namely a Loss function Smooth L1Loss between the coordinates and the height of the central point of the preset anchor frame and the coordinates and the height of the central point of the actual anchor frame is preset; l (L) conf Representing confidence loss, namely judging whether binary cross entropy loss of text line components exists between the preset anchor frame confidence and the actual anchor frame; l (L) x The horizontal coordinate offset Loss is represented, namely, a Loss function Smooth L1Loss between the horizontal coordinate and width offset value of the text line in the predicted anchor frame and the horizontal coordinate and width offset value of the text line in the actual anchor frame; lambda (lambda) v 、λ conf 、λ x Representing the weight;
the output result of the text line component at each preset anchor frame position comprises: v j 、v h 、s i 、x side Wherein v is j 、v h Representing the coordinates and the height s of the central point of the preset anchor frame i Representing confidence level, x, of text line components included in preset anchor boxes side Offset values representing lateral coordinates and width of the text line segment.
As a specific embodiment, step S103 includes:
s103-1: the recognition network comprises a feature extraction network, a feature fusion network, a coding network, a full-connection layer of one layer and a decoding algorithm which are sequentially connected, as shown in fig. 5.
S103-2: the initial learning rate of the identification network is 0.0001, the beta value of the optimizer Adam is (0.9,0.999), and the data and the expansion data are put into the identification network for training;
in the forward propagation process of the identification network, carrying out feature extraction on an image with the size of H multiplied by W through the feature extraction network to obtain a first feature;
fusing the first features through the feature fusion network, and sampling the fused first features to enable the height of the fused first features to be 1, so as to obtain second features;
inputting the second characteristic into the coding network to code so as to obtain a coding characteristic;
inputting the coding features into the full-connection layer for decoding to obtain a decoding result;
finally, aligning the decoding result through the decoding algorithm to obtain an identification result;
wherein the feature extraction network is a Resnet50 network, the feature fusion network is a FPEM (Feature Pyramid Enhancement Module ) network, the encoding network is an Encoder network, the decoding algorithm is a CTC (ConnectionistTemporal Classification, connection timing classification) algorithm, and the loss function of the CTC algorithm isY represents the decoding result, Y ' represents the correctly labeled recognition result, t represents the sequence length of the coding feature, k represents the alignment function of the CTC network, C: k (C) =Y ' represents the recognition result Y ' that all sequences C in the set C can be correctly labeled by the CTC algorithm, p represents the probability, p (C) t Y) means that a sequence c of length t is obtained on the premise of Y t Is a probability of (2).
The Resnet50 network is a residual network for extracting visual features of the image, the FPEM network is a convolution network for fusing the visual features of the multi-stage image, and the receptive field of the model can be increased by fusing the multi-stage features, so that the accuracy of the model is improved. The Encoder network is a characteristic coding network based on a self-attention mechanism, and the adoption of the self-attention mechanism can enable the model to extract effective information in the characteristics more accurately, so that the robustness of the text recognition model is improved. The CTC algorithm is a decoding algorithm of an output sequence, for example, the output sequence is "cccaaat", and the output sequence is "cat" after being aligned by the CTC algorithm.
The Encoder network is an Encoder part in a model transducer widely applied in the fields of natural language processing and computer vision, and the model part benefits from the excellent characteristic capturing performance of a stackable encoding module, wherein the encoding module comprises two parts of Multi-Head Attention and Feed Forward, and the Multi-Head Attention is expressed as follows:
Multi-Head Attention(x)=x+Self-Attention(FC(x),FC(x),FC(x));
wherein the input of the Encoder is respectively input into Self-Attention module (Q, K, V) after passing through 3 full connection layers FC, d k For the dimension of the input, T represents the matrix transpose; the feedforward part is composed of 1 full-connection layer FC, 1 Relu activation function and 1 full-connection layer FC.
S103-3: after the identification result is obtained, the loss of the identification network is calculated through the loss function of the CTC algorithm, the parameters of the identification network are updated by using an optimizer Adam, the data and the expanded data are input into the identification network after the parameters are updated for training, the process is repeated repeatedly until the optimal identification result is obtained, and the optimal model parameters corresponding to the optimal identification result are stored, so that the text identification model is obtained.
Fig. 6 is a schematic flow chart of text line location, text recognition and structured extraction provided by the embodiment of the invention, a single ticket image is input into a text line position detection model (CTPN model) loaded with optimal parameters to obtain a text line detection result, and redundant text boxes are filtered through a confidence threshold to obtain text location boxes of key areas on the image.
When recognizing text line content, the height of the text line image is generally adjusted to 32 pixels and then sent to the text recognition model for recognition, which specifically comprises: (1) The text line image is scaled with the original aspect ratio maintained, and the scaled image height h ' =32, and the image width w ' =w× (h '/h), where w, h is the original image width and height. (2) And inputting the single image into a text recognition model loaded with the optimal parameters to obtain a recognition vector. (3) And processing the recognition vector by a CTC decoding algorithm to obtain a text sequence with highest confidence.
And then carrying out structured extraction to obtain effective information, wherein the method comprises the following steps: (1) Calculating the editing distance between each keyword and the text recognition result, wherein the larger the editing distance is, the lower the matching degree is; (2) Generating an edit distance matrix, and finding the pairing with the minimum edit distance for each keyword; (3) And determining the position of the keyword in the recognition result according to the pairing, and obtaining the text content. And finally extracting the positioned key information, organizing the positioned key information into structured data according to the corresponding type, outputting the structured data, and supplementing the structured data by using a default value obtained by statistics if the positioned key information is not matched with the key words.
The foregoing is an exemplary embodiment of the disclosure, the scope of which is defined by the claims and their equivalents.
Claims (3)
1. A method of ticket recognition, characterized by a model training process and a text recognition process, the model training process comprising:
s100: collecting data for text line detection and text image recognition; wherein the data comprises a text line image;
s101: collecting high-frequency words appearing in various ticket scenes, establishing a keyword database through the high-frequency words, counting rules of field text contents in the high-frequency words, and randomly generating expansion data according to the high-frequency words and the rules;
s102: training a CTPN network through the text line image to obtain a text line position detection model;
s103: training the recognition network through the data and the expansion data to obtain a text recognition model with a self-attention mechanism;
the text recognition process includes:
s200, inputting an image of the ticket into a text line position detection model, wherein the text line position detection model detects the text line position in the ticket and outputs a text image with the detected text line position;
s201: inputting the text image into a text recognition model for text recognition, recognizing the text through a self-attention mechanism of the text recognition model to obtain a recognition result, and carrying out structural extraction on the recognition result according to the keyword database to obtain effective information;
in the step S101, generating the extended data randomly according to the high-frequency word and the rule includes:
combining the high-frequency words with word frequency not smaller than a preset threshold value to generate a text;
combining the text into a specific format conforming to the text in the ticket;
randomly selecting blank or noisy images as a background, and rendering the text conforming to a specific format onto the images to obtain images of the text, namely obtaining the expansion data;
the step S102 includes:
s102-1: the CTPN network comprises a convolutional neural network, an LSTM network and a 1X 1 convolutional layer which are connected in sequence; each text line comprises at least two text line components, and a plurality of preset anchor frames with fixed widths of 16 and different heights are preset in the convolutional neural network and used for positioning the text line components;
s102-2: the initial learning rate of the CTPN network training is 0.001, the momentum is 0.9, and the text line images are put into the CTPN network for training;
in the forward propagation process of the CTPN network, firstly, carrying out feature extraction on an input text line image through the convolutional neural network to obtain a first feature image with the size of NxCxHxW, then carrying out convolution on the first feature image at the position corresponding to each preset anchor frame by using 3 x 3 to obtain a second feature image with the size of Nx9CxHxW, then converting the dimension of the second feature image into NH xW x 9C, then sending the second feature image with the dimension of NH xW x 9C into the LSTM network to learn the sequence feature of each line in the second feature image, obtaining a third feature image with the output of NH xW x 256, converting the dimension of the third feature image into Nx512 xH xW, and finally inputting the third feature image with the dimension of Nx512 xH xW into the convolution layer for convolution to obtain a prediction result; wherein N represents the number of text line images processed each time, H represents the height of the text line images, W represents the width of the text line images, and C represents the number of channels of the text line images in forward propagation of the network;
s102-3: after the prediction result is obtained, calculating the loss of the CTPN network according to a first loss function, updating the parameter of the CTPN network by using an optimizer SGD, putting the text line image into the CTPN network after updating the parameter for training, repeating the process repeatedly until an optimal prediction result is obtained, and storing the optimal model parameter corresponding to the optimal prediction result to obtain the text line position detection model;
wherein the first loss function is: loss=λ v ×L v +λ conf ×L conf +λ x ×L x Wherein L is v The ordinate Loss is represented, namely a Loss function Smooth L1Loss between the coordinates and the height of the central point of the preset anchor frame and the coordinates and the height of the central point of the actual anchor frame is preset; l (L) conf Representing confidence loss, namely judging whether binary cross entropy loss of text line components exists between the preset anchor frame confidence and the actual anchor frame; l (L) x Representing the loss of the horizontal coordinate offset, i.e. the loss between the offset value of the horizontal coordinate and width of the line in the predicted anchor frame and the offset value of the horizontal coordinate and width of the line in the actual anchor frameThe Loss function Smooth L1Loss; lambda (lambda) v 、λ conf 、λ x Representing the weight;
the output result of the text line component at each preset anchor frame position comprises: v j 、v h 、s i 、x side Wherein v is j 、v h Representing the coordinates and the height s of the central point of the preset anchor frame i Representing confidence level, x, of text line components included in preset anchor boxes side Offset values representing lateral coordinates and width of the text line component;
the step S103 includes:
s103-1: the identification network comprises a feature extraction network, a feature fusion network, a coding network, a full-connection layer of one layer and a decoding algorithm which are connected in sequence;
s103-2: the initial learning rate of the identification network is 0.0001, the beta value of the optimizer Adam is (0.9,0.999), and the data and the expansion data are put into the identification network for training;
in the forward propagation process of the identification network, carrying out feature extraction on an image with the size of H multiplied by W through the feature extraction network to obtain a first feature;
fusing the first features through the feature fusion network, and sampling the fused first features to enable the height of the fused first features to be 1, so as to obtain second features;
inputting the second characteristic into the coding network to code so as to obtain a coding characteristic;
inputting the coding features into the full-connection layer for decoding to obtain a decoding result;
finally, aligning the decoding result through the decoding algorithm to obtain an identification result;
wherein the feature extraction network is a Resnet50 network, the feature fusion network is an FPEM network, the encoding network is an Encoder network, the decoding algorithm is a CTC algorithm, and the loss function of the CTC algorithm isY represents the decoding result, Y ' represents the correctly labeled recognition result, t represents the sequence length of the coding feature, k represents the alignment function of the CTC network, C: k (C) =Y ' represents the recognition result Y ' that all sequences C in the set C can be correctly labeled by the CTC algorithm, p represents the probability, p (C) t Y) means that a sequence c of length t is obtained on the premise of Y t Probability of (2);
s103-3: after the identification result is obtained, the loss of the identification network is calculated through the loss function of the CTC algorithm, the parameters of the identification network are updated by using an optimizer Adam, the data and the expanded data are input into the identification network after the parameters are updated for training, the process is repeated repeatedly until the optimal identification result is obtained, and the optimal model parameters corresponding to the optimal identification result are stored, so that the text identification model is obtained.
2. A method for identifying a ticket as claimed in claim 1, wherein in step S201, the performing a structured extraction to obtain valid information includes:
calculating the editing distance between each keyword and the recognition result, generating an editing distance matrix, matching the pairing recognition result with the minimum editing distance for each keyword, and determining the position of the keyword in the recognition result according to the pairing recognition result to obtain the effective information;
and returning to a default value when the keyword is not matched with the pairing identification result.
3. A ticket recognition method according to claim 2, wherein in step S201, when recognizing a text line image by the text recognition model, the text line image is adjusted to 32 pixels in height and then sent to the text recognition model for recognition.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110265378.4A CN112818951B (en) | 2021-03-11 | 2021-03-11 | Ticket identification method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110265378.4A CN112818951B (en) | 2021-03-11 | 2021-03-11 | Ticket identification method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112818951A CN112818951A (en) | 2021-05-18 |
CN112818951B true CN112818951B (en) | 2023-11-21 |
Family
ID=75863141
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110265378.4A Active CN112818951B (en) | 2021-03-11 | 2021-03-11 | Ticket identification method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112818951B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113255645B (en) * | 2021-05-21 | 2024-04-23 | 北京有竹居网络技术有限公司 | Text line picture decoding method, device and equipment |
CN113255646B (en) * | 2021-06-02 | 2022-10-18 | 北京理工大学 | Real-time scene text detection method |
CN113298179B (en) * | 2021-06-15 | 2024-05-28 | 南京大学 | Customs commodity abnormal price detection method and device |
CN113657377B (en) * | 2021-07-22 | 2023-11-14 | 西南财经大学 | Structured recognition method for mechanical bill image |
CN113591772B (en) * | 2021-08-10 | 2024-01-19 | 上海杉互健康科技有限公司 | Method, system, equipment and storage medium for structured identification and input of medical information |
CN115019327B (en) * | 2022-06-28 | 2024-03-08 | 珠海金智维信息科技有限公司 | Fragment bill recognition method and system based on fragment bill segmentation and Transformer network |
CN115713777A (en) * | 2023-01-06 | 2023-02-24 | 山东科技大学 | Contract document content identification method |
CN116912852B (en) * | 2023-07-25 | 2024-10-01 | 京东方科技集团股份有限公司 | Method, device and storage medium for identifying text of business card |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108446621A (en) * | 2018-03-14 | 2018-08-24 | 平安科技(深圳)有限公司 | Bank slip recognition method, server and computer readable storage medium |
CN108921166A (en) * | 2018-06-22 | 2018-11-30 | 深源恒际科技有限公司 | Medical bill class text detection recognition method and system based on deep neural network |
CN110097049A (en) * | 2019-04-03 | 2019-08-06 | 中国科学院计算技术研究所 | A kind of natural scene Method for text detection and system |
CN110263694A (en) * | 2019-06-13 | 2019-09-20 | 泰康保险集团股份有限公司 | A kind of bank slip recognition method and device |
CN110399845A (en) * | 2019-07-29 | 2019-11-01 | 上海海事大学 | Continuously at section text detection and recognition methods in a kind of image |
CN110807455A (en) * | 2019-09-19 | 2020-02-18 | 平安科技(深圳)有限公司 | Bill detection method, device and equipment based on deep learning and storage medium |
CN110866495A (en) * | 2019-11-14 | 2020-03-06 | 杭州睿琪软件有限公司 | Bill image recognition method, bill image recognition device, bill image recognition equipment, training method and storage medium |
CN111340034A (en) * | 2020-03-23 | 2020-06-26 | 深圳智能思创科技有限公司 | Text detection and identification method and system for natural scene |
CN111832423A (en) * | 2020-06-19 | 2020-10-27 | 北京邮电大学 | Bill information identification method, device and system |
CN112115934A (en) * | 2020-09-16 | 2020-12-22 | 四川长虹电器股份有限公司 | Bill image text detection method based on deep learning example segmentation |
-
2021
- 2021-03-11 CN CN202110265378.4A patent/CN112818951B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108446621A (en) * | 2018-03-14 | 2018-08-24 | 平安科技(深圳)有限公司 | Bank slip recognition method, server and computer readable storage medium |
WO2019174130A1 (en) * | 2018-03-14 | 2019-09-19 | 平安科技(深圳)有限公司 | Bill recognition method, server, and computer readable storage medium |
CN108921166A (en) * | 2018-06-22 | 2018-11-30 | 深源恒际科技有限公司 | Medical bill class text detection recognition method and system based on deep neural network |
CN110097049A (en) * | 2019-04-03 | 2019-08-06 | 中国科学院计算技术研究所 | A kind of natural scene Method for text detection and system |
CN110263694A (en) * | 2019-06-13 | 2019-09-20 | 泰康保险集团股份有限公司 | A kind of bank slip recognition method and device |
CN110399845A (en) * | 2019-07-29 | 2019-11-01 | 上海海事大学 | Continuously at section text detection and recognition methods in a kind of image |
CN110807455A (en) * | 2019-09-19 | 2020-02-18 | 平安科技(深圳)有限公司 | Bill detection method, device and equipment based on deep learning and storage medium |
CN110866495A (en) * | 2019-11-14 | 2020-03-06 | 杭州睿琪软件有限公司 | Bill image recognition method, bill image recognition device, bill image recognition equipment, training method and storage medium |
CN111340034A (en) * | 2020-03-23 | 2020-06-26 | 深圳智能思创科技有限公司 | Text detection and identification method and system for natural scene |
CN111832423A (en) * | 2020-06-19 | 2020-10-27 | 北京邮电大学 | Bill information identification method, device and system |
CN112115934A (en) * | 2020-09-16 | 2020-12-22 | 四川长虹电器股份有限公司 | Bill image text detection method based on deep learning example segmentation |
Non-Patent Citations (5)
Title |
---|
financial ticket intelligent recognition system based on deep learning;fukang tian等;arxiv;1-15 * |
ticket text detection and recognition based on deep learning;xiuxin chen等;2019 chinese automation congress;1-5 * |
基于深度学习的自然场景文本检测与识别综述;王建新;王子亚;田萱;;软件学报;第31卷(第05期);1465-1496 * |
基于深度学习的表格类型工单识别设计与实现;潘炜;刘丰威;;数字技术与应用;第38卷(第07期);150-152 * |
基于高分辨率卷积神经网络的场景文本检测模型;陈淼妙;续晋华;;计算机应用与软件;第37卷(第10期);138-144 * |
Also Published As
Publication number | Publication date |
---|---|
CN112818951A (en) | 2021-05-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112818951B (en) | Ticket identification method | |
CN108898138A (en) | Scene text recognition methods based on deep learning | |
CN111027562A (en) | Optical character recognition method based on multi-scale CNN and RNN combined with attention mechanism | |
CN114155527A (en) | Scene text recognition method and device | |
CN112686219B (en) | Handwritten text recognition method and computer storage medium | |
CN114092930B (en) | Character recognition method and system | |
CN112926379A (en) | Method and device for constructing face recognition model | |
CN114067300A (en) | End-to-end license plate correction and identification method | |
Tang et al. | HRCenterNet: An anchorless approach to Chinese character segmentation in historical documents | |
CN113159071B (en) | Cross-modal image-text association anomaly detection method | |
CN117079288B (en) | Method and model for extracting key information for recognizing Chinese semantics in scene | |
CN114581956A (en) | Multi-branch fine-grained feature fusion pedestrian re-identification method | |
Elaraby et al. | A Novel Siamese Network for Few/Zero-Shot Handwritten Character Recognition Tasks. | |
Zuo et al. | An intelligent knowledge extraction framework for recognizing identification information from real-world ID card images | |
CN111832497B (en) | Text detection post-processing method based on geometric features | |
KR20200068073A (en) | Improvement of Character Recognition for Parts Book Using Pre-processing of Deep Learning | |
Karanje et al. | Survey on text detection, segmentation and recognition from a natural scene images | |
CN111242114B (en) | Character recognition method and device | |
Goel et al. | Text extraction from natural scene images using OpenCV and CNN | |
CN113903043B (en) | Method for identifying printed Chinese character font based on twin metric model | |
CN111178409B (en) | Image matching and recognition system based on big data matrix stability analysis | |
CN114399768A (en) | Workpiece product serial number identification method, device and system based on Tesseract-OCR engine | |
Sahu et al. | A survey on handwritten character recognition | |
CN116311275B (en) | Text recognition method and system based on seq2seq language model | |
Sharma | Recovery of drawing order in handwritten digit images |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |