CN112818951A - Ticket identification method - Google Patents
Ticket identification method Download PDFInfo
- Publication number
- CN112818951A CN112818951A CN202110265378.4A CN202110265378A CN112818951A CN 112818951 A CN112818951 A CN 112818951A CN 202110265378 A CN202110265378 A CN 202110265378A CN 112818951 A CN112818951 A CN 112818951A
- Authority
- CN
- China
- Prior art keywords
- text
- network
- recognition
- text line
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 50
- 238000001514 detection method Methods 0.000 claims abstract description 29
- 238000012549 training Methods 0.000 claims abstract description 28
- 238000000605 extraction Methods 0.000 claims abstract description 22
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 8
- 230000006870 function Effects 0.000 claims description 15
- 230000004927 fusion Effects 0.000 claims description 7
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 4
- 239000011159 matrix material Substances 0.000 claims description 4
- 230000001131 transforming effect Effects 0.000 claims description 4
- 238000009877 rendering Methods 0.000 claims description 2
- 238000005070 sampling Methods 0.000 claims description 2
- 230000003190 augmentative effect Effects 0.000 claims 1
- 238000012545 processing Methods 0.000 abstract description 3
- 238000004364 calculation method Methods 0.000 abstract description 2
- 239000000284 extract Substances 0.000 abstract description 2
- 230000000694 effects Effects 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000003416 augmentation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/414—Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Computer Graphics (AREA)
- Geometry (AREA)
- Probability & Statistics with Applications (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Character Discrimination (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a method for ticket identification, which relates to the technical field of text detection, text identification and information structured extraction and solves the technical problem that the existing model can not effectively extract structured information; the data are synthesized through the rules of the high-frequency words and the text contents of the specific fields, so that the training data of the text recognition model is expanded, and the accuracy of the recognition model is improved; based on the convolutional neural network, the method has good parallelism, and can utilize a high-performance GPU (Graphics Processing Unit) to accelerate the calculation.
Description
Technical Field
The disclosure relates to the technical field of text detection, text recognition and information structured extraction, in particular to a ticket recognition method.
Background
The ticket identification refers to a technology for identifying images containing text information in different fields such as invoices, identity cards, bank cards and the like which are common in daily life and extracting structured information in the images. Due to the fields of the ticket, the format of the ticket is complicated, and a plurality of difficulties are brought to identification and structured extraction.
The ticket structured recognition task can be subdivided into research tasks in a plurality of fields such as text detection, text recognition and the like. The main method in the current text detection field is to combine a target detection or segmentation algorithm in deep learning with a text detection task, such as EAST, the algorithm adopts an FCN (full Convolutional network) structure commonly used for semantic segmentation, actually regresses text box parameters based on a regression idea, completes the operations of feature extraction and feature fusion by means of an FCN architecture, then an EAST model predicts regression parameters of a group of text lines at each position in an image, and finally extracts the text lines in an input image by using a non-maximum suppression operation. The method greatly simplifies the process of character detection, but the similar methods at present still have the problems of poor detection effect on long texts and poor detection capability on small text areas, which are the more critical problems in ticket identification.
The current methods in the field of text recognition are mainly character recognition and sequence recognition. When character recognition is carried out by using a character recognition method, firstly, single characters need to be segmented from an image, then, single character images are classified by a classifier, and finally, recognition results of text line levels are combined; the text recognition algorithm based on sequence recognition takes the whole text line as the minimum unit of recognition, completes recognition of the whole text sequence in an automatic alignment mode, and simultaneously introduces a Seq2Seq model and an attention mechanism of natural language processing to improve the recognition effect. However, both methods have respective problems, and the character recognition method needs character-level supervision information, so that a large amount of labeling work is needed; the robustness of the sequence recognition-based method is greatly affected by training data, and erroneous recognition is liable to occur for images and similar characters with complicated backgrounds.
Therefore, for the task of ticket structured identification, the current method does not consider the problem of extracting information structuring, and the obtained messy information cannot be directly used for subsequent work, so the above problems are still to be researched and solved.
Disclosure of Invention
The disclosure provides a ticket identification method, which aims to establish a model capable of effectively extracting structured information aiming at the problems of inconsistent image styles, inconsistent form formats, unclear printing and the like in tickets.
The technical purpose of the present disclosure is achieved by the following technical solutions:
a method of ticket recognition, a model training process and a text recognition process, the model training process comprising:
s100: collecting data for text line detection and text image recognition; wherein the data comprises a text line image;
s101: collecting high-frequency words appearing in various ticket scenes, establishing a keyword database through the high-frequency words, counting rules of specific field text contents in the high-frequency words, and randomly generating expansion data according to the high-frequency words and the rules;
s102: training the CTPN network through the text line image to obtain a text line position detection model;
s103: training a recognition network through the data and the expansion data to obtain a text recognition model with a self-attention mechanism;
the text recognition process includes:
s200, inputting the image of the ticket into a text line position detection model, detecting the text line position in the ticket by the text line position detection model, and outputting the text image of which the text line position is detected;
s201: and inputting the text image into a text recognition model for text recognition, recognizing the text through a self-attention mechanism of the text recognition model to obtain a recognition result, and performing structured extraction on the recognition result according to the keyword database to obtain effective information.
The beneficial effect of this disclosure lies in: the invention obtains the text line position detection model by training the CTPN network, thereby positioning the key information in the ticket and having robustness for tickets in various forms (tables and the like); the data are synthesized through the rules of the high-frequency words and the text contents of the specific fields, so that the training data of the text recognition model is expanded, and the accuracy of the recognition model is improved; based on the convolutional neural network, the method has good parallelism, and can utilize a high-performance GPU (Graphics Processing Unit) to accelerate the calculation.
Drawings
FIGS. 1 and 2 are flow charts of model training processes of a method for ticket identification according to the present invention;
FIGS. 3 and 4 are flow charts of text recognition process of a ticket recognition method according to the present invention;
FIG. 5 is a block diagram of a text recognition model;
fig. 6 is a schematic flow chart of text line positioning, text recognition, and structured extraction according to an embodiment of the present invention.
Detailed Description
The technical scheme of the disclosure will be described in detail with reference to the accompanying drawings. In the description of the present disclosure, it is to be understood that the terms "first", "second", and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated, but merely as distinguishing between different components.
Fig. 1 and 2 are flowcharts of a model training process of a method for ticket identification according to the present invention, and as shown in fig. 1 and 2, the model training process includes: s100: collecting data for text line detection and text image recognition; wherein the data comprises a text line image.
Specifically, collecting data for text line detection and text image recognition can obtain a large number of public, high-precision labeled text line detection sets and text image recognition data sets containing multiple languages through searching in the field of text detection and recognition research. And screening out data with a large difference with a ticket identification scene from the collected data set, marking and removing the acquired abnormal data, and using the sorted data for training a CTPN (connecting Text Proposal Network) Network and an identification Network.
S101: collecting high-frequency words appearing in various ticket scenes, establishing a keyword database through the high-frequency words, counting rules of specific field text contents in the high-frequency words, and randomly generating expansion data according to the high-frequency words and the rules.
Specifically, randomly generating augmentation data according to the high-frequency words and the rules includes: (1) and combining the high-frequency words with the word frequency not less than a preset threshold value to generate a text. (2) The text is assembled into a specific format that conforms to the text in the ticket. (3) And randomly selecting a blank or noisy image as a background, and rendering the text conforming to a specific format on the image to obtain an image of the text, namely the expansion data.
The data and the extended data are actually image data, and the CTPN network and the identification network are trained directly by extracting the characteristics of the image data.
S102: and training the CTPN network through the text line image to obtain a text line position detection model.
S103: and training the recognition network through the data and the expansion data to obtain a text recognition model.
Fig. 3 and 4 are flowcharts of a text recognition process of a ticket recognition method according to the present invention, and as shown in fig. 3 and 4, the text recognition process includes: s200, inputting the image of the ticket into a text line position detection model, detecting the text line position in the ticket by the text line position detection model, and outputting the text image of which the text line position is detected.
S201: and inputting the text image into a text recognition model for text recognition, recognizing the text through a self-attention mechanism of the text recognition model to obtain a recognition result, and performing structured extraction on the recognition result according to the keyword database to obtain effective information.
Specifically, the performing structured extraction to obtain effective information includes: and calculating the editing distance between each keyword and the recognition result, generating an editing distance matrix, matching a matched recognition result with the minimum editing distance for each keyword, and determining the position of the keyword in the recognition result according to the matched recognition result to obtain the effective information. When the key words are not matched with the matching identification result, returning a default value; that is, the recognition rate is not 100%, and when a case occurs in which the keyword cannot be matched to the pair recognition result having the minimum edit distance, a default value is returned. The keyword information is obtained by matching the output of the deep neural network through the minimum editing distance, and the reliability of the result is effectively improved.
As a specific embodiment, step S102 includes:
s102-1: the CTPN network comprises a convolutional neural network, an LSTM (Long Short-Term Memory) network and a 1 x 1 convolutional layer which are sequentially connected; each text line comprises at least two text line components, and a plurality of preset anchor boxes with fixed width 16 and different heights are preset in the convolutional neural network and are used for positioning the text line components.
S102-2: and the initial learning rate of the CTPN network training is 0.001, the momentum is 0.9, and the text line image is put into the CTPN network for training.
In the forward propagation process of the CTPN network, firstly, the feature extraction is carried out on the input text line image through a convolutional neural network (such as VGG16), a first feature map with the size of N multiplied by C multiplied by H multiplied by W is obtained, then, obtaining a second feature map with the size of Nx 9 CxHxW by using 3 x 3 convolution at the position, corresponding to each preset anchor frame, on the first feature map, then transforming the dimension of the second feature map into NH xW x 9C, then sending the second feature map with the dimension of NH xW x 9C into the LSTM network to learn the sequence feature of each line in the second feature map, and obtaining a third feature map with the output of NH xW x 256, transforming the dimension of the third feature map into Nx 512 xHxW, and finally putting the third feature map with the dimension of Nx 512 xHxW into a 1 x 1 convolutional layer for convolution to obtain a prediction result; wherein, N represents the number of the text line images processed each time, H represents the height of the text line images, W represents the width of the text line images, and C represents the number of channels of the text line images in the network forward propagation.
S102-3: after the prediction result is obtained, calculating the loss of the CTPN network according to a first loss function, updating the parameters of the CTPN network by using an optimizer SGD (stochastic gradient descent), putting the text row image into the CTPN network with the updated parameters for training, repeating the process repeatedly until the optimal prediction result is obtained, and storing the optimal model parameters corresponding to the optimal prediction result to obtain the text row position detection model;
wherein the first loss function is: loss ═ λv×Lv+λconf×Lconf+λx×LxWherein L isvExpressing the Loss of the ordinate, namely a Loss function Smooth L1Loss between the coordinates and the height of the center point of the preset anchor frame and the coordinates and the height of the center point of the actual anchor frame; l isconfRepresenting confidence loss, namely whether binary cross entropy loss of text line components exists between the confidence of the preset anchor frame and the actual anchor frame; l isxExpressing the offset Loss of the abscissa, namely a Loss function Smooth L1Loss between the offset values of the horizontal coordinate and the width of the text line in the predicted anchor frame and the offset values of the horizontal coordinate and the width of the text line in the actual anchor frame; lambda [ alpha ]v、λconf、λxRepresenting a weight;
the output result of the text line component at each of the preset anchor frame positions comprises: v. ofj、vh、si、xsideWherein v isj、vhRepresenting the coordinates and height, s, of the center point of the pre-set anchor frameiRepresenting confidence, x, of a text line element included in a preset anchor boxsideOffset values representing the lateral coordinates and width of the text line part.
As a specific embodiment, step S103 includes:
s103-1: the identification network comprises a feature extraction network, a feature fusion network, a coding network, a full connection layer and a decoding algorithm which are connected in sequence, and is shown in fig. 5.
S103-2: the initial learning rate of the recognition network is 0.0001, the beta value of an optimizer Adam is (0.9,0.999), and the data and the expansion data are put into the recognition network for training;
in the forward propagation process of the identification network, carrying out feature extraction on the image with the size of H multiplied by W through the feature extraction network to obtain a first feature;
fusing the first feature through the feature fusion network, and sampling the fused first feature to enable the height of the fused first feature to be 1, so as to obtain a second feature;
inputting the second characteristic into the coding network for coding to obtain a coding characteristic;
inputting the coding characteristics into the full-connection layer for decoding to obtain a decoding result;
finally, aligning the decoding results through the decoding algorithm to obtain an identification result;
wherein the Feature extraction network is a Resnet50 network, the Feature fusion network is a FPEM (Feature Pyramid Enhancement Module) network, the coding network is an Encoder network, the decoding algorithm is a CTC (connection temporal Classification) algorithm, and a loss function of the CTC algorithm is a loss functionY represents the decoding result, Y ' represents the correctly labeled recognition result, t represents the sequence length of the coding feature, k represents the alignment function of the CTC network, C: k (C) ═ Y ' represents that all sequences C in the set C can obtain the correctly labeled recognition result Y ' through the CTC algorithm, p represents the probability, and p (C)tY) indicates that a length of t is obtained on the premise of YSequence ctThe probability of (c).
The Resnet50 network is a residual error network for extracting image visual features, the FPEM network is a convolution network for fusing multi-stage image visual features, and the receptive field of the model can be increased by fusing the multi-stage features, so that the accuracy of the model is improved. The Encoder network is a feature coding network based on a self-attention mechanism, and the self-attention mechanism can enable the model to extract effective messages in features more accurately, so that the robustness of the text recognition model is improved. The CTC algorithm is a decoding algorithm of an output sequence, for example, the output sequence is cccaaat, and after being aligned by the CTC algorithm, the output sequence is cat.
The Encoder network is an Encoder part in a model Transformer widely applied to the fields of natural language processing and computer vision, the model part has excellent feature capture performance due to a superimposable encoding module, the encoding module comprises two parts of Multi-Head Attention and Feed Forward, and the Multi-Head Attention part is expressed as follows:
Multi-Head Attention(x)=x+Self-Attention(FC(x),FC(x),FC(x));
wherein the input of Encoder is used as Q, K, V input in Self-Attention module after passing through 3 layers of full connection layer FC, dkFor the input dimension, T represents the matrix transpose; the feedforward part is composed of a layer 1 full link layer FC, a layer 1 Relu activation function and a layer 1 full link layer FC.
S103-3: and after the recognition result is obtained, calculating the loss of the recognition network through a loss function of the CTC algorithm, updating parameters of the recognition network by using an optimizer Adam, inputting the data and the expanded data into the recognition network with the updated parameters for training, repeating the process repeatedly until the optimal recognition result is obtained, and storing the optimal model parameters corresponding to the optimal recognition result to obtain the text recognition model.
Fig. 6 is a schematic flow chart of text line positioning, text recognition, and structured extraction according to an embodiment of the present invention, where a single ticket image is input into a text line position detection model (CTPN model) loaded with optimal parameters to obtain a text line detection result, and redundant text boxes are filtered by a confidence threshold to obtain a text positioning box of a key area on the image.
When text line content is recognized, the height of a text line image is generally adjusted to 32 pixels and then the text line image is sent to the text recognition model for recognition, which specifically comprises the following steps: (1) and scaling the text line image with the original length-width ratio, wherein the height h ' of the scaled image is 32, and the width w ' of the image is w x (h '/h), wherein w and h are the original width and height of the image. (2) And inputting the single image into the text recognition model loaded with the optimal parameters to obtain a recognition vector. (3) And processing the identification vector through a CTC decoding algorithm to obtain a text sequence with the highest confidence coefficient.
Then, structured extraction is carried out to obtain effective information, including: (1) calculating the editing distance between each keyword and the text recognition result, wherein the larger the editing distance is, the lower the matching degree is; (2) generating an edit distance matrix, and finding a pair with the minimum edit distance for each keyword; (3) and determining the position of the keyword in the recognition result according to the pairing to obtain the text content. And finally, extracting the positioned key information, organizing the positioned key information into structured data according to the corresponding type, and outputting the structured data, wherein if the positioned key information is not matched with the key information, the structured data is supplemented by using a default value obtained by statistics.
The foregoing is an exemplary embodiment of the present disclosure, and the scope of the present disclosure is defined by the claims and their equivalents.
Claims (6)
1. A method of ticket recognition, characterized by a model training process and a text recognition process, the model training process comprising:
s100: collecting data for text line detection and text image recognition; wherein the data comprises a text line image;
s101: collecting high-frequency words appearing in various ticket scenes, establishing a keyword database through the high-frequency words, counting rules of specific field text contents in the high-frequency words, and randomly generating expansion data according to the high-frequency words and the rules;
s102: training the CTPN network through the text line image to obtain a text line position detection model;
s103: training a recognition network through the data and the expansion data to obtain a text recognition model with a self-attention mechanism;
the text recognition process includes:
s200, inputting the image of the ticket into a text line position detection model, detecting the text line position in the ticket by the text line position detection model, and outputting the text image of which the text line position is detected;
s201: and inputting the text image into a text recognition model for text recognition, recognizing the text through a self-attention mechanism of the text recognition model to obtain a recognition result, and performing structured extraction on the recognition result according to the keyword database to obtain effective information.
2. The method according to claim 1, wherein the step S101 of randomly generating augmented data according to the high-frequency words and the rules includes:
combining the high-frequency words with the word frequency not less than a preset threshold value to generate a text;
combining the texts into a specific format which accords with the texts in the ticket;
and randomly selecting a blank or noisy image as a background, and rendering the text conforming to a specific format on the image to obtain an image of the text, namely the expansion data.
3. The method according to claim 1, wherein the performing of structured extraction to obtain valid information in step S201 includes:
calculating the editing distance between each keyword and the recognition result, generating an editing distance matrix, matching a matched recognition result with the minimum editing distance for each keyword, and determining the position of the keyword in the recognition result according to the matched recognition result to obtain the effective information;
and when the keyword is not matched with the pairing identification result, returning a default value.
4. A method of ticket identification according to any of claims 1-3, wherein step S102 comprises:
s102-1: the CTPN network comprises a convolutional neural network, an LSTM network and a 1 x 1 convolutional layer which are sequentially connected; each text line comprises at least two text line components, and a plurality of preset anchor boxes with fixed width as 16 and different heights are preset in the convolutional neural network and are used for positioning the text line components;
s102-2: the initial learning rate of the CTPN network training is 0.001, the momentum is 0.9, and the text line image is put into the CTPN network for training;
in the forward propagation process of the CTPN network, firstly, the feature extraction is carried out on the input text line image through the convolutional neural network to obtain a first feature map with the size of N multiplied by C multiplied by H multiplied by W, then, obtaining a second feature map with the size of Nx 9 CxHxW by using 3 x 3 convolution at the position, corresponding to each preset anchor frame, on the first feature map, then transforming the dimension of the second feature map into NH xW x 9C, then sending the second feature map with the dimension of NH xW x 9C into the LSTM network to learn the sequence feature of each line in the second feature map, and obtaining a third feature map with the output of NH xW x 256, transforming the dimension of the third feature map into Nx 512 xHxW, and finally putting the third feature map with the dimension of Nx 512 xHxW into a 1 x 1 convolutional layer for convolution to obtain a prediction result; wherein N represents the number of the text line images processed each time, H represents the height of the text line images, W represents the width of the text line images, and C represents the number of channels of the text line images in network forward propagation;
s102-3: after the prediction result is obtained, calculating the loss of the CTPN network according to a first loss function, updating the parameters of the CTPN network by using an optimizer SGD, putting the text row image into the CTPN network with the updated parameters for training, repeating the process repeatedly until the optimal prediction result is obtained, and storing the optimal model parameters corresponding to the optimal prediction result to obtain the text row position detection model;
wherein the first loss function is: loss ═ λv×Lv+λconf×Lconf+λx×LxWherein L isvExpressing the Loss of the ordinate, namely a Loss function Smooth L1Loss between the coordinates and the height of the center point of the preset anchor frame and the coordinates and the height of the center point of the actual anchor frame; l isconfRepresenting confidence loss, namely whether binary cross entropy loss of text line components exists between the confidence of the preset anchor frame and the actual anchor frame; l isxThe offset Loss of the abscissa, namely a Loss function Smooth L1Loss between the offset values of the horizontal coordinates and the width of the text line in the predicted anchor box and the offset values of the horizontal coordinates and the width of the text line in the actual anchor box; lambda [ alpha ]v、λconf、λxRepresenting a weight;
the output result of the text line component at each of the preset anchor frame positions comprises: v. ofj、vh、si、xsideWherein v isj、vhRepresenting the coordinates and height, s, of the center point of the pre-set anchor frameiRepresenting confidence, x, of a text line element included in a preset anchor boxsideOffset values representing the lateral coordinates and width of the text line part.
5. The method of ticket identification of claim 4, wherein step S103 comprises:
s103-1: the identification network comprises a feature extraction network, a feature fusion network, a coding network, a full connection layer and a decoding algorithm which are connected in sequence;
s103-2: the initial learning rate of the recognition network is 0.0001, the beta value of an optimizer Adam is (0.9,0.999), and the data and the expansion data are put into the recognition network for training;
in the forward propagation process of the identification network, carrying out feature extraction on the image with the size of H multiplied by W through the feature extraction network to obtain a first feature;
fusing the first feature through the feature fusion network, and sampling the fused first feature to enable the height of the fused first feature to be 1, so as to obtain a second feature;
inputting the second characteristic into the coding network for coding to obtain a coding characteristic;
inputting the coding characteristics into the full-connection layer for decoding to obtain a decoding result;
finally, aligning the decoding results through the decoding algorithm to obtain an identification result;
the feature extraction network is a Resnet50 network, the feature fusion network is an FPEM network, the encoding network is an Encoder network, the decoding algorithm is a CTC algorithm, and the loss function of the CTC algorithm isY represents the decoding result, Y ' represents the correctly labeled recognition result, t represents the sequence length of the coding feature, k represents the alignment function of the CTC network, C: k (C) ═ Y ' represents that all sequences C in the set C can obtain the correctly labeled recognition result Y ' through the CTC algorithm, p represents the probability, and p (C)t| Y) denotes the sequence c of length t, which is obtained on the premise of YtThe probability of (d);
s103-3: and after the recognition result is obtained, calculating the loss of the recognition network through a loss function of the CTC algorithm, updating parameters of the recognition network by using an optimizer Adam, inputting the data and the expanded data into the recognition network with the updated parameters for training, repeating the process repeatedly until the optimal recognition result is obtained, and storing the optimal model parameters corresponding to the optimal recognition result to obtain the text recognition model.
6. The method according to claim 5, wherein in step S201, when the text line image is recognized by the text recognition model, the height of the text line image is adjusted to 32 pixels and then the text line image is sent to the text recognition model for recognition.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110265378.4A CN112818951B (en) | 2021-03-11 | 2021-03-11 | Ticket identification method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110265378.4A CN112818951B (en) | 2021-03-11 | 2021-03-11 | Ticket identification method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112818951A true CN112818951A (en) | 2021-05-18 |
CN112818951B CN112818951B (en) | 2023-11-21 |
Family
ID=75863141
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110265378.4A Active CN112818951B (en) | 2021-03-11 | 2021-03-11 | Ticket identification method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112818951B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113255645A (en) * | 2021-05-21 | 2021-08-13 | 北京有竹居网络技术有限公司 | Method, device and equipment for decoding text line picture |
CN113255646A (en) * | 2021-06-02 | 2021-08-13 | 北京理工大学 | Real-time scene text detection method |
CN113298179A (en) * | 2021-06-15 | 2021-08-24 | 南京大学 | Customs commodity abnormal price detection method and device |
CN113591772A (en) * | 2021-08-10 | 2021-11-02 | 上海杉互健康科技有限公司 | Method, system, equipment and storage medium for structured recognition and entry of medical information |
CN113657377A (en) * | 2021-07-22 | 2021-11-16 | 西南财经大学 | Structured recognition method for airplane ticket printing data image |
CN115019327A (en) * | 2022-06-28 | 2022-09-06 | 珠海金智维信息科技有限公司 | Fragment bill recognition method and system based on fragment bill participle and Transformer network |
CN115713777A (en) * | 2023-01-06 | 2023-02-24 | 山东科技大学 | Contract document content identification method |
CN116912852A (en) * | 2023-07-25 | 2023-10-20 | 京东方科技集团股份有限公司 | Method, device and storage medium for identifying text of business card |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108446621A (en) * | 2018-03-14 | 2018-08-24 | 平安科技(深圳)有限公司 | Bank slip recognition method, server and computer readable storage medium |
CN108921166A (en) * | 2018-06-22 | 2018-11-30 | 深源恒际科技有限公司 | Medical bill class text detection recognition method and system based on deep neural network |
CN110097049A (en) * | 2019-04-03 | 2019-08-06 | 中国科学院计算技术研究所 | A kind of natural scene Method for text detection and system |
CN110263694A (en) * | 2019-06-13 | 2019-09-20 | 泰康保险集团股份有限公司 | A kind of bank slip recognition method and device |
CN110399845A (en) * | 2019-07-29 | 2019-11-01 | 上海海事大学 | Continuously at section text detection and recognition methods in a kind of image |
CN110807455A (en) * | 2019-09-19 | 2020-02-18 | 平安科技(深圳)有限公司 | Bill detection method, device and equipment based on deep learning and storage medium |
CN110866495A (en) * | 2019-11-14 | 2020-03-06 | 杭州睿琪软件有限公司 | Bill image recognition method, bill image recognition device, bill image recognition equipment, training method and storage medium |
CN111340034A (en) * | 2020-03-23 | 2020-06-26 | 深圳智能思创科技有限公司 | Text detection and identification method and system for natural scene |
CN111832423A (en) * | 2020-06-19 | 2020-10-27 | 北京邮电大学 | Bill information identification method, device and system |
CN112115934A (en) * | 2020-09-16 | 2020-12-22 | 四川长虹电器股份有限公司 | Bill image text detection method based on deep learning example segmentation |
-
2021
- 2021-03-11 CN CN202110265378.4A patent/CN112818951B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108446621A (en) * | 2018-03-14 | 2018-08-24 | 平安科技(深圳)有限公司 | Bank slip recognition method, server and computer readable storage medium |
WO2019174130A1 (en) * | 2018-03-14 | 2019-09-19 | 平安科技(深圳)有限公司 | Bill recognition method, server, and computer readable storage medium |
CN108921166A (en) * | 2018-06-22 | 2018-11-30 | 深源恒际科技有限公司 | Medical bill class text detection recognition method and system based on deep neural network |
CN110097049A (en) * | 2019-04-03 | 2019-08-06 | 中国科学院计算技术研究所 | A kind of natural scene Method for text detection and system |
CN110263694A (en) * | 2019-06-13 | 2019-09-20 | 泰康保险集团股份有限公司 | A kind of bank slip recognition method and device |
CN110399845A (en) * | 2019-07-29 | 2019-11-01 | 上海海事大学 | Continuously at section text detection and recognition methods in a kind of image |
CN110807455A (en) * | 2019-09-19 | 2020-02-18 | 平安科技(深圳)有限公司 | Bill detection method, device and equipment based on deep learning and storage medium |
CN110866495A (en) * | 2019-11-14 | 2020-03-06 | 杭州睿琪软件有限公司 | Bill image recognition method, bill image recognition device, bill image recognition equipment, training method and storage medium |
CN111340034A (en) * | 2020-03-23 | 2020-06-26 | 深圳智能思创科技有限公司 | Text detection and identification method and system for natural scene |
CN111832423A (en) * | 2020-06-19 | 2020-10-27 | 北京邮电大学 | Bill information identification method, device and system |
CN112115934A (en) * | 2020-09-16 | 2020-12-22 | 四川长虹电器股份有限公司 | Bill image text detection method based on deep learning example segmentation |
Non-Patent Citations (5)
Title |
---|
FUKANG TIAN等: "financial ticket intelligent recognition system based on deep learning", ARXIV, pages 1 - 15 * |
XIUXIN CHEN等: "ticket text detection and recognition based on deep learning", 2019 CHINESE AUTOMATION CONGRESS, pages 1 - 5 * |
潘炜;刘丰威;: "基于深度学习的表格类型工单识别设计与实现", 数字技术与应用, vol. 38, no. 07, pages 150 - 152 * |
王建新;王子亚;田萱;: "基于深度学习的自然场景文本检测与识别综述", 软件学报, vol. 31, no. 05, pages 1465 - 1496 * |
陈淼妙;续晋华;: "基于高分辨率卷积神经网络的场景文本检测模型", 计算机应用与软件, vol. 37, no. 10, pages 138 - 144 * |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113255645B (en) * | 2021-05-21 | 2024-04-23 | 北京有竹居网络技术有限公司 | Text line picture decoding method, device and equipment |
CN113255645A (en) * | 2021-05-21 | 2021-08-13 | 北京有竹居网络技术有限公司 | Method, device and equipment for decoding text line picture |
CN113255646B (en) * | 2021-06-02 | 2022-10-18 | 北京理工大学 | Real-time scene text detection method |
CN113255646A (en) * | 2021-06-02 | 2021-08-13 | 北京理工大学 | Real-time scene text detection method |
CN113298179A (en) * | 2021-06-15 | 2021-08-24 | 南京大学 | Customs commodity abnormal price detection method and device |
CN113298179B (en) * | 2021-06-15 | 2024-05-28 | 南京大学 | Customs commodity abnormal price detection method and device |
CN113657377B (en) * | 2021-07-22 | 2023-11-14 | 西南财经大学 | Structured recognition method for mechanical bill image |
CN113657377A (en) * | 2021-07-22 | 2021-11-16 | 西南财经大学 | Structured recognition method for airplane ticket printing data image |
CN113591772B (en) * | 2021-08-10 | 2024-01-19 | 上海杉互健康科技有限公司 | Method, system, equipment and storage medium for structured identification and input of medical information |
CN113591772A (en) * | 2021-08-10 | 2021-11-02 | 上海杉互健康科技有限公司 | Method, system, equipment and storage medium for structured recognition and entry of medical information |
CN115019327A (en) * | 2022-06-28 | 2022-09-06 | 珠海金智维信息科技有限公司 | Fragment bill recognition method and system based on fragment bill participle and Transformer network |
CN115019327B (en) * | 2022-06-28 | 2024-03-08 | 珠海金智维信息科技有限公司 | Fragment bill recognition method and system based on fragment bill segmentation and Transformer network |
CN115713777A (en) * | 2023-01-06 | 2023-02-24 | 山东科技大学 | Contract document content identification method |
CN116912852A (en) * | 2023-07-25 | 2023-10-20 | 京东方科技集团股份有限公司 | Method, device and storage medium for identifying text of business card |
Also Published As
Publication number | Publication date |
---|---|
CN112818951B (en) | 2023-11-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112818951B (en) | Ticket identification method | |
CN110334705B (en) | Language identification method of scene text image combining global and local information | |
CN111027562A (en) | Optical character recognition method based on multi-scale CNN and RNN combined with attention mechanism | |
CN114155527A (en) | Scene text recognition method and device | |
CN110689012A (en) | End-to-end natural scene text recognition method and system | |
CN113537227B (en) | Structured text recognition method and system | |
Malik et al. | An efficient segmentation technique for Urdu optical character recognizer (OCR) | |
CN115862045B (en) | Case automatic identification method, system, equipment and storage medium based on image-text identification technology | |
CN114677687A (en) | ViT and convolutional neural network fused writing brush font type rapid identification method | |
CN111680684B (en) | Spine text recognition method, device and storage medium based on deep learning | |
CN116311310A (en) | Universal form identification method and device combining semantic segmentation and sequence prediction | |
Zhou et al. | Learning-based scientific chart recognition | |
CN114187595A (en) | Document layout recognition method and system based on fusion of visual features and semantic features | |
CN115116074A (en) | Handwritten character recognition and model training method and device | |
Tang et al. | HRCenterNet: An anchorless approach to Chinese character segmentation in historical documents | |
Tayyab et al. | Recognition of Visual Arabic Scripting News Ticker From Broadcast Stream | |
CN111832497B (en) | Text detection post-processing method based on geometric features | |
CN116524521B (en) | English character recognition method and system based on deep learning | |
CN111242114B (en) | Character recognition method and device | |
CN111507348A (en) | Character segmentation and identification method based on CTC deep neural network | |
CN110929013A (en) | Image question-answer implementation method based on bottom-up entry and positioning information fusion | |
CN113159071B (en) | Cross-modal image-text association anomaly detection method | |
CN116110047A (en) | Method and system for constructing structured electronic medical record based on OCR-NER | |
CN112329389B (en) | Chinese character stroke automatic extraction method based on semantic segmentation and tabu search | |
CN113903043A (en) | Method for identifying printed Chinese character font based on twin metric model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |