CN112818951A - Ticket identification method - Google Patents

Ticket identification method Download PDF

Info

Publication number
CN112818951A
CN112818951A CN202110265378.4A CN202110265378A CN112818951A CN 112818951 A CN112818951 A CN 112818951A CN 202110265378 A CN202110265378 A CN 202110265378A CN 112818951 A CN112818951 A CN 112818951A
Authority
CN
China
Prior art keywords
text
network
recognition
text line
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110265378.4A
Other languages
Chinese (zh)
Other versions
CN112818951B (en
Inventor
路通
黄智衡
朱立平
易欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202110265378.4A priority Critical patent/CN112818951B/en
Publication of CN112818951A publication Critical patent/CN112818951A/en
Application granted granted Critical
Publication of CN112818951B publication Critical patent/CN112818951B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Probability & Statistics with Applications (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Character Discrimination (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method for ticket identification, which relates to the technical field of text detection, text identification and information structured extraction and solves the technical problem that the existing model can not effectively extract structured information; the data are synthesized through the rules of the high-frequency words and the text contents of the specific fields, so that the training data of the text recognition model is expanded, and the accuracy of the recognition model is improved; based on the convolutional neural network, the method has good parallelism, and can utilize a high-performance GPU (Graphics Processing Unit) to accelerate the calculation.

Description

Ticket identification method
Technical Field
The disclosure relates to the technical field of text detection, text recognition and information structured extraction, in particular to a ticket recognition method.
Background
The ticket identification refers to a technology for identifying images containing text information in different fields such as invoices, identity cards, bank cards and the like which are common in daily life and extracting structured information in the images. Due to the fields of the ticket, the format of the ticket is complicated, and a plurality of difficulties are brought to identification and structured extraction.
The ticket structured recognition task can be subdivided into research tasks in a plurality of fields such as text detection, text recognition and the like. The main method in the current text detection field is to combine a target detection or segmentation algorithm in deep learning with a text detection task, such as EAST, the algorithm adopts an FCN (full Convolutional network) structure commonly used for semantic segmentation, actually regresses text box parameters based on a regression idea, completes the operations of feature extraction and feature fusion by means of an FCN architecture, then an EAST model predicts regression parameters of a group of text lines at each position in an image, and finally extracts the text lines in an input image by using a non-maximum suppression operation. The method greatly simplifies the process of character detection, but the similar methods at present still have the problems of poor detection effect on long texts and poor detection capability on small text areas, which are the more critical problems in ticket identification.
The current methods in the field of text recognition are mainly character recognition and sequence recognition. When character recognition is carried out by using a character recognition method, firstly, single characters need to be segmented from an image, then, single character images are classified by a classifier, and finally, recognition results of text line levels are combined; the text recognition algorithm based on sequence recognition takes the whole text line as the minimum unit of recognition, completes recognition of the whole text sequence in an automatic alignment mode, and simultaneously introduces a Seq2Seq model and an attention mechanism of natural language processing to improve the recognition effect. However, both methods have respective problems, and the character recognition method needs character-level supervision information, so that a large amount of labeling work is needed; the robustness of the sequence recognition-based method is greatly affected by training data, and erroneous recognition is liable to occur for images and similar characters with complicated backgrounds.
Therefore, for the task of ticket structured identification, the current method does not consider the problem of extracting information structuring, and the obtained messy information cannot be directly used for subsequent work, so the above problems are still to be researched and solved.
Disclosure of Invention
The disclosure provides a ticket identification method, which aims to establish a model capable of effectively extracting structured information aiming at the problems of inconsistent image styles, inconsistent form formats, unclear printing and the like in tickets.
The technical purpose of the present disclosure is achieved by the following technical solutions:
a method of ticket recognition, a model training process and a text recognition process, the model training process comprising:
s100: collecting data for text line detection and text image recognition; wherein the data comprises a text line image;
s101: collecting high-frequency words appearing in various ticket scenes, establishing a keyword database through the high-frequency words, counting rules of specific field text contents in the high-frequency words, and randomly generating expansion data according to the high-frequency words and the rules;
s102: training the CTPN network through the text line image to obtain a text line position detection model;
s103: training a recognition network through the data and the expansion data to obtain a text recognition model with a self-attention mechanism;
the text recognition process includes:
s200, inputting the image of the ticket into a text line position detection model, detecting the text line position in the ticket by the text line position detection model, and outputting the text image of which the text line position is detected;
s201: and inputting the text image into a text recognition model for text recognition, recognizing the text through a self-attention mechanism of the text recognition model to obtain a recognition result, and performing structured extraction on the recognition result according to the keyword database to obtain effective information.
The beneficial effect of this disclosure lies in: the invention obtains the text line position detection model by training the CTPN network, thereby positioning the key information in the ticket and having robustness for tickets in various forms (tables and the like); the data are synthesized through the rules of the high-frequency words and the text contents of the specific fields, so that the training data of the text recognition model is expanded, and the accuracy of the recognition model is improved; based on the convolutional neural network, the method has good parallelism, and can utilize a high-performance GPU (Graphics Processing Unit) to accelerate the calculation.
Drawings
FIGS. 1 and 2 are flow charts of model training processes of a method for ticket identification according to the present invention;
FIGS. 3 and 4 are flow charts of text recognition process of a ticket recognition method according to the present invention;
FIG. 5 is a block diagram of a text recognition model;
fig. 6 is a schematic flow chart of text line positioning, text recognition, and structured extraction according to an embodiment of the present invention.
Detailed Description
The technical scheme of the disclosure will be described in detail with reference to the accompanying drawings. In the description of the present disclosure, it is to be understood that the terms "first", "second", and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated, but merely as distinguishing between different components.
Fig. 1 and 2 are flowcharts of a model training process of a method for ticket identification according to the present invention, and as shown in fig. 1 and 2, the model training process includes: s100: collecting data for text line detection and text image recognition; wherein the data comprises a text line image.
Specifically, collecting data for text line detection and text image recognition can obtain a large number of public, high-precision labeled text line detection sets and text image recognition data sets containing multiple languages through searching in the field of text detection and recognition research. And screening out data with a large difference with a ticket identification scene from the collected data set, marking and removing the acquired abnormal data, and using the sorted data for training a CTPN (connecting Text Proposal Network) Network and an identification Network.
S101: collecting high-frequency words appearing in various ticket scenes, establishing a keyword database through the high-frequency words, counting rules of specific field text contents in the high-frequency words, and randomly generating expansion data according to the high-frequency words and the rules.
Specifically, randomly generating augmentation data according to the high-frequency words and the rules includes: (1) and combining the high-frequency words with the word frequency not less than a preset threshold value to generate a text. (2) The text is assembled into a specific format that conforms to the text in the ticket. (3) And randomly selecting a blank or noisy image as a background, and rendering the text conforming to a specific format on the image to obtain an image of the text, namely the expansion data.
The data and the extended data are actually image data, and the CTPN network and the identification network are trained directly by extracting the characteristics of the image data.
S102: and training the CTPN network through the text line image to obtain a text line position detection model.
S103: and training the recognition network through the data and the expansion data to obtain a text recognition model.
Fig. 3 and 4 are flowcharts of a text recognition process of a ticket recognition method according to the present invention, and as shown in fig. 3 and 4, the text recognition process includes: s200, inputting the image of the ticket into a text line position detection model, detecting the text line position in the ticket by the text line position detection model, and outputting the text image of which the text line position is detected.
S201: and inputting the text image into a text recognition model for text recognition, recognizing the text through a self-attention mechanism of the text recognition model to obtain a recognition result, and performing structured extraction on the recognition result according to the keyword database to obtain effective information.
Specifically, the performing structured extraction to obtain effective information includes: and calculating the editing distance between each keyword and the recognition result, generating an editing distance matrix, matching a matched recognition result with the minimum editing distance for each keyword, and determining the position of the keyword in the recognition result according to the matched recognition result to obtain the effective information. When the key words are not matched with the matching identification result, returning a default value; that is, the recognition rate is not 100%, and when a case occurs in which the keyword cannot be matched to the pair recognition result having the minimum edit distance, a default value is returned. The keyword information is obtained by matching the output of the deep neural network through the minimum editing distance, and the reliability of the result is effectively improved.
As a specific embodiment, step S102 includes:
s102-1: the CTPN network comprises a convolutional neural network, an LSTM (Long Short-Term Memory) network and a 1 x 1 convolutional layer which are sequentially connected; each text line comprises at least two text line components, and a plurality of preset anchor boxes with fixed width 16 and different heights are preset in the convolutional neural network and are used for positioning the text line components.
S102-2: and the initial learning rate of the CTPN network training is 0.001, the momentum is 0.9, and the text line image is put into the CTPN network for training.
In the forward propagation process of the CTPN network, firstly, the feature extraction is carried out on the input text line image through a convolutional neural network (such as VGG16), a first feature map with the size of N multiplied by C multiplied by H multiplied by W is obtained, then, obtaining a second feature map with the size of Nx 9 CxHxW by using 3 x 3 convolution at the position, corresponding to each preset anchor frame, on the first feature map, then transforming the dimension of the second feature map into NH xW x 9C, then sending the second feature map with the dimension of NH xW x 9C into the LSTM network to learn the sequence feature of each line in the second feature map, and obtaining a third feature map with the output of NH xW x 256, transforming the dimension of the third feature map into Nx 512 xHxW, and finally putting the third feature map with the dimension of Nx 512 xHxW into a 1 x 1 convolutional layer for convolution to obtain a prediction result; wherein, N represents the number of the text line images processed each time, H represents the height of the text line images, W represents the width of the text line images, and C represents the number of channels of the text line images in the network forward propagation.
S102-3: after the prediction result is obtained, calculating the loss of the CTPN network according to a first loss function, updating the parameters of the CTPN network by using an optimizer SGD (stochastic gradient descent), putting the text row image into the CTPN network with the updated parameters for training, repeating the process repeatedly until the optimal prediction result is obtained, and storing the optimal model parameters corresponding to the optimal prediction result to obtain the text row position detection model;
wherein the first loss function is: loss ═ λv×Lvconf×Lconfx×LxWherein L isvExpressing the Loss of the ordinate, namely a Loss function Smooth L1Loss between the coordinates and the height of the center point of the preset anchor frame and the coordinates and the height of the center point of the actual anchor frame; l isconfRepresenting confidence loss, namely whether binary cross entropy loss of text line components exists between the confidence of the preset anchor frame and the actual anchor frame; l isxExpressing the offset Loss of the abscissa, namely a Loss function Smooth L1Loss between the offset values of the horizontal coordinate and the width of the text line in the predicted anchor frame and the offset values of the horizontal coordinate and the width of the text line in the actual anchor frame; lambda [ alpha ]v、λconf、λxRepresenting a weight;
the output result of the text line component at each of the preset anchor frame positions comprises: v. ofj、vh、si、xsideWherein v isj、vhRepresenting the coordinates and height, s, of the center point of the pre-set anchor frameiRepresenting confidence, x, of a text line element included in a preset anchor boxsideOffset values representing the lateral coordinates and width of the text line part.
As a specific embodiment, step S103 includes:
s103-1: the identification network comprises a feature extraction network, a feature fusion network, a coding network, a full connection layer and a decoding algorithm which are connected in sequence, and is shown in fig. 5.
S103-2: the initial learning rate of the recognition network is 0.0001, the beta value of an optimizer Adam is (0.9,0.999), and the data and the expansion data are put into the recognition network for training;
in the forward propagation process of the identification network, carrying out feature extraction on the image with the size of H multiplied by W through the feature extraction network to obtain a first feature;
fusing the first feature through the feature fusion network, and sampling the fused first feature to enable the height of the fused first feature to be 1, so as to obtain a second feature;
inputting the second characteristic into the coding network for coding to obtain a coding characteristic;
inputting the coding characteristics into the full-connection layer for decoding to obtain a decoding result;
finally, aligning the decoding results through the decoding algorithm to obtain an identification result;
wherein the Feature extraction network is a Resnet50 network, the Feature fusion network is a FPEM (Feature Pyramid Enhancement Module) network, the coding network is an Encoder network, the decoding algorithm is a CTC (connection temporal Classification) algorithm, and a loss function of the CTC algorithm is a loss function
Figure BDA0002971416980000041
Y represents the decoding result, Y ' represents the correctly labeled recognition result, t represents the sequence length of the coding feature, k represents the alignment function of the CTC network, C: k (C) ═ Y ' represents that all sequences C in the set C can obtain the correctly labeled recognition result Y ' through the CTC algorithm, p represents the probability, and p (C)tY) indicates that a length of t is obtained on the premise of YSequence ctThe probability of (c).
The Resnet50 network is a residual error network for extracting image visual features, the FPEM network is a convolution network for fusing multi-stage image visual features, and the receptive field of the model can be increased by fusing the multi-stage features, so that the accuracy of the model is improved. The Encoder network is a feature coding network based on a self-attention mechanism, and the self-attention mechanism can enable the model to extract effective messages in features more accurately, so that the robustness of the text recognition model is improved. The CTC algorithm is a decoding algorithm of an output sequence, for example, the output sequence is cccaaat, and after being aligned by the CTC algorithm, the output sequence is cat.
The Encoder network is an Encoder part in a model Transformer widely applied to the fields of natural language processing and computer vision, the model part has excellent feature capture performance due to a superimposable encoding module, the encoding module comprises two parts of Multi-Head Attention and Feed Forward, and the Multi-Head Attention part is expressed as follows:
Multi-Head Attention(x)=x+Self-Attention(FC(x),FC(x),FC(x));
Figure BDA0002971416980000051
wherein the input of Encoder is used as Q, K, V input in Self-Attention module after passing through 3 layers of full connection layer FC, dkFor the input dimension, T represents the matrix transpose; the feedforward part is composed of a layer 1 full link layer FC, a layer 1 Relu activation function and a layer 1 full link layer FC.
S103-3: and after the recognition result is obtained, calculating the loss of the recognition network through a loss function of the CTC algorithm, updating parameters of the recognition network by using an optimizer Adam, inputting the data and the expanded data into the recognition network with the updated parameters for training, repeating the process repeatedly until the optimal recognition result is obtained, and storing the optimal model parameters corresponding to the optimal recognition result to obtain the text recognition model.
Fig. 6 is a schematic flow chart of text line positioning, text recognition, and structured extraction according to an embodiment of the present invention, where a single ticket image is input into a text line position detection model (CTPN model) loaded with optimal parameters to obtain a text line detection result, and redundant text boxes are filtered by a confidence threshold to obtain a text positioning box of a key area on the image.
When text line content is recognized, the height of a text line image is generally adjusted to 32 pixels and then the text line image is sent to the text recognition model for recognition, which specifically comprises the following steps: (1) and scaling the text line image with the original length-width ratio, wherein the height h ' of the scaled image is 32, and the width w ' of the image is w x (h '/h), wherein w and h are the original width and height of the image. (2) And inputting the single image into the text recognition model loaded with the optimal parameters to obtain a recognition vector. (3) And processing the identification vector through a CTC decoding algorithm to obtain a text sequence with the highest confidence coefficient.
Then, structured extraction is carried out to obtain effective information, including: (1) calculating the editing distance between each keyword and the text recognition result, wherein the larger the editing distance is, the lower the matching degree is; (2) generating an edit distance matrix, and finding a pair with the minimum edit distance for each keyword; (3) and determining the position of the keyword in the recognition result according to the pairing to obtain the text content. And finally, extracting the positioned key information, organizing the positioned key information into structured data according to the corresponding type, and outputting the structured data, wherein if the positioned key information is not matched with the key information, the structured data is supplemented by using a default value obtained by statistics.
The foregoing is an exemplary embodiment of the present disclosure, and the scope of the present disclosure is defined by the claims and their equivalents.

Claims (6)

1. A method of ticket recognition, characterized by a model training process and a text recognition process, the model training process comprising:
s100: collecting data for text line detection and text image recognition; wherein the data comprises a text line image;
s101: collecting high-frequency words appearing in various ticket scenes, establishing a keyword database through the high-frequency words, counting rules of specific field text contents in the high-frequency words, and randomly generating expansion data according to the high-frequency words and the rules;
s102: training the CTPN network through the text line image to obtain a text line position detection model;
s103: training a recognition network through the data and the expansion data to obtain a text recognition model with a self-attention mechanism;
the text recognition process includes:
s200, inputting the image of the ticket into a text line position detection model, detecting the text line position in the ticket by the text line position detection model, and outputting the text image of which the text line position is detected;
s201: and inputting the text image into a text recognition model for text recognition, recognizing the text through a self-attention mechanism of the text recognition model to obtain a recognition result, and performing structured extraction on the recognition result according to the keyword database to obtain effective information.
2. The method according to claim 1, wherein the step S101 of randomly generating augmented data according to the high-frequency words and the rules includes:
combining the high-frequency words with the word frequency not less than a preset threshold value to generate a text;
combining the texts into a specific format which accords with the texts in the ticket;
and randomly selecting a blank or noisy image as a background, and rendering the text conforming to a specific format on the image to obtain an image of the text, namely the expansion data.
3. The method according to claim 1, wherein the performing of structured extraction to obtain valid information in step S201 includes:
calculating the editing distance between each keyword and the recognition result, generating an editing distance matrix, matching a matched recognition result with the minimum editing distance for each keyword, and determining the position of the keyword in the recognition result according to the matched recognition result to obtain the effective information;
and when the keyword is not matched with the pairing identification result, returning a default value.
4. A method of ticket identification according to any of claims 1-3, wherein step S102 comprises:
s102-1: the CTPN network comprises a convolutional neural network, an LSTM network and a 1 x 1 convolutional layer which are sequentially connected; each text line comprises at least two text line components, and a plurality of preset anchor boxes with fixed width as 16 and different heights are preset in the convolutional neural network and are used for positioning the text line components;
s102-2: the initial learning rate of the CTPN network training is 0.001, the momentum is 0.9, and the text line image is put into the CTPN network for training;
in the forward propagation process of the CTPN network, firstly, the feature extraction is carried out on the input text line image through the convolutional neural network to obtain a first feature map with the size of N multiplied by C multiplied by H multiplied by W, then, obtaining a second feature map with the size of Nx 9 CxHxW by using 3 x 3 convolution at the position, corresponding to each preset anchor frame, on the first feature map, then transforming the dimension of the second feature map into NH xW x 9C, then sending the second feature map with the dimension of NH xW x 9C into the LSTM network to learn the sequence feature of each line in the second feature map, and obtaining a third feature map with the output of NH xW x 256, transforming the dimension of the third feature map into Nx 512 xHxW, and finally putting the third feature map with the dimension of Nx 512 xHxW into a 1 x 1 convolutional layer for convolution to obtain a prediction result; wherein N represents the number of the text line images processed each time, H represents the height of the text line images, W represents the width of the text line images, and C represents the number of channels of the text line images in network forward propagation;
s102-3: after the prediction result is obtained, calculating the loss of the CTPN network according to a first loss function, updating the parameters of the CTPN network by using an optimizer SGD, putting the text row image into the CTPN network with the updated parameters for training, repeating the process repeatedly until the optimal prediction result is obtained, and storing the optimal model parameters corresponding to the optimal prediction result to obtain the text row position detection model;
wherein the first loss function is: loss ═ λv×Lvconf×Lconfx×LxWherein L isvExpressing the Loss of the ordinate, namely a Loss function Smooth L1Loss between the coordinates and the height of the center point of the preset anchor frame and the coordinates and the height of the center point of the actual anchor frame; l isconfRepresenting confidence loss, namely whether binary cross entropy loss of text line components exists between the confidence of the preset anchor frame and the actual anchor frame; l isxThe offset Loss of the abscissa, namely a Loss function Smooth L1Loss between the offset values of the horizontal coordinates and the width of the text line in the predicted anchor box and the offset values of the horizontal coordinates and the width of the text line in the actual anchor box; lambda [ alpha ]v、λconf、λxRepresenting a weight;
the output result of the text line component at each of the preset anchor frame positions comprises: v. ofj、vh、si、xsideWherein v isj、vhRepresenting the coordinates and height, s, of the center point of the pre-set anchor frameiRepresenting confidence, x, of a text line element included in a preset anchor boxsideOffset values representing the lateral coordinates and width of the text line part.
5. The method of ticket identification of claim 4, wherein step S103 comprises:
s103-1: the identification network comprises a feature extraction network, a feature fusion network, a coding network, a full connection layer and a decoding algorithm which are connected in sequence;
s103-2: the initial learning rate of the recognition network is 0.0001, the beta value of an optimizer Adam is (0.9,0.999), and the data and the expansion data are put into the recognition network for training;
in the forward propagation process of the identification network, carrying out feature extraction on the image with the size of H multiplied by W through the feature extraction network to obtain a first feature;
fusing the first feature through the feature fusion network, and sampling the fused first feature to enable the height of the fused first feature to be 1, so as to obtain a second feature;
inputting the second characteristic into the coding network for coding to obtain a coding characteristic;
inputting the coding characteristics into the full-connection layer for decoding to obtain a decoding result;
finally, aligning the decoding results through the decoding algorithm to obtain an identification result;
the feature extraction network is a Resnet50 network, the feature fusion network is an FPEM network, the encoding network is an Encoder network, the decoding algorithm is a CTC algorithm, and the loss function of the CTC algorithm is
Figure FDA0002971416970000021
Y represents the decoding result, Y ' represents the correctly labeled recognition result, t represents the sequence length of the coding feature, k represents the alignment function of the CTC network, C: k (C) ═ Y ' represents that all sequences C in the set C can obtain the correctly labeled recognition result Y ' through the CTC algorithm, p represents the probability, and p (C)t| Y) denotes the sequence c of length t, which is obtained on the premise of YtThe probability of (d);
s103-3: and after the recognition result is obtained, calculating the loss of the recognition network through a loss function of the CTC algorithm, updating parameters of the recognition network by using an optimizer Adam, inputting the data and the expanded data into the recognition network with the updated parameters for training, repeating the process repeatedly until the optimal recognition result is obtained, and storing the optimal model parameters corresponding to the optimal recognition result to obtain the text recognition model.
6. The method according to claim 5, wherein in step S201, when the text line image is recognized by the text recognition model, the height of the text line image is adjusted to 32 pixels and then the text line image is sent to the text recognition model for recognition.
CN202110265378.4A 2021-03-11 2021-03-11 Ticket identification method Active CN112818951B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110265378.4A CN112818951B (en) 2021-03-11 2021-03-11 Ticket identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110265378.4A CN112818951B (en) 2021-03-11 2021-03-11 Ticket identification method

Publications (2)

Publication Number Publication Date
CN112818951A true CN112818951A (en) 2021-05-18
CN112818951B CN112818951B (en) 2023-11-21

Family

ID=75863141

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110265378.4A Active CN112818951B (en) 2021-03-11 2021-03-11 Ticket identification method

Country Status (1)

Country Link
CN (1) CN112818951B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113255645A (en) * 2021-05-21 2021-08-13 北京有竹居网络技术有限公司 Method, device and equipment for decoding text line picture
CN113255646A (en) * 2021-06-02 2021-08-13 北京理工大学 Real-time scene text detection method
CN113298179A (en) * 2021-06-15 2021-08-24 南京大学 Customs commodity abnormal price detection method and device
CN113591772A (en) * 2021-08-10 2021-11-02 上海杉互健康科技有限公司 Method, system, equipment and storage medium for structured recognition and entry of medical information
CN113657377A (en) * 2021-07-22 2021-11-16 西南财经大学 Structured recognition method for airplane ticket printing data image
CN115019327A (en) * 2022-06-28 2022-09-06 珠海金智维信息科技有限公司 Fragment bill recognition method and system based on fragment bill participle and Transformer network
CN115713777A (en) * 2023-01-06 2023-02-24 山东科技大学 Contract document content identification method
CN116912852A (en) * 2023-07-25 2023-10-20 京东方科技集团股份有限公司 Method, device and storage medium for identifying text of business card

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108446621A (en) * 2018-03-14 2018-08-24 平安科技(深圳)有限公司 Bank slip recognition method, server and computer readable storage medium
CN108921166A (en) * 2018-06-22 2018-11-30 深源恒际科技有限公司 Medical bill class text detection recognition method and system based on deep neural network
CN110097049A (en) * 2019-04-03 2019-08-06 中国科学院计算技术研究所 A kind of natural scene Method for text detection and system
CN110263694A (en) * 2019-06-13 2019-09-20 泰康保险集团股份有限公司 A kind of bank slip recognition method and device
CN110399845A (en) * 2019-07-29 2019-11-01 上海海事大学 Continuously at section text detection and recognition methods in a kind of image
CN110807455A (en) * 2019-09-19 2020-02-18 平安科技(深圳)有限公司 Bill detection method, device and equipment based on deep learning and storage medium
CN110866495A (en) * 2019-11-14 2020-03-06 杭州睿琪软件有限公司 Bill image recognition method, bill image recognition device, bill image recognition equipment, training method and storage medium
CN111340034A (en) * 2020-03-23 2020-06-26 深圳智能思创科技有限公司 Text detection and identification method and system for natural scene
CN111832423A (en) * 2020-06-19 2020-10-27 北京邮电大学 Bill information identification method, device and system
CN112115934A (en) * 2020-09-16 2020-12-22 四川长虹电器股份有限公司 Bill image text detection method based on deep learning example segmentation

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108446621A (en) * 2018-03-14 2018-08-24 平安科技(深圳)有限公司 Bank slip recognition method, server and computer readable storage medium
WO2019174130A1 (en) * 2018-03-14 2019-09-19 平安科技(深圳)有限公司 Bill recognition method, server, and computer readable storage medium
CN108921166A (en) * 2018-06-22 2018-11-30 深源恒际科技有限公司 Medical bill class text detection recognition method and system based on deep neural network
CN110097049A (en) * 2019-04-03 2019-08-06 中国科学院计算技术研究所 A kind of natural scene Method for text detection and system
CN110263694A (en) * 2019-06-13 2019-09-20 泰康保险集团股份有限公司 A kind of bank slip recognition method and device
CN110399845A (en) * 2019-07-29 2019-11-01 上海海事大学 Continuously at section text detection and recognition methods in a kind of image
CN110807455A (en) * 2019-09-19 2020-02-18 平安科技(深圳)有限公司 Bill detection method, device and equipment based on deep learning and storage medium
CN110866495A (en) * 2019-11-14 2020-03-06 杭州睿琪软件有限公司 Bill image recognition method, bill image recognition device, bill image recognition equipment, training method and storage medium
CN111340034A (en) * 2020-03-23 2020-06-26 深圳智能思创科技有限公司 Text detection and identification method and system for natural scene
CN111832423A (en) * 2020-06-19 2020-10-27 北京邮电大学 Bill information identification method, device and system
CN112115934A (en) * 2020-09-16 2020-12-22 四川长虹电器股份有限公司 Bill image text detection method based on deep learning example segmentation

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
FUKANG TIAN等: "financial ticket intelligent recognition system based on deep learning", ARXIV, pages 1 - 15 *
XIUXIN CHEN等: "ticket text detection and recognition based on deep learning", 2019 CHINESE AUTOMATION CONGRESS, pages 1 - 5 *
潘炜;刘丰威;: "基于深度学习的表格类型工单识别设计与实现", 数字技术与应用, vol. 38, no. 07, pages 150 - 152 *
王建新;王子亚;田萱;: "基于深度学习的自然场景文本检测与识别综述", 软件学报, vol. 31, no. 05, pages 1465 - 1496 *
陈淼妙;续晋华;: "基于高分辨率卷积神经网络的场景文本检测模型", 计算机应用与软件, vol. 37, no. 10, pages 138 - 144 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113255645B (en) * 2021-05-21 2024-04-23 北京有竹居网络技术有限公司 Text line picture decoding method, device and equipment
CN113255645A (en) * 2021-05-21 2021-08-13 北京有竹居网络技术有限公司 Method, device and equipment for decoding text line picture
CN113255646B (en) * 2021-06-02 2022-10-18 北京理工大学 Real-time scene text detection method
CN113255646A (en) * 2021-06-02 2021-08-13 北京理工大学 Real-time scene text detection method
CN113298179A (en) * 2021-06-15 2021-08-24 南京大学 Customs commodity abnormal price detection method and device
CN113298179B (en) * 2021-06-15 2024-05-28 南京大学 Customs commodity abnormal price detection method and device
CN113657377B (en) * 2021-07-22 2023-11-14 西南财经大学 Structured recognition method for mechanical bill image
CN113657377A (en) * 2021-07-22 2021-11-16 西南财经大学 Structured recognition method for airplane ticket printing data image
CN113591772B (en) * 2021-08-10 2024-01-19 上海杉互健康科技有限公司 Method, system, equipment and storage medium for structured identification and input of medical information
CN113591772A (en) * 2021-08-10 2021-11-02 上海杉互健康科技有限公司 Method, system, equipment and storage medium for structured recognition and entry of medical information
CN115019327A (en) * 2022-06-28 2022-09-06 珠海金智维信息科技有限公司 Fragment bill recognition method and system based on fragment bill participle and Transformer network
CN115019327B (en) * 2022-06-28 2024-03-08 珠海金智维信息科技有限公司 Fragment bill recognition method and system based on fragment bill segmentation and Transformer network
CN115713777A (en) * 2023-01-06 2023-02-24 山东科技大学 Contract document content identification method
CN116912852A (en) * 2023-07-25 2023-10-20 京东方科技集团股份有限公司 Method, device and storage medium for identifying text of business card

Also Published As

Publication number Publication date
CN112818951B (en) 2023-11-21

Similar Documents

Publication Publication Date Title
CN112818951B (en) Ticket identification method
CN110334705B (en) Language identification method of scene text image combining global and local information
CN111027562A (en) Optical character recognition method based on multi-scale CNN and RNN combined with attention mechanism
CN114155527A (en) Scene text recognition method and device
CN110689012A (en) End-to-end natural scene text recognition method and system
CN113537227B (en) Structured text recognition method and system
Malik et al. An efficient segmentation technique for Urdu optical character recognizer (OCR)
CN115862045B (en) Case automatic identification method, system, equipment and storage medium based on image-text identification technology
CN114677687A (en) ViT and convolutional neural network fused writing brush font type rapid identification method
CN111680684B (en) Spine text recognition method, device and storage medium based on deep learning
CN116311310A (en) Universal form identification method and device combining semantic segmentation and sequence prediction
Zhou et al. Learning-based scientific chart recognition
CN114187595A (en) Document layout recognition method and system based on fusion of visual features and semantic features
CN115116074A (en) Handwritten character recognition and model training method and device
Tang et al. HRCenterNet: An anchorless approach to Chinese character segmentation in historical documents
Tayyab et al. Recognition of Visual Arabic Scripting News Ticker From Broadcast Stream
CN111832497B (en) Text detection post-processing method based on geometric features
CN116524521B (en) English character recognition method and system based on deep learning
CN111242114B (en) Character recognition method and device
CN111507348A (en) Character segmentation and identification method based on CTC deep neural network
CN110929013A (en) Image question-answer implementation method based on bottom-up entry and positioning information fusion
CN113159071B (en) Cross-modal image-text association anomaly detection method
CN116110047A (en) Method and system for constructing structured electronic medical record based on OCR-NER
CN112329389B (en) Chinese character stroke automatic extraction method based on semantic segmentation and tabu search
CN113903043A (en) Method for identifying printed Chinese character font based on twin metric model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant