CN112818951B - Ticket identification method - Google Patents

Ticket identification method Download PDF

Info

Publication number
CN112818951B
CN112818951B CN202110265378.4A CN202110265378A CN112818951B CN 112818951 B CN112818951 B CN 112818951B CN 202110265378 A CN202110265378 A CN 202110265378A CN 112818951 B CN112818951 B CN 112818951B
Authority
CN
China
Prior art keywords
text
network
recognition
text line
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110265378.4A
Other languages
Chinese (zh)
Other versions
CN112818951A (en
Inventor
路通
黄智衡
朱立平
易欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202110265378.4A priority Critical patent/CN112818951B/en
Publication of CN112818951A publication Critical patent/CN112818951A/en
Application granted granted Critical
Publication of CN112818951B publication Critical patent/CN112818951B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Probability & Statistics with Applications (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Character Discrimination (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a ticket identification method, which relates to the technical field of text detection, text identification and information structured extraction and solves the technical problem that the existing model can not effectively extract structured information; the training data of the text recognition model is expanded and the accuracy of the recognition model is improved through the rule synthesis data of the high-frequency words and the text content of the specific field in the high-frequency words; based on convolutional neural network, the method has good parallelism, and high-performance GPU (Graphics Processing Unit, graphics processor) can be utilized to accelerate calculation.

Description

Ticket identification method
Technical Field
The disclosure relates to the technical field of text detection, text recognition and information structured extraction, in particular to a ticket recognition method.
Background
Ticket identification refers to a technology for identifying images containing text information in different fields such as invoices, identity cards, bank cards and the like which are common in daily life and extracting structured information in the images. Because of the numerous fields involved in the ticket, the format of the ticket is complex, which presents a number of difficulties for identification and structured extraction.
The ticket structured recognition task can be subdivided into research tasks in a number of fields of text detection, text recognition, etc. The main method in the current text detection field is to combine the target detection or segmentation algorithm in the deep learning with the text detection task, such as EAST, the algorithm adopts the FCN (Fully Convolutional Networks, full convolution network) structure commonly used in semantic segmentation, regression is actually performed on text frame parameters based on the regression idea, the operation of feature extraction and feature fusion is completed by means of the FCN structure, then the EAST model predicts the regression parameters of a group of text lines at each position in the image, and finally the text lines in the input image can be extracted by using non-maximum suppression operation. The method greatly simplifies the text detection process, but the current similar method still has the problems of poor long text detection effect and poor small text area detection capability, and the problems are the key problems in ticket identification.
The current methods in the text recognition field mainly comprise character recognition and sequence recognition. When the character recognition method is used for character recognition, firstly, single characters are required to be segmented from the images, then the single character images are classified by a classifier, and finally, recognition results of the character line level are combined; and the text recognition algorithm based on sequence recognition takes the whole text line as the minimum unit of recognition, completes the recognition of the whole text sequence in an automatic alignment mode, and introduces a Seq2Seq model and an attention mechanism of natural language processing to improve the recognition effect. However, both methods have respective problems, and the character recognition method requires supervision information at the character level, so that a large amount of labeling work is required; the robustness of the sequence recognition-based method is greatly affected by training data, and erroneous recognition is likely to occur for images with complex backgrounds and similar characters.
Therefore, for the task of ticket structured recognition, the current method does not consider the problem of extracting information structure, and the messy information obtained by recognition cannot be directly used for subsequent work, so the problem needs to be studied and solved.
Disclosure of Invention
The present disclosure provides a ticket identifying method, which aims at establishing a model capable of effectively extracting structured information, aiming at the problems of different image styles, non-uniform table format, unclear printing, etc. in tickets.
The technical aim of the disclosure is achieved by the following technical scheme:
a method of ticket recognition, a model training process and a text recognition process, the model training process comprising:
s100: collecting data for text line detection and text image recognition; wherein the data comprises a text line image;
s101: collecting high-frequency words appearing in various ticket scenes, establishing a keyword database through the high-frequency words, counting rules of text contents in specific fields in the high-frequency words, and randomly generating expansion data according to the high-frequency words and the rules;
s102: training a CTPN network through the text line image to obtain a text line position detection model;
s103: training the recognition network through the data and the expansion data to obtain a text recognition model with a self-attention mechanism;
the text recognition process includes:
s200, inputting an image of the ticket into a text line position detection model, wherein the text line position detection model detects the text line position in the ticket and outputs a text image with the detected text line position;
s201: inputting the text image into a text recognition model for text recognition, recognizing the text through a self-attention mechanism of the text recognition model to obtain a recognition result, and carrying out structural extraction on the recognition result according to the keyword database to obtain effective information.
The beneficial effects of the present disclosure are: the invention obtains the text line position detection model by training the CTPN network, thereby positioning key information in the ticket and having robustness to the ticket in various forms (forms, etc.); the training data of the text recognition model is expanded and the accuracy of the recognition model is improved through the rule synthesis data of the high-frequency words and the text content of the specific field in the high-frequency words; based on convolutional neural network, the method has good parallelism, and high-performance GPU (Graphics Processing Unit, graphics processor) can be utilized to accelerate calculation.
Drawings
FIGS. 1 and 2 are flowcharts of a method model training process for ticket identification according to the present invention;
FIGS. 3 and 4 are flowcharts of a text recognition process according to a ticket recognition method of the present invention;
FIG. 5 is a block diagram of a text recognition model;
fig. 6 is a schematic flow chart of text line positioning, text recognition and structured extraction according to an embodiment of the present invention.
Detailed Description
The technical scheme of the present disclosure will be described in detail below with reference to the accompanying drawings. In the description of the present disclosure, it should be understood that the terms "first," "second," and "third" are used for descriptive purposes only and are not to be interpreted as indicating or implying a relative importance or the number of technical features indicated is implicitly indicated, only for distinguishing between different components.
Fig. 1 and fig. 2 are flowcharts of a model training process of a ticket recognition method according to the present invention, where, as shown in fig. 1 and fig. 2, the model training process includes: s100: collecting data for text line detection and text image recognition; wherein the data comprises a text line image.
Specifically, data for text line detection and text image recognition are collected, and a large number of published, high-precision labeled text line detection sets and text image recognition data sets containing multiple languages can be obtained through searching in the text detection and recognition research field. And screening out data which has a larger phase difference with the ticket identification scene from the collected data set, marking and removing the obtained abnormal data, and using the data obtained by the arrangement for training of a CTPN (Connectionist Text Proposal Network, connected with a text proposal network) network and an identification network.
S101: collecting high-frequency words appearing in various ticket scenes, establishing a keyword database through the high-frequency words, counting rules of text contents in specific fields in the high-frequency words, and randomly generating expansion data according to the high-frequency words and the rules.
Specifically, generating extended data randomly according to the high-frequency word and the rule includes: (1) And combining the high-frequency words with word frequency not smaller than a preset threshold value to generate a text. (2) The text is combined to conform to a particular format of text in the ticket. (3) And randomly selecting blank or noisy images as a background, and rendering the text conforming to a specific format onto the images to obtain images of the text, thereby obtaining the expansion data.
The data or the expansion data are actually image data, and the CTPN network and the identification network are directly trained by extracting the characteristics of the image data.
S102: and training the CTPN network through the text line image to obtain a text line position detection model.
S103: training the recognition network through the data and the expansion data to obtain a text recognition model.
Fig. 3 and fig. 4 are flowcharts of a text recognition process according to a ticket recognition method of the present invention, and as shown in fig. 3 and fig. 4, the text recognition process includes: s200, inputting the image of the ticket into a text line position detection model, wherein the text line position detection model detects the text line position in the ticket, and outputting the text image with the detected text line position.
S201: inputting the text image into a text recognition model for text recognition, recognizing the text through a self-attention mechanism of the text recognition model to obtain a recognition result, and carrying out structural extraction on the recognition result according to the keyword database to obtain effective information.
Specifically, the performing the structured extraction to obtain the effective information includes: and calculating the editing distance between each keyword and the recognition result, generating an editing distance matrix, matching the pairing recognition result with the minimum editing distance for each keyword, and determining the position of the keyword in the recognition result according to the pairing recognition result to obtain the effective information. When the key word is not matched with the pairing identification result, returning a default value; that is, the recognition rate is not 100%, and when the keyword cannot be matched with the pairing recognition result with the minimum editing distance, a default value is returned. The output of the deep neural network is matched through the minimum editing distance to obtain keyword information, so that the reliability of the result is effectively improved.
As a specific embodiment, step S102 includes:
s102-1: the CTPN network comprises a convolutional neural network, an LSTM (Long Short-Term Memory) network and a 1X 1 convolutional layer which are connected in sequence; each text line comprises at least two text line components, and a plurality of preset anchor boxes with fixed widths of 16 and different heights are preset in the convolutional neural network for positioning the text line components.
S102-2: the initial learning rate of the CTPN network training is 0.001, the momentum is 0.9, and the text line images are put into the CTPN network for training.
In the forward propagation process of the CTPN network, firstly, carrying out feature extraction on an input text line image through a convolutional neural network (such as VGG 16) to obtain a first feature map with the size of NxCxHxW, then carrying out convolution on the first feature map at a position corresponding to each preset anchor frame by using 3 x 3 to obtain a second feature map with the size of Nx9CxHxW, then converting the dimension of the second feature map into NH xW x 9C, then sending the second feature map with the dimension of NH xW x 9C into the LSTM network to learn the sequence feature of each line in the second feature map, obtaining a third feature map with the output of NH xW x 256, converting the dimension of the third feature map into Nx512 xH xW, and finally inputting the third feature map with the dimension of Nx512 xH xW into a convolution layer of 1 x 1 to obtain a prediction result; where N represents the number of text line images processed at a time, H represents the height of the text line images, W represents the width of the text line images, and C represents the number of channels the text line images have in the forward propagation of the network.
S102-3: after the prediction result is obtained, calculating the loss of the CTPN network according to a first loss function, updating the parameter of the CTPN network by using an optimizer SGD (stochastic gradient descent, random gradient descent), putting the text line image into the CTPN network after updating the parameter for training, and repeating the process repeatedly until an optimal prediction result is obtained, and storing the optimal model parameter corresponding to the optimal prediction result to obtain the text line position detection model;
wherein the first loss function is: loss=λ v ×L vconf ×L confx ×L x Wherein L is v The ordinate Loss is represented, namely a Loss function Smooth L1Loss between the coordinates and the height of the central point of the preset anchor frame and the coordinates and the height of the central point of the actual anchor frame is preset; l (L) conf Representing confidence loss, namely judging whether binary cross entropy loss of text line components exists between the preset anchor frame confidence and the actual anchor frame; l (L) x The horizontal coordinate offset Loss is represented, namely, a Loss function Smooth L1Loss between the horizontal coordinate and width offset value of the text line in the predicted anchor frame and the horizontal coordinate and width offset value of the text line in the actual anchor frame; lambda (lambda) v 、λ conf 、λ x Representing the weight;
the output result of the text line component at each preset anchor frame position comprises: v j 、v h 、s i 、x side Wherein v is j 、v h Representing the coordinates and the height s of the central point of the preset anchor frame i Representing confidence level, x, of text line components included in preset anchor boxes side Offset values representing lateral coordinates and width of the text line segment.
As a specific embodiment, step S103 includes:
s103-1: the recognition network comprises a feature extraction network, a feature fusion network, a coding network, a full-connection layer of one layer and a decoding algorithm which are sequentially connected, as shown in fig. 5.
S103-2: the initial learning rate of the identification network is 0.0001, the beta value of the optimizer Adam is (0.9,0.999), and the data and the expansion data are put into the identification network for training;
in the forward propagation process of the identification network, carrying out feature extraction on an image with the size of H multiplied by W through the feature extraction network to obtain a first feature;
fusing the first features through the feature fusion network, and sampling the fused first features to enable the height of the fused first features to be 1, so as to obtain second features;
inputting the second characteristic into the coding network to code so as to obtain a coding characteristic;
inputting the coding features into the full-connection layer for decoding to obtain a decoding result;
finally, aligning the decoding result through the decoding algorithm to obtain an identification result;
wherein the feature extraction network is a Resnet50 network, the feature fusion network is a FPEM (Feature Pyramid Enhancement Module ) network, the encoding network is an Encoder network, the decoding algorithm is a CTC (ConnectionistTemporal Classification, connection timing classification) algorithm, and the loss function of the CTC algorithm isY represents the decoding result, Y ' represents the correctly labeled recognition result, t represents the sequence length of the coding feature, k represents the alignment function of the CTC network, C: k (C) =Y ' represents the recognition result Y ' that all sequences C in the set C can be correctly labeled by the CTC algorithm, p represents the probability, p (C) t Y) means that a sequence c of length t is obtained on the premise of Y t Is a probability of (2).
The Resnet50 network is a residual network for extracting visual features of the image, the FPEM network is a convolution network for fusing the visual features of the multi-stage image, and the receptive field of the model can be increased by fusing the multi-stage features, so that the accuracy of the model is improved. The Encoder network is a characteristic coding network based on a self-attention mechanism, and the adoption of the self-attention mechanism can enable the model to extract effective information in the characteristics more accurately, so that the robustness of the text recognition model is improved. The CTC algorithm is a decoding algorithm of an output sequence, for example, the output sequence is "cccaaat", and the output sequence is "cat" after being aligned by the CTC algorithm.
The Encoder network is an Encoder part in a model transducer widely applied in the fields of natural language processing and computer vision, and the model part benefits from the excellent characteristic capturing performance of a stackable encoding module, wherein the encoding module comprises two parts of Multi-Head Attention and Feed Forward, and the Multi-Head Attention is expressed as follows:
Multi-Head Attention(x)=x+Self-Attention(FC(x),FC(x),FC(x));
wherein the input of the Encoder is respectively input into Self-Attention module (Q, K, V) after passing through 3 full connection layers FC, d k For the dimension of the input, T represents the matrix transpose; the feedforward part is composed of 1 full-connection layer FC, 1 Relu activation function and 1 full-connection layer FC.
S103-3: after the identification result is obtained, the loss of the identification network is calculated through the loss function of the CTC algorithm, the parameters of the identification network are updated by using an optimizer Adam, the data and the expanded data are input into the identification network after the parameters are updated for training, the process is repeated repeatedly until the optimal identification result is obtained, and the optimal model parameters corresponding to the optimal identification result are stored, so that the text identification model is obtained.
Fig. 6 is a schematic flow chart of text line location, text recognition and structured extraction provided by the embodiment of the invention, a single ticket image is input into a text line position detection model (CTPN model) loaded with optimal parameters to obtain a text line detection result, and redundant text boxes are filtered through a confidence threshold to obtain text location boxes of key areas on the image.
When recognizing text line content, the height of the text line image is generally adjusted to 32 pixels and then sent to the text recognition model for recognition, which specifically comprises: (1) The text line image is scaled with the original aspect ratio maintained, and the scaled image height h ' =32, and the image width w ' =w× (h '/h), where w, h is the original image width and height. (2) And inputting the single image into a text recognition model loaded with the optimal parameters to obtain a recognition vector. (3) And processing the recognition vector by a CTC decoding algorithm to obtain a text sequence with highest confidence.
And then carrying out structured extraction to obtain effective information, wherein the method comprises the following steps: (1) Calculating the editing distance between each keyword and the text recognition result, wherein the larger the editing distance is, the lower the matching degree is; (2) Generating an edit distance matrix, and finding the pairing with the minimum edit distance for each keyword; (3) And determining the position of the keyword in the recognition result according to the pairing, and obtaining the text content. And finally extracting the positioned key information, organizing the positioned key information into structured data according to the corresponding type, outputting the structured data, and supplementing the structured data by using a default value obtained by statistics if the positioned key information is not matched with the key words.
The foregoing is an exemplary embodiment of the disclosure, the scope of which is defined by the claims and their equivalents.

Claims (3)

1. A method of ticket recognition, characterized by a model training process and a text recognition process, the model training process comprising:
s100: collecting data for text line detection and text image recognition; wherein the data comprises a text line image;
s101: collecting high-frequency words appearing in various ticket scenes, establishing a keyword database through the high-frequency words, counting rules of field text contents in the high-frequency words, and randomly generating expansion data according to the high-frequency words and the rules;
s102: training a CTPN network through the text line image to obtain a text line position detection model;
s103: training the recognition network through the data and the expansion data to obtain a text recognition model with a self-attention mechanism;
the text recognition process includes:
s200, inputting an image of the ticket into a text line position detection model, wherein the text line position detection model detects the text line position in the ticket and outputs a text image with the detected text line position;
s201: inputting the text image into a text recognition model for text recognition, recognizing the text through a self-attention mechanism of the text recognition model to obtain a recognition result, and carrying out structural extraction on the recognition result according to the keyword database to obtain effective information;
in the step S101, generating the extended data randomly according to the high-frequency word and the rule includes:
combining the high-frequency words with word frequency not smaller than a preset threshold value to generate a text;
combining the text into a specific format conforming to the text in the ticket;
randomly selecting blank or noisy images as a background, and rendering the text conforming to a specific format onto the images to obtain images of the text, namely obtaining the expansion data;
the step S102 includes:
s102-1: the CTPN network comprises a convolutional neural network, an LSTM network and a 1X 1 convolutional layer which are connected in sequence; each text line comprises at least two text line components, and a plurality of preset anchor frames with fixed widths of 16 and different heights are preset in the convolutional neural network and used for positioning the text line components;
s102-2: the initial learning rate of the CTPN network training is 0.001, the momentum is 0.9, and the text line images are put into the CTPN network for training;
in the forward propagation process of the CTPN network, firstly, carrying out feature extraction on an input text line image through the convolutional neural network to obtain a first feature image with the size of NxCxHxW, then carrying out convolution on the first feature image at the position corresponding to each preset anchor frame by using 3 x 3 to obtain a second feature image with the size of Nx9CxHxW, then converting the dimension of the second feature image into NH xW x 9C, then sending the second feature image with the dimension of NH xW x 9C into the LSTM network to learn the sequence feature of each line in the second feature image, obtaining a third feature image with the output of NH xW x 256, converting the dimension of the third feature image into Nx512 xH xW, and finally inputting the third feature image with the dimension of Nx512 xH xW into the convolution layer for convolution to obtain a prediction result; wherein N represents the number of text line images processed each time, H represents the height of the text line images, W represents the width of the text line images, and C represents the number of channels of the text line images in forward propagation of the network;
s102-3: after the prediction result is obtained, calculating the loss of the CTPN network according to a first loss function, updating the parameter of the CTPN network by using an optimizer SGD, putting the text line image into the CTPN network after updating the parameter for training, repeating the process repeatedly until an optimal prediction result is obtained, and storing the optimal model parameter corresponding to the optimal prediction result to obtain the text line position detection model;
wherein the first loss function is: loss=λ v ×L vconf ×L confx ×L x Wherein L is v The ordinate Loss is represented, namely a Loss function Smooth L1Loss between the coordinates and the height of the central point of the preset anchor frame and the coordinates and the height of the central point of the actual anchor frame is preset; l (L) conf Representing confidence loss, namely judging whether binary cross entropy loss of text line components exists between the preset anchor frame confidence and the actual anchor frame; l (L) x Representing the loss of the horizontal coordinate offset, i.e. the loss between the offset value of the horizontal coordinate and width of the line in the predicted anchor frame and the offset value of the horizontal coordinate and width of the line in the actual anchor frameThe Loss function Smooth L1Loss; lambda (lambda) v 、λ conf 、λ x Representing the weight;
the output result of the text line component at each preset anchor frame position comprises: v j 、v h 、s i 、x side Wherein v is j 、v h Representing the coordinates and the height s of the central point of the preset anchor frame i Representing confidence level, x, of text line components included in preset anchor boxes side Offset values representing lateral coordinates and width of the text line component;
the step S103 includes:
s103-1: the identification network comprises a feature extraction network, a feature fusion network, a coding network, a full-connection layer of one layer and a decoding algorithm which are connected in sequence;
s103-2: the initial learning rate of the identification network is 0.0001, the beta value of the optimizer Adam is (0.9,0.999), and the data and the expansion data are put into the identification network for training;
in the forward propagation process of the identification network, carrying out feature extraction on an image with the size of H multiplied by W through the feature extraction network to obtain a first feature;
fusing the first features through the feature fusion network, and sampling the fused first features to enable the height of the fused first features to be 1, so as to obtain second features;
inputting the second characteristic into the coding network to code so as to obtain a coding characteristic;
inputting the coding features into the full-connection layer for decoding to obtain a decoding result;
finally, aligning the decoding result through the decoding algorithm to obtain an identification result;
wherein the feature extraction network is a Resnet50 network, the feature fusion network is an FPEM network, the encoding network is an Encoder network, the decoding algorithm is a CTC algorithm, and the loss function of the CTC algorithm isY represents the decoding result, Y ' represents the correctly labeled recognition result, t represents the sequence length of the coding feature, k represents the alignment function of the CTC network, C: k (C) =Y ' represents the recognition result Y ' that all sequences C in the set C can be correctly labeled by the CTC algorithm, p represents the probability, p (C) t Y) means that a sequence c of length t is obtained on the premise of Y t Probability of (2);
s103-3: after the identification result is obtained, the loss of the identification network is calculated through the loss function of the CTC algorithm, the parameters of the identification network are updated by using an optimizer Adam, the data and the expanded data are input into the identification network after the parameters are updated for training, the process is repeated repeatedly until the optimal identification result is obtained, and the optimal model parameters corresponding to the optimal identification result are stored, so that the text identification model is obtained.
2. A method for identifying a ticket as claimed in claim 1, wherein in step S201, the performing a structured extraction to obtain valid information includes:
calculating the editing distance between each keyword and the recognition result, generating an editing distance matrix, matching the pairing recognition result with the minimum editing distance for each keyword, and determining the position of the keyword in the recognition result according to the pairing recognition result to obtain the effective information;
and returning to a default value when the keyword is not matched with the pairing identification result.
3. A ticket recognition method according to claim 2, wherein in step S201, when recognizing a text line image by the text recognition model, the text line image is adjusted to 32 pixels in height and then sent to the text recognition model for recognition.
CN202110265378.4A 2021-03-11 2021-03-11 Ticket identification method Active CN112818951B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110265378.4A CN112818951B (en) 2021-03-11 2021-03-11 Ticket identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110265378.4A CN112818951B (en) 2021-03-11 2021-03-11 Ticket identification method

Publications (2)

Publication Number Publication Date
CN112818951A CN112818951A (en) 2021-05-18
CN112818951B true CN112818951B (en) 2023-11-21

Family

ID=75863141

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110265378.4A Active CN112818951B (en) 2021-03-11 2021-03-11 Ticket identification method

Country Status (1)

Country Link
CN (1) CN112818951B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113255645B (en) * 2021-05-21 2024-04-23 北京有竹居网络技术有限公司 Text line picture decoding method, device and equipment
CN113255646B (en) * 2021-06-02 2022-10-18 北京理工大学 Real-time scene text detection method
CN113298179B (en) * 2021-06-15 2024-05-28 南京大学 Customs commodity abnormal price detection method and device
CN113657377B (en) * 2021-07-22 2023-11-14 西南财经大学 Structured recognition method for mechanical bill image
CN113591772B (en) * 2021-08-10 2024-01-19 上海杉互健康科技有限公司 Method, system, equipment and storage medium for structured identification and input of medical information
CN115019327B (en) * 2022-06-28 2024-03-08 珠海金智维信息科技有限公司 Fragment bill recognition method and system based on fragment bill segmentation and Transformer network
CN115713777A (en) * 2023-01-06 2023-02-24 山东科技大学 Contract document content identification method
CN116912852B (en) * 2023-07-25 2024-10-01 京东方科技集团股份有限公司 Method, device and storage medium for identifying text of business card

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108446621A (en) * 2018-03-14 2018-08-24 平安科技(深圳)有限公司 Bank slip recognition method, server and computer readable storage medium
CN108921166A (en) * 2018-06-22 2018-11-30 深源恒际科技有限公司 Medical bill class text detection recognition method and system based on deep neural network
CN110097049A (en) * 2019-04-03 2019-08-06 中国科学院计算技术研究所 A kind of natural scene Method for text detection and system
CN110263694A (en) * 2019-06-13 2019-09-20 泰康保险集团股份有限公司 A kind of bank slip recognition method and device
CN110399845A (en) * 2019-07-29 2019-11-01 上海海事大学 Continuously at section text detection and recognition methods in a kind of image
CN110807455A (en) * 2019-09-19 2020-02-18 平安科技(深圳)有限公司 Bill detection method, device and equipment based on deep learning and storage medium
CN110866495A (en) * 2019-11-14 2020-03-06 杭州睿琪软件有限公司 Bill image recognition method, bill image recognition device, bill image recognition equipment, training method and storage medium
CN111340034A (en) * 2020-03-23 2020-06-26 深圳智能思创科技有限公司 Text detection and identification method and system for natural scene
CN111832423A (en) * 2020-06-19 2020-10-27 北京邮电大学 Bill information identification method, device and system
CN112115934A (en) * 2020-09-16 2020-12-22 四川长虹电器股份有限公司 Bill image text detection method based on deep learning example segmentation

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108446621A (en) * 2018-03-14 2018-08-24 平安科技(深圳)有限公司 Bank slip recognition method, server and computer readable storage medium
WO2019174130A1 (en) * 2018-03-14 2019-09-19 平安科技(深圳)有限公司 Bill recognition method, server, and computer readable storage medium
CN108921166A (en) * 2018-06-22 2018-11-30 深源恒际科技有限公司 Medical bill class text detection recognition method and system based on deep neural network
CN110097049A (en) * 2019-04-03 2019-08-06 中国科学院计算技术研究所 A kind of natural scene Method for text detection and system
CN110263694A (en) * 2019-06-13 2019-09-20 泰康保险集团股份有限公司 A kind of bank slip recognition method and device
CN110399845A (en) * 2019-07-29 2019-11-01 上海海事大学 Continuously at section text detection and recognition methods in a kind of image
CN110807455A (en) * 2019-09-19 2020-02-18 平安科技(深圳)有限公司 Bill detection method, device and equipment based on deep learning and storage medium
CN110866495A (en) * 2019-11-14 2020-03-06 杭州睿琪软件有限公司 Bill image recognition method, bill image recognition device, bill image recognition equipment, training method and storage medium
CN111340034A (en) * 2020-03-23 2020-06-26 深圳智能思创科技有限公司 Text detection and identification method and system for natural scene
CN111832423A (en) * 2020-06-19 2020-10-27 北京邮电大学 Bill information identification method, device and system
CN112115934A (en) * 2020-09-16 2020-12-22 四川长虹电器股份有限公司 Bill image text detection method based on deep learning example segmentation

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
financial ticket intelligent recognition system based on deep learning;fukang tian等;arxiv;1-15 *
ticket text detection and recognition based on deep learning;xiuxin chen等;2019 chinese automation congress;1-5 *
基于深度学习的自然场景文本检测与识别综述;王建新;王子亚;田萱;;软件学报;第31卷(第05期);1465-1496 *
基于深度学习的表格类型工单识别设计与实现;潘炜;刘丰威;;数字技术与应用;第38卷(第07期);150-152 *
基于高分辨率卷积神经网络的场景文本检测模型;陈淼妙;续晋华;;计算机应用与软件;第37卷(第10期);138-144 *

Also Published As

Publication number Publication date
CN112818951A (en) 2021-05-18

Similar Documents

Publication Publication Date Title
CN112818951B (en) Ticket identification method
CN108898138A (en) Scene text recognition methods based on deep learning
CN111027562A (en) Optical character recognition method based on multi-scale CNN and RNN combined with attention mechanism
CN114155527A (en) Scene text recognition method and device
CN112686219B (en) Handwritten text recognition method and computer storage medium
CN114092930B (en) Character recognition method and system
CN112926379A (en) Method and device for constructing face recognition model
CN114067300A (en) End-to-end license plate correction and identification method
Tang et al. HRCenterNet: An anchorless approach to Chinese character segmentation in historical documents
CN113159071B (en) Cross-modal image-text association anomaly detection method
CN117079288B (en) Method and model for extracting key information for recognizing Chinese semantics in scene
CN114581956A (en) Multi-branch fine-grained feature fusion pedestrian re-identification method
Elaraby et al. A Novel Siamese Network for Few/Zero-Shot Handwritten Character Recognition Tasks.
Zuo et al. An intelligent knowledge extraction framework for recognizing identification information from real-world ID card images
CN111832497B (en) Text detection post-processing method based on geometric features
KR20200068073A (en) Improvement of Character Recognition for Parts Book Using Pre-processing of Deep Learning
Karanje et al. Survey on text detection, segmentation and recognition from a natural scene images
CN111242114B (en) Character recognition method and device
Goel et al. Text extraction from natural scene images using OpenCV and CNN
CN113903043B (en) Method for identifying printed Chinese character font based on twin metric model
CN111178409B (en) Image matching and recognition system based on big data matrix stability analysis
CN114399768A (en) Workpiece product serial number identification method, device and system based on Tesseract-OCR engine
Sahu et al. A survey on handwritten character recognition
CN116311275B (en) Text recognition method and system based on seq2seq language model
Sharma Recovery of drawing order in handwritten digit images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant