CN112818951B - A method of ticket identification - Google Patents
A method of ticket identification Download PDFInfo
- Publication number
- CN112818951B CN112818951B CN202110265378.4A CN202110265378A CN112818951B CN 112818951 B CN112818951 B CN 112818951B CN 202110265378 A CN202110265378 A CN 202110265378A CN 112818951 B CN112818951 B CN 112818951B
- Authority
- CN
- China
- Prior art keywords
- text
- network
- recognition
- text line
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 46
- 238000001514 detection method Methods 0.000 claims abstract description 30
- 238000000605 extraction Methods 0.000 claims abstract description 18
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 8
- 239000000284 extract Substances 0.000 claims abstract description 8
- 230000008569 process Effects 0.000 claims description 24
- 230000006870 function Effects 0.000 claims description 15
- 230000007246 mechanism Effects 0.000 claims description 8
- 230000004927 fusion Effects 0.000 claims description 7
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 4
- 239000011159 matrix material Substances 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 abstract description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 241000282326 Felis catus Species 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/414—Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Computer Graphics (AREA)
- Geometry (AREA)
- Probability & Statistics with Applications (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Character Discrimination (AREA)
- Image Analysis (AREA)
Abstract
本发明公开了一种票证识别的方法,涉及文本检测、文本识别与信息结构化提取技术领域,解决了现有模型不能有效提取结构化信息的技术问题,其技术方案要点是通过对CTPN网络进行训练得到文本行位置检测模型,从而对票证中的关键信息进行定位,且对各种形式(表格等)的票证具有鲁棒性;通过高频词及其中特定字段文本内容的规则合成数据,扩充了文本识别模型的训练数据,提升了识别模型的准确性;基于卷积神经网络,具有很好的并行性,可以利用高性能的GPU(Graphics Processing Unit,图形处理器)加速计算。
The invention discloses a ticket recognition method, which relates to the technical fields of text detection, text recognition and structured information extraction. It solves the technical problem that the existing model cannot effectively extract structured information. The key point of the technical solution is to carry out the CTPN network The text line position detection model is trained to locate the key information in the ticket and is robust to various forms of tickets (tables, etc.); the data is synthesized through rules of high-frequency words and text content in specific fields to expand the It provides training data for the text recognition model and improves the accuracy of the recognition model; it is based on a convolutional neural network and has good parallelism, and can use high-performance GPU (Graphics Processing Unit, graphics processor) to accelerate calculations.
Description
技术领域Technical field
本公开涉及文本检测、文本识别与信息结构化提取技术领域,尤其涉及一种票证识别的方法。The present disclosure relates to the technical fields of text detection, text recognition and structured information extraction, and in particular, to a method of ticket recognition.
背景技术Background technique
票证识别指的是对日常生活中常见的发票、身份证、银行卡等不同领域中含有文字信息的图像进行识别并提取其中结构化信息的技术。由于票证包含的领域众多,票证的格式繁杂,给识别与结构化提取带来了诸多困难。Ticket recognition refers to the technology of identifying images containing text information in different fields such as invoices, ID cards, bank cards, etc. that are common in daily life and extracting structured information from them. Since tickets include many fields and their formats are complex, it brings many difficulties to identification and structured extraction.
票证结构化识别任务可以被细分为文本检测、文本识别等多个领域内的研究任务。当前文本检测领域的主要方法是将深度学习中目标检测或分割算法与文字检测任务相结合得到,如EAST,该算法采用了语义分割常用的FCN(Fully Convolutional Networks,全卷积网络)结构,实际上还是基于回归的思想对文字框参数进行回归,借助了FCN的架构完成特征提取和特征融合的操作,然后EAST模型在图像中的每个位置预测一组文本行的回归参数,最后使用非极大值抑制操作即可提取输入图像中的文字行。该方法极大地简化了文字检测的流程,但是目前相似的方法仍存在对长文本检测效果不佳、小文本区域检测能力差的问题,而这些问题正是票证识别中较为关键的问题。The ticket structured recognition task can be subdivided into research tasks in multiple fields such as text detection and text recognition. The main method in the current field of text detection is to combine the target detection or segmentation algorithm in deep learning with the text detection task, such as EAST. This algorithm uses the FCN (Fully Convolutional Networks) structure commonly used in semantic segmentation. In fact, The text box parameters are regressed based on the idea of regression, and the FCN architecture is used to complete the feature extraction and feature fusion operations. Then the EAST model predicts the regression parameters of a set of text lines at each position in the image, and finally uses non-polar The large value suppression operation extracts lines of text from the input image. This method greatly simplifies the process of text detection, but currently similar methods still have problems such as poor detection of long text and poor detection of small text areas, and these problems are the more critical issues in ticket recognition.
当前文本识别领域内的方法主要有字符识别和序列识别两种。使用字符识别方法进行文字识别时,首先需要从图像中分割出单个字符,再通过分类器对单个字符图像进行分类,最后合并成文本行级别的识别结果;而基于序列识别的文本识别算法将整个文本行作为识别的最小单元,以自动对齐的方式完成对整个文字序列的识别,同时引入了自然语言处理的Seq2Seq模型和注意力机制来提高识别效果。但这两种方法都存在各自的问题,字符识别方法需要字符级别的监督信息,所以需要大量的标注工作;基于序列识别的方法的鲁棒性受训练数据的影响很大,对于背景复杂的图像和相似的字符容易出现错误识别。Currently, there are two main methods in the field of text recognition: character recognition and sequence recognition. When using the character recognition method for text recognition, you first need to segment a single character from the image, then classify the single character image through a classifier, and finally merge it into a text line-level recognition result; while the text recognition algorithm based on sequence recognition divides the entire As the smallest unit of recognition, text lines are automatically aligned to complete the recognition of the entire text sequence. At the same time, the Seq2Seq model of natural language processing and the attention mechanism are introduced to improve the recognition effect. However, both methods have their own problems. The character recognition method requires character-level supervision information, so a lot of annotation work is required; the robustness of the method based on sequence recognition is greatly affected by the training data, and for images with complex backgrounds and similar characters are prone to misrecognition.
因此,对于票证结构化识别的任务,当前方法没有考虑到对信息结构化的提取问题,识别得到的杂乱信息并不能直接用于后续的工作,所以针对以上问题还有待研究解决。Therefore, for the task of structured ticket recognition, the current method does not take into account the extraction of structured information, and the messy information obtained cannot be directly used in subsequent work, so the above problems still need to be researched and solved.
发明内容Contents of the invention
本公开提供了一种票证识别的方法,其技术目的是针对票证中的图像风格不一、表格格式不统一、印刷不清晰等问题,建立一个能有效提取结构化信息的模型。The present disclosure provides a method for ticket recognition. Its technical purpose is to establish a model that can effectively extract structured information to address issues such as different image styles, inconsistent table formats, and unclear printing in tickets.
本公开的上述技术目的是通过以下技术方案得以实现的:The above technical objectives of the present disclosure are achieved through the following technical solutions:
一种票证识别的方法,模型训练过程和文本识别过程,所述模型训练过程包括:A method of ticket recognition, a model training process and a text recognition process. The model training process includes:
S100:收集用于文本行检测与文本图像识别的数据;其中,所述数据包括文本行图像;S100: Collect data for text line detection and text image recognition; wherein the data includes text line images;
S101:收集在各类票证场景下出现的高频词,通过所述高频词建立关键词数据库,并统计所述高频词中特定字段文本内容的规则,根据所述高频词和所述规则随机生成扩充数据;S101: Collect high-frequency words that appear in various ticket scenarios, establish a keyword database based on the high-frequency words, and count the rules of text content in specific fields in the high-frequency words. According to the high-frequency words and the Rules randomly generate extended data;
S102:通过所述文本行图像对CTPN网络进行训练,得到文本行位置检测模型;S102: Train the CTPN network through the text line image to obtain a text line position detection model;
S103:通过所述数据和所述扩充数据对识别网络进行训练,得到带有自注意力机制的文本识别模型;S103: Train the recognition network through the data and the expanded data to obtain a text recognition model with a self-attention mechanism;
所述文本识别过程包括:The text recognition process includes:
S200:将票证的图像输入到文本行位置检测模型,所述文本行位置检测模型对票证中的文本行位置进行检测,输出检测到文本行位置的文本图像;S200: Input the image of the ticket into the text line position detection model. The text line position detection model detects the text line position in the ticket and outputs the text image with the detected text line position;
S201:将所述文本图像输入到文本识别模型进行文本识别,通过文本识别模型的自注意力机制对所述文本进行识别后得到识别结果,根据所述关键词数据库对所述识别结果进行结构化提取,得到有效信息。S201: Input the text image into the text recognition model for text recognition, recognize the text through the self-attention mechanism of the text recognition model to obtain a recognition result, and structure the recognition result according to the keyword database Extract and get valid information.
本公开的有益效果在于:本发明通过对CTPN网络进行训练得到文本行位置检测模型,从而对票证中的关键信息进行定位,且对各种形式(表格等)的票证具有鲁棒性;通过高频词及其中特定字段文本内容的规则合成数据,扩充了文本识别模型的训练数据,提升了识别模型的准确性;基于卷积神经网络,具有很好的并行性,可以利用高性能的GPU(Graphics Processing Unit,图形处理器)加速计算。The beneficial effects of the present disclosure are: the present invention obtains a text line position detection model by training the CTPN network, thereby locating the key information in the ticket, and is robust to various forms of tickets (tables, etc.); through high The rule-synthesized data of frequent words and specific field text content in them expands the training data of the text recognition model and improves the accuracy of the recognition model; based on the convolutional neural network, it has good parallelism and can utilize high-performance GPU ( Graphics Processing Unit, graphics processor) accelerates calculations.
附图说明Description of the drawings
图1、图2为本发明所述一种票证识别的方法模型训练过程的流程图;Figures 1 and 2 are flow charts of the model training process of a ticket identification method according to the present invention;
图3、图4为本发明所述一种票证识别的方法文本识别过程的流程图;Figures 3 and 4 are flow charts of the text recognition process of a ticket recognition method according to the present invention;
图5为文本识别模型的结构图;Figure 5 is the structure diagram of the text recognition model;
图6为本发明实施例提供的文本行定位、文本识别、结构化提取的流程示意图。Figure 6 is a schematic flowchart of text line positioning, text recognition, and structured extraction provided by an embodiment of the present invention.
具体实施方式Detailed ways
下面将结合附图对本公开技术方案进行详细说明。在本公开的描述中,需要理解地是,术语“第一”、“第二”、“第三”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量,仅用来区分不同的组成部分。The technical solution of the present disclosure will be described in detail below with reference to the accompanying drawings. In the description of the present disclosure, it should be understood that the terms "first", "second" and "third" are used for descriptive purposes only and cannot be understood as indicating or implying relative importance or implicitly indicating the indicated. The number of technical characteristics serves only to distinguish the different components.
图1、图2为本发明所述一种票证识别的方法模型训练过程的流程图,如图1和图2所示,模型训练过程包括:S100:收集用于文本行检测与文本图像识别的数据;其中,所述数据包括文本行图像。Figures 1 and 2 are flow charts of the model training process of a ticket recognition method according to the present invention. As shown in Figures 1 and 2, the model training process includes: S100: Collecting data for text line detection and text image recognition. Data; wherein the data includes text line images.
具体地,收集用于文本行检测与文本图像识别的数据,可通过在文本检测与识别研究领域检索得到大量公开的、高精度标注的、包含多种语言的文本行检测集和文本图像识别数据集。从收集到的数据集中筛去与票证识别场景相差较大的数据,标记并剔除获取到的异常数据,将整理得到的数据用于CTPN(Connectionist Text Proposal Network,连接文本提案网络)网络和识别网络的训练。Specifically, to collect data for text line detection and text image recognition, a large number of public, high-precision annotated text line detection sets and text image recognition data containing multiple languages can be obtained by searching in the field of text detection and recognition research. set. Screen out the data that is significantly different from the ticket recognition scenario from the collected data set, mark and eliminate the abnormal data obtained, and use the collected data for the CTPN (Connectionist Text Proposal Network, Connectionist Text Proposal Network) network and recognition network training.
S101:收集在各类票证场景下出现的高频词,通过所述高频词建立关键词数据库,并统计所述高频词中特定字段文本内容的规则,根据所述高频词和所述规则随机生成扩充数据。S101: Collect high-frequency words that appear in various ticket scenarios, establish a keyword database based on the high-frequency words, and count the rules of text content in specific fields in the high-frequency words. According to the high-frequency words and the Rules randomly generate augmented data.
具体地,根据所述高频词和所述规则随机生成扩充数据,包括:(1)将词频不小于预设阈值的所述高频词进行组合生成文本。(2)将所述文本组合成符合票证中文本的特定格式。(3)随机选取空白或带有噪声的图像作为背景,将符合特定格式的所述文本渲染到图像上,得到所述文本的图像,即得到所述扩充数据。Specifically, randomly generating expanded data according to the high-frequency words and the rules includes: (1) combining the high-frequency words whose word frequency is not less than a preset threshold to generate text. (2) Combine the text into a specific format that matches the text in the ticket. (3) Randomly select a blank or noisy image as the background, render the text that conforms to a specific format onto the image, and obtain an image of the text, that is, obtain the expanded data.
上述无论是数据还是扩充数据,实际都是图像数据,直接通过提取图像数据的特征训练CTPN网络和识别网络。The above data, whether it is data or expanded data, is actually image data. The CTPN network and recognition network are directly trained by extracting the characteristics of the image data.
S102:通过所述文本行图像对CTPN网络进行训练,得到文本行位置检测模型。S102: Train the CTPN network through the text line image to obtain a text line position detection model.
S103:通过所述数据和所述扩充数据对识别网络进行训练,得到文本识别模型。S103: Train the recognition network using the data and the expanded data to obtain a text recognition model.
图3、图4为本发明所述一种票证识别的方法文本识别过程的流程图,如图3和图4所示,文本识别过程包括:S200:将票证的图像输入到文本行位置检测模型,所述文本行位置检测模型对票证中的文本行位置进行检测,输出检测到文本行位置的文本图像。Figures 3 and 4 are flow charts of the text recognition process of a ticket recognition method according to the present invention. As shown in Figures 3 and 4, the text recognition process includes: S200: Input the image of the ticket into the text line position detection model , the text line position detection model detects the text line position in the ticket, and outputs a text image with the detected text line position.
S201:将所述文本图像输入到文本识别模型进行文本识别,通过文本识别模型的自注意力机制对所述文本进行识别后得到识别结果,根据所述关键词数据库对所述识别结果进行结构化提取,得到有效信息。S201: Input the text image into the text recognition model for text recognition, recognize the text through the self-attention mechanism of the text recognition model to obtain a recognition result, and structure the recognition result according to the keyword database Extract and get valid information.
具体地,所述进行结构化提取,得到有效信息,包括:计算每个所述关键词与所述识别结果的编辑距离,生成编辑距离矩阵,为每个所述关键词匹配编辑距离最小的配对识别结果,根据所述配对识别结果确定所述关键词在所述识别结果中的位置,得到所述有效信息。而当关键词没有匹配到配对识别结果时,返回缺省值;也就是说,识别率并不是100%的,当出现关键词无法匹配到编辑距离最小的配对识别结果的情况时,就会返回一个缺省值表示。将深度神经网络的输出通过最小编辑距离匹配得到关键词信息,有效的提高了结果的可靠性。Specifically, performing structured extraction to obtain effective information includes: calculating the edit distance between each keyword and the recognition result, generating an edit distance matrix, and matching the pair with the smallest edit distance for each keyword Recognition results: determine the position of the keyword in the recognition result according to the paired recognition result, and obtain the effective information. When the keyword does not match the pair recognition result, the default value is returned; that is to say, the recognition rate is not 100%. When the keyword cannot match the pair recognition result with the smallest edit distance, it will return Represented by a default value. The output of the deep neural network is matched with the minimum edit distance to obtain keyword information, which effectively improves the reliability of the results.
作为具体实施方案地,步骤S102包括:As a specific implementation, step S102 includes:
S102-1:所述CTPN网络包括依次连接的卷积神经网络、LSTM(Long Short-TermMemory,长短时记忆)网络和一个1×1卷积层;每个文本行包括至少两个文本行部件,在所述卷积神经网络中预设多个宽度固定为16、高度不同的预设锚框用于定位所述文本行部件。S102-1: The CTPN network includes a convolutional neural network, an LSTM (Long Short-Term Memory) network and a 1×1 convolution layer connected in sequence; each text line includes at least two text line components, A plurality of preset anchor boxes with a fixed width of 16 and different heights are preset in the convolutional neural network for positioning the text line component.
S102-2:所述CTPN网络训练的初始学习率为0.001,动量为0.9,将所述文本行图像投入到所述CTPN网络进行训练。S102-2: The initial learning rate of the CTPN network training is 0.001, the momentum is 0.9, and the text line image is put into the CTPN network for training.
在所述CTPN网络的前向传播过程中,首先通过卷积神经网络(例如VGG16)对输入的所述文本行图像进行特征提取,得到大小为N×C×H×W的第一特征图,然后在所述第一特征图上对应每个预设锚框的位置处使用3×3卷积得到大小为N×9C×H×W的第二特征图,随后将所述第二特征图的维度变换为NH×W×9C,再将维度为NH×W×9C的第二特征图送入所述LSTM网络中学习所述第二特征图中每一行的序列特征,得到输出为NH×W×256的第三特征图,并将所述第三特征图的维度变换为N×512×H×W,最后将维度为N×512×H×W的第三特征图投入到1×1卷积层卷积后得到预测结果;其中,N表示每次处理的文本行图像的数量,H表示文本行图像的高度,W表示文本行图像的宽度,C表示文本行图像在网络前向传播中的通道数。In the forward propagation process of the CTPN network, first feature extraction is performed on the input text line image through a convolutional neural network (such as VGG16), and a first feature map of size N×C×H×W is obtained. Then use 3×3 convolution at the position corresponding to each preset anchor box on the first feature map to obtain a second feature map with a size of N×9C×H×W, and then convert the The dimension is transformed to NH×W×9C, and then the second feature map with the dimension NH×W×9C is sent to the LSTM network to learn the sequence features of each row in the second feature map, and the output is NH×W ×256 third feature map, and transform the dimension of the third feature map to N×512×H×W, and finally put the third feature map with dimension N×512×H×W into a 1×1 volume The prediction result is obtained after convolution of the multilayer layer; where N represents the number of text line images processed each time, H represents the height of the text line image, W represents the width of the text line image, and C represents the forward propagation of the text line image in the network. number of channels.
S102-3:得到所述预测结果后,按照第一损失函数计算CTPN网络的损失,再使用优化器SGD(stochastic gradient descent,随机梯度下降)对CTPN网络的参数进行更新,再将所述文本行图像投入到更新参数后的CTPN网络进行训练,反复重复这一过程,直至得到最佳预测结果,保存所述最佳预测结果对应的最佳模型参数,即得到所述文本行位置检测模型;S102-3: After obtaining the prediction result, calculate the loss of the CTPN network according to the first loss function, then use the optimizer SGD (stochastic gradient descent, stochastic gradient descent) to update the parameters of the CTPN network, and then add the text line The image is put into the CTPN network after updated parameters for training, and this process is repeated repeatedly until the best prediction result is obtained, and the best model parameters corresponding to the best prediction result are saved to obtain the text line position detection model;
其中,所述第一损失函数为:Loss=λv×Lv+λconf×Lconf+λx×Lx,其中,Lv表示纵坐标损失,即预设锚框中心点坐标和高度与实际锚框中心点坐标和高度之间的损失函数Smooth L1 Loss;Lconf表示置信度损失,即预设锚框置信度与实际锚框之间是否含有文本行部件的二元交叉熵损失;Lx表示横坐标偏移损失,即预测锚框中文本行的横向坐标、宽度的偏移值与实际锚框中文本行的横向坐标、宽度的偏移值之间的损失函数Smooth L1Loss;λv、λconf、λx表示权重;Wherein , the first loss function is : Loss = λ v The loss function Smooth L1 Loss between the actual anchor box center point coordinates and height; L conf represents the confidence loss, that is, the binary cross entropy loss of whether the text line component is included between the preset anchor box confidence and the actual anchor box; L conf x represents the abscissa offset loss, that is, the loss function Smooth L1Loss between the offset value of the transverse coordinate and width of the text line in the predicted anchor box and the offset value of the transverse coordinate and width of the text line in the actual anchor box; λ v , λ conf , λ x represent weight;
所述文本行部件在每个所述预设锚框位置处的输出结果包括:vj、vh、si、xside,其中,vj、vh表示所述预设锚框的中心点坐标和高度,si表示预设锚框中包括的文本行部件的置信度,xside表示所述文本行部件的横向坐标和宽度的偏移值。The output results of the text line component at each of the preset anchor box positions include: v j , v h , si , x side , where v j and v h represent the center point of the preset anchor box. coordinates and height, s i represents the confidence of the text line component included in the preset anchor box, x side represents the offset value of the horizontal coordinate and width of the text line component.
作为具体实施方案地,步骤S103包括:As a specific implementation, step S103 includes:
S103-1:所述识别网络包括依次连接的特征提取网络、特征融合网络、编码网络、一层的全连接层和解码算法,如图5所示。S103-1: The recognition network includes a feature extraction network, a feature fusion network, a coding network, a fully connected layer and a decoding algorithm that are connected in sequence, as shown in Figure 5.
S103-2:所述识别网络的初始学习率为0.0001,优化器Adam的贝塔值为(0.9,0.999),将所述数据和所述扩充数据投入到所述识别网络进行训练;S103-2: The initial learning rate of the recognition network is 0.0001, and the beta value of the optimizer Adam is (0.9, 0.999). The data and the expanded data are put into the recognition network for training;
在所述识别网络的前向传播过程中,将大小为H×W的图像通过所述特征提取网络进行特征提取,得到第一特征;In the forward propagation process of the recognition network, the image with a size of H×W is extracted through the feature extraction network to obtain the first feature;
再通过所述特征融合网络对所述第一特征进行融合,并对融合后的所述第一特征进行采样使融合后的所述第一特征的高度为1,得到第二特征;Then, the first features are fused through the feature fusion network, and the fused first features are sampled so that the height of the fused first features is 1 to obtain the second features;
将所述第二特征输入到所述编码网络进行编码得到编码特征;Input the second feature into the encoding network for encoding to obtain encoding features;
将所述编码特征输入到所述全连接层进行解码,得到解码结果;Input the encoding features into the fully connected layer for decoding, and obtain the decoding result;
最后通过所述解码算法对所述解码结果进行对齐得到识别结果;Finally, the decoding results are aligned through the decoding algorithm to obtain the recognition result;
其中,所述特征提取网络为Resnet50网络,所述特征融合网络为FPEM(FeaturePyramid Enhancement Module,特征金字塔增强模块)网络,所述编码网络为Encoder网络,所述解码算法为CTC(ConnectionistTemporal Classification,连接时序分类)算法,所述CTC算法的损失函数为Y表示所述解码结果,Y'表示经过正确标注的所述识别结果,t表示所述编码特征的序列长度,k表示所述CTC网络的对齐函数,C:k(c)=Y'表示集合C中的所有序列c都可以通过CTC算法得到正确标注的识别结果Y',p表示概率,p(ct|Y)表示在Y的前提下得到长度为t的序列ct的概率。Wherein, the feature extraction network is a Resnet50 network, the feature fusion network is a FPEM (FeaturePyramid Enhancement Module, feature pyramid enhancement module) network, the encoding network is an Encoder network, and the decoding algorithm is CTC (Connectionist Temporal Classification, connection timing Classification) algorithm, the loss function of the CTC algorithm is Y represents the decoding result, Y' represents the correctly annotated recognition result, t represents the sequence length of the encoding feature, k represents the alignment function of the CTC network, C:k(c)=Y' represents the set All sequences c in C can obtain the correctly labeled recognition result Y' through the CTC algorithm, p represents the probability, and p(c t |Y) represents the probability of obtaining the sequence c t of length t under the premise of Y.
Resnet50网络为提取图像视觉特征的残差网络,FPEM网络为将多阶段图像视觉特征进行融合的卷积网络,通过融合多阶段特征可以增大模型的感受野从而提升模型的准确性。Encoder网络为基于自注意力机制的特征编码网络,采用自注意力机制可以使得模型更准确的提取特征中的有效消息,从而提升文本识别模型的鲁棒性。CTC算法为输出序列的解码算法,例如输出序列为“cccaaat”,经过CTC算法对齐后就是“cat”。The Resnet50 network is a residual network that extracts image visual features, and the FPEM network is a convolutional network that fuses multi-stage image visual features. By fusing multi-stage features, the receptive field of the model can be increased and the accuracy of the model can be improved. The Encoder network is a feature encoding network based on the self-attention mechanism. The use of the self-attention mechanism allows the model to more accurately extract effective information from the features, thus improving the robustness of the text recognition model. The CTC algorithm is a decoding algorithm for the output sequence. For example, the output sequence is "cccaaat", and after alignment with the CTC algorithm, it is "cat".
Encoder网络为同时在自然语言处理与计算机视觉领域广泛应用的模型Transformer中的编码器部分,该部分模型得益于可叠加的编码模块拥有优秀的特征捕捉性能,编码模块包含Multi-Head Attention(多头注意力)和Feed Forward(前馈)两个部分,其中的Multi-Head Attention部分数学表示如下:The Encoder network is the encoder part of the model Transformer that is widely used in the fields of natural language processing and computer vision. This part of the model benefits from the superpositionable encoding module and has excellent feature capture performance. The encoding module includes Multi-Head Attention (multi-head attention). Attention) and Feed Forward (feed forward) two parts, the mathematical expression of the Multi-Head Attention part is as follows:
Multi-Head Attention(x)=x+Self-Attention(FC(x),FC(x),FC(x));Multi-Head Attention(x)=x+Self-Attention(FC(x),FC(x),FC(x));
其中Encoder的输入分别经过3层全连接层FC后作为Q,K,V输入Self-Attention(自注意力)模块中,dk为输入的维度,T表示矩阵转置;前馈部分由1层全连接层FC、1层Relu激活函数与1层全连接层FC构成。The input of the Encoder is input into the Self-Attention (self-attention) module as Q, K, V after passing through three layers of fully connected layers FC respectively. d k is the dimension of the input, and T represents the matrix transpose; the feedforward part consists of layer 1 It is composed of a fully connected layer FC, a layer of Relu activation function and a layer of fully connected layer FC.
S103-3:得到所述识别结果后,通过所述CTC算法的损失函数计算所述识别网络的损失,再使用优化器Adam对识别网络的参数进行更新,再将所述数据和所述扩充数据投入到更新参数后的识别网络进行训练,反复重复这一过程,直至得到最佳识别结果,保存所述最佳识别结果对应的最佳模型参数,即得到所述文本识别模型。S103-3: After obtaining the recognition result, calculate the loss of the recognition network through the loss function of the CTC algorithm, then use the optimizer Adam to update the parameters of the recognition network, and then combine the data and the expanded data Invest in the recognition network after updated parameters for training, repeat this process repeatedly until the best recognition result is obtained, and save the best model parameters corresponding to the best recognition result to obtain the text recognition model.
图6为本发明实施例提供的文本行定位、文本识别、结构化提取的流程示意图,将单幅票证图像输入加载了最佳参数的文本行位置检测模型(CTPN模型)中,得到文本行检测结果,并通过置信度阈值过滤掉多余的文本框,得到图像上关键区域的文本定位框。Figure 6 is a schematic flowchart of text line positioning, text recognition, and structured extraction provided by an embodiment of the present invention. A single ticket image is input into a text line position detection model (CTPN model) loaded with optimal parameters to obtain text line detection. As a result, redundant text boxes are filtered out through the confidence threshold, and the text positioning box of the key area on the image is obtained.
对文本行内容进行识别时,一般将文本行图像的高度调至32像素再送入所述文本识别模型进行识别,具体包括:(1)将文本行图像保持原长宽比进行缩放,缩放后图像高度h'=32,图像宽度w'=w×(h'/h),其中w,h为图像原宽度和高度。(2)将单张图像输入加载了最佳参数的文本识别模型中,得到识别向量。(3)通过CTC解码算法对识别向量进行处理,得到置信度最高的文本序列。When identifying the content of a text line, the height of the text line image is generally adjusted to 32 pixels and then sent to the text recognition model for recognition. This specifically includes: (1) scaling the text line image while maintaining the original aspect ratio. The scaled image Height h'=32, image width w'=w×(h'/h), where w and h are the original width and height of the image. (2) Input a single image into the text recognition model loaded with optimal parameters to obtain the recognition vector. (3) Process the identification vector through the CTC decoding algorithm to obtain the text sequence with the highest confidence.
然后进行结构化提取,获取有效信息,包括:(1)计算各个关键词与文本识别结果的编辑距离,编辑距离越大,匹配度越低;(2)生成编辑距离矩阵,为每个关键词找到编辑距离最小的配对;(3)根据配对确定关键词在识别结果中的位置,获得文本内容。最后将定位到关键信息提取并按照对应的类型组织成结构化数据输出,若未匹配到关键词则使用统计得到的缺省值补充。Then perform structured extraction to obtain effective information, including: (1) Calculating the edit distance between each keyword and the text recognition result. The larger the edit distance, the lower the matching degree; (2) Generating an edit distance matrix for each keyword Find the pairing with the smallest edit distance; (3) Determine the position of the keyword in the recognition result based on the pairing, and obtain the text content. Finally, the key information will be extracted and organized into structured data output according to the corresponding type. If the keyword is not matched, the default value obtained from statistics will be used to supplement it.
以上为本公开示范性实施例,本公开的保护范围由权利要求书及其等效物限定。The above are exemplary embodiments of the present disclosure, and the protection scope of the present disclosure is defined by the claims and their equivalents.
Claims (3)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110265378.4A CN112818951B (en) | 2021-03-11 | 2021-03-11 | A method of ticket identification |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110265378.4A CN112818951B (en) | 2021-03-11 | 2021-03-11 | A method of ticket identification |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112818951A CN112818951A (en) | 2021-05-18 |
CN112818951B true CN112818951B (en) | 2023-11-21 |
Family
ID=75863141
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110265378.4A Active CN112818951B (en) | 2021-03-11 | 2021-03-11 | A method of ticket identification |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112818951B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113255645B (en) * | 2021-05-21 | 2024-04-23 | 北京有竹居网络技术有限公司 | Text line picture decoding method, device and equipment |
CN113255646B (en) * | 2021-06-02 | 2022-10-18 | 北京理工大学 | A real-time scene text detection method |
CN113298179B (en) * | 2021-06-15 | 2024-05-28 | 南京大学 | Customs commodity abnormal price detection method and device |
CN113657377B (en) * | 2021-07-22 | 2023-11-14 | 西南财经大学 | Structured recognition method for mechanical bill image |
CN113591772B (en) * | 2021-08-10 | 2024-01-19 | 上海杉互健康科技有限公司 | Method, system, equipment and storage medium for structured identification and input of medical information |
CN115019327B (en) * | 2022-06-28 | 2024-03-08 | 珠海金智维信息科技有限公司 | Fragment bill recognition method and system based on fragment bill segmentation and Transformer network |
CN115713777A (en) * | 2023-01-06 | 2023-02-24 | 山东科技大学 | Contract document content identification method |
CN116912852B (en) * | 2023-07-25 | 2024-10-01 | 京东方科技集团股份有限公司 | Method, device and storage medium for identifying text of business card |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108446621A (en) * | 2018-03-14 | 2018-08-24 | 平安科技(深圳)有限公司 | Bank slip recognition method, server and computer readable storage medium |
CN108921166A (en) * | 2018-06-22 | 2018-11-30 | 深源恒际科技有限公司 | Medical bill class text detection recognition method and system based on deep neural network |
CN110097049A (en) * | 2019-04-03 | 2019-08-06 | 中国科学院计算技术研究所 | A kind of natural scene Method for text detection and system |
CN110263694A (en) * | 2019-06-13 | 2019-09-20 | 泰康保险集团股份有限公司 | A kind of bank slip recognition method and device |
CN110399845A (en) * | 2019-07-29 | 2019-11-01 | 上海海事大学 | A method for detecting and recognizing text in continuous segments in images |
CN110807455A (en) * | 2019-09-19 | 2020-02-18 | 平安科技(深圳)有限公司 | Bill detection method, device, device and storage medium based on deep learning |
CN110866495A (en) * | 2019-11-14 | 2020-03-06 | 杭州睿琪软件有限公司 | Bill image recognition method, bill image recognition device, bill image recognition equipment, training method and storage medium |
CN111340034A (en) * | 2020-03-23 | 2020-06-26 | 深圳智能思创科技有限公司 | Text detection and identification method and system for natural scene |
CN111832423A (en) * | 2020-06-19 | 2020-10-27 | 北京邮电大学 | A kind of bill information identification method, device and system |
CN112115934A (en) * | 2020-09-16 | 2020-12-22 | 四川长虹电器股份有限公司 | Bill image text detection method based on deep learning example segmentation |
-
2021
- 2021-03-11 CN CN202110265378.4A patent/CN112818951B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108446621A (en) * | 2018-03-14 | 2018-08-24 | 平安科技(深圳)有限公司 | Bank slip recognition method, server and computer readable storage medium |
WO2019174130A1 (en) * | 2018-03-14 | 2019-09-19 | 平安科技(深圳)有限公司 | Bill recognition method, server, and computer readable storage medium |
CN108921166A (en) * | 2018-06-22 | 2018-11-30 | 深源恒际科技有限公司 | Medical bill class text detection recognition method and system based on deep neural network |
CN110097049A (en) * | 2019-04-03 | 2019-08-06 | 中国科学院计算技术研究所 | A kind of natural scene Method for text detection and system |
CN110263694A (en) * | 2019-06-13 | 2019-09-20 | 泰康保险集团股份有限公司 | A kind of bank slip recognition method and device |
CN110399845A (en) * | 2019-07-29 | 2019-11-01 | 上海海事大学 | A method for detecting and recognizing text in continuous segments in images |
CN110807455A (en) * | 2019-09-19 | 2020-02-18 | 平安科技(深圳)有限公司 | Bill detection method, device, device and storage medium based on deep learning |
CN110866495A (en) * | 2019-11-14 | 2020-03-06 | 杭州睿琪软件有限公司 | Bill image recognition method, bill image recognition device, bill image recognition equipment, training method and storage medium |
CN111340034A (en) * | 2020-03-23 | 2020-06-26 | 深圳智能思创科技有限公司 | Text detection and identification method and system for natural scene |
CN111832423A (en) * | 2020-06-19 | 2020-10-27 | 北京邮电大学 | A kind of bill information identification method, device and system |
CN112115934A (en) * | 2020-09-16 | 2020-12-22 | 四川长虹电器股份有限公司 | Bill image text detection method based on deep learning example segmentation |
Non-Patent Citations (5)
Title |
---|
financial ticket intelligent recognition system based on deep learning;fukang tian等;arxiv;1-15 * |
ticket text detection and recognition based on deep learning;xiuxin chen等;2019 chinese automation congress;1-5 * |
基于深度学习的自然场景文本检测与识别综述;王建新;王子亚;田萱;;软件学报;第31卷(第05期);1465-1496 * |
基于深度学习的表格类型工单识别设计与实现;潘炜;刘丰威;;数字技术与应用;第38卷(第07期);150-152 * |
基于高分辨率卷积神经网络的场景文本检测模型;陈淼妙;续晋华;;计算机应用与软件;第37卷(第10期);138-144 * |
Also Published As
Publication number | Publication date |
---|---|
CN112818951A (en) | 2021-05-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112818951B (en) | A method of ticket identification | |
CN108664996B (en) | A method and system for ancient text recognition based on deep learning | |
US20230106873A1 (en) | Text extraction method, text extraction model training method, electronic device and storage medium | |
CN107729865A (en) | A kind of handwritten form mathematical formulae identified off-line method and system | |
CN112464781A (en) | Document image key information extraction and matching method based on graph neural network | |
CN115311463B (en) | Category-guided multi-scale decoupling marine remote sensing image text retrieval method and system | |
CN109635808B (en) | A Method for Extracting Chinese Keyword and Context in Natural Scene Images | |
CN111680684B (en) | Spine text recognition method, device and storage medium based on deep learning | |
CN114647715A (en) | Entity recognition method based on pre-training language model | |
CN106127222A (en) | The similarity of character string computational methods of a kind of view-based access control model and similarity determination methods | |
CN111488732A (en) | Deformed keyword detection method, system and related equipment | |
CN109460725A (en) | Receipt consumption details content mergence and extracting method | |
Tang et al. | HRCenterNet: an anchorless approach to Chinese character segmentation in historical documents | |
CN114898372A (en) | Vietnamese scene character detection method based on edge attention guidance | |
CN117373042A (en) | Card image structuring processing method and device | |
Essa et al. | Enhanced technique for Arabic handwriting recognition using deep belief network and a morphological algorithm for solving ligature segmentation | |
CN116645694A (en) | Text-target retrieval method based on dynamic self-evolution information extraction and alignment | |
CN115019319A (en) | A Structured Image Content Recognition Method Based on Dynamic Feature Extraction | |
CN114419636A (en) | Text recognition method, device, equipment and storage medium | |
CN116343237A (en) | Bill identification method based on deep learning and knowledge graph | |
CN111242060B (en) | Method and system for extracting key information of document image | |
CN117975216A (en) | Salient object detection method based on multi-modal feature refinement and fusion | |
CN116110047A (en) | Method and system for constructing structured electronic medical records based on OCR-NER | |
CN116580388A (en) | End-to-end text recognition method | |
CN113221885A (en) | Hierarchical modeling method and system based on whole words and radicals |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |