CN112464845B - Bill recognition method, equipment and computer storage medium - Google Patents

Bill recognition method, equipment and computer storage medium Download PDF

Info

Publication number
CN112464845B
CN112464845B CN202011415040.4A CN202011415040A CN112464845B CN 112464845 B CN112464845 B CN 112464845B CN 202011415040 A CN202011415040 A CN 202011415040A CN 112464845 B CN112464845 B CN 112464845B
Authority
CN
China
Prior art keywords
result
image
character recognition
model
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011415040.4A
Other languages
Chinese (zh)
Other versions
CN112464845A (en
Inventor
朱焱
姜浩
蔡权雄
牛昕宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Industry Research Kunyun Artificial Intelligence Research Institute Co ltd
Original Assignee
Shandong Industry Research Kunyun Artificial Intelligence Research Institute Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Industry Research Kunyun Artificial Intelligence Research Institute Co ltd filed Critical Shandong Industry Research Kunyun Artificial Intelligence Research Institute Co ltd
Priority to CN202011415040.4A priority Critical patent/CN112464845B/en
Publication of CN112464845A publication Critical patent/CN112464845A/en
Application granted granted Critical
Publication of CN112464845B publication Critical patent/CN112464845B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/232Orthographic correction, e.g. spell checking or vowelisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/98Detection or correction of errors, e.g. by rescanning the pattern or by human intervention; Evaluation of the quality of the acquired patterns
    • G06V10/993Evaluation of the quality of the acquired pattern
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/63Scene text, e.g. street names
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Quality & Reliability (AREA)
  • Character Discrimination (AREA)

Abstract

The invention discloses a bill identification method, equipment and a computer storage medium, wherein the method comprises the following steps: preprocessing an image to be recognized to generate an image preprocessing result; inputting the image preprocessing result into a text positioning model, detecting and positioning the text in the image preprocessing result, and generating a text positioning result; inputting the text positioning result into a character recognition model, and performing character recognition on the text in the text positioning result to generate a character recognition result; and inputting the character recognition result into a semantic correction model, and performing semantic correction on characters in the character recognition result to generate a final recognition result. The invention solves the problem of poor identification effect of the stylus-printed bills, effectively reduces the occupancy rate of computer resources, and improves the training efficiency, thereby improving the identification accuracy rate of the stylus-printed bills.

Description

Bill recognition method, equipment and computer storage medium
Technical Field
The invention relates to the field of image recognition, in particular to a bill recognition method, bill recognition equipment and a computer storage medium.
Background
With the improvement of the medical level of the modern society, a large amount of medical invoice data are stored and processed by a recording computer every day. The traditional mode is that the information in the bill is input into the computer by manpower, and the computer is high in cost and low in efficiency, and the input work task is heavy and high in strength, so that the input personnel are easy to fatigue to cause work errors. With the continuous development of the optical character recognition technology in the field of pattern recognition, the recognition rate is improved to a great extent, text information in medical invoices can be recognized quickly and accurately, and the method plays a key role in automatic recognition and warehousing of the invoices.
At present, the main text recognition methods include a template matching method, a geometric feature extraction method and the like, and the recognition methods have certain limitations, so that the recognition accuracy is low, and the recognition effect is poor particularly under the condition of noise. In addition, the existing bill identification methods are all directed at standard printing bodies, the fonts of the standard printing bodies are standard, have no break points and are easy to identify, and the medical institutions mainly use needle printers to print, and the break points exist in the characters printed by the needle printers. The general text recognition method is no longer applicable.
Disclosure of Invention
In view of the above, a bill identification method, a bill identification device and a computer storage medium are provided to solve the problem of poor identification effect of needle-printed medical bills.
The embodiment of the application provides a bill identification method, which comprises the following steps:
preprocessing an image to be recognized to generate an image preprocessing result;
inputting the image preprocessing result into a text positioning model, detecting and positioning the text in the image preprocessing result, and generating a text positioning result;
inputting the text positioning result into a character recognition model, and performing character recognition on the text in the text positioning result to generate a character recognition result;
and inputting the character recognition result into a semantic correction model, and performing semantic correction on characters in the character recognition result to generate a final recognition result.
In an embodiment, the performing a preprocessing operation on the image to be recognized to generate an image preprocessing result includes:
carrying out image correction on the image to be recognized;
carrying out graying processing on the result after the image correction;
performing threshold segmentation on the grayed result;
and generating an image preprocessing result.
In one embodiment, the training process of the text positioning model includes:
constructing a bill data set;
and training the improved Faster-RCNN by using the bill data set to generate a text positioning model.
In one embodiment, the modified Faster-RCNN comprises:
extracting local features and global features of the bill data set by adopting a multi-scale convolution kernel; and the number of the first and second electrodes,
and updating the bill data set weight by using a learning rate error by adopting a self-adaptive learning strategy.
In one embodiment, the multi-scale convolution kernel includes replacing a 3 × 3 fixed-size convolution kernel in the original fast-RCNN model with a predetermined number of 1 × 1 and 3 × 3 convolution kernels.
In one embodiment, the training process of the character recognition model includes:
constructing a character data set;
and training the improved Alexnet by using the character data set to generate a character recognition model.
In one embodiment, the modified Alexnet network includes:
replacing the convolution kernel of 11 × 11 of the 1 st convolution layer in the original Alexnet network by the convolution kernel of 9 × 9; and is
Two 3 x 3 convolution kernels were used to replace the 5 x 5 convolution kernels of the 2 nd convolution layer in the original Alexnet network.
In one embodiment, the training process of the semantic modification model includes:
acquiring a preset number of medical terms, labeling the medical terms, and generating a medical term corpus;
and training the RNN by using the medical term corpus to generate a semantic correction model.
In an embodiment, the inputting the character recognition result into a semantic correction model, performing semantic correction on the characters in the character recognition result, and generating a final recognition result includes:
when the character recognition result is consistent with the output result of the semantic correction model, the character recognition result is a final recognition result; or the like, or, alternatively,
and when the character recognition result is inconsistent with the output result of the semantic correction model, the output result of the semantic correction model is the final recognition result.
To achieve the above object, there is also provided a computer-readable storage medium having stored thereon a ticket recognition method program which, when executed by a processor, implements the steps of any of the methods described above.
In order to achieve the above object, there is also provided a bill identifying apparatus, including a memory, a processor, and a bill identifying method program stored in the memory and executable on the processor, where the processor implements the steps of any one of the above methods when executing the bill identifying method program.
One or more technical solutions provided in the embodiments of the present application have at least the following technical effects or advantages: preprocessing an image to be recognized to generate an image preprocessing result; the step is to prepare data for the subsequent steps, and process the image to be recognized into an input format which accords with the text positioning model so as to ensure the accuracy of the text positioning model. Inputting the image preprocessing result into a text positioning model, detecting and positioning the text in the image preprocessing result, and generating a text positioning result; the text positioning model in the step has stronger capability of capturing image features and stronger text detection capability. Inputting the text positioning result into a character recognition model, and performing character recognition on the text in the text positioning result to generate a character recognition result; the character recognition model in the step has strong character feature extraction capability, can extract the features of the characters from the character image, and realizes the accurate recognition of the characters. And inputting the character recognition result into a semantic correction model, and performing semantic correction on characters in the character recognition result to generate a final recognition result. The step can carry out semantic correction on the character recognition result, and further ensure the correctness of the final recognition result. The invention solves the problem of poor identification effect of the stylus-printed bills, effectively reduces the occupancy rate of computer resources, and improves the training efficiency, thereby improving the identification accuracy rate of the stylus-printed bills.
Drawings
Fig. 1 is a schematic hardware architecture diagram of a bill identification method according to an embodiment of the present application;
FIG. 2 is a schematic flow chart of a first embodiment of a bill identification method according to the present application;
FIG. 3 is a schematic flow chart of a bill identification method according to the present application;
FIG. 4 is a schematic flowchart of detailed steps of step S110 in the first embodiment of the document identification method of the present application;
FIG. 5 is a flowchart illustrating the detailed steps of step S120 in the first embodiment of the document identification method of the present application;
FIG. 6 is a schematic diagram of a multi-scale convolution of a bill identification method according to the present application;
FIG. 7 is a flowchart illustrating the detailed step of step S130 in the first embodiment of the document identification method of the present application;
FIG. 8 is a schematic view of a character recognition process of the bill recognition method of the present application;
FIG. 9 is a flowchart illustrating the detailed steps of step S140 in the first embodiment of the document identification method of the present application;
fig. 10 is a flowchart illustrating a second embodiment of the ticket recognition method according to the present application.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The main solution of the embodiment of the invention is as follows: preprocessing an image to be recognized to generate an image preprocessing result; inputting the image preprocessing result into a text positioning model, detecting and positioning the text in the image preprocessing result, and generating a text positioning result; inputting the text positioning result into a character recognition model, and performing character recognition on the text in the text positioning result to generate a character recognition result; and inputting the character recognition result into a semantic correction model, and performing semantic correction on characters in the character recognition result to generate a final recognition result. The invention solves the problem of poor identification effect of the stylus-printed bills, effectively reduces the occupancy rate of computer resources, and improves the training efficiency, thereby improving the identification accuracy rate of the stylus-printed bills.
In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.
The application relates to a bill identifying device 010 includes as shown in fig. 1: at least one processor 012, memory 011.
The processor 012 may be an integrated circuit chip having signal processing capability. In implementation, the steps of the method may be performed by hardware integrated logic circuits or instructions in the form of software in the processor 012. The processor 012 may be a general-purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, or discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 011, and the processor 012 reads the information in the memory 011 and completes the steps of the method in combination with the hardware.
It is to be understood that the memory 011 in embodiments of the present invention can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. The non-volatile Memory may be a Read Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. The volatile Memory may be a Random Access Memory (RAM) which serves as an external cache. By way of illustration and not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic random access memory (Synchronous DRAM, SDRAM), Double data rate Synchronous Dynamic random access memory (ddr DRAM), Enhanced Synchronous SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The memory 011 of the systems and methods described in connection with the embodiments of the invention is intended to comprise, without being limited to, these and any other suitable types of memory.
Referring to fig. 2, fig. 2 is a first embodiment of a bill identification method of the present application, which includes the following steps:
step S110: and carrying out preprocessing operation on the image to be recognized to generate an image preprocessing result.
The image to be recognized may be a needle-printed ticket, and may be a medical ticket, an invoice, a receipt, a shopping receipt, or the like, without limitation.
The preprocessing operation is to perform uniform processing on the image to be recognized so that the image to be recognized conforms to a preset format so as to be conveniently input into a model for detection.
The preprocessing operation may include image correction, graying processing, and threshold segmentation, or may be other preprocessing methods, which are not limited herein.
Step S120: and inputting the image preprocessing result into a text positioning model, detecting and positioning the text in the image preprocessing result, and generating a text positioning result.
When identifying characters on an image, the position of a text region needs to be determined, and the process of finding out the character region from the image is called text positioning.
The text positioning model takes the image preprocessing result as input and carries out accurate detection and positioning on the text in the image preprocessing result.
The text positioning result includes attributes such as the position, size, number, etc. of the text region, and is not limited herein.
Step S130: and inputting the text positioning result into a character recognition model, and performing character recognition on the text in the text positioning result to generate a character recognition result.
The character recognition model may be to recognize characters in the text positioning result.
Step S140: and inputting the character recognition result into a semantic correction model, and performing semantic correction on characters in the character recognition result to generate a final recognition result.
And a semantic correction step is added after the character recognition result, so that the recognition correctness is further ensured.
Fig. 3 is a schematic diagram showing the whole process of the bill identification method of the present application.
The beneficial effects of the above embodiment are as follows: preprocessing an image to be recognized to generate an image preprocessing result; the method comprises the following steps of preparing data for the subsequent steps, and processing the image to be recognized into an input format conforming to the text positioning model so as to ensure the accuracy of the text positioning model. Inputting the image preprocessing result into a text positioning model, detecting and positioning the text in the image preprocessing result, and generating a text positioning result; the text positioning model in the step has stronger capability of capturing image features and stronger text detection capability. Inputting the text positioning result into a character recognition model, and performing character recognition on the text in the text positioning result to generate a character recognition result; the character recognition model in the step has strong character feature extraction capability, can extract the features of the characters from the character image, and realizes the accurate recognition of the characters. And inputting the character recognition result into a semantic correction model, and performing semantic correction on characters in the character recognition result to generate a final recognition result. The step can carry out semantic correction on the character recognition result, and further ensure the correctness of the final recognition result. The invention solves the problem of poor identification effect of needle-printed medical bills, effectively reduces the occupancy rate of computer resources, and improves the training efficiency, thereby improving the identification accuracy rate of the needle-printed medical bills.
Referring to fig. 4, fig. 4 is a detailed refinement step of step S110 in the first embodiment of the document identification method of the present application, where the preprocessing operation is performed on the image to be identified to generate an image preprocessing result, and the method includes:
step S111: and carrying out image correction on the image to be recognized.
Image Correction (Image Correction) refers to restoration processing performed on a distorted Image. The reasons for image distortion are: image distortion due to aberrations, distortion, bandwidth limitations, etc. of the imaging system; geometric distortion of the image due to imaging device pose and scanning non-linearity; image distortion due to motion blur, radiation distortion, introduction of noise, etc. The basic idea of image correction is to build a corresponding mathematical model based on the cause of image distortion, extract the required information from the contaminated or distorted image signal, and restore the original image of the image along the inverse process of the image distortion. The actual restoration process is to design a filter that calculates an estimate of the true image from the distorted image to maximize its proximity to the true image based on a predetermined error criterion.
Step S112: and carrying out graying processing on the result after the image correction.
The graying processing changes the colorful medical invoice image into a single-channel image, so that the subsequent threshold segmentation is facilitated.
Step S113: and performing threshold segmentation on the grayed result.
The threshold segmentation method is an image segmentation technology based on regions, and the principle is to divide image pixel points into a plurality of classes. The image thresholding segmentation is the most common traditional image segmentation method, and becomes the most basic and widely applied segmentation technology in image segmentation due to simple implementation, small calculation amount and stable performance. It is particularly suitable for images where the object and background occupy different gray scale ranges. It not only can compress a great amount of data, but also greatly simplifies the analysis and processing steps, and thus is a necessary image preprocessing process before image analysis, feature extraction and pattern recognition in many cases. The purpose of image thresholding is to divide the set of pixels by gray level, each resulting subset forming a region corresponding to the real scene, each region having consistent properties within it, while adjacent regions do not have such consistent properties. Such a division can be achieved by choosing one or more threshold values from the grey scale.
The threshold segmentation may employ Otsu algorithm, Niblack algorithm, and Kittlerr algorithm. And is not limited herein.
Step S114: and generating an image preprocessing result.
And finally generating an image preprocessing result through preprocessing operations such as image correction, gray processing, threshold segmentation and the like.
The image preprocessing process can also comprise image denoising processing and breakpoint processing on a result obtained after threshold segmentation, wherein the core of the breakpoint processing is that Gaussian blur and image enhancement operations are carried out on an image so as to eliminate breakpoints in characters.
The beneficial effects existing in the above embodiment are as follows: the step of preprocessing the image to be recognized and generating the image preprocessing result is specifically provided, so that the generated image preprocessing result is more accurate and is a data guarantee for the correctness of subsequent text positioning and character recognition.
Referring to fig. 5, fig. 5 is a detailed step of step S120 in the first embodiment of the document identification method of the present application, and the training process of the text positioning model includes:
step S121: and constructing a bill data set.
The tickets can be medical tickets, issue tickets, receipts, shopping tickets, wherein each ticket database contains the same category of tickets, i.e. if the tickets are medical tickets, all the tickets contained in the corresponding ticket database are medical related tickets.
The collection of the ticket may be a collection performed in a medical system.
Step S122: and training the improved Faster-RCNN by using the bill data set to generate a text positioning model.
The Fast-RCNN is an object detection algorithm, and provides an RPN (region proxy network) region generation network on the basis of the Fast-RCNN, so that the object detection speed is greatly improved.
The improved Faster-RCNN can be trained by utilizing the bill data set to generate a text positioning model.
The beneficial effects existing in the above embodiment are as follows: specifically, a training process of the text positioning model is given, and the training effect of the text positioning model is guaranteed.
In one embodiment, the modified Faster-RCNN comprises:
extracting local features and global features of the bill data set by adopting a multi-scale convolution kernel; and the number of the first and second electrodes,
and updating the bill data set weight by using a learning rate error by adopting a self-adaptive learning strategy.
The multi-scale convolution kernel extracts feature information of different scales in the image by using convolution kernels of different sizes in the same convolution layer. Meanwhile, abstraction of the characteristic information is increased through continuous deepening of the network layer number, so that the description capacity of the characteristic information on the target is improved.
The local feature may be a local expression of an image feature, which reflects local characteristics of the image and is suitable for matching, searching and other applications of the image.
The global feature may refer to a feature that can represent the whole image, and the global feature is relative to the local feature of the image and is used for describing the whole features such as the color and the shape of the image or the object.
The adaptive learning strategy has the advantage that the learning rate varies slightly with each iteration, decreasing as loss decreases and increasing as loss increases. The larger learning rate is beneficial to jumping out of the local minimum value and reaching the global minimum point, so that the network can find the direction with the fastest gradient drop more quickly.
The beneficial effects existing in the above embodiment are as follows: the multi-scale convolution kernel can effectively fuse the features of adjacent regions with different sizes of the image, the large-scale convolution kernel extracts the global features of the image, and the small-scale convolution kernel extracts the local features of the image, so that the capability of capturing the image features by a network is stronger, and the text detection capability of the model is greatly improved.
In one embodiment, the multi-scale convolution kernel includes replacing a 3 × 3 fixed-size convolution kernel in the original fast-RCNN model with a preset number of 1 × 1 and 3 × 3 convolution kernels.
As shown in fig. 6, two convolution kernels of 1 × 1 and two convolution kernels of 3 × 3 may be used to replace the convolution kernel of 3 × 3 fixed size in the original fast-RCNN model, where the convolution result of the convolution kernel of 1 × 1 is input into the convolution kernel of 3 × 3 for convolution, and the convolution result is feature-fused with the other convolution result of 1 × 1 and the convolution result of 3 × 3, that is, the obtained global feature and the local feature are fused.
Wherein the predetermined number is not limited herein.
The beneficial effects existing in the above embodiment are as follows: and specifically, the setting of a multi-scale convolution kernel is given, the capability of feature extraction is enhanced, and the detection and positioning accuracy of the text positioning model is ensured.
Referring to fig. 7, fig. 7 is a detailed step of step S130 in the first embodiment of the document recognition method of the present application, and the training process of the character recognition model includes:
step S131: a character data set is constructed.
The characters may be notations of a language.
The character data set contains all the characters that can be collected. In an embodiment, the data set may be a character data set related to chinese, and the data set may be a chinese character included in a modern chinese dictionary.
Step S132: and training the improved Alexnet by using the character data set to generate a character recognition model.
Alexnet realizes a deep convolutional neural network structure in a large-scale image data set for the first time, and realizes a high-efficiency GPU convolutional operation structure.
The method and the device can train the improved Alexnet by utilizing the character data set to generate a character recognition model.
The beneficial effects existing in the above embodiment are as follows: after the improved Alexnet is trained, the network learns deep features of the stylus printing font, the character images can be recognized, the trained Alexnet network, namely a generated character recognition model, has strong character feature extraction capability, the features of the characters can be extracted from the character images, and the recognition of the characters is realized.
In one embodiment, the modified Alexnet network includes:
replacing the convolution kernel of 11 × 11 of the 1 st convolution layer in the original Alexnet network by the convolution kernel of 9 × 9; and is
Two 3 x 3 convolution kernels were used to replace the 5 x 5 convolution kernels of the 2 nd convolution layer in the original Alexnet network.
As shown in fig. 8, the convolution kernel size of 1 st convolution layer of Alexnet network is changed from 11 × 11 to 9 × 9; replacing the 5 x 5 convolution kernels of tier 2 with 2 convolution kernels of size 3 x 3; the number of feature maps for each layer network is reduced.
The beneficial effects of the above embodiment are as follows: the occupation of computer resources is reduced by changing the size of the convolution kernel, the number of the characteristic diagram and the convolution layer, the training efficiency is improved, and the response speed of the character recognition model is accelerated.
Referring to fig. 9, fig. 9 is a detailed step of step S140 in the first embodiment of the document identification method of the present application, and the training process of the semantic correction model includes:
step S141: acquiring a preset number of medical terms, labeling the medical terms, and generating a medical term corpus.
A preset number of medical terms included in the medical field may be obtained, and the medical terms are labeled to form a training set, i.e., a medical term corpus, required by the training model.
Step S142: and training the RNN by using the medical term corpus to generate a semantic correction model.
A sentence or a long or short sentence or phrase can be regarded as a sequence consisting of different elements related before and after, and RNN networks are well suited to handle sequence class prediction problems. The core idea of RNN is to construct a connection relationship between related events before and after through a loop structure, and predict an imminent occurrence through a previous occurrence. In the application, the content of the medical bill is often in the form of words and phrases, so that certain relation must be provided between the front and back words.
The beneficial effects of the above embodiment are as follows: and specifically, a training process of the semantic correction model is given, and the effect of the semantic correction model is ensured.
Referring to fig. 10, fig. 10 is a second embodiment of the bill identification method of the present application, including: the step of inputting the character recognition result into a semantic correction model, performing semantic correction on the characters in the character recognition result, and generating a final recognition result includes:
step S210: preprocessing an image to be recognized to generate an image preprocessing result;
step S220: inputting the image preprocessing result into a text positioning model, detecting and positioning the text in the image preprocessing result, and generating a text positioning result;
step S230: inputting the text positioning result into a character recognition model, and performing character recognition on the text in the text positioning result to generate a character recognition result;
step S240: when the character recognition result is consistent with the output result of the semantic correction model, the character recognition result is a final recognition result; or the like, or, alternatively,
and when the character recognition result is inconsistent with the output result of the semantic correction model, the output result of the semantic correction model is the final recognition result.
Compared with the first embodiment, the second embodiment includes step S240, and other steps are the same as the first embodiment and are not repeated herein.
In the embodiment, the medical bill is identified as 'vitamin' in the character recognition model because the acquired bill image is unclear or bent, and the problem cannot be solved only by using the improved Alexenet because the 'vitamin' and the 'turbulence' are really similar in shape. However, when the probability of the occurrence of the term "vitamin" in the result output by the semantic correction model is far greater than that of the term "vitamin disorder", it is determined that a recognition error exists, and the wrong result "vitamin disorder" can be corrected to be the term "vitamin".
Namely, when the character recognition result vitamin is inconsistent with the semantic correction model output result vitamin, the semantic correction model output result vitamin is the final recognition result, and the semantic correction process is completed.
In the specific implementation, a preset number of characters or a combination of characters in the character recognition result is taken as the input of a semantic correction model, the semantic correction result is compared with the character recognition result, and if the output result of the semantic correction model is consistent with the character recognition result, the character recognition result is the final recognition result; however, if the semantic correction model output result does not match the character recognition result, the semantic correction model output result is used as the final recognition result.
The present application further provides a computer-readable storage medium having stored thereon a ticket recognition method program, which when executed by a processor, implements the steps of any of the above-described methods.
The application also provides bill identification equipment, which comprises a memory, a processor and a bill identification method program which is stored on the memory and can run on the processor, wherein the processor realizes the steps of any one of the methods when executing the bill identification method program.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should be noted that in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the preferred embodiment and all changes and modifications that fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (6)

1. A method of bill identification, the method comprising:
preprocessing an image to be recognized to generate an image preprocessing result;
inputting the image preprocessing result into a text positioning model, detecting and positioning the text in the image preprocessing result, and generating a text positioning result, wherein the text positioning model is generated based on improved fast-RCNN, and the improved fast-RCNN comprises: extracting local features and global features of the bill data set by adopting a multi-scale convolution kernel, updating the weight of the bill data set by utilizing a learning rate error by adopting a self-adaptive learning strategy, wherein the multi-scale convolution kernel comprises 1 × 1 and 3 × 3 convolution kernels of preset quantity instead of a convolution kernel of 3 × 3 fixed size in an original fast-RCNN model, inputting the convolution result of the 1 × 1 convolution kernel into the convolution kernel of 3 × 3 to be convolved to obtain a convolution result, and performing feature fusion on the convolution result, the 1 × 1 convolution kernel and the 3 × 3 convolution result;
inputting the text positioning result into a character recognition model, performing character recognition on the text in the text positioning result, and generating a character recognition result, wherein the character recognition model is generated based on an improved Alexenet, and the improved Alexenet network comprises: replacing the 11 × 11 convolution kernel of the 1 st convolution layer in the original Alexnet network with a 9 × 9 convolution kernel and replacing the 5 × 5 convolution kernel of the 2 nd convolution layer in the original Alexnet network with two 3 × 3 convolution kernels;
inputting the character recognition result into a semantic correction model, performing semantic correction on characters in the character recognition result, and when the character recognition result is consistent with an output result of the semantic correction model, the character recognition result is a final recognition result, or when the character recognition result is inconsistent with the output result of the semantic correction model, the output result of the semantic correction model is the final recognition result, wherein the training process of the semantic correction model comprises the following steps: acquiring a preset number of medical terms, labeling the medical terms to generate a medical term corpus, and training the RNN by using the medical term corpus to generate a semantic correction model.
2. The bill identifying method according to claim 1, wherein the pre-processing operation on the image to be identified to generate an image pre-processing result comprises:
carrying out image correction on the image to be recognized;
carrying out graying processing on the result after the image correction;
performing threshold segmentation on the grayed result;
and generating an image preprocessing result.
3. The document recognition method according to claim 1, wherein the training process of the text orientation model comprises:
constructing a bill data set;
and training the improved Faster-RCNN by using the bill data set to generate a text positioning model.
4. The bill recognition method according to claim 1, wherein the training process of the character recognition model comprises:
constructing a character data set;
and training the improved Alexnet by using the character data set to generate a character recognition model.
5. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a ticket identification method program which, when executed by a processor, implements the steps of the method of any one of claims 1-4.
6. A bill identifying apparatus comprising a memory, a processor and a bill identifying method program stored on the memory and executable on the processor, the processor implementing the steps of the method of any one of claims 1 to 4 when executing the bill identifying method program.
CN202011415040.4A 2020-12-04 2020-12-04 Bill recognition method, equipment and computer storage medium Active CN112464845B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011415040.4A CN112464845B (en) 2020-12-04 2020-12-04 Bill recognition method, equipment and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011415040.4A CN112464845B (en) 2020-12-04 2020-12-04 Bill recognition method, equipment and computer storage medium

Publications (2)

Publication Number Publication Date
CN112464845A CN112464845A (en) 2021-03-09
CN112464845B true CN112464845B (en) 2022-09-16

Family

ID=74801144

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011415040.4A Active CN112464845B (en) 2020-12-04 2020-12-04 Bill recognition method, equipment and computer storage medium

Country Status (1)

Country Link
CN (1) CN112464845B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113012265B (en) * 2021-04-22 2024-04-30 中国平安人寿保险股份有限公司 Method, apparatus, computer device and medium for generating needle-type printed character image
CN113435437A (en) * 2021-06-24 2021-09-24 随锐科技集团股份有限公司 Method and device for identifying state of switch on/off indicator and storage medium
CN113807416B (en) * 2021-08-30 2024-04-05 国泰新点软件股份有限公司 Model training method and device, electronic equipment and storage medium
CN114328831A (en) * 2021-12-24 2022-04-12 江苏银承网络科技股份有限公司 Bill information identification and error correction method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107844740A (en) * 2017-09-05 2018-03-27 中国地质调查局西安地质调查中心 A kind of offline handwriting, printing Chinese character recognition methods and system
CN110147788A (en) * 2019-05-27 2019-08-20 东北大学 A kind of metal plate and belt Product labelling character recognition method based on feature enhancing CRNN

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108446621A (en) * 2018-03-14 2018-08-24 平安科技(深圳)有限公司 Bank slip recognition method, server and computer readable storage medium
SG10201904825XA (en) * 2019-05-28 2019-10-30 Alibaba Group Holding Ltd Automatic optical character recognition (ocr) correction
CN111062397A (en) * 2019-12-18 2020-04-24 厦门商集网络科技有限责任公司 Intelligent bill processing system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107844740A (en) * 2017-09-05 2018-03-27 中国地质调查局西安地质调查中心 A kind of offline handwriting, printing Chinese character recognition methods and system
CN110147788A (en) * 2019-05-27 2019-08-20 东北大学 A kind of metal plate and belt Product labelling character recognition method based on feature enhancing CRNN

Also Published As

Publication number Publication date
CN112464845A (en) 2021-03-09

Similar Documents

Publication Publication Date Title
CN112464845B (en) Bill recognition method, equipment and computer storage medium
JP6831480B2 (en) Text detection analysis methods, equipment and devices
RU2695489C1 (en) Identification of fields on an image using artificial intelligence
CN110647829A (en) Bill text recognition method and system
CN110490081B (en) Remote sensing object interpretation method based on focusing weight matrix and variable-scale semantic segmentation neural network
Wilkinson et al. Neural Ctrl-F: segmentation-free query-by-string word spotting in handwritten manuscript collections
CN111353491B (en) Text direction determining method, device, equipment and storage medium
CN112395996A (en) Financial bill OCR recognition and image processing method, system and readable storage medium
CN110321913B (en) Text recognition method and device
CN113158895B (en) Bill identification method and device, electronic equipment and storage medium
CN113963147B (en) Key information extraction method and system based on semantic segmentation
CN114596566B (en) Text recognition method and related device
CN110909809A (en) Card image identification method based on deep learning
WO2021051553A1 (en) Certificate information classification and positioning method and apparatus
CN111626177A (en) PCB element identification method and device
CN116311279A (en) Sample image generation, model training and character recognition methods, equipment and media
CN112949455A (en) Value-added tax invoice identification system and method
Al Ghamdi A novel approach to printed Arabic optical character recognition
CN116343237A (en) Bill identification method based on deep learning and knowledge graph
CN117274969A (en) Seal identification method, device, equipment and medium
KR102026280B1 (en) Method and system for scene text detection using deep learning
CN115223183A (en) Information extraction method and device and electronic equipment
CN115311666A (en) Image-text recognition method and device, computer equipment and storage medium
CN111738248B (en) Character recognition method, training method of character decoding model and electronic equipment
CN114443834A (en) Method and device for extracting license information and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant