CN112464845B

CN112464845B - Bill recognition method, equipment and computer storage medium

Info

Publication number: CN112464845B
Application number: CN202011415040.4A
Authority: CN
Inventors: 朱焱; 姜浩; 蔡权雄; 牛昕宇
Original assignee: Shandong Industry Research Kunyun Artificial Intelligence Research Institute Co ltd
Current assignee: Shandong Industry Research Kunyun Artificial Intelligence Research Institute Co ltd
Priority date: 2020-12-04
Filing date: 2020-12-04
Publication date: 2022-09-16
Anticipated expiration: 2040-12-04
Also published as: CN112464845A

Abstract

The invention discloses a bill identification method, equipment and a computer storage medium, wherein the method comprises the following steps: preprocessing an image to be recognized to generate an image preprocessing result; inputting the image preprocessing result into a text positioning model, detecting and positioning the text in the image preprocessing result, and generating a text positioning result; inputting the text positioning result into a character recognition model, and performing character recognition on the text in the text positioning result to generate a character recognition result; and inputting the character recognition result into a semantic correction model, and performing semantic correction on characters in the character recognition result to generate a final recognition result. The invention solves the problem of poor identification effect of the stylus-printed bills, effectively reduces the occupancy rate of computer resources, and improves the training efficiency, thereby improving the identification accuracy rate of the stylus-printed bills.

Description

Bill recognition method, equipment and computer storage medium

Technical Field

The invention relates to the field of image recognition, in particular to a bill recognition method, bill recognition equipment and a computer storage medium.

Background

With the improvement of the medical level of the modern society, a large amount of medical invoice data are stored and processed by a recording computer every day. The traditional mode is that the information in the bill is input into the computer by manpower, and the computer is high in cost and low in efficiency, and the input work task is heavy and high in strength, so that the input personnel are easy to fatigue to cause work errors. With the continuous development of the optical character recognition technology in the field of pattern recognition, the recognition rate is improved to a great extent, text information in medical invoices can be recognized quickly and accurately, and the method plays a key role in automatic recognition and warehousing of the invoices.

At present, the main text recognition methods include a template matching method, a geometric feature extraction method and the like, and the recognition methods have certain limitations, so that the recognition accuracy is low, and the recognition effect is poor particularly under the condition of noise. In addition, the existing bill identification methods are all directed at standard printing bodies, the fonts of the standard printing bodies are standard, have no break points and are easy to identify, and the medical institutions mainly use needle printers to print, and the break points exist in the characters printed by the needle printers. The general text recognition method is no longer applicable.

Disclosure of Invention

In view of the above, a bill identification method, a bill identification device and a computer storage medium are provided to solve the problem of poor identification effect of needle-printed medical bills.

The embodiment of the application provides a bill identification method, which comprises the following steps:

preprocessing an image to be recognized to generate an image preprocessing result;

inputting the image preprocessing result into a text positioning model, detecting and positioning the text in the image preprocessing result, and generating a text positioning result;

inputting the text positioning result into a character recognition model, and performing character recognition on the text in the text positioning result to generate a character recognition result;

and inputting the character recognition result into a semantic correction model, and performing semantic correction on characters in the character recognition result to generate a final recognition result.

In an embodiment, the performing a preprocessing operation on the image to be recognized to generate an image preprocessing result includes:

carrying out image correction on the image to be recognized;

carrying out graying processing on the result after the image correction;

performing threshold segmentation on the grayed result;

and generating an image preprocessing result.

In one embodiment, the training process of the text positioning model includes:

constructing a bill data set;

and training the improved Faster-RCNN by using the bill data set to generate a text positioning model.

In one embodiment, the modified Faster-RCNN comprises:

extracting local features and global features of the bill data set by adopting a multi-scale convolution kernel; and the number of the first and second electrodes,

and updating the bill data set weight by using a learning rate error by adopting a self-adaptive learning strategy.

In one embodiment, the multi-scale convolution kernel includes replacing a 3 × 3 fixed-size convolution kernel in the original fast-RCNN model with a predetermined number of 1 × 1 and 3 × 3 convolution kernels.

In one embodiment, the training process of the character recognition model includes:

constructing a character data set;

and training the improved Alexnet by using the character data set to generate a character recognition model.

In one embodiment, the modified Alexnet network includes:

replacing the convolution kernel of 11 × 11 of the 1 st convolution layer in the original Alexnet network by the convolution kernel of 9 × 9; and is

Two 3 x 3 convolution kernels were used to replace the 5 x 5 convolution kernels of the 2 nd convolution layer in the original Alexnet network.

In one embodiment, the training process of the semantic modification model includes:

acquiring a preset number of medical terms, labeling the medical terms, and generating a medical term corpus;

and training the RNN by using the medical term corpus to generate a semantic correction model.

In an embodiment, the inputting the character recognition result into a semantic correction model, performing semantic correction on the characters in the character recognition result, and generating a final recognition result includes:

when the character recognition result is consistent with the output result of the semantic correction model, the character recognition result is a final recognition result; or the like, or, alternatively,

and when the character recognition result is inconsistent with the output result of the semantic correction model, the output result of the semantic correction model is the final recognition result.

To achieve the above object, there is also provided a computer-readable storage medium having stored thereon a ticket recognition method program which, when executed by a processor, implements the steps of any of the methods described above.

In order to achieve the above object, there is also provided a bill identifying apparatus, including a memory, a processor, and a bill identifying method program stored in the memory and executable on the processor, where the processor implements the steps of any one of the above methods when executing the bill identifying method program.

One or more technical solutions provided in the embodiments of the present application have at least the following technical effects or advantages: preprocessing an image to be recognized to generate an image preprocessing result; the step is to prepare data for the subsequent steps, and process the image to be recognized into an input format which accords with the text positioning model so as to ensure the accuracy of the text positioning model. Inputting the image preprocessing result into a text positioning model, detecting and positioning the text in the image preprocessing result, and generating a text positioning result; the text positioning model in the step has stronger capability of capturing image features and stronger text detection capability. Inputting the text positioning result into a character recognition model, and performing character recognition on the text in the text positioning result to generate a character recognition result; the character recognition model in the step has strong character feature extraction capability, can extract the features of the characters from the character image, and realizes the accurate recognition of the characters. And inputting the character recognition result into a semantic correction model, and performing semantic correction on characters in the character recognition result to generate a final recognition result. The step can carry out semantic correction on the character recognition result, and further ensure the correctness of the final recognition result. The invention solves the problem of poor identification effect of the stylus-printed bills, effectively reduces the occupancy rate of computer resources, and improves the training efficiency, thereby improving the identification accuracy rate of the stylus-printed bills.

Drawings

Fig. 1 is a schematic hardware architecture diagram of a bill identification method according to an embodiment of the present application;

FIG. 2 is a schematic flow chart of a first embodiment of a bill identification method according to the present application;

FIG. 3 is a schematic flow chart of a bill identification method according to the present application;

FIG. 4 is a schematic flowchart of detailed steps of step S110 in the first embodiment of the document identification method of the present application;

FIG. 5 is a flowchart illustrating the detailed steps of step S120 in the first embodiment of the document identification method of the present application;

FIG. 6 is a schematic diagram of a multi-scale convolution of a bill identification method according to the present application;

FIG. 7 is a flowchart illustrating the detailed step of step S130 in the first embodiment of the document identification method of the present application;

FIG. 8 is a schematic view of a character recognition process of the bill recognition method of the present application;

FIG. 9 is a flowchart illustrating the detailed steps of step S140 in the first embodiment of the document identification method of the present application;

fig. 10 is a flowchart illustrating a second embodiment of the ticket recognition method according to the present application.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The main solution of the embodiment of the invention is as follows: preprocessing an image to be recognized to generate an image preprocessing result; inputting the image preprocessing result into a text positioning model, detecting and positioning the text in the image preprocessing result, and generating a text positioning result; inputting the text positioning result into a character recognition model, and performing character recognition on the text in the text positioning result to generate a character recognition result; and inputting the character recognition result into a semantic correction model, and performing semantic correction on characters in the character recognition result to generate a final recognition result. The invention solves the problem of poor identification effect of the stylus-printed bills, effectively reduces the occupancy rate of computer resources, and improves the training efficiency, thereby improving the identification accuracy rate of the stylus-printed bills.

In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.

The application relates to a bill identifying device 010 includes as shown in fig. 1: at least one processor 012, memory 011.

The processor 012 may be an integrated circuit chip having signal processing capability. In implementation, the steps of the method may be performed by hardware integrated logic circuits or instructions in the form of software in the processor 012. The processor 012 may be a general-purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, or discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 011, and the processor 012 reads the information in the memory 011 and completes the steps of the method in combination with the hardware.

It is to be understood that the memory 011 in embodiments of the present invention can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. The non-volatile Memory may be a Read Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. The volatile Memory may be a Random Access Memory (RAM) which serves as an external cache. By way of illustration and not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic random access memory (Synchronous DRAM, SDRAM), Double data rate Synchronous Dynamic random access memory (ddr DRAM), Enhanced Synchronous SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The memory 011 of the systems and methods described in connection with the embodiments of the invention is intended to comprise, without being limited to, these and any other suitable types of memory.

Referring to fig. 2, fig. 2 is a first embodiment of a bill identification method of the present application, which includes the following steps:

step S110: and carrying out preprocessing operation on the image to be recognized to generate an image preprocessing result.

The image to be recognized may be a needle-printed ticket, and may be a medical ticket, an invoice, a receipt, a shopping receipt, or the like, without limitation.

The preprocessing operation is to perform uniform processing on the image to be recognized so that the image to be recognized conforms to a preset format so as to be conveniently input into a model for detection.

The preprocessing operation may include image correction, graying processing, and threshold segmentation, or may be other preprocessing methods, which are not limited herein.

Step S120: and inputting the image preprocessing result into a text positioning model, detecting and positioning the text in the image preprocessing result, and generating a text positioning result.

When identifying characters on an image, the position of a text region needs to be determined, and the process of finding out the character region from the image is called text positioning.

The text positioning model takes the image preprocessing result as input and carries out accurate detection and positioning on the text in the image preprocessing result.

The text positioning result includes attributes such as the position, size, number, etc. of the text region, and is not limited herein.

Step S130: and inputting the text positioning result into a character recognition model, and performing character recognition on the text in the text positioning result to generate a character recognition result.

The character recognition model may be to recognize characters in the text positioning result.

Step S140: and inputting the character recognition result into a semantic correction model, and performing semantic correction on characters in the character recognition result to generate a final recognition result.

And a semantic correction step is added after the character recognition result, so that the recognition correctness is further ensured.

Fig. 3 is a schematic diagram showing the whole process of the bill identification method of the present application.

The beneficial effects of the above embodiment are as follows: preprocessing an image to be recognized to generate an image preprocessing result; the method comprises the following steps of preparing data for the subsequent steps, and processing the image to be recognized into an input format conforming to the text positioning model so as to ensure the accuracy of the text positioning model. Inputting the image preprocessing result into a text positioning model, detecting and positioning the text in the image preprocessing result, and generating a text positioning result; the text positioning model in the step has stronger capability of capturing image features and stronger text detection capability. Inputting the text positioning result into a character recognition model, and performing character recognition on the text in the text positioning result to generate a character recognition result; the character recognition model in the step has strong character feature extraction capability, can extract the features of the characters from the character image, and realizes the accurate recognition of the characters. And inputting the character recognition result into a semantic correction model, and performing semantic correction on characters in the character recognition result to generate a final recognition result. The step can carry out semantic correction on the character recognition result, and further ensure the correctness of the final recognition result. The invention solves the problem of poor identification effect of needle-printed medical bills, effectively reduces the occupancy rate of computer resources, and improves the training efficiency, thereby improving the identification accuracy rate of the needle-printed medical bills.

Referring to fig. 4, fig. 4 is a detailed refinement step of step S110 in the first embodiment of the document identification method of the present application, where the preprocessing operation is performed on the image to be identified to generate an image preprocessing result, and the method includes:

step S111: and carrying out image correction on the image to be recognized.

Image Correction (Image Correction) refers to restoration processing performed on a distorted Image. The reasons for image distortion are: image distortion due to aberrations, distortion, bandwidth limitations, etc. of the imaging system; geometric distortion of the image due to imaging device pose and scanning non-linearity; image distortion due to motion blur, radiation distortion, introduction of noise, etc. The basic idea of image correction is to build a corresponding mathematical model based on the cause of image distortion, extract the required information from the contaminated or distorted image signal, and restore the original image of the image along the inverse process of the image distortion. The actual restoration process is to design a filter that calculates an estimate of the true image from the distorted image to maximize its proximity to the true image based on a predetermined error criterion.

Step S112: and carrying out graying processing on the result after the image correction.

The graying processing changes the colorful medical invoice image into a single-channel image, so that the subsequent threshold segmentation is facilitated.

Step S113: and performing threshold segmentation on the grayed result.

The threshold segmentation method is an image segmentation technology based on regions, and the principle is to divide image pixel points into a plurality of classes. The image thresholding segmentation is the most common traditional image segmentation method, and becomes the most basic and widely applied segmentation technology in image segmentation due to simple implementation, small calculation amount and stable performance. It is particularly suitable for images where the object and background occupy different gray scale ranges. It not only can compress a great amount of data, but also greatly simplifies the analysis and processing steps, and thus is a necessary image preprocessing process before image analysis, feature extraction and pattern recognition in many cases. The purpose of image thresholding is to divide the set of pixels by gray level, each resulting subset forming a region corresponding to the real scene, each region having consistent properties within it, while adjacent regions do not have such consistent properties. Such a division can be achieved by choosing one or more threshold values from the grey scale.

The threshold segmentation may employ Otsu algorithm, Niblack algorithm, and Kittlerr algorithm. And is not limited herein.

Step S114: and generating an image preprocessing result.

And finally generating an image preprocessing result through preprocessing operations such as image correction, gray processing, threshold segmentation and the like.

The image preprocessing process can also comprise image denoising processing and breakpoint processing on a result obtained after threshold segmentation, wherein the core of the breakpoint processing is that Gaussian blur and image enhancement operations are carried out on an image so as to eliminate breakpoints in characters.

The beneficial effects existing in the above embodiment are as follows: the step of preprocessing the image to be recognized and generating the image preprocessing result is specifically provided, so that the generated image preprocessing result is more accurate and is a data guarantee for the correctness of subsequent text positioning and character recognition.

Referring to fig. 5, fig. 5 is a detailed step of step S120 in the first embodiment of the document identification method of the present application, and the training process of the text positioning model includes:

step S121: and constructing a bill data set.

The tickets can be medical tickets, issue tickets, receipts, shopping tickets, wherein each ticket database contains the same category of tickets, i.e. if the tickets are medical tickets, all the tickets contained in the corresponding ticket database are medical related tickets.

The collection of the ticket may be a collection performed in a medical system.

Step S122: and training the improved Faster-RCNN by using the bill data set to generate a text positioning model.

The Fast-RCNN is an object detection algorithm, and provides an RPN (region proxy network) region generation network on the basis of the Fast-RCNN, so that the object detection speed is greatly improved.

The improved Faster-RCNN can be trained by utilizing the bill data set to generate a text positioning model.

The beneficial effects existing in the above embodiment are as follows: specifically, a training process of the text positioning model is given, and the training effect of the text positioning model is guaranteed.

In one embodiment, the modified Faster-RCNN comprises:

The multi-scale convolution kernel extracts feature information of different scales in the image by using convolution kernels of different sizes in the same convolution layer. Meanwhile, abstraction of the characteristic information is increased through continuous deepening of the network layer number, so that the description capacity of the characteristic information on the target is improved.

The local feature may be a local expression of an image feature, which reflects local characteristics of the image and is suitable for matching, searching and other applications of the image.

The global feature may refer to a feature that can represent the whole image, and the global feature is relative to the local feature of the image and is used for describing the whole features such as the color and the shape of the image or the object.

The adaptive learning strategy has the advantage that the learning rate varies slightly with each iteration, decreasing as loss decreases and increasing as loss increases. The larger learning rate is beneficial to jumping out of the local minimum value and reaching the global minimum point, so that the network can find the direction with the fastest gradient drop more quickly.

The beneficial effects existing in the above embodiment are as follows: the multi-scale convolution kernel can effectively fuse the features of adjacent regions with different sizes of the image, the large-scale convolution kernel extracts the global features of the image, and the small-scale convolution kernel extracts the local features of the image, so that the capability of capturing the image features by a network is stronger, and the text detection capability of the model is greatly improved.

In one embodiment, the multi-scale convolution kernel includes replacing a 3 × 3 fixed-size convolution kernel in the original fast-RCNN model with a preset number of 1 × 1 and 3 × 3 convolution kernels.

As shown in fig. 6, two convolution kernels of 1 × 1 and two convolution kernels of 3 × 3 may be used to replace the convolution kernel of 3 × 3 fixed size in the original fast-RCNN model, where the convolution result of the convolution kernel of 1 × 1 is input into the convolution kernel of 3 × 3 for convolution, and the convolution result is feature-fused with the other convolution result of 1 × 1 and the convolution result of 3 × 3, that is, the obtained global feature and the local feature are fused.

Wherein the predetermined number is not limited herein.

The beneficial effects existing in the above embodiment are as follows: and specifically, the setting of a multi-scale convolution kernel is given, the capability of feature extraction is enhanced, and the detection and positioning accuracy of the text positioning model is ensured.

Referring to fig. 7, fig. 7 is a detailed step of step S130 in the first embodiment of the document recognition method of the present application, and the training process of the character recognition model includes:

step S131: a character data set is constructed.

The characters may be notations of a language.

The character data set contains all the characters that can be collected. In an embodiment, the data set may be a character data set related to chinese, and the data set may be a chinese character included in a modern chinese dictionary.

Step S132: and training the improved Alexnet by using the character data set to generate a character recognition model.

Alexnet realizes a deep convolutional neural network structure in a large-scale image data set for the first time, and realizes a high-efficiency GPU convolutional operation structure.

The method and the device can train the improved Alexnet by utilizing the character data set to generate a character recognition model.

The beneficial effects existing in the above embodiment are as follows: after the improved Alexnet is trained, the network learns deep features of the stylus printing font, the character images can be recognized, the trained Alexnet network, namely a generated character recognition model, has strong character feature extraction capability, the features of the characters can be extracted from the character images, and the recognition of the characters is realized.

In one embodiment, the modified Alexnet network includes:

As shown in fig. 8, the convolution kernel size of 1 st convolution layer of Alexnet network is changed from 11 × 11 to 9 × 9; replacing the 5 x 5 convolution kernels of tier 2 with 2 convolution kernels of size 3 x 3; the number of feature maps for each layer network is reduced.

The beneficial effects of the above embodiment are as follows: the occupation of computer resources is reduced by changing the size of the convolution kernel, the number of the characteristic diagram and the convolution layer, the training efficiency is improved, and the response speed of the character recognition model is accelerated.

Referring to fig. 9, fig. 9 is a detailed step of step S140 in the first embodiment of the document identification method of the present application, and the training process of the semantic correction model includes:

step S141: acquiring a preset number of medical terms, labeling the medical terms, and generating a medical term corpus.

A preset number of medical terms included in the medical field may be obtained, and the medical terms are labeled to form a training set, i.e., a medical term corpus, required by the training model.

Step S142: and training the RNN by using the medical term corpus to generate a semantic correction model.

A sentence or a long or short sentence or phrase can be regarded as a sequence consisting of different elements related before and after, and RNN networks are well suited to handle sequence class prediction problems. The core idea of RNN is to construct a connection relationship between related events before and after through a loop structure, and predict an imminent occurrence through a previous occurrence. In the application, the content of the medical bill is often in the form of words and phrases, so that certain relation must be provided between the front and back words.

The beneficial effects of the above embodiment are as follows: and specifically, a training process of the semantic correction model is given, and the effect of the semantic correction model is ensured.

Referring to fig. 10, fig. 10 is a second embodiment of the bill identification method of the present application, including: the step of inputting the character recognition result into a semantic correction model, performing semantic correction on the characters in the character recognition result, and generating a final recognition result includes:

step S210: preprocessing an image to be recognized to generate an image preprocessing result;

step S220: inputting the image preprocessing result into a text positioning model, detecting and positioning the text in the image preprocessing result, and generating a text positioning result;

step S230: inputting the text positioning result into a character recognition model, and performing character recognition on the text in the text positioning result to generate a character recognition result;

step S240: when the character recognition result is consistent with the output result of the semantic correction model, the character recognition result is a final recognition result; or the like, or, alternatively,

Compared with the first embodiment, the second embodiment includes step S240, and other steps are the same as the first embodiment and are not repeated herein.

In the embodiment, the medical bill is identified as 'vitamin' in the character recognition model because the acquired bill image is unclear or bent, and the problem cannot be solved only by using the improved Alexenet because the 'vitamin' and the 'turbulence' are really similar in shape. However, when the probability of the occurrence of the term "vitamin" in the result output by the semantic correction model is far greater than that of the term "vitamin disorder", it is determined that a recognition error exists, and the wrong result "vitamin disorder" can be corrected to be the term "vitamin".

Namely, when the character recognition result vitamin is inconsistent with the semantic correction model output result vitamin, the semantic correction model output result vitamin is the final recognition result, and the semantic correction process is completed.

In the specific implementation, a preset number of characters or a combination of characters in the character recognition result is taken as the input of a semantic correction model, the semantic correction result is compared with the character recognition result, and if the output result of the semantic correction model is consistent with the character recognition result, the character recognition result is the final recognition result; however, if the semantic correction model output result does not match the character recognition result, the semantic correction model output result is used as the final recognition result.

The present application further provides a computer-readable storage medium having stored thereon a ticket recognition method program, which when executed by a processor, implements the steps of any of the above-described methods.

The application also provides bill identification equipment, which comprises a memory, a processor and a bill identification method program which is stored on the memory and can run on the processor, wherein the processor realizes the steps of any one of the methods when executing the bill identification method program.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It should be noted that in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the preferred embodiment and all changes and modifications that fall within the scope of the invention.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A method of bill identification, the method comprising:

inputting the image preprocessing result into a text positioning model, detecting and positioning the text in the image preprocessing result, and generating a text positioning result, wherein the text positioning model is generated based on improved fast-RCNN, and the improved fast-RCNN comprises: extracting local features and global features of the bill data set by adopting a multi-scale convolution kernel, updating the weight of the bill data set by utilizing a learning rate error by adopting a self-adaptive learning strategy, wherein the multi-scale convolution kernel comprises 1 × 1 and 3 × 3 convolution kernels of preset quantity instead of a convolution kernel of 3 × 3 fixed size in an original fast-RCNN model, inputting the convolution result of the 1 × 1 convolution kernel into the convolution kernel of 3 × 3 to be convolved to obtain a convolution result, and performing feature fusion on the convolution result, the 1 × 1 convolution kernel and the 3 × 3 convolution result;

inputting the text positioning result into a character recognition model, performing character recognition on the text in the text positioning result, and generating a character recognition result, wherein the character recognition model is generated based on an improved Alexenet, and the improved Alexenet network comprises: replacing the 11 × 11 convolution kernel of the 1 st convolution layer in the original Alexnet network with a 9 × 9 convolution kernel and replacing the 5 × 5 convolution kernel of the 2 nd convolution layer in the original Alexnet network with two 3 × 3 convolution kernels;

inputting the character recognition result into a semantic correction model, performing semantic correction on characters in the character recognition result, and when the character recognition result is consistent with an output result of the semantic correction model, the character recognition result is a final recognition result, or when the character recognition result is inconsistent with the output result of the semantic correction model, the output result of the semantic correction model is the final recognition result, wherein the training process of the semantic correction model comprises the following steps: acquiring a preset number of medical terms, labeling the medical terms to generate a medical term corpus, and training the RNN by using the medical term corpus to generate a semantic correction model.

2. The bill identifying method according to claim 1, wherein the pre-processing operation on the image to be identified to generate an image pre-processing result comprises:

carrying out image correction on the image to be recognized;

carrying out graying processing on the result after the image correction;

performing threshold segmentation on the grayed result;

and generating an image preprocessing result.

3. The document recognition method according to claim 1, wherein the training process of the text orientation model comprises:

constructing a bill data set;

4. The bill recognition method according to claim 1, wherein the training process of the character recognition model comprises:

constructing a character data set;

5. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a ticket identification method program which, when executed by a processor, implements the steps of the method of any one of claims 1-4.

6. A bill identifying apparatus comprising a memory, a processor and a bill identifying method program stored on the memory and executable on the processor, the processor implementing the steps of the method of any one of claims 1 to 4 when executing the bill identifying method program.