WO2019174130A1

WO2019174130A1 - Bill recognition method, server, and computer readable storage medium

Info

Publication number: WO2019174130A1
Application number: PCT/CN2018/089202
Authority: WO
Inventors: 田野; 刘鹏; 王健宗; 肖京
Original assignee: 平安科技（深圳）有限公司
Priority date: 2018-03-14
Filing date: 2018-05-31
Publication date: 2019-09-19
Also published as: CN108446621A

Abstract

Disclosed is a bill recognition method. The method comprises: receiving a bill image to be recognized, and processing the bill image by means of a pre-trained bill image recognition model; performing text detection on the bill image by using a pre-trained text detection model, and determining a target character zone comprising characters in the bill image and fields to be recognized in the target character zone; and invoking, for the fields to be recognized, a corresponding text recognition model for character recognition, so as to separately recognize character information contained in the multiple fields to be recognized in the target character zone, and outputting the recognition result. The present application further provides a server and a computer readable storage medium. The bill recognition method, the server, and the computer readable storage medium provided in the present application can improve the digitalization efficiency of bills, reduce the work intensity of service staff, and enhance the accuracy or refinement level of data.

Description

Ticket identification method, server and computer readable storage medium

Priority claim

The present application claims priority to Chinese Patent Application No. 201810208586.9, entitled "Note Recognition Method, Server and Computer Readable Storage Medium", which is filed on March 14, 2018, the entire contents of which is incorporated herein by reference. This application is incorporated by reference.

Technical field

The present application relates to the field of image recognition, and in particular, to a ticket identification method and a server computer readable storage medium.

Background technique

Nowadays, with the development of the economy and the improvement of people's living standards, more and more people choose to purchase medical, commercial, financial and other insurance. Some insurance companies have slowly started self-service claims business. For example, in the process of medical claims, users only need to upload photos of outpatient or hospital invoices to the insurance company system. The insurance company salesperson will upload the information on the invoice pictures uploaded by the users. Entering into the claims system for the next step, this method greatly facilitates the user's process of claim settlement. But on the other hand, it also increases the pressure on the insurance company. The problem is mainly caused by the need to spend a lot of manpower to process the image uploaded by the user. In many cases, the salesman is also tired of a single job, which makes the data entry error rate increase.

By introducing the ticket identification technology, it is possible to improve the digitization efficiency of the bill under certain conditions, reduce the work intensity of the business personnel, and improve the accuracy or refinement of the data. Different from the traditional ticket scanning recognition technology, the recognition difficulty of the picture uploaded by the user is greatly increased, mainly in the different shooting environments of the user, the lighting, the rotation angle, the image definition, the occlusion, and even the completion degree of the bills are different. These factors have brought great challenges to the ticket identification process.

Summary of the invention

In view of this, the present application proposes a ticket identification method and a server to solve the problem of how to quickly and accurately identify a ticket picture.

First, in order to achieve the above object, the present application provides a ticket identification method, the method comprising the steps of:

Receiving a picture of the ticket to be identified, and processing the picture of the ticket by the pre-trained ticket picture recognition model to obtain a processed picture of the ticket;

Performing text detection on the processed ticket image by using a pre-trained text detection model, and determining that the processed ticket image includes a target character region of the character and a to-be-identified field included in the target character region;

And corresponding to the to-be-identified field, calling a corresponding text recognition model for character recognition, the text recognition model identifying character information included in the to-be-identified field, and generating a confidence level for the recognized character information;

Comparing the confidence level with a preset confidence threshold, if the confidence is higher than the confidence threshold, outputting the character information included in the target character region according to a preset method, if the confidence is low At the confidence threshold, the document picture is verified by a third party, and the result of the third party verification identification is output;

The preset method includes: retaining the top ten digits of the bill number; using the cosine similarity in the tf-idf algorithm to match the hospital name of the hospital field; extracting the date and time on the original string result output by the algorithm As the date; the uppercase Chinese character amount is transferred to Arabic numerals; the non-related characters are removed and the two decimal places are reserved, and all the amount portions of the algorithm output are formatted.

In addition, in order to achieve the above object, the present application further provides a server including a memory, a processor, and a ticket identification system stored on the memory and operable on the processor, the ticket identification system being processed The steps of the ticket identification method as described above are implemented when the device is executed.

Further, in order to achieve the above object, the present application further provides a computer readable storage medium storing a ticket identification system, the ticket identification system being executable by at least one processor to enable the At least one processor performs the steps of the ticket identification method as described above.

Compared with the prior art, the ticket identification method, the server and the computer readable storage medium proposed by the present application first receive a picture of the ticket to be identified, and the pre-trained ticket picture recognition model pre-predicts the ticket picture according to a preset rule. Processing; secondly, performing text detection on the ticket picture using a pre-trained text detection model to obtain a target character area including characters in the ticket picture, the target character area including a plurality of to-be-identified fields; a target character area, calling a corresponding text recognition model for character recognition to respectively identify character information included in the plurality of to-be-identified fields in the target character area; and finally, acquiring the text recognition model to identify the target character a confidence level generated when the character information is included in the area, and the obtained confidence level is compared with a preset confidence threshold. If the confidence level is higher than the confidence threshold, the target is output according to a preset method. Character information contained in the character area, if the confidence is lower than the confidence threshold, The document image to a third party identified by inspection, and the output of a third party verify identification. The ticket identification method, the server and the computer readable storage medium proposed by the application can improve the digitization efficiency of the ticket, reduce the work intensity of the business personnel, improve the accuracy or refinement of the data, and combine the deep learning algorithm with the third party assistance. The ticket can be more accurately identified, and the present application is more convenient, faster, and more accurate than the prior art, and significantly reduces the cost.

DRAWINGS

1 is a schematic diagram of an optional hardware architecture of the server of the present application;

2 is a schematic diagram of a program module of a first embodiment of the ticket identification system of the present application;

3 is a schematic diagram of a program module of a second embodiment of the ticket identification system of the present application;

4 is a schematic flow chart of a first embodiment of the ticket identification method of the present application;

5 is a schematic flow chart of a second embodiment of the ticket identification method of the present application;

6 is a schematic flow chart of a third embodiment of the ticket identification method of the present application;

FIG. 7 is a schematic flow chart of a fourth embodiment of the ticket identification method of the present application.

The implementation, functional features and advantages of the present application will be further described with reference to the accompanying drawings.

detailed description

In order to make the objects, technical solutions, and advantages of the present application more comprehensible, the present application will be further described in detail below with reference to the accompanying drawings and embodiments. It is understood that the specific embodiments described herein are merely illustrative of the application and are not intended to be limiting. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without departing from the inventive scope are the scope of the present application.

It should be noted that the descriptions of "first", "second" and the like in the present application are for the purpose of description only, and are not to be construed as indicating or implying their relative importance or implicitly indicating the number of technical features indicated. . Thus, features defining "first" or "second" may include at least one of the features, either explicitly or implicitly. In addition, the technical solutions between the various embodiments may be combined with each other, but must be based on the realization of those skilled in the art, and when the combination of the technical solutions is contradictory or impossible to implement, it should be considered that the combination of the technical solutions does not exist. Nor is it within the scope of protection required by this application.

Referring to FIG. 1, it is a schematic diagram of an optional hardware architecture of the server 1 of the present application.

In this embodiment, the server 1 may include, but is not limited to, the memory 11, the processor 12, and the network interface 13 being communicably connected to each other through a system bus. It is pointed out that Figure 1 only shows the server 1 with the components 11-13, but it should be understood that not all illustrated components are required to be implemented, and more or fewer components may be implemented instead.

The server 1 may be a computing device such as a rack server, a blade server, a tower server, or a rack server. The server 1 may be an independent server or a server cluster composed of multiple servers.

The memory 11 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (eg, SD or DX memory, etc.), a random access memory (RAM), a static Random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disk, optical disk, and the like. In some embodiments, the memory 11 may be an internal storage unit of the server 1, such as a hard disk or memory of the server 1. In other embodiments, the memory 11 may also be an external storage device of the server 1, such as a plug-in hard disk equipped on the server 1, a smart memory card (SMC), and a secure digital (Secure) Digital, SD) cards, flash cards, etc. Of course, the memory 11 can also include both the internal storage unit of the server 1 and its external storage device. In this embodiment, the memory 11 is generally used to store an operating system installed in the server 1 and various types of application software, such as program codes of the ticket identification system 2. Further, the memory 11 can also be used to temporarily store various types of data that have been output or are to be output.

The processor 12 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 12 is typically used to control the overall operation of the server 1. In this embodiment, the processor 12 is configured to run program code or process data stored in the memory 11, such as running the ticket identification system 2 and the like.

The network interface 13 may comprise a wireless network interface or a wired network interface, which is typically used to establish a communication connection between the server 1 and other electronic devices.

So far, the hardware structure and functions of the devices related to this application have been described in detail. Hereinafter, various embodiments of the present application will be made based on the above description.

First, the present application proposes a ticket identification system 2.

Referring to FIG. 2, it is a program block diagram of the first embodiment of the ticket identification system 2 of the present application.

In the present embodiment, the ticket identification system 2 includes a series of computer program instructions stored on the memory 11, and when the computer program instructions are executed by the processor 12, the ticket identification operation of the embodiments of the present application can be implemented. In some embodiments, the ticket identification system 2 can be divided into one or more modules based on the particular operations implemented by the various portions of the computer program instructions. For example, in FIG. 2, the ticket identification system 2 can be divided into a pre-processing module 21, a text detection module 22, a text recognition module 23, and a comparison module 24-level output module 25. among them:

The pre-processing module 21 is configured to receive a picture of the ticket to be identified, receive a picture of the ticket to be identified, and process the ticket picture by the pre-trained ticket picture recognition model to obtain the processed ticket picture.

Specifically, the pre-processing module 21 receives the bill image to be identified, and performs pre-processing on the bill image according to a preset step, where the preset step may be to classify, denoise, correct, and intercept the bill image. And according to the preset step, the pre-trained ticket picture recognition model performs classification processing, denoising processing, correction processing, and intercepting ticket processing on the ticket picture.

Specifically, the classification process is used to classify the received ticket pictures to facilitate subsequent processing, and the denoising process can eliminate noise points of the ticket picture, and can generate less blurring of the image, and the user uploaded the ticket may There are a variety of rotation angles, we have to rotate the ticket to the correct direction in order to proceed to the next step, the correction of the ticket picture can make the ticket rotate to the correct direction, the interception ticket is to take the ticket from the original ticket picture Intercepted, the original bill picture includes the bill and the background image, and the interception of the bill can remove the interference of the background image.

Specifically, the ticket picture recognition model is a deep convolutional neural network (for example, the deep convolutional neural network may be a SSD (Single Shot MultiBox Detector) algorithm model selected in a CaffeNet environment), The basic network structure uses the network structure of VGG16, and then removes the last fully connected layer, adding an additional six different scale feature layers.

The deep convolutional neural network model used in this application consists of one input layer, 13 convolutional layers, 5 pooling layers, 2 fully connected layers, and 1 sorting layer. The detailed structure of the deep convolutional neural network model is shown in Table 1.

Table 1

Among them: Layer Name column indicates the name of each layer, Input indicates the input layer, Conv indicates the convolution layer of the model, Conv1 indicates the first convolution layer of the model, MaxPool indicates the maximum pooling layer of the model, and MaxPool1 indicates the model. The first maximum pooling layer, Fc represents the fully connected layer in the model, Fc1 represents the first fully connected layer in the model, Softmax represents the Softmax classifier; Batch Size represents the number of input images of the current layer; Kernel Size represents the current layer The scale of the convolution kernel (for example, the Kernel Size can be equal to 3, indicating that the scale of the convolution kernel is 3x 3); the Stride Size indicates the moving step size of the convolution kernel, that is, moving to the next convolution position after completing one convolution The distance; Pad Size indicates the size of the image fill in the current network layer.

The text detection module 22 is configured to perform text detection on the ticket image processed by the pre-processing module 21 by using a pre-trained text detection model, and determine that the target character region including the character and the target character region included in the ticket image are included. The field to be identified.

Specifically, the text detection model uses a CTPN (Connectionist Text Proposal Network) model based on CaffeNet, and the CTPN model structure includes VGG16 (convolution neural network), LSTM, fully connected layer, etc., wherein VGG is developed from Alex-net. The network, LSTM (Long Short-Term Memory) is a long-term and short-term memory network, which is a time recurrent neural network.

The step of performing text detection on the ticket picture using the text detection model includes:

Use VGG16 to get depth features;

A frame with a fixed width (for example, 16 pixels width) is used to detect a text proposal (a part of a text line), and a feature string corresponding to the same line frame is serialized and input into the LSTM;

The fully connected layer is used to regress and classify, and the eligible text proposals are merged into a final text line, which is the character region.

CTPN makes full use of the contextual connection of text line, combined with RNN and CNN, improves the accuracy of text detection.

The text recognition module 23 is configured to call, according to the to-be-identified field, a corresponding text recognition model for character recognition, where the text recognition model identifies character information included in the to-be-identified field, and the recognized character is Information generation confidence.

Specifically, the text recognition model is based on a model structure of CNN+LSTM+CTC of MXNet, wherein CNN (Convolutional Neural Networks) is a convolutional neural network, and CTC (Connectionist temporal classification) is connected to the last layer of the CNN network. For the sequence learning, the structure of the text recognition model includes Convolutional Layers, Recurrent Layers, and Transcription Layer. The steps of character recognition of the target character region include:

Convolutional Layers perform feature extraction on input image dicing;

On all channels of the last convolutional layer output, splicing from left to right column by column to obtain a sequence of features;

The obtained feature sequences are placed in a loop network layer (Recurrent Layers) for character recognition;

The result of the recognition is processed by the Transcription Layer, and the final recognition result is generated according to the character dictionary.

The training steps of the model include:

Obtaining a preset number (for example, 100,000) of bill picture samples, and dividing the bill picture samples into a first data set and a second data set according to a ratio of X:Y (for example, 8:2), in the first data set The number of picture samples is greater than the number of picture samples in the second data set, the first data set is used as a training set, and the second data set is used as a test set;

The image samples in the first data set are sent to the text recognition model for model training, and the model is tested using the second data set at intervals (for example, every 1000 iterations) to evaluate the effect of the currently trained model. During the test, the model obtained by the training is used to identify the character information of the picture in the second data set, and compares with the name of the tested picture to calculate the error of the recognition result and the labeling result. If the model at the time of the test diverge the error in the recognition of the bill picture, the training parameters are adjusted and retrained, so that the error of the recognition of the bill picture by the model at the training can converge. When the error converges, the model training is ended and the generated model is used as the final text recognition model.

Specifically, when the character recognition model performs character recognition, a corresponding confidence is generated for the recognized character information. The step of obtaining the confidence may be: estimating the generalized confidence using a corresponding formula for different fields to be identified; and obtaining the confidence according to the generalized confidence. For example, the generalized confidence may be obtained from the distance calculation of the unknown sample from the representative sample, or the multi-layer forward neural network may be used to obtain the generalized confidence, and the confidence may be inferred from the generalized confidence using a statistical method. It should be noted that the technician can select a suitable formula and tool according to the need to generate a corresponding confidence for the recognized character information, and details are not described herein again.

The comparing module 24 is configured to compare the obtained confidence level with a preset confidence threshold.

Specifically, if the confidence level is higher than the confidence threshold, the character information included in the target character region is retained, and if the confidence level is lower than the confidence threshold, the document image is passed to a third party. Carry out inspection identification.

The output module 25 is configured to output an output of the character recognition result according to the output value of the comparison module 24, and if the comparator inputs the confidence level higher than the confidence threshold, output the target character region according to a preset rule. Character information, if the confidence level is lower than the confidence threshold, the document picture is verified by a third party, and the result of the third party verification identification is output.

Specifically, the preset rule includes: the first ten digits that the ticket number can be reserved; the hospital field uses the cosine similarity in the tf-idf algorithm to match the best hospital name; the original string result of the date part output in the algorithm The year, month, and day are extracted; the amount of capital Chinese characters is processed by Arabic numerals; all the amount parts are formatted uniformly for the algorithm output, and the non-related characters are removed and the two decimal places are retained.

Specifically, the third party may be a crowdsourcing platform, and the crowdsourcing refers to a company or organization that outsources tasks previously performed by employees to a non-specific (and usually large) mass network in a free and voluntary manner. way of doing. Specifically, the crowdsourcing platform mainly performs the following tasks:

1. Assist in the development of algorithms, including: data annotation, data cleaning, returning the manual verification results to the recognition learning system to continue training, so as to continuously improve the accuracy of the recognition model;

2, the algorithm artificially combines, for complex fields, the algorithm realizes the detection of the text block, and then the part that is difficult to complete by the artificial solution algorithm, such as manually implementing complex text segmentation and unconventional text recognition;

3. The result of the artificial correction algorithm output, the output of the algorithm with low confidence is transferred to the crowdsourcing, and the verification is performed manually to improve the final recognition accuracy.

Specifically, in the process of outputting the result of the artificial auxiliary algorithm, in order to ensure the accuracy, the third party adopts a mechanism for randomly distributing tasks, and each task is distributed to a certain number of users, and then the majority of the same answers are obtained. That is, the result is finally recovered through a cross-validation mechanism.

Referring to FIG. 3, it is a program block diagram of the second embodiment of the ticket identification system 2 of the present application. In this embodiment, the pre-processing module 21 in the ticket identification system 2 includes a classification module 210, a denoising module 220, a correction module 230, and an intercepting module 240.

Specifically, the classification module 210 is configured to identify a ticket category in the received picture by using a pre-trained ticket picture recognition model after receiving the bill picture to be processed, and output a category identification result of the ticket (for example, The categories of medical bills include outpatient bills, hospital bills, and other types of bills.

Specifically, the denoising module 220 performs image smoothing processing and wavelet filtering processing on the ticket image, wherein the image smoothing processing may adopt a neighborhood averaging method and a median filtering method, and the neighborhood averaging method is to perform one pixel. The average value of all the pixels in the neighborhood is assigned to the corresponding pixel in the output image to achieve the purpose of smoothing. The process is to make a window slide on the image. The value of the center position of the window is the average value of each point in the window. Instead, the grayscale average of a few pixels is used instead of the grayscale of one pixel. The median filtering is a nonlinear smoothing filter based on the sorting statistics theory that can effectively suppress noise. The filtering principle is as follows: firstly, a neighborhood with a certain pixel as a center point is determined, which is generally a square neighborhood, and then the gray values of each pixel in the neighborhood are sorted, and the intermediate value is taken as the new value of the central pixel gray scale. The neighborhood here is usually called a window; when the window moves up and down and left and right in the image, the median filtering algorithm can be used to smooth the image well. The median filtered output pixel is determined by the median value of the neighborhood image, so the median filter is far less sensitive to the extreme pixel values (pixels that differ greatly from the surrounding pixel gray values), thus eliminating isolated Noise points can make the image produce less blur.

Specifically, the correction module 230 performs a correction process on the ticket picture such that the ticket is rotated to the correct direction.

Specifically, the intercept module 240 intercepts the ticket from the original ticket picture.

In addition, the present application also proposes a ticket identification method.

Referring to FIG. 4, it is a schematic flowchart of the first embodiment of the ticket identification method of the present application. In this embodiment, the order of execution of the steps in the flowchart shown in FIG. 5 may be changed according to different requirements, and some steps may be omitted.

Step S110: Receive a picture of the ticket to be identified, and process the ticket picture by a pre-trained ticket picture recognition model.

Specifically, the processing manner includes classifying, denoising, correcting, and intercepting the ticket picture.

Step S120: Perform text detection on the ticket picture by using a pre-trained text detection model, and determine a target character area including characters in the ticket picture and a to-be-identified field included in the target character area.

Specifically, performing area recognition on a character area of the ticket picture, and identifying a small frame containing character information and having a fixed width of a preset value (for example, 16 pixel width) from the ticket picture, and including the included Small boxes whose character information is on the same line are stitched together in order to form a target line character area containing character information.

Specifically, identifying the input ticket picture may specifically be as follows:

First, the feature map (W*H*C) is obtained from the first five convolutional layers of VGG16.

Secondly, the features of the window of 3*3*C are taken at each position of the feature map of the fifth convolutional layer, and these features are used to predict the category information and location information corresponding to the k anchors at the position.

Third, input the 3*3*C features (W*3*3*C) corresponding to all windows of each row into the LSTM to obtain the W*256 output.

Fourth, input the W*256 of LSTM to the 512-dimensional fully connected layer.

Fifth, the full connectivity layer feature is entered into three classification or regression layers, because by default the width of each anchor is 16 and no longer changes. The width of the returned rectangles is fixed.

Sixth, a simple text line construction algorithm is used to merge the elongated rectangular boxes in the proposal's text into a text line.

Step S130: Calling a corresponding text recognition model for character recognition to identify the character information included in the plurality of to-be-identified fields in the target character region and acquiring the text recognition model identification The confidence generated when the character information contained in the target character region is generated.

In step S140, the obtained confidence level is compared with a preset confidence threshold. If the confidence level is higher than the confidence threshold, the character information included in the target character region is output according to a preset method. If the confidence level is lower than the confidence threshold, the document picture is verified by a third party, and the result of the third party verification identification is output.

As shown in FIG. 5, it is a schematic flowchart of a second embodiment of the ticket identification method of the present application. In this embodiment, the preprocessing in step S110 of the ticket identification method includes the following steps:

Step S210, classifying the ticket picture.

Step S220, denoising the ticket picture.

Step S230, correcting the picture of the ticket.

Specifically, the correcting process includes the steps of:

Determining the position of the center point of the stamp in the ticket center click ticket;

Determining the rotation angle of the bill according to the relative positional relationship between the center point of the bill and the center point of the stamp;

Rotate the ticket to the horizontal direction according to this angle (clockwise or counterclockwise rotation).

Step S240, intercepting the ticket picture.

FIG. 6 is a schematic flowchart diagram of a third embodiment of the ticket identification method of the present application. In this embodiment, the training step of the text detection model in step S120 of the ticket identification method includes:

Step S310, preparing a preset number of ticket picture samples marked with corresponding picture categories for each preset ticket picture category.

Specifically, the preset picture category includes an outpatient ticket and a hospitalization ticket, and the preset number is 1000 sheets.

Step S320, the picture samples corresponding to each preset picture category are divided into a training subset of a first ratio and a verification subset of a second ratio, and the picture samples in each training subset are mixed to obtain a training set, and The picture samples in the respective verification subsets are mixed to obtain a verification set.

Specifically, the first ratio and the second ratio are 80% and 20%.

Step S330, training the ticket picture recognition model by using the training set.

Step S340, the accuracy of the ticket picture recognition model of the training is verified by using the verification set. If the accuracy rate is greater than or equal to the preset accuracy rate, the training ends; if the accuracy rate is less than the preset accuracy rate, the installation is increased. The number of picture samples corresponding to each preset picture category is described, and the above steps are re-executed.

Specifically, the preset accuracy rate may be 90%.

FIG. 7 is a schematic flowchart diagram of a third embodiment of the ticket identification method of the present application. In this embodiment, the step of the text recognition model in the step S130 of the ticket identification method for identifying the characters in the ticket image including the character region includes:

Step S410, the convolution layer performs feature extraction on the ticket image dicing.

Step S420, splicing column by column from left to right on all channels outputted by the convolution layer to obtain a feature sequence.

Step S430, placing the obtained feature sequence into the loop network layer for character recognition.

Step S440, the translation layer processes the result of the recognition, and generates a final recognition result according to the character dictionary.

The serial numbers of the embodiments of the present application are merely for the description, and do not represent the advantages and disadvantages of the embodiments.

Through the description of the above embodiments, those skilled in the art can clearly understand that the foregoing embodiment method can be implemented by means of software plus a necessary general hardware platform, and of course, can also be through hardware, but in many cases, the former is better. Implementation. Based on such understanding, the technical solution of the present application, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk, The optical disc includes a number of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the methods described in various embodiments of the present application.

The above is only a preferred embodiment of the present application, and is not intended to limit the scope of the patent application, and the equivalent structure or equivalent process transformations made by the specification and the drawings of the present application, or directly or indirectly applied to other related technical fields. The same is included in the scope of patent protection of this application.

Claims

A ticket identification method is applied to a server, characterized in that the method comprises the steps of:

Receiving a picture of the ticket to be identified, processing the picture of the ticket by using a pre-trained ticket picture recognition model to obtain a processed picture of the ticket;

Performing text detection on the processed ticket image by using a pre-trained text detection model, and determining that the processed ticket image includes a target character region of the character and a to-be-identified field included in the target character region;

And corresponding to the to-be-identified field, calling a corresponding text recognition model for character recognition, the text recognition model identifying character information included in the to-be-identified field, and generating a confidence level for the recognized character information;

Comparing the confidence level with a preset confidence threshold, if the confidence is higher than the confidence threshold, outputting the character information included in the target character region according to a preset method, if the confidence is low At the confidence threshold, the document picture is verified by a third party, and the result of the third party verification identification is output;

The preset method includes: retaining the top ten digits of the bill number; using the cosine similarity in the tf-idf algorithm to match the hospital name of the hospital field; extracting the date and time on the original string result output by the algorithm As the date; the uppercase Chinese character amount is transferred to Arabic numerals; the non-related characters are removed and the two decimal places are reserved, and all the amount portions of the algorithm output are formatted.
The ticket identification method according to claim 1, wherein the processing of the ticket image by the ticket picture recognition model comprises: classifying, denoising, correcting, and intercepting the ticket image, The ticket picture processed by the classification processing, the denoising processing, the correction processing, and the interception ticket processing is taken as the processed ticket picture.
The ticket identification method according to claim 2, wherein the classification processing comprises: dividing the bill picture into three categories: an outpatient bill, a hospital bill, and other types of bills; and the denoising processing is: The bill picture performs image smoothing processing and wavelet filtering processing; the rectifying processing includes the steps of: determining a bill center point of the bill picture and a position of a stamp center point in the bill picture, according to the bill center point and the stamp center The relative positional relationship of the points determines the rotation angle of the ticket, and rotates the ticket to the horizontal direction according to the angle; the intercepting the ticket is: cutting the ticket from the original ticket image to remove the background image of the original ticket image.
The ticket identification method according to any one of claims 1-3, wherein the bill picture recognition model is a deep convolutional neural network, and the deep convolutional neural network is a deep convolutional neural network selected in a CaffeNet environment. An algorithm model of the SSD (Single Shot MultiBox Detector), the training process of the ticket picture recognition model includes the steps of:

Preparing a preset number of ticket picture samples marked with corresponding picture categories for each preset ticket picture category;

Dividing the picture samples corresponding to each of the preset picture categories into a training subset of the first ratio and a verification subset of the second ratio, mixing the picture samples in each training subset to obtain a training set, and The image samples in each verification subset are mixed to obtain a verification set;

Training the ticket picture recognition model with the training set; and

Using the verification set to verify the accuracy of the ticket picture recognition model of the training, if the accuracy rate is greater than or equal to the preset accuracy rate, the training ends; if the accuracy rate is less than the preset accuracy rate, each of the Predetermine the number of image samples corresponding to the image category, and re-execute the above steps;

The preset picture category includes an outpatient ticket and a hospitalization ticket, the preset number is 1000, and the first ratio and the second ratio are 80% and 20%.
The ticket identification method according to claim 1, wherein the text detection model is a CaffeNet-based CTPN (Connectionist Text Proposal Network) model, and the text detection model performs a character region of the processed ticket image. The area is identified, and a small frame containing the character information and the fixed width is a preset value is identified from the processed ticket image, and the small frames containing the character information in the same line are stitched together in sequence to form the character information. A target line character area, wherein the preset value is 16 pixel widths.
The ticket identification method according to claim 5, wherein the training process of the text detection model comprises the steps of:

S1. Obtain a preset number of bill picture samples for the to-be-identified field;

S2, a second preset number of small frames of different aspect ratios with fixed widths and preset values are set on the first preset number of pixels on each ticket picture sample, and are performed on the respective ticket picture samples. a small frame containing part or all of the character information of the to-be-identified field is marked, and the ticket picture sample containing the character information of the to-be-identified field is classified into the first training set, and the ticket that does not contain the character information of the to-be-identified field is included The image sample is classified into the second training set;

S3, extracting, from the first training set and the second training set, the first preset ratio of the ticket picture samples as a sample picture to be trained, and using the remaining ticket picture samples in the first training set and the second training set as Sample image to be verified;

S4: performing model training by using the extracted sample images to be trained to generate the text recognition model, and verifying the generated text recognition model by using each sample image to be verified; and

S5, if the verification pass rate is greater than or equal to the preset threshold, the training is completed, if the verification pass rate is less than the preset threshold, increase the number of document picture samples, and repeat steps S2, S3, S4;

The preset number is 100, the first preset number is 16, the second preset number is 10, the preset value is 16 pixel width, and the first preset ratio is 80%, the preset threshold is 98%.
The ticket identification method according to claim 1, wherein the text recognition model comprises a convolution layer, a cyclic network layer and a translation layer, and the step of the text recognition model performing character recognition on the target character region comprises:

The convolution layer performs feature extraction on the processed ticket picture dicing;

On all the channels output by the convolutional layer, splicing from left to right column by column to obtain a feature sequence;

Putting the obtained feature sequence into the loop network layer for character recognition;

The translation layer processes the identified result and generates a final recognition result based on the character dictionary.
The ticket identification method according to claim 7, wherein the training process of the text recognition model comprises the steps of:

Obtaining a preset number of ticket picture samples, and dividing the ticket picture sample into a first data set and a second data set according to a preset ratio, where the number of picture samples in the first data set is greater than the picture sample in the second data set Quantity, the first data set as a training set, and the second data set as a test set;

Sending a picture sample in the first data set to the text recognition model for model training, and performing a preset number of iterations, using the second data set to test the text recognition model, if the test The text recognition model diverges the error of the ticket picture recognition, adjusts the training parameters and retrains, so that the error of the recognition of the ticket picture by the text recognition model converges during training.
A server, comprising: a memory, a processor, and a ticket identification system stored on the memory and operable on the processor, the ticket identification system being implemented by the processor The following steps:

Receiving a picture of the ticket to be identified, processing the picture of the ticket by using a pre-trained ticket picture recognition model to obtain a processed picture of the ticket;

Performing text detection on the processed ticket image by using a pre-trained text detection model, and determining that the processed ticket image includes a target character region of the character and a to-be-identified field included in the target character region;

And corresponding to the to-be-identified field, calling a corresponding text recognition model for character recognition, the text recognition model identifying character information included in the to-be-identified field, and generating a confidence level for the recognized character information;

Comparing the confidence level with a preset confidence threshold, if the confidence is higher than the confidence threshold, outputting the character information included in the target character region according to a preset method, if the confidence is low At the confidence threshold, the document picture is verified by a third party, and the result of the third party verification identification is output;

The preset method includes: retaining the top ten digits of the bill number; using the cosine similarity in the tf-idf algorithm to match the hospital name of the hospital field; extracting the date and time on the original string result output by the algorithm As the date; the uppercase Chinese character amount is transferred to Arabic numerals; the non-related characters are removed and the two decimal places are reserved, and all the amount portions of the algorithm output are formatted.
The server according to claim 9, wherein the processing of the ticket picture by the ticket picture recognition model comprises: classifying, denoising, correcting, and intercepting the ticket image, which will pass The classification process, the denoising process, the correction process, and the note picture of the interception ticket processing are used as the processed ticket picture.
The server according to claim 10, wherein said sorting processing comprises: dividing said bill picture into three categories of outpatient bills, hospital bills, and other types of bills; said denoising processing is: said The bill picture performs image smoothing processing and wavelet filtering processing; the rectifying processing includes the steps of: determining a bill center point of the bill picture and a position of a stamp center point in the bill picture, according to the bill center point and the stamp center point The relative positional relationship determines the rotation angle of the ticket, and rotates the ticket to the horizontal direction according to the angle; the intercepting the ticket is: cutting the ticket from the original ticket image to remove the background image of the original ticket image.
The server according to any one of claims 9-11, wherein the ticket picture recognition model is a deep convolutional neural network, which is a deep convolutional neural network SSD (selected in a CaffeNet environment) The algorithm model of the Single Shot MultiBox Detector), the training process of the ticket picture recognition model includes the steps:

Preparing a preset number of ticket picture samples marked with corresponding picture categories for each preset ticket picture category;

Dividing the picture samples corresponding to each of the preset picture categories into a training subset of the first ratio and a verification subset of the second ratio, mixing the picture samples in each training subset to obtain a training set, and The image samples in each verification subset are mixed to obtain a verification set;

Training the ticket picture recognition model with the training set; and

Using the verification set to verify the accuracy of the ticket picture recognition model of the training, if the accuracy rate is greater than or equal to the preset accuracy rate, the training ends; if the accuracy rate is less than the preset accuracy rate, each of the Predetermine the number of image samples corresponding to the image category, and re-execute the above steps;

The preset picture category includes an outpatient ticket and a hospitalization ticket, the preset number is 1000, and the first ratio and the second ratio are 80% and 20%.
The server according to claim 9, wherein the text detection model is a CaffeNet-based CTPN (Connectionist Text Proposal Network) model, and the text detection model performs area identification on a character region of the processed ticket picture. Recognizing a small frame containing the character information and having a fixed width as a preset value from the processed ticket image, and splicing the small frames in the same line containing the character information in a sequential order to form a target line containing the character information. a character area, wherein the preset value is 16 pixels wide.
The server according to claim 13, wherein the training process of the text detection model comprises the steps of:

S1. Obtain a preset number of bill picture samples for the to-be-identified field;

S2, a second preset number of small frames of different aspect ratios with fixed widths and preset values are set on the first preset number of pixels on each ticket picture sample, and are performed on the respective ticket picture samples. a small frame containing part or all of the character information of the to-be-identified field is marked, and the ticket picture sample containing the character information of the to-be-identified field is classified into the first training set, and the ticket that does not contain the character information of the to-be-identified field is included The image sample is classified into the second training set;

S3, extracting, from the first training set and the second training set, the first preset ratio of the ticket picture samples as a sample picture to be trained, and using the remaining ticket picture samples in the first training set and the second training set as Sample image to be verified;

S4: performing model training by using the extracted sample images to be trained to generate the text recognition model, and verifying the generated text recognition model by using each sample image to be verified; and

S5, if the verification pass rate is greater than or equal to the preset threshold, the training is completed, if the verification pass rate is less than the preset threshold, increase the number of document picture samples, and repeat steps S2, S3, S4;

The preset number is 100, the first preset number is 16, the second preset number is 10, the preset value is 16 pixel width, and the first preset ratio is 80%, the preset threshold is 98%.
The server according to claim 9, wherein the text recognition model comprises a convolution layer, a loop network layer and a translation layer, and the step of the text recognition model performing character recognition on the target character region comprises:

The convolution layer performs feature extraction on the processed ticket picture dicing;

On all the channels output by the convolutional layer, splicing from left to right column by column to obtain a feature sequence;

Putting the obtained feature sequence into the loop network layer for character recognition;

The translation layer processes the identified result and generates a final recognition result based on the character dictionary.
The server according to claim 15, wherein the training process of the text recognition model comprises the steps of:

Obtaining a preset number of ticket picture samples, and dividing the ticket picture sample into a first data set and a second data set according to a preset ratio, where the number of picture samples in the first data set is greater than the picture sample in the second data set Quantity, the first data set as a training set, and the second data set as a test set;

Sending a picture sample in the first data set to the text recognition model for model training, and performing a preset number of iterations, using the second data set to test the text recognition model, if the test The text recognition model diverges the error of the ticket picture recognition, adjusts the training parameters and retrains, so that the error of the recognition of the ticket picture by the text recognition model converges during training.
A computer readable storage medium storing a ticket identification system, wherein when the ticket identification system is executable by at least one processor, the following steps are implemented:

Receiving a picture of the ticket to be identified, processing the picture of the ticket by using a pre-trained ticket picture recognition model to obtain a processed picture of the ticket;

Performing text detection on the processed ticket image by using a pre-trained text detection model, and determining that the processed ticket image includes a target character region of the character and a to-be-identified field included in the target character region;

And corresponding to the to-be-identified field, calling a corresponding text recognition model for character recognition, the text recognition model identifying character information included in the to-be-identified field, and generating a confidence level for the recognized character information;

Comparing the confidence level with a preset confidence threshold, if the confidence is higher than the confidence threshold, outputting the character information included in the target character region according to a preset method, if the confidence is low At the confidence threshold, the document picture is verified by a third party, and the result of the third party verification identification is output;

The preset method includes: retaining the top ten digits of the bill number; using the cosine similarity in the tf-idf algorithm to match the hospital name of the hospital field; extracting the date and time on the original string result output by the algorithm As the date; the uppercase Chinese character amount is transferred to Arabic numerals; the non-related characters are removed and the two decimal places are reserved, and all the amount portions of the algorithm output are formatted.
The computer readable storage medium according to claim 17, wherein the processing of the ticket image by the ticket picture recognition model comprises: classifying, denoising, correcting, and intercepting the ticket image Processing, the ticket picture processed by the classification processing, the denoising processing, the correction processing, and the interception ticket processing is used as the processed ticket picture.
The computer readable storage medium according to claim 18, wherein said sorting processing comprises: dividing said bill picture into three categories of outpatient bills, hospital bills, and other types of bills; said denoising processing is Performing an image smoothing process and a wavelet filtering process on the ticket image; the correcting process includes the steps of: determining a ticket center point of the ticket picture and a position of a stamp center point in the ticket picture, according to the ticket center point The relative positional relationship with the center point of the stamp determines the rotation angle of the ticket, and rotates the ticket to the horizontal direction according to the angle; the intercepting ticket is: the ticket is taken out from the original ticket image, and the background image of the original ticket image is removed.
A computer readable storage medium according to any of claims 17-19, wherein said ticket picture recognition model is a deep convolutional neural network, which is based on deep convolution selected in the environment of CaffeNet An algorithm model of a Sin (Single Shot MultiBox Detector), the training process of the bill picture recognition model includes the steps of:

Preparing a preset number of ticket picture samples marked with corresponding picture categories for each preset ticket picture category;

Dividing the picture samples corresponding to each of the preset picture categories into a training subset of the first ratio and a verification subset of the second ratio, mixing the picture samples in each training subset to obtain a training set, and The image samples in each verification subset are mixed to obtain a verification set;

Training the ticket picture recognition model with the training set; and

Using the verification set to verify the accuracy of the ticket picture recognition model of the training, if the accuracy rate is greater than or equal to the preset accuracy rate, the training ends; if the accuracy rate is less than the preset accuracy rate, each of the Predetermine the number of image samples corresponding to the image category, and re-execute the above steps;

The preset picture category includes an outpatient ticket and a hospitalization ticket, the preset number is 1000, and the first ratio and the second ratio are 80% and 20%.